FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
-
Main Idea
-
Focus on category-level 6D pose and size estimation from a monocular RGB-D image.
-
Design box-cage an online 3D deformation mechanism for training set augmentation to increase the generalization ability of FS-Net and save the hardware source.
-
Method is shape feature-based.
-
Perform at 20 FPS (Close to real time conditions).
-
FS-Net can efficiently extract the category-level pose feature with fewer data.
-
Challenging of Category-level 6D pose estimation:
- Variation of object shapes and color in the same category.
-
-
Contribution
-
Propose a fast shape-based network [FS-Net] to estimate category-level 6D object size and pose. Due to the efficient category-level pose feature extraction, the framework runs at 20 FPS on a GTX 1080 Ti GPU.
-
Propose a 3DGC autoencoder to reconstruct the observed points for latent orientation feature learning. Then design a decoupled rotation mechanism to fully decode the orientation information. This decoupled mechanism allows to naturally handle the circle symmetry object.
-
Propose a novel box-cage based 3D deformation mechanism to augment the training data. With this mechanism, the pose accuracy of FS-Net is improved by 7.7%.
-
-
Model
-
Use YOLOv3 to detect the object location with RGB input.
-
Use 3DGC autoencoder to perform 3D segmentation and observed points reconstruction. The latent feature can learn orientation information through the process. Then, propose a novel decoupled rotation mechanism for decoding orientation information.
-
Use PointNet to estimate the translation and object size.
Note: use the box-cage based 3D deformation mechanism to increase the generalization ability of FS-Net
-
-
Data and Metrics
-
Dataset
- NOCS
- LINEMOD
- NOCS
-
Evaluation Metrics
-
Category-Level Pose Estimation
- 3D IoU 25,50,75
- n ◦ m cm
-
Instance-Level Pose Estimation
- ADD-(s)
-
-
-
Result
-
FS-Net is robust to the size of the training dataset and has good category-level feature extraction ability. Even with 20% of the training dataset, the FS-Net can still achieve state-of-the-art performance.
-
3D deformation mechanism significantly improves the robustness and performance of FS-Net.
-
The average reconstruction error of FS-Net method is 0.86, which is 72.9% and 18.9% lower than that of [Shape-Prior] and [CASS] methods, respectively.
-
FS-Net method achieves better pose estimation results via a simpler reconstruction task.
-
For (Category-Level Pose Estimation) FS-Net method outperforms the other state-of-the-art methods on both accuracy and speed.
-
Specifically, on 3D detection metric IOU 50 , FS-Net outperforms the previous best method, NOCS, by 11.7%, and the running speed is 4 times faster.
-
Note: For (Instance-Level Pose Estimation) FS-Net achieves comparable results on both accuracy [97.6%] and speed [20 FPS].
-
Limitation and Futur work
- Future work
- Adopt 3D object detection techniques to directly detect the objects from point clouds.
- Future work
- pdf | code | Presentation