FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Arch

Contribution
1. Propose a fast shape-based network [FS-Net] to estimate category-level 6D object size and pose. Due to the efficient category-level pose feature extraction, the framework runs at 20 FPS on a GTX 1080 Ti GPU.
2. Propose a 3DGC autoencoder to reconstruct the observed points for latent orientation feature learning. Then design a decoupled rotation mechanism to fully decode the orientation information. This decoupled mechanism allows to naturally handle the circle symmetry object.
3. Propose a novel box-cage based 3D deformation mechanism to augment the training data. With this mechanism, the pose accuracy of FS-Net is improved by 7.7%.

Model
1. Use YOLOv3 to detect the object location with RGB input.
2. Use 3DGC autoencoder to perform 3D segmentation and observed points reconstruction. The latent feature can learn orientation information through the process. Then, propose a novel decoupled rotation mechanism for decoding orientation information.
3. Use PointNet to estimate the translation and object size.
Note: use the box-cage based 3D deformation mechanism to increase the generalization ability of FS-Net

Note: For (Instance-Level Pose Estimation) FS-Net achieves comparable results on both accuracy [97.6%] and speed [20 FPS].

Limitation and Futur work
- Future work
  - Adopt 3D object detection techniques to directly detect the objects from point clouds.

Main Idea