PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF PoseEstimation

Arch

Contribution
1. A novel deep 3D keypoints Hough voting network with instance semantic segmentation for 6DoF Pose Estimation of single RGBD image.
2. State-of-the-art 6DoF pose estimation performance on YCB and LineMOD datasets.
3. An in-depth analysis of our 3D-keypoint-based method and comparison with previous approaches, demonstrating that 3D-keypoint is a key factor to boost performance for 6DoF pose estimation. We also show that jointly training 3D-keypoint and semantic segmentation can further improve the performance.

Model
1. The Feature Extraction module extracts the per-point feature from an RGBD image.
2. They are fed into module Mk , MC and MS to predict the translation offsets to keypoints, center point and semantic labels of each point respectively.
3. A clustering algorithm is then applied to distinguish different instances with the same semantic label and points on the same instance vote for their target keypoints.
4. Finally, a least-square fitting algorithm is applied to the predicted keypoints to estimate 6DoF pose parameters.