PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF PoseEstimation
-
Main Idea
-
A novel data-driven method [keypoint-based approach] for robust 6DoF object pose estimation from a single RGBD image.
-
Propose a deep Hough voting network to detect 3D keypoints of objects and then estimate the 6D pose parameters within a least-squares fitting manner.
-
Extension of 2D-keypoint approaches that successfully work on RGB based 6DoF estimation. It allows to fully utilize the geometric constraint of rigid objects with the extra depth information and is easy for a network to learn and optimize.
-
Introduce an instance semantic segmentation module into the network and jointly optimized with keypoint voting [To handle scenes with multiple objects].
-
Extend [PVNet] method to 3D keypoints with extra depth information and fully utilize geometric constraints of rigid objects.
-
-
Contribution
-
A novel deep 3D keypoints Hough voting network with instance semantic segmentation for 6DoF Pose Estimation of single RGBD image.
-
State-of-the-art 6DoF pose estimation performance on YCB and LineMOD datasets.
-
An in-depth analysis of our 3D-keypoint-based method and comparison with previous approaches, demonstrating that 3D-keypoint is a key factor to boost performance for 6DoF pose estimation. We also show that jointly training 3D-keypoint and semantic segmentation can further improve the performance.
-
-
Model
-
The Feature Extraction module extracts the per-point feature from an RGBD image.
-
They are fed into module Mk , MC and MS to predict the translation offsets to keypoints, center point and semantic labels of each point respectively.
-
A clustering algorithm is then applied to distinguish different instances with the same semantic label and points on the same instance vote for their target keypoints.
-
Finally, a least-square fitting algorithm is applied to the predicted keypoints to estimate 6DoF pose parameters.
-
-
Data and Metrics
-
Dataset
- YCB-VIDEO
- LineMOD
-
Evaluation Metrics
- ADD
- ADD(-S)
-
-
Result
1. Result on the YCB-VIDEO Dataset
2. Result on the LineMOD Dataset
-
Limitation and Futur work
- pdf | code | Presentation