PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF PoseEstimation

Arch

  • Main Idea

    • A novel data-driven method [keypoint-based approach] for robust 6DoF object pose estimation from a single RGBD image.

    • Propose a deep Hough voting network to detect 3D keypoints of objects and then estimate the 6D pose parameters within a least-squares fitting manner.

    • Extension of 2D-keypoint approaches that successfully work on RGB based 6DoF estimation. It allows to fully utilize the geometric constraint of rigid objects with the extra depth information and is easy for a network to learn and optimize.

    • Introduce an instance semantic segmentation module into the network and jointly optimized with keypoint voting [To handle scenes with multiple objects].

    • Extend [PVNet] method to 3D keypoints with extra depth information and fully utilize geometric constraints of rigid objects.


  • Contribution

    1. A novel deep 3D keypoints Hough voting network with instance semantic segmentation for 6DoF Pose Estimation of single RGBD image.

    2. State-of-the-art 6DoF pose estimation performance on YCB and LineMOD datasets.

    3. An in-depth analysis of our 3D-keypoint-based method and comparison with previous approaches, demonstrating that 3D-keypoint is a key factor to boost performance for 6DoF pose estimation. We also show that jointly training 3D-keypoint and semantic segmentation can further improve the performance.


  • Model

    1. The Feature Extraction module extracts the per-point feature from an RGBD image.

    2. They are fed into module Mk , MC and MS to predict the translation offsets to keypoints, center point and semantic labels of each point respectively.

    3. A clustering algorithm is then applied to distinguish different instances with the same semantic label and points on the same instance vote for their target keypoints.

    4. Finally, a least-square fitting algorithm is applied to the predicted keypoints to estimate 6DoF pose parameters.

    model


  • Data and Metrics

    • Dataset

      • YCB-VIDEO
      • LineMOD
    • Evaluation Metrics

      • ADD
      • ADD(-S)

  • Result

1. Result on the YCB-VIDEO Dataset

res

2. Result on the LineMOD Dataset

res


  • Limitation and Futur work