GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Arch

  • Main Idea

    • Propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations.

    • Extensive experiments show that the approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets, yet real-time and robust.

    • Establishing 2D-3D correspondences whilst computing the final 6D pose estimate in a fully differentiable way.

    • Propose to learn the PnP optimization, exploiting the fact that the correspondences are organized in image space, which gives a significant boost in performance, outperforming all prior works.


  • Contribution

    1. Revisit the key ingredients in direct 6D pose regression and observe that by choosing appropriate representations for the pose parameters, methods based ondirect regression show competitive performance compared with state-of-the-art correspondence-based indirect methods.

    2. Propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to boost the performance of direct 6D pose regression via leveraging the geometric guidance from dense correspondence-based intermediate representations.


  • Model

    1. Given an RGB image I, our GDR-Net takes the zoomed-in RoI (Dynamic Zoom-In for training, off-the-shelf detections for testing) as input and predicts several intermediate geometric features.

    2. The Patch-PnP directly regresses the 6D object pose from Dense Correspondences (M 2D-3D ) and Surface Region Attention (M SRA ).

    model


  • Data and Metrics

    • Dataset

      • YCB-Video
      • Occlusion LineMOD
    • Evaluation Metrics

      • ADD-S
      • n°, n cm

  • Result

1. Result on the YCB-Video Dataset

res

2. Result on the Occlusion LineMOD Dataset

res


  • Limitation and Futur work

    • Futur work
      • Extend our work to more challenging scenarios, such as the lack of annotated real data and unseen object categories or instances.