PR-GCN: A Deep Graph Convolutional Network with Point Refinement for 6D Pose Estimation

Arch

  • Main Idea

    • Proposes a novel deep learning approach, namely Graph Convolutional Network with Point Refinement [PR-GCN], to simultaneously address the issues belw in a unified way.

      • (1) ineffective representation of depth data.
      • (2) insufficient integration of different modalities.
    • Introduces the Point Refinement Network (PRN) to polish 3D point clouds, recovering missing parts with noise removed.

    • Introduces the Multi-Modal Fusion Graph Convolutional Network (MMF-GCN) is presented to strengthen RGB-D combination, which captures geometry-aware intermodality correlation through local information propagation in the graph convolutional network.

    • Extensive experiments are conducted on three widely used benchmarks [ LM, LM-O and YCB-V datasets], and state- of-the-art performance is reached.

    • PRN and MMF-GCN modules are well generalized to other frameworks.


  • Contribution

    1. Propose the PR-GCN approach to 6D pose estimation by enhancing depth representation and multi-modal combination.

    2. Present the PRN module with a regularized multi-resolution regression loss for point-cloud refinement. To the best of our knowledge, it is the first that applies 3D point generation to this task.

    3. Develop the MMF-GCN module to capture local geometry-aware inter-modality correlation for RGB-D fusion.


PRN) to improve the quality of depth representation, together with a Multi-Modal Fusion Graph Convolution Network (MMF-GCN) to fully explore local geometry-aware inter-modality correlations for sufficient combination.

  • Model

    1. Given an RGB-D image, it first localizes objects on RGB images and generates their raw 3D point clouds.

    2. PRN generates refined 3D points to polish shape clues [improve the quality of depth representation].

    3. MMF-GCN integrates multi-modal features by propagating local geometry-aware information and leveraging refined 3D points [fully explore local geometry-aware inter-modality correlations for sufficient combination].

    4. 6D pose is finally inferred based on the feature delivered by MMF-GCN.

    model


  • Data and Metrics

    • Dataset

      • YCB-Video
      • LINEMOD
      • Occlusion LineMOD
    • Evaluation Metrics

      • Average Distance (ADD)
      • ADD-Symmetric (ADD-S)

    Note : Average Distance (ADD) and ADD-Symmetric (ADD-S), designed for general objects and symmetric objects, respectively.


  • Result

1. Result on the YCB-Video Dataset

res

2. Result on the LINEMOD Dataset

res

3. Result on the Occlusion LineMOD Dataset

res


  • Limitation and Futur work