PR-GCN: A Deep Graph Convolutional Network with Point Refinement for 6D Pose Estimation

Arch

Main Idea
- Proposes a novel deep learning approach, namely Graph Convolutional Network with Point Refinement [PR-GCN], to simultaneously address the issues belw in a unified way.
  - (1) ineffective representation of depth data.
  - (2) insufficient integration of different modalities.
- Introduces the Point Refinement Network (PRN) to polish 3D point clouds, recovering missing parts with noise removed.
- Introduces the Multi-Modal Fusion Graph Convolutional Network (MMF-GCN) is presented to strengthen RGB-D combination, which captures geometry-aware intermodality correlation through local information propagation in the graph convolutional network.
- Extensive experiments are conducted on three widely used benchmarks [ LM, LM-O and YCB-V datasets], and state- of-the-art performance is reached.
- PRN and MMF-GCN modules are well generalized to other frameworks.

Contribution
1. Propose the PR-GCN approach to 6D pose estimation by enhancing depth representation and multi-modal combination.
2. Present the PRN module with a regularized multi-resolution regression loss for point-cloud refinement. To the best of our knowledge, it is the first that applies 3D point generation to this task.
3. Develop the MMF-GCN module to capture local geometry-aware inter-modality correlation for RGB-D fusion.

PRN) to improve the quality of depth representation, together with a Multi-Modal Fusion Graph Convolution Network (MMF-GCN) to fully explore local geometry-aware inter-modality correlations for sufficient combination.

Model
1. Given an RGB-D image, it first localizes objects on RGB images and generates their raw 3D point clouds.
2. PRN generates refined 3D points to polish shape clues [improve the quality of depth representation].
3. MMF-GCN integrates multi-modal features by propagating local geometry-aware information and leveraging refined 3D points [fully explore local geometry-aware inter-modality correlations for sufficient combination].
4. 6D pose is finally inferred based on the feature delivered by MMF-GCN.

Data and Metrics
- Dataset
  - YCB-Video
  - LINEMOD
  - Occlusion LineMOD
- Evaluation Metrics
  - Average Distance (ADD)
  - ADD-Symmetric (ADD-S)
Note : Average Distance (ADD) and ADD-Symmetric (ADD-S), designed for general objects and symmetric objects, respectively.