6D Pose Estimation with Correlation Fusion

Arch

Contribution
1. Propose intra- and inter-correlation modules to exploit the consistent and complementary information within and between RGB and depth modalities for 6D pose estimation.
2. Explore multiple strategies for fusing the intra- and inter-modality information flow to learn discriminative multi-modal features.
3. Demonstrate that the proposed method can achieve the state-of-the-art performance on widely-used benchmark datasets for 6D pose estimation, including [LineMOD and YCB-Video] datasets.
4. The method can benefit robot grasping tasks by providing an accurate estimation of object pose.

Model
1. The first stage consists of semantic segmentation and feature extraction.
2. The second stage consists of models the intra- and inter-correlation within and between RGB and depth modalities followed by multiple module fusion strategies.
3. Additional stage which exploits an iterative refinement methodology to obtain final 6D pose estimation