6D Pose Estimation with Correlation Fusion

Arch

  • Main Idea

    • Propose a novel Correlation Fusion (CF) framework which models the feature correlation within and between RGB and depth modalities to improve the performance of 6D pose estimation.

    • Propose two modules namely Intra-modality [IntraMCM] Correlation Modeling and Inter-modality [InterMCM] Correlation Modeling, to help select prominent features within and cross two modalities using a self-attention mechanism.

    • Intra-modality [IntraMCM] is designed to learn prominent modality-specific features.

    • Inter-modality [InterMCM] is to capture complement modality features.

    • The first work to explore effective intra- and inter-modality fusion in 6D pose estimation.

    • Achieve the state-of-the-art performance on LineMOD and YCB-Video dataset.

    • The CF method can benefit a real-world robot grasping task by providing accurate object pose estimation.


  • Contribution

    1. Propose intra- and inter-correlation modules to exploit the consistent and complementary information within and between RGB and depth modalities for 6D pose estimation.

    2. Explore multiple strategies for fusing the intra- and inter-modality information flow to learn discriminative multi-modal features.

    3. Demonstrate that the proposed method can achieve the state-of-the-art performance on widely-used benchmark datasets for 6D pose estimation, including [LineMOD and YCB-Video] datasets.

    4. The method can benefit robot grasping tasks by providing an accurate estimation of object pose.


  • Model

    1. The first stage consists of semantic segmentation and feature extraction.

    2. The second stage consists of models the intra- and inter-correlation within and between RGB and depth modalities followed by multiple module fusion strategies.

    3. Additional stage which exploits an iterative refinement methodology to obtain final 6D pose estimation

    model


  • Data and Metrics

    • Dataset

      • LINEMOD
      • YCB-VIDEO
    • Evaluation Metrics

      • ADD
      • ADD(-S)

  • Result

1. Result on the LINEMOD Dataset

res

2. Result on the YCB-VIDEO Dataset

res


  • Limitation and Futur work