FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Arch

  • Main Idea

    • Propose [FFB6D] approach without any time-consuming post-refinement procedure outperforms the state-of-the-art by a large margin.

    • Propose FFB6D, learns to combine appearance [RGB] and geometry information [Depth] for representation learning as well as output representation selection. Specifically, at the representation learning stage.

    • The representation learning stage, build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. the two networks can leverage local and global complementary information from the other one to obtain better representations.

    • The output representation stage, designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.


  • Contribution

    1. A novel full flow bidirectional fusion network for representation learning from a single scene RGBD image, which can be generalized to more applications, such as 3D object detection.

    2. A simple but effective automatic 3D keypoint selection algorithm [SIFT-FPS] that leverages texture and geometry information of object models.

    3. State-of-the-art 6D pose estimation performance on the YCB-Video, LineMOD, and Occlusion LineMOD datasets.

    4. In-depth analysis to understand various design choices of the system.


  • Model

    1. A CNN and a point cloud network is utilized for representation learning of RGB image and point cloud respectively.

    2. In flow of the two networks, bidirectional fusion modules are added as communicate bridges.

    3. The extracted per-point features are then fed into an instance semantic segmentation and a 3D keypoint voting modules to obtain per-object 3D keypoints.

    4. Finally, the pose is recovered within a least-squares fitting algorithm.

    model


  • Data and Metrics

    • Dataset

      • YCB-Video
      • LINEMOD
      • Occlusion LineMOD
    • Evaluation Metrics

      • ADD
      • ADD-S

  • Result

1. Result on the YCB-Video Dataset

res

2. Result on the LINEMOD Dataset

res

3. Result on the Occlusion LineMOD Dataset

res


  • Limitation and Futur work