FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Arch

  • Main Idea

    • Focus on category-level 6D pose and size estimation from a monocular RGB-D image.

    • Design box-cage an online 3D deformation mechanism for training set augmentation to increase the generalization ability of FS-Net and save the hardware source.

    • Method is shape feature-based.

    • Perform at 20 FPS (Close to real time conditions).

    • FS-Net can efficiently extract the category-level pose feature with fewer data.

    • Challenging of Category-level 6D pose estimation:

      • Variation of object shapes and color in the same category.

  • Contribution

    1. Propose a fast shape-based network [FS-Net] to estimate category-level 6D object size and pose. Due to the efficient category-level pose feature extraction, the framework runs at 20 FPS on a GTX 1080 Ti GPU.

    2. Propose a 3DGC autoencoder to reconstruct the observed points for latent orientation feature learning. Then design a decoupled rotation mechanism to fully decode the orientation information. This decoupled mechanism allows to naturally handle the circle symmetry object.

    3. Propose a novel box-cage based 3D deformation mechanism to augment the training data. With this mechanism, the pose accuracy of FS-Net is improved by 7.7%.


  • Model

    1. Use YOLOv3 to detect the object location with RGB input.

    2. Use 3DGC autoencoder to perform 3D segmentation and observed points reconstruction. The latent feature can learn orientation information through the process. Then, propose a novel decoupled rotation mechanism for decoding orientation information.

    3. Use PointNet to estimate the translation and object size.

    Note: use the box-cage based 3D deformation mechanism to increase the generalization ability of FS-Net

    Model1 Model2


  • Data and Metrics

    • Dataset

      • NOCS NOCS
      • LINEMOD LINEMOD
    • Evaluation Metrics

      • Category-Level Pose Estimation

        • 3D IoU 25,50,75
        • n ◦ m cm
      • Instance-Level Pose Estimation

        • ADD-(s)

  • Result

    1. FS-Net is robust to the size of the training dataset and has good category-level feature extraction ability. Even with 20% of the training dataset, the FS-Net can still achieve state-of-the-art performance.

    2. 3D deformation mechanism significantly improves the robustness and performance of FS-Net.

    3. The average reconstruction error of FS-Net method is 0.86, which is 72.9% and 18.9% lower than that of [Shape-Prior] and [CASS] methods, respectively.

    4. FS-Net method achieves better pose estimation results via a simpler reconstruction task.

    5. For (Category-Level Pose Estimation) FS-Net method outperforms the other state-of-the-art methods on both accuracy and speed.

    6. Specifically, on 3D detection metric IOU 50 , FS-Net outperforms the previous best method, NOCS, by 11.7%, and the running speed is 4 times faster.

Note: For (Instance-Level Pose Estimation) FS-Net achieves comparable results on both accuracy [97.6%] and speed [20 FPS].


  • Limitation and Futur work

    • Future work
      • Adopt 3D object detection techniques to directly detect the objects from point clouds.