Related Work of 6D Opject Pose Estimation
-
Instance-level 6D pose estimation:
-
Training set and test set contain the same objects, huge progress has been made in recent years.
- PVN3D: A Deep Point-wise 3D Key- points Voting Network for 6DoF Pose Estimation. pdf
- Unified Framework for Multi-View Multi-Class Object Pose Estimation. pdf
- Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation. pdf
- A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. pdf
-
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. pdf
-
PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation. pdf
- CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation. pdf
- DPOD: 6D Pose Object Detector and Refiner. pdf
- G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features. pdf
- DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. pdf
Note: (Ours) refer to FS-Net model.
-
-
Category-level 6D pose estimation:
-
The major challenge of category-level pose estimation is the intra-class object variation, including shape and color variation.
-
Map the different objects in the same category into a uniform model via RGB feature or RGB-D fusion feature.
-
Wang et al. [Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation.] trained a modified Mask R-CNN to predict the normalized object coordinate space (NOCS) map of different objects based on RGB feature, and then computed the pose with observed depth and NOCS map by Umeyama algorithm. pdf
-
Chen et al. [Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation.] proposed to learn a canonical shape space (CASS) to tackle intra-class shape variations with RGB-D fusion feature. pdf
-
Tian et al. [Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation.] trained a network to predict the NOCS map of different objects, with the uniform shape prior* learned from a shape collection, and RGB-D fusion feature. pdf
-
Chen et al. [6D-Pack: Category-Level 6D Pose Tracker with Anchor-Based Keypoints.]. pdf
Note: (Ours) refer to FS-Net model.
Note: Although these methods achieved state-of-the-art performance, there are still two remaining issues.(Read FS-Net paper)
-
-
-
FS_Net:
-
Instance-Level Pose Estimation
-
Template matching based.
-
Render synthetic image patches from different viewpoints distributed on a sphere around the 3D model of the object and store them as a database of templates. Then the input images are searched using this template database sequentially in a sliding window fashion.
-
Aligned the template to the observed image or depth map via hand-crafted or deep learning feature descriptors. need the 3D object model to generate the template pool, their applications in category-level 6D pose estimation are limited.
-
-
Correspondences based.
-
Trained model to establish 2D-3D correspondences or 3D-3D correspondences. Then solved perspective-n-point (Pnp) and SVD problem with 2D-3D and 3D-3D correspondences, respectively.
-
(2D-3D) correspondences:
-
(3D-3D) correspondences:
-
-
Voting based.
Note: The generation of canonical 3D keypoints is based on the known 3D object model that is not available when predicting the category-level pose.
-
-
Category-Level Pose Estimation
-
[Normalized Object Coordinate Space for Category-Level 6D Object Pose andSize Estimation.]. Map the different objects in the same category to a NOCS map. Then used semantic segmentation to access the observed points cloud with known camera parameters. The 6D pose and size are calculated by the Umeyama algorithm with the NOCS map and the observed points. pdf
-
[Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation.]. Estimated the 6D pose via the learning of a canonical shape space with dense-fusion feature. pdf
Note: RGB feature is sensitive to color variation, the performance of their methods in category-level pose estimation is limited.
-
-
3D Data Augmentation
-
Online data augmentation techniques such as [translation, random flipping, shifting, scaling, and rotation] are applied to original point clouds for training data augmentation.
- Problem: These operations cannot change the shape property of the object.
-
Part-aware augmentation which operates on the semantic parts of the 3D object with five manipulations: [dropout, swap, mix, sparing, and noise injection].
- Problem: how to decide the semantic parts are ambiguous.
-
Box-cage based 3D data augmentation mechanism from [FS-Net] which can generate the various shape variants and avoid semantic parts decision procedure.
- Advantages:
- Don’t require an extra training process to obtain the box-cage of the object.
- Don’t need the target shape to learn the deformation operation.
- The mechanism is entirely online, which saves training time and storage space.
- Advantages:
-
-