6D Pose Estimation DATASETS
Instance Vs. Category
- Instance :
- Instance-level tasks usually treat each object as a class & need an exact 3D model for each object.
- These datasets treat an object as a class and the objects in their test set are seen in the training phrase.
-
Challenges of Instances:
- Viewpoint variability (VP)
- Sampled the test scenes where target objects are located to produce sequences that are widely distributed in the pose space by [0 − 360] degree, [−180 − 180] degree, [−180 − 180] degree in the roll, pitch, and yaw angles, respectively.
-
As the pose space gets wider, the amount of data required for training a 6D estimator increases, in order to capture reasonable viewpoint coverage of the target object.
-
Texture-less objects (TL)
- Texture is important information for RGB cameras, which can capture and represent a scene by 3 basic colors (channels): red, green and blue.
- An object of interest can easily be distinguished from background or any other instances available in the scene, in case it is sufficiently textured.
- Texture on the surface allows to define discriminative features to represent the interested object.
-
When objects are texture-less, this discriminative property disappears, and thus making methods strongly dependent on depth channel in order to estimate 6D poses of objects.
-
Occlusion (O)
- One of the most common challenges observed in 6D object pose estimation.
- Occlusion occurs when an object of interest is partly or completely blocked by other objects existing in the scene.
- Naive occlusion is handled by either modelling it during an off-line training phase or engineering a part-based approach that infers the 6D pose of the object of interest from its unoccluded (occlusion-free) parts.
-
The existence of severe occlusion gives rise false positive estimations, degrading methods’ performance.
-
Severe Occlusion (SO)
-
Clutter (C)
- Clutter is a challenge mainly associated with complicated backgrounds of images, in which existing objects of interest even cannot be detected by naked eye.
- Several methods handle this challenge training the algorithms with cluttered background images.
-
Utilizing background images alleviates the generalization capability of methods, making those data-dependent.
-
Similar-looking distractors (SLD)
- Similar-looking distractors along with similar looking object classes is one of the biggest challenges observed in 6D object pose recovery.
- In case the similarity is in depth channel, 6D pose estimators are strongly confused because of the lack of discriminative selection of shape features.
-
The lacking in shape is compensated by RGB in case there is no color similarity.
-
Multiple Instance (IM)
-
Bin Picking (BP)
-
Category :
- Category-level tasks target at generalization for the unseen objects without exact 3D models.
- These datasets treat the objects from the same category as a class, so their test set with unseen objects focuses on evaluation for the generalization ability.
-
Challenges of Category:
- Distribution shift among source and target
- Any 6D pose estimation working at the level of categories is tested on the instances in target domain.
- The objects in the target domain are different than that are of the source domain, there is a shift between the marginal probability distributions of these two domains.
-
Additionally, this distribution shift itself also changes as the instances in the target domain are unseen to the 6D pose estimator.
-
High intra-class variations
- Despite the fact that instances from the same category typically have similar physical properties, they are not exactly the same.
- Texture and color variations are seen in RGB channel, geometry and shape discrepancies are observed in depth channel.
- Geometric dissimilarities are related to scale and dimensions of the instances, and shape-wise, they appear different in case they physically have extra parts out of the common ones.
- Category-level 6D object pose estimators handle intra-class variation during training using the data that are of the instances belonging to the source domain.
Note: instances’ challenges can also be observed at level of categories, but not the other way round.
NOCS: normalized object coordinate space, renders the object’s local XYZ space to RGB with Alpha indicating whether the object exists within the pixel.