6D Pose Estimation DATASETS

Instance Vs. Category

  • Instance :
  • Instance-level tasks usually treat each object as a class & need an exact 3D model for each object.
  • These datasets treat an object as a class and the objects in their test set are seen in the training phrase.
  • Challenges of Instances:

    • Viewpoint variability (VP)
    • Sampled the test scenes where target objects are located to produce sequences that are widely distributed in the pose space by [0 − 360] degree, [−180 − 180] degree, [−180 − 180] degree in the roll, pitch, and yaw angles, respectively.
    • As the pose space gets wider, the amount of data required for training a 6D estimator increases, in order to capture reasonable viewpoint coverage of the target object.

    • Texture-less objects (TL)

    • Texture is important information for RGB cameras, which can capture and represent a scene by 3 basic colors (channels): red, green and blue.
    • An object of interest can easily be distinguished from background or any other instances available in the scene, in case it is sufficiently textured.
    • Texture on the surface allows to define discriminative features to represent the interested object.
    • When objects are texture-less, this discriminative property disappears, and thus making methods strongly dependent on depth channel in order to estimate 6D poses of objects.

    • Occlusion (O)

    • One of the most common challenges observed in 6D object pose estimation.
    • Occlusion occurs when an object of interest is partly or completely blocked by other objects existing in the scene.
    • Naive occlusion is handled by either modelling it during an off-line training phase or engineering a part-based approach that infers the 6D pose of the object of interest from its unoccluded (occlusion-free) parts.
    • The existence of severe occlusion gives rise false positive estimations, degrading methods’ performance.

    • Severe Occlusion (SO)

    • Clutter (C)

    • Clutter is a challenge mainly associated with complicated backgrounds of images, in which existing objects of interest even cannot be detected by naked eye.
    • Several methods handle this challenge training the algorithms with cluttered background images.
    • Utilizing background images alleviates the generalization capability of methods, making those data-dependent.

    • Similar-looking distractors (SLD)

    • Similar-looking distractors along with similar looking object classes is one of the biggest challenges observed in 6D object pose recovery.
    • In case the similarity is in depth channel, 6D pose estimators are strongly confused because of the lack of discriminative selection of shape features.
    • The lacking in shape is compensated by RGB in case there is no color similarity.

    • Multiple Instance (IM)

    • Bin Picking (BP)

  • Category :

  • Category-level tasks target at generalization for the unseen objects without exact 3D models.
  • These datasets treat the objects from the same category as a class, so their test set with unseen objects focuses on evaluation for the generalization ability.
  • Challenges of Category:

    • Distribution shift among source and target
    • Any 6D pose estimation working at the level of categories is tested on the instances in target domain.
    • The objects in the target domain are different than that are of the source domain, there is a shift between the marginal probability distributions of these two domains.
    • Additionally, this distribution shift itself also changes as the instances in the target domain are unseen to the 6D pose estimator.

    • High intra-class variations

    • Despite the fact that instances from the same category typically have similar physical properties, they are not exactly the same.
    • Texture and color variations are seen in RGB channel, geometry and shape discrepancies are observed in depth channel.
    • Geometric dissimilarities are related to scale and dimensions of the instances, and shape-wise, they appear different in case they physically have extra parts out of the common ones.
    • Category-level 6D object pose estimators handle intra-class variation during training using the data that are of the instances belonging to the source domain.

Note: instances’ challenges can also be observed at level of categories, but not the other way round.

NOCS: normalized object coordinate space, renders the object’s local XYZ space to RGB with Alpha indicating whether the object exists within the pixel.