Category-Level 3D Correspondence in Camera Space via Morphable Object Priors
:::info Stub — Full Engineering Breakdown Coming This paper was featured on Hugging Face Daily Papers on 2026-05-27 with 6 upvotes. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::
| Authors | Leonhard Sommer et al. |
| Year | 2026 |
| HF Upvotes | 6 |
| arXiv | 2605.28257 |
| Download | |
| HF Page | View on Hugging Face |
Abstract
Understanding 3D objects from images is fundamental to robotics and AR/VR applications. While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions. In this work, we study category-level 3D correspondence in camera space -- predicting, from a single image, 3D locations that remain consistent across instances within a category -- and show that it can emerge without explicit correspondence supervision by learning a shared morphable object prior. To enable research in this direction, we introduce HouseCorr3D, the first large-scale benchmark for monocular category-level 3D correspondence with 178k images across 50 household object categories, 280 unique instances, and 3D keypoint annotations directly on CAD models. Crucially, HouseCorr3D provides amodal correspondence labels for occluded regions and explicit symmetry annotations, addressing key limitations of existing datasets. We further propose Morpheus, a method that learns morphable category-level shape priors by disentangling canonical shape, deformation, and object pose. Through this shared canonical grounding, semantically meaningful 3D correspondences in camera space emerge implicitly. These emerging 3D correspondences set a new state of the art on HouseCorr3D, demonstrating that semantic 3D object understanding can arise without direct correspondence supervision. Data and code are publicly available at https://github.com/GenIntel/HouseCorr3D.
Engineering Breakdown
The Problem
While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions.
The Approach
In this work, we study category-level 3D correspondence in camera space -- predicting, from a single image, 3D locations that remain consistent across instances within a category -- and show that it can emerge without explicit correspondence supervision by learning a shared morphable object prior. To enable research in this direction, we introduce HouseCorr3D, the first large-scale benchmark for monocular category-level 3D correspondence with 178k images across 50 household object categories, 280 unique instances, and 3D keypoint annotations directly on CAD models.
Key Results
These emerging 3D correspondences set a new state of the art on HouseCorr3D, demonstrating that semantic 3D object understanding can arise without direct correspondence supervision.
Research Areas
This paper contributes to the following areas of AI/ML engineering:
- Machine learning
- Deep learning
- Neural networks
- Model optimization
- AI systems
- Categorylevel
:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::
