How does scalable work in practice?

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video covers rayder, scalable, selfsupervised from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-rayder-scalable-selfsupervised-novel-view-synthesis-from-realworld-video

What is the difference between rayder and selfsupervised?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-rayder-scalable-selfsupervised-novel-view-synthesis-from-realworld-video

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

:::info Stub — Full Engineering Breakdown Coming This paper was auto-fetched from arXiv on 2026-06-01. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Ulrich Prestel et al.
Year	2026
Field	Computer Vision
arXiv	2605.31535
PDF	Download
Categories	cs.CV, cs.AI, cs.LG

Abstract

Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem. A minimal dynamic state, treated as a nuisance factor, absorbs time-varying content and enables stable training on unconstrained real-world video. Importantly, RayDer keeps static-scene NVS as its target task: dynamic content is leveraged purely as scalable supervision, not reconstructed as in dynamic-scene (4D) NVS. Across multiple model sizes and orders of magnitude in data, RayDer exhibits clean power-law scaling with data and compute, and outperforms static-scene data mixtures. On a large number of benchmarks, RayDer achieves strong zero-shot open-set performance competitive with state-of-the-art supervised approaches. Project Page: https://compvis.github.io/rayder

Engineering Breakdown

The Problem

The Approach

We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem.

Key Results

On a large number of benchmarks, RayDer achieves strong zero-shot open-set performance competitive with state-of-the-art supervised approaches.

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Image recognition
Object detection
Visual transformers
Convolutional networks
Multimodal learning
Selfsupervised

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​