Nvidia wants to scale robot simulation training with Lyra 2.0

Nvidia wants to scale robot simulation training with Lyra 2.0 Key Points - Nvidia researchers present Lyra 2.0, a system that generates coherent 3D environments with an extension of up to 90 meters from a single photo. - The system stores already generated 3D geometry as orientation and trains specifically against quality losses in order to solve two central weaknesses of previous video models. - According to Nvidia, Lyra 2.0 outperforms six competitors and can export the generated scenes to physics engines such as Isaac Sim in order to train robots in generated environments. Nvidia researchers have unveiled Lyra 2.0, a system that generates large, coherent 3D environments from a single photograph.

The resulting scenes can be explored in real time and used directly in robot simulations. Existing AI models for 3D scene generation struggle with long camera paths: the further the virtual camera moves from its starting point, the more colors and structures distort. When the camera returns to a previously seen location, the model often reinvents the environment from scratch. Nvidia researchers aim to solve this problem with Lyra 2.0. The system takes a single photo and generates camera-controlled videos that simulate a virtual walkthrough of a scene. These videos are then automatically converted into 3D representations that can be viewed in real time and used in simulation environments.

According to the research paper, the generated scenes can span roughly 90 meters. How Lyra 2.0 fixes the two biggest problems in 3D scene generation Current video models fail at two fundamental challenges, according to the researchers. First, the model forgets previously seen areas as soon as they leave the frame. Second, small errors accumulate during step-by-step video generation, building up into significant distortions over time. To tackle the first problem, Lyra 2.0 stores the 3D geometry for every generated frame. When the camera moves back toward a previously visited area, the system retrieves the earlier frames and uses their spatial information as a reference.

The video model still handles the actual image generation, which means errors in the stored geometry don't bleed directly into new frames. To prevent drift, the researchers deliberately expose the model to its own flawed outputs during training. This teaches it to recognize and correct quality degradation instead of passing errors along. Lyra 2.0 outperforms six competing methods In benchmark tests on two datasets, Lyra 2.0 beats six other methods - including GEN3C, Yume-1.5, and CaM - across nearly all measured criteria like image quality, style consistency, and camera control, according to Nvidia.

A faster variant of the model generates videos roughly 13 times quicker at comparable quality. The generated 3D scenes can be explored step by step through an interactive interface and exported as meshes to physics engines like Nvidia Isaac Sim. This could let robots train in fully generated environments without needing to capture real-world 3D data, the company says. For now, though, Lyra 2.0 only supports static scenes. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now.

မူရင်းသတင်းရင်းမြစ်များ