NVIDIA Releases Cosmos 3: A Frontier Foundation Model for Physical AI
NVIDIA has released Cosmos 3, a unified foundation model that combines physical reasoning, world generation, and action generation within a single open model. The model is designed to understand the real world, predict what's likely to happen next, and generate actions for specific environments, embodiments, and tasks.
Cosmos 3 supports multiple input and output modalities through its unified architecture, including text, images, videos, and action sequences. Two Cosmos 3 models are currently available: Cosmos 3 Nano, a compact version with 16B parameters optimized for efficient inference, and Cosmos 3 Super, a 64B parameter model designed for maximum quality and capability.
NVIDIA is open-sourcing six synthetic data generation (SDG) datasets on Hugging Face to support physical AI development. The datasets cover robotics, physics simulation, spatial reasoning, human motion, driving, and warehouse environments. Additionally, NVIDIA has released post-training scripts for adapting Cosmos 3 to new domains, embodiments, and datasets.
Cosmos 3 has been evaluated across multiple benchmark suites, including VANTAGE-Bench, Traffic Anomaly Reasoning (TAR), PAI-Bench, R-Bench Physics-IQ, and RoboLab. The model leads on several public leaderboards, demonstrating its state-of-the-art performance in physical AI tasks.
NVIDIA is providing a fully open set of training recipes for Cosmos 3, including code, configs, and workflows for adapting the model to new domains, embodiments, and datasets. The release also includes NVIDIA NIM microservices for optimized, production-ready deployment on NVIDIA GPUs.
Developers can download the Cosmos 3 models, training scripts, and datasets from Hugging Face and GitHub. NVIDIA is encouraging collaboration and contributions to the Cosmos ecosystem through its community on GitHub and Discord.