NVIDIA Cosmos

Part of the NVIDIA Cosmos project family — the training and serving framework repository.

Cosmos-Framework

Cosmos-Framework is an end-to-end framework for training and serving world models, including the Cosmos3 model family. Everything lives in a single top-level cosmos_framework/ Python package:

Training — distributed FSDP / TP / CP / PP trainer, native DCP checkpoints with HuggingFace safetensors import/export, JSONL / WebDataset / LeRobot dataset adapters. Entry point: cosmos_framework.scripts.train. See docs/training.md.
Inference — Diffusers / Transformers / vLLM backends with offline batch generation and online serving (Ray + Gradio). Entry point: cosmos_framework.scripts.inference. Ecosystem-facing shim libraries (lightweight standalone wrappers for downstream projects) live under packages/.

Cosmos 3

Cosmos 3 is our newest model family [Report] [Website]. It is a suite of omnimodal world models designed to jointly process and generate language, images, video, audio, and action sequences within a unified Mixture-of-Transformers architecture. By supporting highly flexible input-output configurations, it seamlessly unifies critical modalities for Physical AI — effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. For a guided experience to test out Cosmos3, please visit [Cosmos].

Framework Documentation

Quickstart
Setup
Training (Supervised Fine-Tuning)
- JSONL Dataset
Inference
Policy Server
Agent Skills
Reference

Setup

For more details and alternative installation methods, see Setup. Before installing, make sure your machine meets the System Requirements. If you want a curated PyTorch + CUDA environment, start from the recommended NVIDIA NGC base image.

Install system dependencies:

sudo apt-get install -y --no-install-recommends curl ffmpeg git-lfs libx11-dev tree wget

Install the package with uv (pick the dependency group that matches your CUDA toolkit — see CUDA Variants):

# CUDA 13.0 (recommended)
uv sync --all-extras --group=cu130-train
# Or, for CUDA 12.8:
# uv sync --all-extras --group=cu128-train
source .venv/bin/activate && export LD_LIBRARY_PATH=

If you are starting from the recommended NGC image (nvcr.io/nvidia/pytorch:26.06-py3), see the one-shot quickstart.

Training

For the full guide (data preparation, base-checkpoint conversion, parallelism strategies, mixed precision, resuming), see Training. The number of GPUs required depends on the recipe; the shipped recipes under examples/ are 8-GPU configurations (tested on 8× H100 80 GB) launched via their paired launch shells, e.g.:

bash examples/launch_sft_vision_nano.sh

Users may adjust the GPU count to match their model and underlying hardware architecture — tune NPROC_PER_NODE and the parallelism degrees (DP/CP/FSDP shard) in the recipe accordingly.

Inference

See Inference for the full guide — launch commands, supported modes, parallelism presets, and troubleshooting.

Quick single-GPU launch:

python -m cosmos_framework.scripts.inference \
    --parallelism-preset=latency \
    -i "inputs/omni/t2v.json" \
    -o outputs/omni_nano \
    --checkpoint-path Cosmos3-Nano \
    --seed=0

Policy Server

See Policy Server for the full guide.

Agent Skills

Coding agents (Claude Code, Codex CLI, Cursor, and other AGENTS.md-aware tools) can load task-specific instructions for this repo from the bundled AgentSkills. Each skill is a self-contained SKILL.md that the agent invokes automatically when the user's request matches its description.

Skills live in .agents/skills/ (canonical) and are mirrored under .claude/skills/ for Claude Code:

Skill	When it activates
`cosmos3-setup`	Installation, environment setup, checkpoint downloads, Docker.
`cosmos3-codebase-nav`	"Where is X" / "where do I change parameter Y" questions across `cosmos_framework/`.
`cosmos3-inference`	Running offline or online inference, parallelism, sampling parameters.
`cosmos3-post-training`	SFT post-training end-to-end: data prep, DCP conversion, launch, export.
`cosmos3-env-troubleshoot`	Diagnosing install/runtime errors (ImportError, CUDA, Docker, checkpoint failures).

See AGENTS.md for the canonical repo map that agents load first; the skills above are referenced from there.

Reference

Topic	What it covers
Setup	Hardware/software prerequisites, `uv` install paths, CUDA variants, Docker base image, and base-checkpoint downloading.
Code Structure	Repository layout and a per-subpackage tour of `cosmos_framework/` — where each concern lives and where to add new code.
Training	Launching multi-GPU and multi-node runs; parallelism strategies; mixed precision; resuming.
Inference (from a trained checkpoint)	Loading a trained checkpoint into one of the inference backends.
Policy Server	Running the server-client pipeline for Cosmos3-Policy-DROID.
FAQ	Troubleshooting (OOM, NCCL hangs, slow training), environment variables, and common pitfalls.
AGENTS.md + Agent Skills	Repo map and task-specific `SKILL.md` files loaded automatically by AGENTS.md-aware coding agents (Claude Code, Codex CLI, Cursor, etc.).

Cosmos-Framework

Cosmos 3

Framework Documentation

Setup

Training

Inference

Policy Server

Agent Skills

Reference

关于 About

语言 Languages

提交活跃度 Commit Activity

核心贡献者 Contributors