Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

NVIDIA Cosmos

NVIDIA Cosmos | 🤗 Cosmos 3

Part of the NVIDIA Cosmos project family — the training and serving framework repository.

Cosmos-Framework

Cosmos-Framework is an end-to-end framework for training and serving world models, including the Cosmos3 model family. Everything lives in a single top-level cosmos_framework/ Python package:

  • Training — distributed FSDP / TP / CP / PP trainer, native DCP checkpoints with HuggingFace safetensors import/export, JSONL / WebDataset / LeRobot dataset adapters. Entry point: cosmos_framework.scripts.train. See docs/training.md.
  • Inference — Diffusers / Transformers / vLLM backends with offline batch generation and online serving (Ray + Gradio). Entry point: cosmos_framework.scripts.inference. Ecosystem-facing shim libraries (lightweight standalone wrappers for downstream projects) live under packages/.

Cosmos 3

Cosmos 3 is our newest model family [Report] [Website]. It is a suite of omnimodal world models designed to jointly process and generate language, images, video, audio, and action sequences within a unified Mixture-of-Transformers architecture. By supporting highly flexible input-output configurations, it seamlessly unifies critical modalities for Physical AI — effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. For a guided experience to test out Cosmos3, please visit [Cosmos].

Framework Documentation

Setup

For more details and alternative installation methods, see Setup. Before installing, make sure your machine meets the System Requirements. If you want a curated PyTorch + CUDA environment, start from the recommended NVIDIA NGC base image.

Install system dependencies:

sudo apt-get install -y --no-install-recommends curl ffmpeg git-lfs libx11-dev tree wget

Install the package with uv (pick the dependency group that matches your CUDA toolkit — see CUDA Variants):

# CUDA 13.0 (recommended) uv sync --all-extras --group=cu130-train # Or, for CUDA 12.8: # uv sync --all-extras --group=cu128-train source .venv/bin/activate && export LD_LIBRARY_PATH=

If you are starting from the recommended NGC image (nvcr.io/nvidia/pytorch:25.09-py3), see the one-shot quickstart.

Training

For the full guide (data preparation, base-checkpoint conversion, parallelism strategies, mixed precision, resuming), see Training. The number of GPUs required depends on the recipe; the shipped recipes under examples/ are 8-GPU configurations (tested on 8× H100 80 GB) launched via their paired launch shells, e.g.:

bash examples/launch_sft_vision_nano.sh

Users may adjust the GPU count to match their model and underlying hardware architecture — tune NPROC_PER_NODE and the parallelism degrees (DP/CP/FSDP shard) in the recipe accordingly.

Inference

See Inference for the full guide — launch commands, supported modes, parallelism presets, and troubleshooting.

Quick single-GPU launch:

python -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ -i "inputs/omni/t2v.json" \ -o outputs/omni_nano \ --checkpoint-path Cosmos3-Nano \ --seed=0

Policy Server

See Policy Server for the full guide.

Reference

TopicWhat it covers
SetupHardware/software prerequisites, uv install paths, CUDA variants, Docker base image, and base-checkpoint downloading.
Code StructureRepository layout and a per-subpackage tour of cosmos_framework/ — where each concern lives and where to add new code.
TrainingLaunching multi-GPU and multi-node runs; parallelism strategies; mixed precision; resuming.
Inference (from a trained checkpoint)Loading a trained checkpoint into one of the inference backends.
Policy ServerRunning the server-client pipeline for Cosmos3-Nano-Policy-DROID.
FAQTroubleshooting (OOM, NCCL hangs, slow training), environment variables, and common pitfalls.

关于 About

Our inference and training framework to run on the Cosmos Models

语言 Languages

Python99.6%
Shell0.2%
Just0.1%
Dockerfile0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
21
Total Commits
峰值: 10次/周
Less
More

核心贡献者 Contributors