Cosmos-Drive-Dreams
External Links: Paper | Arxiv Paper | Paper Website
On This Page: Models | Dataset | Toolkits | SDG Pipeline
This is the official code repository of Cosmos-Drive-Dreams - a Synthetic Data Generation (SDG) pipeline built on Cosmos World Foundation Models for generating diverse and challenging scenarios for Autonomous Vehicle use-cases.
We open-source our model weights, pipeline toolkits, and a dataset (including cosmos-generated videos, paired HDMap and LiDAR), which consists of 81,802 clips.
<video controls autoplay loop src=https://github.com/user-attachments/assets/af926ed2-6f93-4e9d-8afe-95095792e8d8>
News
- 2025-10-29: World Scenario Rendering feature added to toolkits! This enhanced rendering mode (used in Cosmos-Transfer 2.5) provides high-fidelity 3D geometry-based control signals with rich laneline and bounding box patterns compared to traditional HDMap rendering. See the toolkit for details.
- 2025-10-22: Data preprocessing, post-training and inference scripts of lidar tokenizer and diffuison models are released ! See Huggingface for our model cards.
- 2025-06-10: Model, Toolkits, and Dataset (including cosmos-generated video, HDMap, and LiDAR) are released! Stay tuned for the paired GT RGB videos.
https://github.com/user-attachments/assets/43c5b921-ef23-4d5d-8ab4-2a7a58a0cb77
Condition Videos: HDMap / LiDAR Depth / World Scenario
Cosmos-Drive Open-source Summary
| Name | Type | Link |
|---|---|---|
| Cosmos-7B-AV-Sample (Paper Sec. [2.1]) | model | base_model.pt |
| Cosmos-7B-Multiview-AV-Sample (Paper Sec. [2.1]) | model | Huggingface Link |
| Cosmos-Transfer1-7B-Sample-AV (Paper Sec. [2.2]) | model | Huggingface Link |
| Cosmos-7B-Single2Multiview-Sample-AV (Paper Sec. [2.3]) | model | Huggingface Link |
| Cosmos-7B-LiDAR-GEN-Sample-AV (Paper Sec. [3]) | model | Huggingface link |
Cosmos-Drive-Dreams Dataset
Cosmos-Drive-Dreams Dataset contains labels (HDMap, BBox, and LiDAR) for 5,843 10-second clips collected by NVIDIA, along with 81,802 synthetic video samples generated by Cosmos-Drive-Dreams from these labels. The synthetically generated video is 121 frames long, capturing a wide variety of challenging scenarios, such as rainy, snowy, foggy, etc, that might not be as easily available in real-world driving datasets. This dataset is ready for commercial/non-commercial use.
Detailed information can be found on the Huggingface page.
Download
usage: scripts/download.py [-h] --odir ODIR [--file_types {hdmap,lidar,synthetic}[,…]] [--workers N] [--clean_cache] required arguments: --odir ODIR Output directory where files are stored. optional arguments: -h, --help Show this help message and exit. --file_types {hdmap,lidar,synthetic}[,…] Comma-separated list of data groups to fetch. • hdmap → common folders + 3d_* HD-map layers • lidar → common folders + lidar_raw • synthetic → common folders + cosmos_synthetic Default: hdmap,lidar,synthetic (all groups). --workers N Parallel download threads (default: 1). Increase on fast networks; reduce if you hit rate limits or disk bottlenecks. --clean_cache Delete the temporary HuggingFace cache after each run to reclaim disk space. common folders (always downloaded, regardless of --file_types): all_object_info, captions, car_mask_coarse, ftheta_intrinsic, pinhole_intrinsic, pose, vehicle_pose
Here are some examples:
# download all (about 3TB) python scripts/download.py --odir YOUR_DATASET_PATH --workers YOUR_WORKER_NUMBER # download hdmap only python scripts/download.py --odir YOUR_DATASET_PATH --file_types hdmap --workers YOUR_WORKER_NUMBER # download lidar only python scripts/download.py --odir YOUR_DATASET_PATH --file_types lidar --workers YOUR_WORKER_NUMBER # download synthetic video only (about 700GB) python scripts/download.py --odir YOUR_DATASET_PATH --file_types synthetic --workers YOUR_WORKER_NUMBER
Tutorial
- Visualizing the structured labels here
Cosmos-Drive-Dreams Toolkits
-
Visualizing the structured labels here
-
Editing ego trajectory interactively to produce novel scenarios here
-
Converting Waymo Open Dataset to our format here
-
Rectify f-theta camera images to more common pinhole camera images here
Cosmos-Drive-Dreams SDG Pipeline
We provide a simple walkthrough including all stages of our SDG pipeline through example data available in the assets folder; no additional data download is necessary. For large-scale sampling, please download the above Cosmos-Drive-Dreams Dataset.
0. Installation and Model Downloading
We recommend using conda for managing your environment. Detailed instructions for setting up Cosmos-Drive-Dreams can be found in INSTALL.md.
1. Preprocessing Condition Videos
cosmos-drive-dreams-toolkits/render_from_rds_hq.py is used to render condition videos from RDS-HQ dataset. It supports three rendering modes:
- HDMap: Traditional 2D projection-based HD map + bounding box rendering (CPU-only), used in Cosmos-Transfer1-7B-Sample-AV
- LiDAR: LiDAR depth rendering (requires GPU), used in Cosmos-Transfer1-7B-Sample-AV
- World Scenario: Enhanced 3D geometry-based rendering with rich visual details (requires GPU), used in Cosmos-Transfer2.5-2B/auto/multiview
In this example, we will only render the HD map + bounding box condition videos.
cd cosmos-drive-dreams-toolkits # generate multi-view condition videos. # If you just want to generate front-view videos, replace `-d rds_hq_mv` with `-d rds_hq` python render_from_rds_hq.py -i ../assets/example -o ../outputs -d rds_hq_mv --skip lidar --skip world_scenario cd ..
This will automatically launch multiple jobs based on Ray for data parallelization, but since we are only processing 1 clip here, it will only use 1 worker. The script should return in under a minute and produce a new directory at outputs/hdmap:
outputs/ └── hdmap/ ├── ftheta_camera_cross_left_120fov │ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4 ├── ftheta_camera_cross_right_120fov │ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4 ├── ftheta_camera_front_wide_120fov │ └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4 ├── ftheta_camera_rear_left_120fov └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4 ...
The suffix _0 means it is the first chunk of the video, which will be 121 frames long.
2. Prompt Rewriting
A prompt describing a possible manifestation for the example can be found in assets/example/captions/2d23*.txt. We can use a VLM (Qwen3 to be exact) to augment this single prompt into many variations as follows:
python scripts/rewrite_caption.py -i assets/example/captions -o outputs/captions
The output will be saved at outputs/captions/2d23*json.
3. Front-view Video Generation
Next, we use Cosmos-Transfer1-7b-Sample-AV to generate a 121-frame RGB video from the HD Map condition video and text prompt.
PYTHONPATH="cosmos-transfer1" python scripts/generate_video_single_view.py --caption_path outputs/captions --input_path outputs --video_save_folder outputs/single_view --checkpoint_dir checkpoints/ --is_av_sample --controlnet_specs assets/sample_av_hdmap_spec.json
For detailed description on how to run this model and how to adjust inference parameters, see this readme.
4. Multiview Video Generation
After single view videos have been generated, we use Cosmos-Transfer1-7b-Sample-AV-Single2MultiView to extend them into multi-view videos.
CUDA_HOME=$CONDA_PREFIX PYTHONPATH="cosmos-transfer1" python scripts/generate_video_multi_view.py --caption_path outputs/captions --input_path outputs --input_view_path outputs/single_view --video_save_folder outputs/multi_view --checkpoint_dir checkpoints --is_av_sample --controlnet_specs assets/sample_av_hdmap_multiview_spec.json
For detailed description on how to run this model and how to adjust inference parameters, see this readme.
5. Filtering via VLM
Coming soon
Citation
@misc{nvidia2025cosmosdrivedreams, title = {Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models}, author = {Ren, Xuanchi and Lu, Yifan and Cao, Tianshi and Gao, Ruiyuan and Huang, Shengyu and Sabour, Amirmojtaba and Shen, Tianchang and Pfaff, Tobias and Wu, Jay Zhangjie and Chen, Runjian and Kim, Seung Wook and Gao, Jun and Leal-Taixe, Laura and Chen, Mike and Fidler, Sanja and Ling, Huan}, year = {2025}, url = {https://arxiv.org/abs/2506.09042} }
@misc{nvidia2025cosmostransfer1, title = {Cosmos Transfer1: World Generation with Adaptive Multimodal Control}, author = {NVIDIA}, year = {2025}, url = {https://arxiv.org/abs/2503.14492} }