Cosmos-Drive-Dreams

External Links: Paper | Arxiv Paper | Paper Website

On This Page: Models | Dataset | Toolkits | SDG Pipeline

This is the official code repository of Cosmos-Drive-Dreams - a Synthetic Data Generation (SDG) pipeline built on Cosmos World Foundation Models for generating diverse and challenging scenarios for Autonomous Vehicle use-cases.

We open-source our model weights, pipeline toolkits, and a dataset (including cosmos-generated videos, paired HDMap and LiDAR), which consists of 81,802 clips.

Cosmos-Drive-Dream Teaser

News

2025-10-29: World Scenario Rendering feature added to toolkits! This enhanced rendering mode (used in Cosmos-Transfer 2.5) provides high-fidelity 3D geometry-based control signals with rich laneline and bounding box patterns compared to traditional HDMap rendering. See the toolkit for details.
2025-10-22: Data preprocessing, post-training and inference scripts of lidar tokenizer and diffuison models are released ! See Huggingface for our model cards.
2025-06-10: Model, Toolkits, and Dataset (including cosmos-generated video, HDMap, and LiDAR) are released! Stay tuned for the paired GT RGB videos.

https://github.com/user-attachments/assets/43c5b921-ef23-4d5d-8ab4-2a7a58a0cb77

Condition Videos: HDMap / LiDAR Depth / World Scenario

Cosmos-Drive Open-source Summary

Name	Type	Link
Cosmos-7B-AV-Sample (Paper Sec. [2.1])	model	`base_model.pt`
Cosmos-7B-Multiview-AV-Sample (Paper Sec. [2.1])	model	Huggingface Link
Cosmos-Transfer1-7B-Sample-AV (Paper Sec. [2.2])	model	Huggingface Link
Cosmos-7B-Single2Multiview-Sample-AV (Paper Sec. [2.3])	model	Huggingface Link
Cosmos-7B-LiDAR-GEN-Sample-AV (Paper Sec. [3])	model	Huggingface link

Cosmos-Drive-Dreams Dataset

Cosmos-Drive-Dreams Dataset contains labels (HDMap, BBox, and LiDAR) for 5,843 10-second clips collected by NVIDIA, along with 81,802 synthetic video samples generated by Cosmos-Drive-Dreams from these labels. The synthetically generated video is 121 frames long, capturing a wide variety of challenging scenarios, such as rainy, snowy, foggy, etc, that might not be as easily available in real-world driving datasets. This dataset is ready for commercial/non-commercial use.

Detailed information can be found on the Huggingface page.

Download

usage: scripts/download.py [-h] --odir ODIR
                           [--file_types {hdmap,lidar,synthetic}[,…]]
                           [--workers N] [--clean_cache]

required arguments:
  --odir ODIR            Output directory where files are stored.

optional arguments:
  -h, --help             Show this help message and exit.
  --file_types {hdmap,lidar,synthetic}[,…]
                  Comma-separated list of data groups to fetch.
                  • hdmap     → common folders + 3d_* HD-map layers  
                  • lidar     → common folders + lidar_raw  
                  • synthetic → common folders + cosmos_synthetic  
                  Default: hdmap,lidar,synthetic (all groups).
  --workers N            Parallel download threads (default: 1).
                         Increase on fast networks; reduce if you hit
                         rate limits or disk bottlenecks.
  --clean_cache          Delete the temporary HuggingFace cache after
                         each run to reclaim disk space.

common folders (always downloaded, regardless of --file_types):
  all_object_info, captions, car_mask_coarse, ftheta_intrinsic,
  pinhole_intrinsic, pose, vehicle_pose

Here are some examples:

# download all (about 3TB)
python scripts/download.py --odir YOUR_DATASET_PATH --workers YOUR_WORKER_NUMBER

# download hdmap only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types hdmap --workers YOUR_WORKER_NUMBER

# download lidar only
python scripts/download.py --odir YOUR_DATASET_PATH --file_types lidar --workers YOUR_WORKER_NUMBER

# download synthetic video only (about 700GB)
python scripts/download.py --odir YOUR_DATASET_PATH --file_types synthetic --workers YOUR_WORKER_NUMBER

Tutorial

Visualizing the structured labels here

Cosmos-Drive-Dreams Toolkits

Visualizing the structured labels here
Editing ego trajectory interactively to produce novel scenarios here
Converting Waymo Open Dataset to our format here
Rectify f-theta camera images to more common pinhole camera images here

toolkit_demo_small.webm

Cosmos-Drive-Dreams SDG Pipeline

We provide a simple walkthrough including all stages of our SDG pipeline through example data available in the assets folder; no additional data download is necessary. For large-scale sampling, please download the above Cosmos-Drive-Dreams Dataset.

0. Installation and Model Downloading

We recommend using conda for managing your environment. Detailed instructions for setting up Cosmos-Drive-Dreams can be found in INSTALL.md.

1. Preprocessing Condition Videos

cosmos-drive-dreams-toolkits/render_from_rds_hq.py is used to render condition videos from RDS-HQ dataset. It supports three rendering modes:

HDMap: Traditional 2D projection-based HD map + bounding box rendering (CPU-only), used in Cosmos-Transfer1-7B-Sample-AV
LiDAR: LiDAR depth rendering (requires GPU), used in Cosmos-Transfer1-7B-Sample-AV
World Scenario: Enhanced 3D geometry-based rendering with rich visual details (requires GPU), used in Cosmos-Transfer2.5-2B/auto/multiview

In this example, we will only render the HD map + bounding box condition videos.

cd cosmos-drive-dreams-toolkits

# generate multi-view condition videos.
# If you just want to generate front-view videos, replace `-d rds_hq_mv` with `-d rds_hq`
python render_from_rds_hq.py -i ../assets/example -o ../outputs -d rds_hq_mv --skip lidar --skip world_scenario
cd ..

This will automatically launch multiple jobs based on Ray for data parallelization, but since we are only processing 1 clip here, it will only use 1 worker. The script should return in under a minute and produce a new directory at outputs/hdmap:

outputs/
└── hdmap/
    ├── ftheta_camera_cross_left_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_cross_right_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_front_wide_120fov
    │   └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ├── ftheta_camera_rear_left_120fov
        └── 2d23a1f4-c269-46aa-8e7d-1bb595d1e421_2445376400000_2445396400000_0.mp4
    ...

The suffix _0 means it is the first chunk of the video, which will be 121 frames long.

2. Prompt Rewriting

A prompt describing a possible manifestation for the example can be found in assets/example/captions/2d23*.txt. We can use a VLM (Qwen3 to be exact) to augment this single prompt into many variations as follows:

python scripts/rewrite_caption.py -i assets/example/captions -o outputs/captions

The output will be saved at outputs/captions/2d23*json.

3. Front-view Video Generation

Next, we use Cosmos-Transfer1-7b-Sample-AV to generate a 121-frame RGB video from the HD Map condition video and text prompt.

PYTHONPATH="cosmos-transfer1" python scripts/generate_video_single_view.py --caption_path outputs/captions --input_path outputs --video_save_folder outputs/single_view --checkpoint_dir checkpoints/ --is_av_sample --controlnet_specs assets/sample_av_hdmap_spec.json

For detailed description on how to run this model and how to adjust inference parameters, see this readme.

4. Multiview Video Generation

After single view videos have been generated, we use Cosmos-Transfer1-7b-Sample-AV-Single2MultiView to extend them into multi-view videos.

CUDA_HOME=$CONDA_PREFIX PYTHONPATH="cosmos-transfer1" python scripts/generate_video_multi_view.py --caption_path outputs/captions --input_path outputs --input_view_path outputs/single_view --video_save_folder outputs/multi_view --checkpoint_dir checkpoints --is_av_sample --controlnet_specs assets/sample_av_hdmap_multiview_spec.json