📖 Please read the HuggingFace Model Card first! The model card contains comprehensive details on model architecture, inputs/outputs, licensing, and tested hardware configurations. This GitHub README focuses on setup, usage, and frequently asked questions.
Prerequisites
- NVIDIA GPU with CUDA support
- CUDA Toolkit 12.x with
nvcc(required to compileflash-attnfrom source). If you don't have it, see Troubleshooting for a fallback using PyTorch's built-in SDPA. - Python 3.12
Hardware requirements
| Configuration | VRAM |
|---|---|
Single-sample inference (num_traj_samples=1) | ~24 GB |
Multi-sample inference (num_traj_samples=16) | ~40 GB |
Multi-sample inference with CFG (num_traj_samples=16) | ~60 GB |
Measured on an NVIDIA H100 80GB GPU.
Getting Started
1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh export PATH="$HOME/.local/bin:$PATH"
2. Set up the environment
uv venv a1_5_venv source a1_5_venv/bin/activate uv sync --active
Note: If
uv syncfails onflash-attn, see Troubleshooting below.
3. Authenticate with HuggingFace
The model and dataset require access to gated resources. Request access here:
Then authenticate:
hf auth login
Get your token at: https://huggingface.co/settings/tokens
Note: The
physical_ai_avpackage (auto-installed via dependencies) streams data from the HuggingFace dataset. You must have accepted the dataset access request above before running inference.
Running Inference
Test script
NOTE: This script will download both some example data (relatively small) and the model weights (22 GB). The latter can be particularly slow depending on network bandwidth. For reference, it takes around 2.5 minutes on a 100 MB/s wired connection.
python src/alpamayo1_5/test_inference.py
In case you would like to obtain more trajectories and reasoning traces, please feel free to increase
the num_traj_samples argument in the script.
Interactive notebooks
We provide notebooks that demonstrate the different capabilities of Alpamayo 1.5 under notebooks/, including standard model inference, incorporating navigation guidance, modifying the number of cameras, and visual question answering.
Inference methods
Alpamayo 1.5 provides two inference methods:
-
sample_trajectories_from_data_with_vlm_rollout-- Full pipeline: the VLM generates chain-of-causation reasoning, then a diffusion expert produces trajectory predictions conditioned on the VLM's hidden states. This is the primary inference method used by the test script and most notebooks. -
generate_text-- Text-only generation for visual question answering (VQA). Returns extracted text fields.
Project Structure
alpamayo_1.5_release/
├── notebooks/
│ ├── inference.ipynb # Standard model inference
│ ├── inference_cam_num.ipynb # Inference with different camera counts
│ ├── inference_nav.ipynb # Inference with navigation guidance
│ └── inference_vqa.ipynb # Visual question answering
├── src/
│ └── alpamayo1_5/
│ ├── action_space/
│ │ └── ... # Action space definitions
│ ├── diffusion/
│ │ └── ... # Diffusion model components
│ ├── geometry/
│ │ └── ... # Geometry utilities and modules
│ ├── models/
│ │ ├── ... # Model components and utils functions
│ ├── __init__.py # Package marker
│ ├── config.py # Model and experiment configuration
│ ├── helper.py # Utility functions
│ ├── load_physical_aiavdataset.py # Dataset loader
│ ├── test_inference.py # Inference test script
├── pyproject.toml # Project dependencies
└── uv.lock # Locked dependency versions
Troubleshooting
Flash Attention issues
The model uses Flash Attention 2 by default. flash-attn requires CUDA Toolkit (specifically nvcc) at build time. If you see build errors during uv sync:
Option A: Install without flash-attn and use SDPA fallback
uv sync --active --no-install-package flash-attn
Then load the model with PyTorch's built-in scaled dot-product attention:
from alpamayo1_5.models.alpamayo1_5 import Alpamayo1_5 model = Alpamayo1_5.from_pretrained( "nvidia/Alpamayo-1.5-10B", dtype=torch.bfloat16, attn_implementation="sdpa", ).to("cuda")
Option B: Install CUDA Toolkit, then retry
Install CUDA Toolkit 12.x (e.g., via your package manager or NVIDIA's install guide), ensure nvcc is on your PATH, then re-run:
uv sync --active
Frequently Asked Questions (FAQ)
How does Alpamayo 1.5 relate to Alpamayo 1?
Alpamayo 1.5 expands upon the architecture released in Alpamayo 1 and fully realizes what is described in our paper "Alpamayo 1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail ". Specifically:
| Feature | Description | Alpamayo 1 | Alpamayo 1.5 |
|---|---|---|---|
| Chain-of-Causation (CoC) reasoning | Hybrid auto-labeling with human in the loop for reasoning traces | ✅ Included | ✅ Included |
| Vision-Language-Action architecture | Cosmos-Reason backbone + action expert | ✅ Included | ✅ Included |
| Trajectory prediction | 6.4s horizon, 64 waypoints at 10 Hz | ✅ Supported | ✅ Supported |
| RL post-training | Reinforcement learning for reasoning/action consistency | ❌ Not RL post-trained | ✅ RL post-trained |
| Navigation conditioning | Explicit navigation inputs | ❌ Not supported | ✅ Supported |
| General VQA | Supports visual question answering | ❌ Not supported | ✅ Supported |
| Flexible multi-camera support | Supports a variable number of input cameras | ❌ Not supported | ✅ Supported |
Does Alpamayo 1.5 accept navigation inputs?
Yes! Please see notebooks/inference_nav.ipynb for examples.
Does Alpamayo 1.5 support general VQA?
Yes! Please see notebooks/inference_vqa.ipynb for examples.
Was Alpamayo 1.5 post-trained with Reinforcement Learning (RL)?
Yes! Alpamayo 1.5 has undergone RL post-training, achieving improvements in reasoning quality and reasoning-trajectory alignment as a result.
Does Alpamayo 1.5 accept different numbers of cameras?
Yes! Please see notebooks/inference_cam_num.ipynb for examples. Note that model accuracy may degrade with fewer cameras, the magnitude of which will depend on the specific scenario. For instance, it is expected that Alpamayo 1.5 would struggle to see cross-traffic in a right turn if only provided a front-facing camera.
What are the minimum GPU requirements?
You need an NVIDIA GPU with at least 24 GB VRAM for inference. Tested configurations include RTX 3090, A100, H100, and B200. Running on GPUs with less memory (e.g., 16 GB) will likely result in CUDA out-of-memory errors. Please refer to our hardware requirements for more information.
Can I use this model in production / commercial applications?
No. The model weights are released under a non-commercial license. This release is intended for research, experimentation, and evaluation purposes only. See the License section and the HuggingFace Model Card for details.
License
Apache License 2.0 - see LICENSE for details.
Disclaimer
Alpamayo 1.5 is a pre-trained reasoning model designed to accelerate research and development in the autonomous vehicle (AV) domain. It is intended to serve as a foundation for a range of AV-related use cases-from instantiating an end-to-end backbone for autonomous driving to enabling reasoning-based auto-labeling tools. In short, it should be viewed as a building block for developing customized AV applications.
Important notes:
- Alpamayo 1.5 is provided solely for research, experimentation, and evaluation purposes.
- Alpamayo 1.5 is not a fully fledged driving stack. Among other limitations, it lacks access to critical real-world sensor inputs, does not incorporate required diverse and redundant safety mechanisms, and has not undergone automotive-grade validation for deployment.
By using this model, you acknowledge that it is a research tool intended to support scientific inquiry, benchmarking, and exploration—not a substitute for a certified AV stack. The developers and contributors disclaim any responsibility or liability for the use of the model or its outputs.
Citation
If you use Alpamayo 1.5 in your research, please cite:
@article{nvidia2025alpamayo, title={{Alpamayo-R1}: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail}, author={NVIDIA and Yan Wang and Wenjie Luo and Junjie Bai and Yulong Cao and Tong Che and Ke Chen and Yuxiao Chen and Jenna Diamond and Yifan Ding and Wenhao Ding and Liang Feng and Greg Heinrich and Jack Huang and Peter Karkus and Boyi Li and Pinyi Li and Tsung-Yi Lin and Dongran Liu and Ming-Yu Liu and Langechuan Liu and Zhijian Liu and Jason Lu and Yunxiang Mao and Pavlo Molchanov and Lindsey Pavao and Zhenghao Peng and Mike Ranzinger and Ed Schmerling and Shida Shen and Yunfei Shi and Sarah Tariq and Ran Tian and Tilman Wekel and Xinshuo Weng and Tianjun Xiao and Eric Yang and Xiaodong Yang and Yurong You and Xiaohui Zeng and Wenyuan Zhang and Boris Ivanovic and Marco Pavone}, year={2025}, journal={arXiv preprint arXiv:2511.00088}, }