Towards the Era of Post-Training for Autonomous Driving

The missing infrastructure for Physical AI post-training in AD. Open-source. Production-validated.

Highlights

WorldEngine is a post-training framework for Physical AI that systematically addresses the long-tail safety-critical data scarcity problem in autonomous driving.
Data-driven long-tail discovery: Failure-prone scenarios are automatically identified from real-world driving logs by the pre-trained agent itself — no manual design, no synthetic perturbations.
Photorealistic interactive simulation via 3D Gaussian Splatting (3DGS): Each discovered scenario is reconstructed into a fully controllable, real-time-renderable simulation environment with independent dynamic agent manipulation.
Behavior-driven scenario generation: Leverages Behavior World Model (BWM) to generalize and synthesize diverse traffic variations from existing long-tail scenarios, expanding sparse safety-critical events into a dense, learnable distribution.
RL-based post-training on synthesized safety-critical rollouts substantially outperforms scaling pre-training data alone — competitive with a ~10× increase in pre-training data.
Production-scale validation: Deployed on a mass-produced ADAS platform trained on 80,000+ hours of real-world driving logs, reducing simulated collision rate by up to 45.5% and achieving zero disengagements in a 200 km on-road test.

News

[2026/04/09] Official dataset released. See OpenDriveLab/WorldEngine or OpenDriveLab/WorldEngine (ModelScope)
[2026/04/10] Official code repository established.

Benchmark

We compare different post-training paradigms on the nuPlan dataset, evaluating on both open-loop and closed-loop metrics across common and rare driving scenarios.

Metric notes: Early stage. Stable ckpts and corresponding results coming soon.

Open-loop PDMS is aligned with NAVSIM v1.1 PDM Score. Common denotes the standard navtest split; Rare denotes the navtest_failures subset — failure-prone rare-case scenarios extracted from navtest.

Closed-loop Success Rate is defined as the fraction of simulated driving episodes completed without collision or off-road failure.

Closed-loop Ego Progress (EP) measures the route progress made by the ego vehicle during SimEngine closed-loop testing, reflecting whether the agent makes meaningful forward progress rather than merely avoiding collision or off-road failure.

Closed-loop PDMS* is the PDM Score obtained via SimEngine closed-loop testing, where the planner interacts with reactive agents in simulation under real-time rendering.

Training notes:

Rare logs are failure-prone scenarios automatically extracted from navtrain by the pre-trained agent itself (see Rare Case Extraction).

Common logs are the standard cases in navtrain.

Method	Open-loop PDMS ↑ (common)	Open-loop PDMS ↑ (rare)	Closed-loop SR ↑ (rare)	Closed-loop EP ↑ (rare)	Closed-loop PDMS* ↑ (rare)
Base model	85.64	47.14	73.66	46.71	60.98
Supervised fine-tuning on rare logs	87.50	52.55	74.51	47.59	61.87
Post-training on common logs	87.69	49.36	69.63	51.02	60.21
Post-training on rare logs	88.51	59.20	73.35	51.86	62.78
Post-training on rare synthetic replays	82.61	62.69	87.20	32.49	63.22
Post-training on rare rollouts w/o Behaviour WM	88.53	61.88	77.96	56.74	67.33
Post-training with WorldEngine	88.95	59.83	88.89	47.66	70.12

Key findings:

Post-training on rare logs substantially improves rare open-loop PDMS over supervised fine-tuning (59.20 vs. 52.55), but does not improve rare closed-loop SR, indicating that fixed rare logs alone are insufficient for robust interactive behaviour.
Post-training on common logs provides limited long-tail benefit and degrades rare closed-loop performance, reducing SR from 73.66% to 69.63% and PDMS$^\ast$ from 60.98 to 60.21, confirming the importance of long-tail event discovery.
The full WorldEngine pipeline achieves the best overall rare closed-loop performance, with the highest SR (88.89%) and PDMS$^\ast$ (70.12). It improves rare closed-loop SR by +15.23 percentage points and PDMS$^\ast$ by +9.14 over the base model, while maintaining strong common open-loop performance.

Qualitative Results — Closed-Loop Simulation on nuPlan

Each pair shows the Base model vs WorldEngine post-trained model on the same rare-case scenario. Left: front-camera rendering; Right: BEV trajectory visualization.

On-Road Deployment — Night Urban Driving

Zero disengagements in 200 km on-road testing on a mass-produced ADAS platform.

System Architecture

WorldEngine consists of two tightly coupled subsystems:

Module	Function	Core Technology
SimEngine	Closed-loop simulation with ego & agents	Hydra, Ray, rendering
AlgEngine	End-to-end model training & evaluation	MMDetection3D, UniAD/VADv2/HydraMDP

Roadmap

Core platform integration (SimEngine + AlgEngine)
Multi-GPU distributed simulation and training
Rare case extraction and fine-tuning pipeline
Comprehensive documentation and usage guides
Hugging Face / ModelScope dataset
Open-source release (code, data, early pre-trained models)
arXiv preprint
Behavior World Model integration
Stable pre-trained models

Getting Started

Documentation Overview

WorldEngine provides comprehensive guides for each stage of your workflow:

Guide	Purpose	Key Topics
Installation	Set up both conda environments	Two-environment setup (simengine + algengine), dependencies, troubleshooting
Data Organization	Prepare datasets and checkpoints	Data structure, Hugging Face/ModelScope downloads, symlinks
Quick Start	Run your first experiment in 5 min	Quick test tutorial, understanding results, complete pipeline
SimEngine Usage	Master closed-loop simulation	Rollout scripts, distributed testing, configuration, metrics
AlgEngine Usage	Train and fine-tune models	Training from scratch, evaluation, rare case extraction, RL fine-tuning

Installation

WorldEngine requires two separate conda environments due to different Python requirements.

Full installation guide: docs/installation.md

Quick Test

Verify your installation with a pre-trained model:

# Set up environment variable
export WORLDENGINE_ROOT=$(pwd)

# Option 1: Single GPU test 
bash scripts/closed_loop_test.sh

# Option 2: Multi-GPU test (Default 8 GPUs)
bash scripts/multigpu_closed_loop_test.sh

What this does:

Loads a pre-trained VADv2 model (50% training data, epoch 8)
Runs closed-loop simulation on 288 rare-case test scenarios
Evaluates with navsim v1 PDMS (collision avoidance, progress, comfort, etc.)
Saves results to experiments/closed_loop_exps/e2e_vadv2_50pct/navtest_failures_NR/

Detailed quick start tutorial: docs/quick_start.md

Deep Dive by Module

After the quick test, explore each subsystem in detail:

SimEngine - Photorealistic Closed-Loop Simulation

Learn how to run simulations, generate rollouts, and test models:

Rollout scripts for data generation (no model required)
Testing scripts for model evaluation (single/multi-GPU)
Ray distributed simulation for large-scale testing
Reactive vs non-reactive agent modes
Configuration guide for all Hydra parameters

SimEngine Usage Guide

AlgEngine - End-to-End Model Training & Fine-Tuning

Learn how to train models, extract rare cases, and fine-tune:

Training from scratch
Open-loop evaluation on test sets
Rare case extraction from evaluation failures
RL-based fine-tuning on long-tail scenarios
Multi-GPU training with distributed data parallel

AlgEngine Usage Guide

Scene Reconstruction - 3D Gaussian Splatting-based method, MTGS

WorldEngine's simulation environments are powered by 3D Gaussian Splatting (MTGS):

Multi-traversal reconstruction from nuPlan data
Photorealistic rendering for closed-loop simulation
Asset generation for SimEngine scenes

MTGS Repository

Citation

If any parts of our work help your research, please consider citing us and giving a star to our repository:

If you use the Render Assets (MTGS), please also cite:

@article{li2025mtgs,
  title={MTGS: Multi-Traversal Gaussian Splatting},
  author={Li, Tianyu and Qiu, Yihang and Wu, Zhenhua and Lindstr{\"o}m, Carl and Su, Peng and Nie{\ss}ner, Matthias and Li, Hongyang},
  journal={arXiv preprint arXiv:2503.12552},
  year={2025}
}

If you use the augmented scenarios data, please cite as well:

@inproceedings{zhou2025nexus,
  title={Decoupled Diffusion Sparks Adaptive Scene Generation},
  author={Zhou, Yunsong and Ye, Naisheng and Ljungbergh, William and Li, Tianyu and Yang, Jiazhi and Yang, Zetong and Zhu, Hongzi and Petersson, Christoffer and Li, Hongyang},
  booktitle={ICCV},
  year={2025}
}

@article{li2025optimization,
  title={Optimization-Guided Diffusion for Interactive Scene Generation},
  author={Li, Shihao and Ye, Naisheng and Li, Tianyu and Chitta, Kashyap and An, Tuo and Su, Peng and Wang, Boyang and Liu, Haiou and Lv, Chen and Li, Hongyang},
  journal={arXiv preprint arXiv:2512.07661},
  year={2025}
}

If you find AlgEngine well, please cite as well:

@ARTICLE{11353028,
  author={Liu, Haochen and Li, Tianyu and Yang, Haohan and Chen, Li and Wang, Caojun and Guo, Ke and Tian, Haochen and Li, Hongchen and Li, Hongyang and Lv, Chen},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Reinforced Refinement With Self-Aware Expansion for End-to-End Autonomous Driving}, 
  year={2026},
  volume={48},
  number={5},
  pages={5774-5792},
  keywords={Adaptation models;Self-aware;Autonomous vehicles;Pipelines;Planning;Training;Reinforcement learning;Uncertainty;Data models;Safety;End-to-end autonomous driving;reinforced finetuning;imitation learning;motion planning},
  doi={10.1109/TPAMI.2026.3653866}}

If you find data scaling infos helpful, please also cite:

@article{tian2025simscale,
        title={SimScale: Learning to Drive via Real-World Simulation at Scale},
        author={Haochen Tian and Tianyu Li and Haochen Liu and Jiazhi Yang and Yihang Qiu and Guang Li and Junli Wang and Yinfeng Gao and Zhang Zhang and Liang Wang and Hangjun Ye and Tieniu Tan and Long Chen and Hongyang Li},
        journal={arXiv preprint arXiv:2511.23369},
        year={2025}
      }

Contributing

We welcome contributions from the community! Whether you want to:

Report bugs - Open an Issue
Improve documentation - Submit a Pull Request
Contribute code - Fork, develop, and submit a PR

Please read our contributing guidelines before submitting PRs.

For questions:

Check the documentation first
Search existing Issues

License

All content in this repository is under the Apache-2.0 license.

The released data is based on nuPlan and is under the CC-BY-NC-SA 4.0 license.

Related Resources

We acknowledge all the open-source contributors for the following projects to make this work possible:

Project	Description
	Multi-traversal Gaussian Splatting for scene reconstruction
	Large scale driving simulation
	Collaboration-friendly NeRF toolkit
	3D detection framework
	End-to-end autonomous driving framework
	Vectorized autonomous driving framework
	Non-reactive autonomous vehicle simulation benchmark
	Large-scale autonomous driving dataset
	Compositional driving simulation platform
	Distributed execution framework
	Configuration management framework