Awesome Agentic World Modeling
This repository accompanies the Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond, providing a taxonomy-aligned bibliography of 400+ cited works and 100+ representative systems. Papers are grouped by taxonomy section and listed in reverse chronological order within each subsection to support literature navigation, comparison, and ongoing updates. Released under the MIT License. Check out our poster here.
[!TIP] 👋 Welcome to join the discussion on
or
, share your work in progress, and help us grow the agentic world modeling community together.
[!NOTE] 📚 If you find this resource useful, please cite and
the repo:
@article{chu2026agenticworldmodelingfoundations, title = {Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond}, author = {Meng Chu and Xuan Billy Zhang and Kevin Qinghong Lin and Lingdong Kong and Jize Zhang and Teng Tu and Weijian Ma and Ziqi Huang and Senqiao Yang and Wei Huang and Yeying Jin and Zhefan Rao and Jinhui Ye and Xinyu Lin and Xichen Zhang and Qisheng Hu and Shuai Yang and Leyang Shen and Wei Chow and Yifei Dong and Fengyi Wu and Quanyu Long and Bin Xia and Shaozuo Yu and Mingkang Zhu and Wenhu Zhang and Jiehui Huang and Haokun Gui and Haoxuan Che and Long Chen and Qifeng Chen and Wenxuan Zhang and Wenya Wang and Xiaojuan Qi and Yang Deng and Yanwei Li and Mike Zheng Shou and Zhi-Qi Cheng and See-Kiong Ng and Ziwei Liu and Philip Torr and Jiaya Jia}, year = {2026}, eprint = {2604.22748}, archivePrefix = {arXiv}, primaryClass = {cs.AI}, url = {https://arxiv.org/abs/2604.22748} }
Table of Contents
- Taxonomy Overview
- L1: Predictor
- L2: Simulator
- L3: Evolver
- Benchmarks & Evaluation
- Related Surveys
- Welcome to Contribute
Overview
| Level | Definition | Key Capability | Physical | Digital | Social | Scientific |
|---|---|---|---|---|---|---|
| L1 Predictor | One-step local transition | Prediction accuracy, robustness, identifiability | RSSM, V-JEPA, TD-MPC2 | LLM pred., Othello-WM | ToMnet, BToM | GNN, FNO |
| L2 Simulator | Multi-step rollout respecting governing laws | Long-horizon coherence, intervention sensitivity, constraint consistency | DreamerV3, Sora, Cosmos | WebDreamer, Code2World | Generative Agents, CICERO | GraphCast, NeuralGCM |
| L3 Evolver | Design → Execute → Observe → Reflect with model revision | Active information expansion, autonomous execution, belief revision | AdaptSim, Self-Modeling | AlphaEvolve, FunSearch | Evolving Constitutions, AgentSociety | A-Lab, AI Scientist |
L1: Predictor
Methods learning local one-step operators: state inference, forward dynamics, observation decoding, and inverse dynamics.
Representation Learning
- VJEPA (arXiv, 2026) — Variational JEPA as probabilistic world model.
- V-JEPA 2 (arXiv, 2025) — Scaled V-JEPA; action-conditioned world model from video.
- DINOv2 (TMLR, 2024) — Self-supervised vision features; strong transfer.
- V-JEPA (TMLR, 2024) — Video JEPA; temporal prediction in feature space.
- I-JEPA (CVPR, 2023) — Image Joint-Embedding Predictive Architecture.
- SPR (ICLR, 2021) — Self-Predictive Representations; temporal consistency for data-efficient RL.
- MoCo (CVPR, 2020) — Momentum contrast for unsupervised visual features.
- SimCLR (ICML, 2020) — Simple contrastive learning with strong augmentation.
- CURL (ICML, 2020) — Contrastive unsupervised representations for RL.
- CPC (arXiv, 2018) — Contrastive Predictive Coding; predicts future in latent space.
- β-VAE (ICLR, 2017) — Disentangled representations via increased KL penalty.
- VQ-VAE (NeurIPS, 2017) — Discrete codebook tokenization.
- VAE (ICLR, 2014) — Variational autoencoder; foundational latent variable model.
Model-Based RL
- DreamerV3 (Nature, 2025) — Generalizes across 150+ tasks; unified symlog world model.
- TD-MPC2 (ICLR, 2024) — Temporal difference-aligned dynamics; 317M parameters.
- DreamerV2 (ICLR, 2021) — Discrete latent representations; human-level Atari.
- EfficientZero (NeurIPS, 2021) — MuZero + self-supervised consistency.
- Dreamer (ICLR, 2020) — Latent imagination via RSSM; multi-step backpropagation.
- MuZero (Nature, 2020) — Value-aligned dynamics without reconstruction; masters Go, chess, Atari.
- DeepMDP (ICML, 2019) — Bellman-aligned latent state abstraction.
- MBPO (NeurIPS, 2019) — Model-Based Policy Optimization; short-horizon rollouts + off-policy RL.
- PETS (NeurIPS, 2018) — Probabilistic Ensemble + Trajectory Sampling.
- World Models (NeurIPS, 2018) — VAE + MDN-RNN; influential early architecture.
- E2C (NeurIPS, 2015) — Embed to Control; locally linear latent dynamics from images.
- PILCO (ICML, 2011) — Gaussian process dynamics for data-efficient policy search.
Token & Diffusion-Based
- DIAMOND (NeurIPS, 2024) — U-Net diffusion transition operator.
- Delta-IRIS (ICML, 2024) — Delta-based tokenization for world models.
- IRIS (ICLR, 2023) — VQ-VAE + Transformer autoregressive world model.
- STORM (NeurIPS, 2023) — Stochastic Transformer + VAE world model.
- Latent Diffusion (CVPR, 2022) — Diffusion in latent space; high-quality decoding.
- TransDreamer (arXiv, 2022) — Transformer-XL replacing RSSM.
L2: Simulator
Systems composing operators into multi-step rollouts satisfying governing laws.
Physical World
- HWM (arXiv, 2026) — Hierarchical latent world model + multi-scale planning.
- BridgeV2W (arXiv, 2026) — Action-conditioned embodied video generation.
- Yume (arXiv, 2025) — Video diffusion interactive world generation.
- RoboScape (arXiv, 2025) — Physics-informed robotic video world model.
- PIN-WM (arXiv, 2025) — Differentiable rigid-body physics + 3DGS.
- GAIA-2 (arXiv, 2025) — Latent diffusion multi-view AD generation.
- Aether (arXiv, 2025) — CogVideoX geometry-aware fine-tune.
- Cosmos (arXiv, 2025) — NVIDIA autoregressive + diffusion hybrid.
- LWM (ICLR, 2025) — RingAttention long-context LLM world model.
- DreamerV3 (Nature, 2025) — RSSM + symlog loss for generalist long-horizon rollout.
- DreMa (arXiv, 2024) — Compositional 3DGS digital twins for manipulation.
- Vista (NeurIPS, 2024) — Diffusion driving world model.
- iVideoGPT (NeurIPS, 2024) — Transformer + VQ-VAE interactive prediction.
- DIAMOND (NeurIPS, 2024) — U-Net diffusion as Atari simulator.
- Sora (OpenAI, 2024) — DiT video diffusion world simulator.
- VideoPoet (ICML, 2024) — LLM multimodal video tokenizer.
- Genie (ICML, 2024) — Latent action discovery; generative interactive environment.
- OccWorld (arXiv, 2024) — GPT 3D occupancy prediction for AD.
- Copilot4D (ICLR, 2024) — VQ-VAE + discrete point diffusion.
- DriveDreamer (ECCV, 2024) — Diffusion AD generation.
- Lumiere (SIGGRAPH, 2024) — Space-time U-Net diffusion.
- GAIA-1 (arXiv, 2023) — Transformer video generation for AD.
- DayDreamer (CoRL, 2023) — RSSM on real-world robots.
- Diffuser (ICML, 2022) — Diffusion trajectory planning.
- DreamingV2 (arXiv, 2022) — DreamerV2 + reconstruction-free objective.
- DreamerPro (ICML, 2022) — RSSM + prototypical representations.
- PathDreamer (ICCV, 2021) — Autoregressive visual world model for VLN.
- Plan2Explore (ICML, 2020) — Dreamer + self-supervised exploration.
- MuZero (Nature, 2020) — Value-aligned dynamics with MCTS for long-horizon planning.
Digital World
- Code2World (arXiv, 2026) — VLM code rendering as environment.
- RWML (arXiv, 2026) — LLM + RL sim-to-real alignment.
- gWorld (arXiv, 2026) — VLM code rendering for web simulation.
- WebWorld (arXiv, 2026) — Fine-tuned VLM web simulator.
- MobileDreamer (arXiv, 2026) — LLM GUI sketch prediction.
- Word2World (arXiv, 2025) — LLM text-based world model evaluation.
- NeuralOS (arXiv, 2025) — RNN + pixel diffusion for desktop GUI.
- WebSynthesis (arXiv, 2025) — LLM + MCTS trajectory synthesis.
- GameCraft (arXiv, 2025) — Diffusion game video generation.
- GameFactory (ICCV, 2025) — Action-controlled interactive game video generation.
- WebDreamer (TMLR, 2025) — LLM web state simulation + tree search.
- WMA (ICLR, 2025) — LLM web transition prediction.
- GameGen-X (ICLR, 2025) — interactive open-world game video world model.
- GameNGen (ICLR, 2025) — U-Net diffusion runs DOOM at 20 FPS.
- CodeWM (arXiv, 2024) — LLM + MCTS code world model generation.
- WorldCoder (NeurIPS, 2024) — LLM incremental code synthesis world model.
- GameGAN (CVPR, 2020) — GAN neural game engine from gameplay videos.
Social World
- PolicySim (arXiv, 2026) — LLM platform policy sandbox.
- AIvilization (arXiv, 2026) — Large-scale sandbox economy simulation.
- MASim (arXiv, 2025) — Multilingual agent social simulation.
- SWM-AP (arXiv, 2025) — Social world model for mechanism design.
- OASIS (arXiv, 2024) — 1M-agent social simulation at scale.
- Project Sid (arXiv, 2024) — 1000 LLM agents with emergent civilization.
- Werewolf (arXiv, 2024) — LLM + RL strategic deception.
- Sotopia (ICLR, 2024) — LLM social evaluation framework.
- AvalonBench (NeurIPS, 2023) — LLM deductive social reasoning.
- Generative Agents (UIST, 2023) — LLM reflective memory stream in Smallville.
- CICERO (Science, 2022) — LLM + strategic planning for human-level Diplomacy.
- Social Simulacra (UIST, 2022) — GPT prompt-chain community simulation.
- Deal or No Deal (EMNLP, 2017) — RNN + RL self-play negotiation.
Scientific World
- Lingshu-Cell (arXiv, 2026) — Masked discrete diffusion cellular world model.
- Aurora (arXiv, 2025) — 3D Swin Earth system foundation model.
- GenCast (Nature, 2025) — Spherical ensemble diffusion forecasting.
- NeuralGCM (Nature, 2024) — Hybrid physics-NN general circulation model.
- BAX (npj Computational Materials, 2024) — Bayesian algorithm execution for targeted materials discovery.
- GraphCast (Science, 2023) — GNN autoregressive weather in under 1 minute.
- ClimaX (ICML, 2023) — ViT climate foundation model.
- Pangu-Weather (Nature, 2023) — 3D Earth Transformer weather forecasting.
- FNO (ICLR, 2021) — Fourier Neural Operator; 1000x speedup for PDEs.
- GNS (ICML, 2020) — Graph Network Simulator; learned particle dynamics.
- ChemBO (AISTATS, 2020) — Bayesian optimization for synthesizable small molecules.
- P3BO (ICML, 2020) — Population-based black-box optimization for biological sequence design.
L3: Evolver
Systems closing the design → execute → observe → reflect loop to autonomously revise their models.
Physical World
- Self-Modeling (npj Robotics, 2025) — Robot detects morphology changes and retrains kinematic model.
- AdaptSim (CoRL, 2023) — Sim-parameter adaptation via Bayesian optimization.
Digital World
- AUI (arXiv, 2025) — VLM + adaptive UI grounding.
- AlphaEvolve (DeepMind, 2025) — LLM + evolutionary coding agent for algorithm discovery.
- SWE-agent (arXiv, 2024) — LLM + shell interface with regression gates.
- CodeIt (ICML, 2024) — LLM code generation + self-play fine-tuning.
- FunSearch (Nature, 2024) — LLM + evolutionary search discovers math algorithms.
Social World
- Evolving Constitutions (arXiv, 2026) — LLM constitution revision via genetic programming.
- AgentSociety (arXiv, 2025) — LLM multi-agent simulation with behavioral drift tracking.
Scientific World
- BioLab (bioRxiv, 2025) — Autonomous biological laboratory agent.
- OriGene (bioRxiv, 2025) — Self-evolving virtual disease biologist for therapeutic target discovery.
- Biomni (bioRxiv, 2025) — Foundation model for biological experimentation.
- AI Scientist v2 (arXiv, 2025) — Agentic tree search for workshop-level discovery.
- Co-Scientist (arXiv, 2025) — Multi-agent tournament for biomedical hypothesis generation.
- MOOSE-Chem2 (NeurIPS, 2025) — Hierarchical hypothesis search for chemistry.
- MOOSE-Chem (ICLR, 2025) — Rediscovered chemistry hypotheses from pre-2024 data.
- AI Scientist (arXiv, 2024) — Full-paper generation + peer review loop.
- SDL Lasers (Science, 2024) — Multi-site self-driving lab for organic lasers.
- A-Lab (Nature, 2023) — Autonomous robotic lab; 41 novel compounds in 17 days.
- BacterAI (Nature Microbiology, 2023) — Zero-knowledge iterative amino acid requirement mapping.
- CAMEO (Nature Comms, 2020) — Bayesian active learning at synchrotron beamline.
- Yeast Cycles (PNAS, 2019) — Closed-loop experiment design for yeast metabolism.
- Robot Scientist (Automated Experimentation, 2010) — Robot scientist framework for autonomous scientific discovery.
Benchmarks & Evaluation
Physical
- RoboCasa (arXiv, 2024) — 100+ kitchen task completion.
- CALVIN (arXiv, 2021) — Multi-step language-conditioned manipulation.
- Meta-World (CoRL, 2019) — Success rate over 50 manipulation tasks.
- nuScenes (CVPR, 2019) — 3D detection and tracking; mAP, NDS.
- Atari 100k (arXiv, 2019) — Human-normalized score; 26 games, 100k steps.
Digital
- GameWorld (arXiv, 2026) — Standardized multimodal game-agent evaluation.
- OSWorld (arXiv, 2024) — Desktop OS task success.
- SWE-bench (ICLR, 2024) — Multi-file patch resolved rate.
- WebArena (ICLR, 2024) — 812 web task success rate.
Social
- Sotopia (ICLR, 2024) — 7-dimensional social score.
- Hi-ToM (arXiv, 2023) — Higher-order theory of mind.
- FANToM (arXiv, 2023) — Conversational false-belief accuracy.
Scientific
- DiscoveryBench (NeurIPS, 2024) — Evidence-based hypothesis accuracy.
- Minecraft (MCU) (arXiv, 2023) — Open-world tech-tree completion.
- ScienceWorld (EMNLP, 2022) — 30 elementary science experiments.
Related Surveys
- Yue et al. (arXiv, 2025) — Visual world model roadmap G1-G4.
- Zhang, P-F et al. (arXiv, 2025) — Robotic manipulation world models.
- Li et al. (arXiv, 2025) — Embodied world models (3-axis).
- Kong et al. (arXiv, 2025) — 3D/4D world modeling.
- Wei, Jiaqi et al. (arXiv, 2025) — AI-for-Science autonomous discovery.
- Tu et al. (arXiv, 2025) — AD world models.
- Feng et al. (arXiv, 2025) — AD world models.
- Ding et al. (ACM CSUR, 2025) — Understanding vs predicting world models.
- Kang et al. (arXiv, 2025) — How far is video generation from world model.
- Zhu et al. (arXiv, 2024) — Sora / video world models.
- Moerland et al. (FnT ML, 2023) — Model-based RL.
Welcome to Contribute
We welcome contributions! This project is actively maintained. If you know a paper or benchmark that should be listed, open an issue with the link and target section.
Automatic Paper Agent
Papers and benchmarks share the same submission flow: open an issue containing an awwm-paper block. AI agents can use the repository skill at skills/add-paper/SKILL.md to generate it. For plain arXiv-link submissions, include lines such as Section: L2 and Subsection: Digital; the workflow cannot infer taxonomy placement from the URL alone.
{ "section": "L2", "subsection": "Digital", "title": "Paper title", "paper_url": "https://arxiv.org/abs/2601.00001", "venue": "arXiv", "year": 2026, "summary": "Concise contribution phrase.", "code_url": "https://github.com/org/repo", "homepage_url": "https://project-name.github.io/" }
For a benchmark, set "section": "Benchmark" and choose the regime as the subsection (Physical / Digital / Social / Scientific). Everything else stays the same.
The GitHub Action parses the block, inserts the entry in reverse chronological order under the right section, and opens a PR for maintainer review. code_url (rendered as a live GitHub-stars badge when on github.com) and homepage_url (rendered as a Homepage badge) are optional. Valid section / subsection pairs:
L1— Predictor · subsectionsRepresentation,Model-Based-RL,Token-Diffusion.L2— Simulator · subsectionsPhysical,Digital,Social,Scientific.L3— Evolver · subsectionsPhysical,Digital,Social,Scientific.Benchmark· subsectionsPhysical,Digital,Social,Scientific.
The legacy combined form ("section": "L2-Digital", no subsection) is still accepted but should not be used for new submissions. You can also submit a traditional PR if you prefer.