Awesome Agentic World Modeling

This repository accompanies the Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond, providing a taxonomy-aligned bibliography of 400+ cited works and 100+ representative systems. Papers are grouped by taxonomy section and listed in reverse chronological order within each subsection to support literature navigation, comparison, and ongoing updates. Released under the MIT License. Check out our poster here.

[!TIP] 👋 Welcome to join the discussion on or , share your work in progress, and help us grow the agentic world modeling community together.

[!NOTE] 📚 If you find this resource useful, please cite and the repo:

@article{chu2026agenticworldmodelingfoundations,
  title         = {Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond},
  author        = {Meng Chu and Xuan Billy Zhang and Kevin Qinghong Lin and Lingdong Kong and Jize Zhang and Teng Tu and Weijian Ma and Ziqi Huang and Senqiao Yang and Wei Huang and Yeying Jin and Zhefan Rao and Jinhui Ye and Xinyu Lin and Xichen Zhang and Qisheng Hu and Shuai Yang and Leyang Shen and Wei Chow and Yifei Dong and Fengyi Wu and Quanyu Long and Bin Xia and Shaozuo Yu and Mingkang Zhu and Wenhu Zhang and Jiehui Huang and Haokun Gui and Haoxuan Che and Long Chen and Qifeng Chen and Wenxuan Zhang and Wenya Wang and Xiaojuan Qi and Yang Deng and Yanwei Li and Mike Zheng Shou and Zhi-Qi Cheng and See-Kiong Ng and Ziwei Liu and Philip Torr and Jiaya Jia},
  year          = {2026},
  eprint        = {2604.22748},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  url           = {https://arxiv.org/abs/2604.22748}
}

Overview

Level	Definition	Key Capability	Physical	Digital	Social	Scientific
L1 Predictor	One-step local transition	Prediction accuracy, robustness, identifiability	RSSM, V-JEPA, TD-MPC2	LLM pred., Othello-WM	ToMnet, BToM	GNN, FNO
L2 Simulator	Multi-step rollout respecting governing laws	Long-horizon coherence, intervention sensitivity, constraint consistency	DreamerV3, Sora, Cosmos	WebDreamer, Code2World	Generative Agents, CICERO	GraphCast, NeuralGCM
L3 Evolver	Design → Execute → Observe → Reflect with model revision	Active information expansion, autonomous execution, belief revision	AdaptSim, Self-Modeling	AlphaEvolve, FunSearch	Evolving Constitutions, AgentSociety	A-Lab, AI Scientist

L1: Predictor

Methods learning local one-step operators: state inference, forward dynamics, observation decoding, and inverse dynamics.

Representation Learning

VJEPA (arXiv, 2026) — Variational JEPA as probabilistic world model.
V-JEPA 2 (arXiv, 2025) — Scaled V-JEPA; action-conditioned world model from video.
DINOv2 (TMLR, 2024) — Self-supervised vision features; strong transfer.
V-JEPA (TMLR, 2024) — Video JEPA; temporal prediction in feature space.
I-JEPA (CVPR, 2023) — Image Joint-Embedding Predictive Architecture.
SPR (ICLR, 2021) — Self-Predictive Representations; temporal consistency for data-efficient RL.
MoCo (CVPR, 2020) — Momentum contrast for unsupervised visual features.
SimCLR (ICML, 2020) — Simple contrastive learning with strong augmentation.
CURL (ICML, 2020) — Contrastive unsupervised representations for RL.
CPC (arXiv, 2018) — Contrastive Predictive Coding; predicts future in latent space.
β-VAE (ICLR, 2017) — Disentangled representations via increased KL penalty.
VQ-VAE (NeurIPS, 2017) — Discrete codebook tokenization.
VAE (ICLR, 2014) — Variational autoencoder; foundational latent variable model.

Model-Based RL

DreamerV3 (Nature, 2025) — Generalizes across 150+ tasks; unified symlog world model.
TD-MPC2 (ICLR, 2024) — Temporal difference-aligned dynamics; 317M parameters.
DreamerV2 (ICLR, 2021) — Discrete latent representations; human-level Atari.
EfficientZero (NeurIPS, 2021) — MuZero + self-supervised consistency.
Dreamer (ICLR, 2020) — Latent imagination via RSSM; multi-step backpropagation.
MuZero (Nature, 2020) — Value-aligned dynamics without reconstruction; masters Go, chess, Atari.
DeepMDP (ICML, 2019) — Bellman-aligned latent state abstraction.
MBPO (NeurIPS, 2019) — Model-Based Policy Optimization; short-horizon rollouts + off-policy RL.
PETS (NeurIPS, 2018) — Probabilistic Ensemble + Trajectory Sampling.
World Models (NeurIPS, 2018) — VAE + MDN-RNN; influential early architecture.
E2C (NeurIPS, 2015) — Embed to Control; locally linear latent dynamics from images.
PILCO (ICML, 2011) — Gaussian process dynamics for data-efficient policy search.

Token & Diffusion-Based

DIAMOND (NeurIPS, 2024) — U-Net diffusion transition operator.
Delta-IRIS (ICML, 2024) — Delta-based tokenization for world models.
IRIS (ICLR, 2023) — VQ-VAE + Transformer autoregressive world model.
STORM (NeurIPS, 2023) — Stochastic Transformer + VAE world model.
Latent Diffusion (CVPR, 2022) — Diffusion in latent space; high-quality decoding.
TransDreamer (arXiv, 2022) — Transformer-XL replacing RSSM.

L2: Simulator

Systems composing operators into multi-step rollouts satisfying governing laws.

Physical World

HWM (arXiv, 2026) — Hierarchical latent world model + multi-scale planning.
BridgeV2W (arXiv, 2026) — Action-conditioned embodied video generation.
Yume (arXiv, 2025) — Video diffusion interactive world generation.
RoboScape (arXiv, 2025) — Physics-informed robotic video world model.
PIN-WM (arXiv, 2025) — Differentiable rigid-body physics + 3DGS.
GAIA-2 (arXiv, 2025) — Latent diffusion multi-view AD generation.
Aether (arXiv, 2025) — CogVideoX geometry-aware fine-tune.
Cosmos (arXiv, 2025) — NVIDIA autoregressive + diffusion hybrid.
LWM (ICLR, 2025) — RingAttention long-context LLM world model.
DreamerV3 (Nature, 2025) — RSSM + symlog loss for generalist long-horizon rollout.
DreMa (arXiv, 2024) — Compositional 3DGS digital twins for manipulation.
Vista (NeurIPS, 2024) — Diffusion driving world model.
iVideoGPT (NeurIPS, 2024) — Transformer + VQ-VAE interactive prediction.
DIAMOND (NeurIPS, 2024) — U-Net diffusion as Atari simulator.
Sora (OpenAI, 2024) — DiT video diffusion world simulator.
VideoPoet (ICML, 2024) — LLM multimodal video tokenizer.
Genie (ICML, 2024) — Latent action discovery; generative interactive environment.
OccWorld (arXiv, 2024) — GPT 3D occupancy prediction for AD.
Copilot4D (ICLR, 2024) — VQ-VAE + discrete point diffusion.
DriveDreamer (ECCV, 2024) — Diffusion AD generation.
Lumiere (SIGGRAPH, 2024) — Space-time U-Net diffusion.
GAIA-1 (arXiv, 2023) — Transformer video generation for AD.
DayDreamer (CoRL, 2023) — RSSM on real-world robots.
Diffuser (ICML, 2022) — Diffusion trajectory planning.
DreamingV2 (arXiv, 2022) — DreamerV2 + reconstruction-free objective.
DreamerPro (ICML, 2022) — RSSM + prototypical representations.
PathDreamer (ICCV, 2021) — Autoregressive visual world model for VLN.
Plan2Explore (ICML, 2020) — Dreamer + self-supervised exploration.
MuZero (Nature, 2020) — Value-aligned dynamics with MCTS for long-horizon planning.

Digital World

Code2World (arXiv, 2026) — VLM code rendering as environment.
RWML (arXiv, 2026) — LLM + RL sim-to-real alignment.
gWorld (arXiv, 2026) — VLM code rendering for web simulation.
WebWorld (arXiv, 2026) — Fine-tuned VLM web simulator.
MobileDreamer (arXiv, 2026) — LLM GUI sketch prediction.
Word2World (arXiv, 2025) — LLM text-based world model evaluation.
NeuralOS (arXiv, 2025) — RNN + pixel diffusion for desktop GUI.
WebSynthesis (arXiv, 2025) — LLM + MCTS trajectory synthesis.
GameCraft (arXiv, 2025) — Diffusion game video generation.
GameFactory (ICCV, 2025) — Action-controlled interactive game video generation.
WebDreamer (TMLR, 2025) — LLM web state simulation + tree search.
WMA (ICLR, 2025) — LLM web transition prediction.
GameGen-X (ICLR, 2025) — interactive open-world game video world model.
GameNGen (ICLR, 2025) — U-Net diffusion runs DOOM at 20 FPS.
CodeWM (arXiv, 2024) — LLM + MCTS code world model generation.
WorldCoder (NeurIPS, 2024) — LLM incremental code synthesis world model.
GameGAN (CVPR, 2020) — GAN neural game engine from gameplay videos.

Social World

PolicySim (arXiv, 2026) — LLM platform policy sandbox.
AIvilization (arXiv, 2026) — Large-scale sandbox economy simulation.
MASim (arXiv, 2025) — Multilingual agent social simulation.
SWM-AP (arXiv, 2025) — Social world model for mechanism design.
OASIS (arXiv, 2024) — 1M-agent social simulation at scale.
Project Sid (arXiv, 2024) — 1000 LLM agents with emergent civilization.
Werewolf (arXiv, 2024) — LLM + RL strategic deception.
Sotopia (ICLR, 2024) — LLM social evaluation framework.
AvalonBench (NeurIPS, 2023) — LLM deductive social reasoning.
Generative Agents (UIST, 2023) — LLM reflective memory stream in Smallville.
CICERO (Science, 2022) — LLM + strategic planning for human-level Diplomacy.
Social Simulacra (UIST, 2022) — GPT prompt-chain community simulation.
Deal or No Deal (EMNLP, 2017) — RNN + RL self-play negotiation.

Scientific World

Lingshu-Cell (arXiv, 2026) — Masked discrete diffusion cellular world model.
Aurora (arXiv, 2025) — 3D Swin Earth system foundation model.
GenCast (Nature, 2025) — Spherical ensemble diffusion forecasting.
NeuralGCM (Nature, 2024) — Hybrid physics-NN general circulation model.
BAX (npj Computational Materials, 2024) — Bayesian algorithm execution for targeted materials discovery.
GraphCast (Science, 2023) — GNN autoregressive weather in under 1 minute.
ClimaX (ICML, 2023) — ViT climate foundation model.
Pangu-Weather (Nature, 2023) — 3D Earth Transformer weather forecasting.
FNO (ICLR, 2021) — Fourier Neural Operator; 1000x speedup for PDEs.
GNS (ICML, 2020) — Graph Network Simulator; learned particle dynamics.
ChemBO (AISTATS, 2020) — Bayesian optimization for synthesizable small molecules.
P3BO (ICML, 2020) — Population-based black-box optimization for biological sequence design.