Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

GigaWorld-Policy

GigaWorld-Policy: An Efficient Action-Centered World-Action Model

arXiv Website

📖 Overview

World-Action Models (WAM) initialized from pre-trained video generation backbones have demonstrated remarkable potential for robot policy learning. GigaWorld-Policy is an action-centered WAM that learns 2D pixel-action dynamics while enabling efficient action decoding, with optional video generation.

Key Features

  • 🚀 9x faster inference compared to Motus (leading WAM baseline)
  • 📈 7% higher task success rate than Motus
  • 💪 95% improvement over pi-0.5 on RoboTwin 2.0

🛠️ Installation

1. Create Conda Environment

conda create -n gigaworld-policy python==3.11 conda activate gigaworld-policy

2. Install Dependencies

pip install ./third_party/giga-train pip install ./third_party/giga-models pip install ./third_party/giga-datasets

📊 Data Preprocessing

Before training, you need to compute the normalization statistics and pre-compute the T5 text embeddings for your dataset.

1. Compute Normalization Statistics

This will generate a norm_stats_delta.json file which is required by the policy.

python -m scripts.compute_norm_stats \ --data_paths "/path/to/dataset_dir" \ --output_path "/path/to/norm_stats_delta.json" \ --embodiment_id {embodiment-id} \ --delta-mask {delta-mask} \ --sample-rate 1.0 \ --action-chunk 48 \
  • --embodiment_id: Check compute_norm_stats.py for the mapping from robot type to ID.
  • --delta_mask: A boolean mask indicating which action dimensions are deltas (True) vs. absolute values (False).

2. Compute T5 Embeddings

This will pre-compute and save the T5 text embeddings for the language instructions in your dataset.

python -m scripts.compute_t5_embedding \ --repo_id "/path/to/dataset_dir" \ --root "/path/to/dataset_dir" \ --wan_path "/path/to/Wan2.2-TI2V-5B" \ --device "cuda" \ --text_len 512 \ --t5_folder_name "t5_embedding"

⚙️ Configuration

After completing the data preprocessing steps, modify the config file world_action_model/configs/example.py to point to the generated files and your model weights:

ParameterDescription
models.pretrainedPath to your pretrained model weights
transform.norm_pathPath to the generated norm_stats_delta.json
data_dirPath to your dataset

🚀 Training

Once the data is preprocessed and the configuration is set, you can start training:

python -m scripts.train --config world_action_model.configs.example.config

🚀 Inference

We provide an inference server and a simple open-loop evaluation client. Open-loop here means we sample observations (images/state) from an offline dataset and run inference, without executing actions in a real environment to collect the next observations.

1. Start Server

python -m scripts.inference_server \ --model_id "/path/to/huggingface_model_dir_or_id" \ --transformer_path "/path/to/transformer_checkpoint_dir" \ --stats_path "/path/to/norm_stats_delta.json" \ --t5_embedding_pkl "/path/to/t5_embedding.pt"

Optionally, add --return_images to enable video visualization during inference (videos will be saved under --vis_dir):

python -m scripts.inference_server \ --model_id "/path/to/huggingface_model_dir_or_id" \ --transformer_path "/path/to/transformer_checkpoint_dir" \ --stats_path "/path/to/norm_stats_delta.json" \ --t5_embedding_pkl "/path/to/t5_embedding.pt" \ --return_images \ --vis_dir "./vis"

2. Run Open-loop Client

python -m scripts.inference_client \ --dataset_paths "/path/to/dataset_dir" \ --save_dir "./vis"

📅 Roadmap

ComponentStatus
Inference Code
Training Code
Pre-trained Weights🔲

📚 Citation

@article{ye2026gigaworld, title={GigaWorld-Policy: An Efficient Action-Centered World-Action Model}, author={Ye, Angen and Wang, Boyuan and Ni, Chaojun and Huang, Guan and Zhao, Guosheng and Li, Hao and Li, Hengtao and Li, Jie and Lv, Jindi and Liu, Jingyu and Cao, Min and Li, Peng and Deng, Qiuping and Mei, Wenjun and Wang, Xiaofeng and Chen, Xinze and Zhou, Xinyu and Wang, Yang and Chang, Yifan and Li, Yifan and Zhou, Yukun and Ye, Yun and Liu, Zhichao and Zhu, Zheng}, journal={arXiv preprint arXiv:2603.17240}, year={2026} }

关于 About

GigaWorld-Policy: An Efficient Action-Centered World–Action Model

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
3
Total Commits
峰值: 3次/周
Less
More

核心贡献者 Contributors