AMP_mjlab

Deployment integration code is in ccrpRepo/wbc_fsm, under MJAmp State.

G1 AMP motion control project built on top of mjlab + rsl_rl.

Key features of this repository:

A single policy learns both locomotion (walk/run) and recovery (fall-and-get-up)
AMP discriminator regularizes motion style and priors
Training and deployment pipelines are consistent, with direct ONNX policy export support

Core Idea

Instead of training separate policies for locomotion and recovery and switching between them, this project learns both capabilities in one unified policy.

Implementation highlights:

Motion data split:
- Walk/Run data: src/assets/motions/g1/amp/WalkandRun
- Recovery data: src/assets/motions/g1/amp/Recovery
Delayed termination/reset:
- A subset of environments does not reset immediately after termination
- These environments receive a recovery window and reset states sampled from recovery clips
Unified AMP training:
- One actor-critic + One AMP discriminator
- Velocity tracking, perturbation robustness, and recovery are learned together

This reduces discontinuities caused by policy switching and yields more consistent behavior.

Requirements

Linux
Python 3.11 (recommended)
Working MuJoCo and GPU driver setup

Quick Start

1. Install

conda activate mjlab
cd AMP_mjlab
python -m pip install -e .

2. Apply mjlab Patch (Optional)

If you do not apply this patch, remove history_ordering configuration from the code.

What this patch does:

It adds an option for how observation history is flattened: by time (time) or by term (term).
Default mjlab behavior supports only term ordering.

Patch file:

mjlab_patch/mjlab/managers/observation_manager.py

Example command:

cp mjlab_patch/mjlab/managers/observation_manager.py \
  /home/crp/miniconda3/envs/mjlab/lib/python3.11/site-packages/mjlab/managers/observation_manager.py

3. List Available Tasks

python scripts/list_envs.py --keyword AMP

Main tasks:

Unitree-G1-AMP-Rough
Unitree-G1-AMP-Flat

Training

python scripts/train.py Unitree-G1-AMP-Flat --env.scene.num-envs=4096

Logs are saved by default to:

logs/rsl_rl/g1_amp_locomotion/<time_stamp_run>/

Training Curve Note (Important)

Around 2w iterations (about 20k), the policy often suddenly learns fall-recovery behavior.
As a result, multiple metrics in logs may show abrupt jumps. This is expected and not necessarily a training failure.

Training log transition example

Evaluation and Visualization

Replay with a trained checkpoint:

python scripts/play.py Unitree-G1-AMP-Rough \
  --checkpoint-file logs/rsl_rl/g1_amp_locomotion/<run_dir>/model_<iter>.pt

Note: ONNX export is enabled by default in both training and play workflows.

Motion Data Preparation

CSV-to-NPZ conversion script:

python scripts/csv_to_npz.py --help

Recommended data layout:

Raw CSV: motion_data_csv/amp
Converted NPZ: src/assets/motions/g1/amp/WalkandRun and src/assets/motions/g1/amp/Recovery

If valid NPZ files exist in these folders, training config loads them automatically.

Repository Structure

src/tasks/amp_loco: AMP locomotion/recovery task implementation
src/tasks/amp_loco/config/g1: G1 task registration, env configs, RL configs
src/tasks/amp_loco/mdp: rewards, observations, events, termination logic
scripts/train.py: training entry point
scripts/play.py: playback entry point
scripts/csv_to_npz.py: motion data conversion tool
mjlab_patch: required local patch for mjlab

Highlights

One policy unifies walk/run and recovery skills
AMP + velocity objective jointly optimize style and task performance
Delayed reset with recovery sampling explicitly improves recovery ability
End-to-end pipeline supports ONNX export for deployment

Acknowledgements

Thanks to unitreerobotics/unitree_rl_mjlab for open-sourcing their work and inspiration.
Thanks to Open-X-Humanoid/TienKung-Lab; the rsl_rl AMP part in this project references their implementation.