Unitree RL Mjlab

✳️ Overview

Unitree RL Mjlab is a reinforcement learning project built upon the mjlab, using MuJoCo as its physics simulation backend, currently supporting Unitree Go2, A2, As2, G1, R1, H1_2 and H2.

Mjlab combines Isaac Lab's proven API with best-in-class MuJoCo physics to provide lightweight, modular abstractions for RL robotics research and sim-to-real deployment.

MuJoCo	Physical

📦 Installation and Configuration

Please refer to setup.md for installation and configuration steps.

🔁 Process Overview

The basic workflow for using reinforcement learning to achieve motion control is:

Train → Play → Sim2Real

Train: The agent interacts with the MuJoCo simulation and optimizes policies through reward maximization.
Play: Replay trained policies to verify expected behavior.
Sim2Real: Deploy trained policies to physical Unitree robots for real-world execution.

🛠️ Usage Guide

1. Velocity Tracking Training

Run the following command to train a velocity tracking policy:

python scripts/train.py Unitree-G1-Flat --env.scene.num-envs=4096

Multi-GPU Training: Scale to multiple GPUs using --gpu-ids:

python scripts/train.py Unitree-G1-Flat \
  --gpu-ids 0 1 \
  --env.scene.num-envs=4096

The first argument (e.g., Mjlab-Velocity-Flat-Unitree-G1) specifies the training task. Available velocity tracking tasks:
- Unitree-Go2-Flat
- Unitree-G1-Flat
- Unitree-G1-23Dof-Flat
- Unitree-H1_2-Flat
- Unitree-A2-Flat
- Unitree-R1-Flat

[!NOTE] For more details, refer to the mjlab documentation: mjlab documentation.

2. Motion Imitation Training

Train a Unitree G1 to mimic reference motion sequences.

2.1 Prepare Motion Files

Prepare csv motion files in mjlab/motions/g1/ and convert them to npz format:

python scripts/csv_to_npz.py \
--input-file src/assets/motions/g1/dance1_subject2.csv \
--output-name dance1_subject2.npz \
--input-fps 30 \
--output-fps 50 \
--robot g1 # g1 or g1_23dof

npz files will be stored at:：src/motions/g1/...

2.2 Training

After generating the NPZ file, launch imitation training:

python scripts/train.py Unitree-G1-Tracking-No-State-Estimation --motion_file=src/assets/motions/g1/dance1_subject2.npz --env.scene.num-envs=4096

Available tasks:

Unitree-G1-Tracking-No-State-Estimation
Unitree-G1-23Dof-Tracking-No-State-Estimation

[!NOTE] For detailed motion imitation instructions, refer to the BeyondMimic documentation: BeyondMimic documentation.

⚙️ Parameter Description

--env.scene: simulation scene configuration (e.g., num_envs, dt, ground type, gravity, disturbances)
--env.observations: observation space configuration (e.g., joint state, IMU, commands, etc.)
--env.rewards: reward terms used for policy optimization
--env.commands: task commands (e.g., velocity, pose, or motion targets)
--env.terminations: termination conditions for each episode
--agent.seed: random seed for reproducibility
--agent.resume: resume from the last saved checkpoint when enabled
--agent.policy: policy network architecture configuration
--agent.algorithm: reinforcement learning algorithm configuration (PPO, hyperparameters, etc.)

Training results are stored at：logs/rsl_rl/<robot>_(velocity | tracking)/<date_time>/model_<iteration>.pt

3. Simulation Validation

To visualize policy behavior in MuJoCo:

Velocity tracking:

python scripts/play.py Unitree-G1-Flat --checkpoint_file=logs/rsl_rl/g1_velocity/2026-xx-xx_xx-xx-xx/model_xx.pt

Motion imitation:

python scripts/play.py Unitree-G1-Tracking-No-State-Estimation --motion_file=src/assets/motions/g1/dance1_subject2.npz --checkpoint_file=logs/rsl_rl/g1_tracking/2026-xx-xx_xx-xx-xx/model_xx.pt

Note：

During training, policy.onnx and policy.onnx.data are also exported for deployment onto physical robots.

Visualization：

Go2	G1	H1_2	G1_mimic

4. Real Deployment

Before deployment, install the required communication tools:

4.1 Power On the Robot

Start the robot in suspended state and wait until it enters zero-torque mode.

4.2 Enable Debug Mode

While in zero-torque mode, press L2 + R2 on the controller. The robot will enter debug mode with joint damping enabled.

4.3 Connect to the Robot

Connect your PC to the robot via Ethernet. Configure the network as:

Address：192.168.123.222
Netmask：255.255.255.0

Use ifconfig to determine the Ethernet device name for deployment.

4.4 Compilation

Example: Unitree G1 velocity control. Place policy.onnx and policy.onnx.data into: deploy/robots/g1/config/policy/velocity/v0/exported. Then compile:

cd deploy/robots/g1
mkdir build && cd build
cmake .. && make

4.5 Deployment

4.5.1 Simulation Deployment

Before deploying on the real robot, it is recommended to perform simulation deployment using unitree_mujoco to prevent abnormal behaviors on the physical robot. This framework has already integrated it.

Build unitree_mujoco：

cd simulate
mkdir build && cd build
cmake .. && make -j8

Launch the simulator (note that a gamepad must be connected):

./simulate/build/unitree_mujoco

You can select the corresponding robot in simulate/config

Launch the simulation control program:

cd deploy/robots/g1/build
./g1_ctrl --network=lo

4.5.2 Real-Robot Deployment

Launch the control program on the real robot:

cd deploy/robots/g1/build
./g1_ctrl --network=enp5s0

Arguments：

network: The network interface used to connect to the robot. Use lo for simulation deployment, and enp5s0 for the real robot(You can check it using the ifconfig command)

Deployment Results：

Go2	G1	H1_2	G1_mimic

🎉 Acknowledgements

This project would not be possible without the contributions of the following repositories:

mjlab: training and execution framework
whole_body_tracking: versatile humanoid motion tracking framework
rsl_rl: reinforcement learning algorithm implementation
mujoco_warp: GPU-accelerated rendering and simulation interface
mujoco: high-fidelity rigid-body physics engine