Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control (NMR)

News

2026.03.24: Release NMR paper and website.
2026.03.26: Release HuggingFace live demo.
2026.04: Release deployable inference code and checkpoint.

TODOs

2026.03.24: Release NMR paper and website.
2026.03.26: Release HuggingFace live demo: https://huggingface.co/spaces/RayZhao/NMR
Release deployable inference code.
Release CEPR dataset (SMPL and robot).
Release training code.

Quick Start

1. Install Dependencies

We recommend using conda:

conda create -n nmr python=3.10
conda activate nmr

Install PyTorch (adjust CUDA version as needed):

pip install torch --index-url https://download.pytorch.org/whl/cu118

Install remaining dependencies:

pip install -r requirements.txt

2. Run the Gradio Web Demo

python app.py

On first run, the model checkpoint (~518 MB) and SMPL-X body model (~104 MB) will be automatically downloaded from HuggingFace Hub. Subsequent runs load from cache.

Upload any AMASS .npz file (or use the provided examples in examples/) to get:

Interactive 3D skeleton animation
Downloadable bmimic .npz result file

3. Command-line Inference

python inference.py --src examples/sample_motion.npz --output-dir output/

Batch processing (a directory of NPZ/PKL files):

python inference.py --src /path/to/motions/ --output-dir output/

Disable low-pass filter (raw network output):

python inference.py --src examples/sample_motion.npz --output-dir output/ --no-filter

Input formats

Format	Fields	Coordinate
AMASS `.npz`	`trans`, `root_orient`, `pose_body`	Z-up (auto-converted)
Standard `.npz`	`transl`, `global_orient`, `body_pose`	Y-up

High frame-rate sequences (>30 FPS) are automatically downsampled to 30 FPS.

Output format

A bmimic .npz file at 50 FPS:

{
    'fps':            np.ndarray (1,),          # 50
    'joint_pos':      np.ndarray (T, 29),       # joint angles [rad]
    'joint_vel':      np.ndarray (T, 29),       # joint velocities [rad/s]
    'body_pos_w':     np.ndarray (T, 30, 3),    # body positions in world frame [m]
    'body_quat_w':    np.ndarray (T, 30, 4),    # body orientations wxyz in world frame
    'body_lin_vel_w': np.ndarray (T, 30, 3),    # body linear velocities [m/s]
    'body_ang_vel_w': np.ndarray (T, 30, 3),    # body angular velocities [rad/s]
}

Model Architecture

NMR uses a two-stage pipeline:

SMPL-X motion (T, 140)
        ↓
   SMPL-X VQ-VAE Encoder
        ↓ (T/4, 512)
   LLaMA Transformer (forward, non-autoregressive)
        ↓ (T/4, 512)
   G1 VQ-VAE Decoder
        ↓
G1 robot motion (T, 217)
        ↓
post-processing (Butterworth low-pass filter)
        ↓
{dof (T,29), root_trans (T,3), root_rot_quat (T,4)}

Stage 1 — VQ-VAE Tokenizer: Encodes SMPL-X human motion into a compact latent space using FSQ quantization (codebook size 1920, temporal downsampling ×4).

Stage 2 — Transformer: A 70M-parameter LLaMA-style model that maps human motion embeddings to G1 robot motion embeddings in a one-to-one forward pass (non-autoregressive).

For full architecture details, see the paper.

Checkpoint

Model weights are hosted on HuggingFace Hub at RayZhao/NMR and will be downloaded automatically on first use.

If you prefer to download manually:

huggingface-cli download RayZhao/NMR weights/epoch_30.pth --local-dir .
huggingface-cli download RayZhao/NMR assets/SMPLX_NEUTRAL.npz --local-dir .

Citation


@article{zhao2026make,
  title={Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control},
  author={Zhao, Qingrui and Yang, Kaiyue and Wang, Xiyu and Zhao, Shiqi and Lu, Yi and Zhang, Xinfang and Yin, Wei and Shen, Qiu and Long, Xiao-Xiao and Cao, Xun},
  journal={arXiv preprint arXiv:2603.22201},
  year={2026}
}