GEM: A Generalist Model for Human Motion

Monocular whole-body 3D human pose estimation using the SOMA body model

2D Keypoint Overlay In-Camera Mesh

Global Mesh G1 Robot Retarget

_{2D Keypoint Overlay | In-Camera Mesh | Global Mesh | Retargeted G1 Motion}

📰 News

[2025-06] 📢 Added ONNX/TensorRT accelerated demo (demo_soma_onnx.py) — see Demo docs
[2025-06] 📢 Added humanoid robot retargeting to Unitree G1 (--retarget)
[2025-05] 📢 The GEM codebase is released!

🔎 Overview

GEM is a video-based 3D human pose estimation model developed by NVIDIA. It recovers full-body 77-joint motion — including body, hands, and face — from monocular video using the SOMA parametric body model. The pipeline handles dynamic cameras and recovers global motion trajectories. GEM includes a bundled 2D pose estimation model that detects 77 SOMA keypoints, making the system fully self-contained. Licensed under Apache 2.0 for commercial use.

✨ Key Features

77-joint SOMA body model — full body, hands, and face articulation
Bundled 2D keypoint detector — 2D pose estimator trained for SOMA 77-joint skeleton
Camera-space motion recovery — camera-space human motion estimation from dynamic monocular video
World-space motion recovery — world-space human motion estimation from dynamic monocular video
Humanoid robot retargeting — retarget recovered motion to Unitree G1 robot via SOMA Retargeter
Apache 2.0 licensed — commercially usable, trained on NVIDIA-owned data only

Research Version: Multi-Modal Conditioning

Looking for multi-modal motion generation (text, audio, music conditioning)? Check out GEM-SMPL, our research model using the SMPL body model that supports both motion estimation and generation from diverse input modalities. Presented at ICCV 2025 (Highlight).

🚀 Quick Start

# 1. Clone
git clone --recursive https://github.com/NVlabs/GEM-X.git && cd GEM-X

# 2. Setup environment
pip install uv && uv venv .venv --python 3.12 && source .venv/bin/activate
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
uv pip install -e third_party/soma && cd third_party/soma && git lfs pull && cd ../..
bash scripts/install_env.sh

# 3. Run demo
python scripts/demo/demo_soma.py --video path/to/video.mp4 --ckpt inputs/pretrained/gem_soma.ckpt

# 4. (Optional) Retarget to G1 robot
uv pip install -e third_party/soma-retargeter
python scripts/demo/demo_soma.py --video path/to/video.mp4 --retarget

See docs/INSTALL.md for detailed installation instructions.

Mac users: For real-time webcam demos on Apple Silicon, see docs/INSTALL_MACOS.md instead.

📚 Documentation

Document	Description
Installation	Prerequisites, step-by-step setup, Docker, troubleshooting
Installation (macOS)	Apple Silicon setup for the real-time webcam demo
Demo	Full 3D pipeline, ONNX/TRT accelerated demo, 2D keypoint-only demo, output formats
Training & Evaluation	Dataset preparation, training commands, config system
Model Overview	Architecture, SOMA body model, bundled 2D pose model
Related Projects	GENMO, SOMA, ecosystem cross-references

📦 Pretrained Models

Model	Body Model	Joints	Download
GEM (SOMA)	SOMA	77 (body + hands + face)	gem_soma.ckpt

Place checkpoints under inputs/pretrained/ or pass the path via --ckpt. The demo scripts will automatically download the checkpoint from HuggingFace if --ckpt is not provided.

🤝 Related Humanoid Work at NVIDIA

GEM is part of a larger effort to enable humanoid motion data for robotics, physical AI, and other applications.

Check out these related works:

📖 Citation

@inproceedings{genmo2025,
  title     = {GENMO: A GENeralist Model for Human MOtion},
  author    = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}

📄 License

This project is released under Apache 2.0. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. See ATTRIBUTIONS.md for specifics.

GOVERNING TERMS:

Use of the source code is governed by the Apache License, Version 2.0. Use of the associated model is governed by the NVIDIA Open Model License Agreement.