Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

GEM: A Generalist Model for Human Motion

Monocular whole-body 3D human pose estimation using the SOMA body model


2D Keypoint Overlay In-Camera Mesh

Global Mesh G1 Robot Retarget

2D Keypoint Overlay  |  In-Camera Mesh  |  Global Mesh  |  Retargeted G1 Motion

📰 News

  • [2025-06] 📢 Added ONNX/TensorRT accelerated demo (demo_soma_onnx.py) — see Demo docs
  • [2025-06] 📢 Added humanoid robot retargeting to Unitree G1 (--retarget)
  • [2025-05] 📢 The GEM codebase is released!

🔎 Overview

GEM is a video-based 3D human pose estimation model developed by NVIDIA. It recovers full-body 77-joint motion — including body, hands, and face — from monocular video using the SOMA parametric body model. The pipeline handles dynamic cameras and recovers global motion trajectories. GEM includes a bundled 2D pose estimation model that detects 77 SOMA keypoints, making the system fully self-contained. Licensed under Apache 2.0 for commercial use.

✨ Key Features

  • 77-joint SOMA body model — full body, hands, and face articulation
  • Bundled 2D keypoint detector — 2D pose estimator trained for SOMA 77-joint skeleton
  • Camera-space motion recovery — camera-space human motion estimation from dynamic monocular video
  • World-space motion recovery — world-space human motion estimation from dynamic monocular video
  • Humanoid robot retargeting — retarget recovered motion to Unitree G1 robot via SOMA Retargeter
  • Apache 2.0 licensed — commercially usable, trained on NVIDIA-owned data only

Research Version: Multi-Modal Conditioning

Looking for multi-modal motion generation (text, audio, music conditioning)? Check out GEM-SMPL, our research model using the SMPL body model that supports both motion estimation and generation from diverse input modalities. Presented at ICCV 2025 (Highlight).

🚀 Quick Start

# 1. Clone git clone --recursive https://github.com/NVlabs/GEM-X.git && cd GEM-X # 2. Setup environment pip install uv && uv venv .venv --python 3.12 && source .venv/bin/activate uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 uv pip install -e third_party/soma && cd third_party/soma && git lfs pull && cd ../.. bash scripts/install_env.sh # 3. Run demo python scripts/demo/demo_soma.py --video path/to/video.mp4 --ckpt inputs/pretrained/gem_soma.ckpt # 4. (Optional) Retarget to G1 robot uv pip install -e third_party/soma-retargeter python scripts/demo/demo_soma.py --video path/to/video.mp4 --retarget

See docs/INSTALL.md for detailed installation instructions.

Mac users: For real-time webcam demos on Apple Silicon, see docs/INSTALL_MACOS.md instead.

📚 Documentation

DocumentDescription
InstallationPrerequisites, step-by-step setup, Docker, troubleshooting
Installation (macOS)Apple Silicon setup for the real-time webcam demo
DemoFull 3D pipeline, ONNX/TRT accelerated demo, 2D keypoint-only demo, output formats
Training & EvaluationDataset preparation, training commands, config system
Model OverviewArchitecture, SOMA body model, bundled 2D pose model
Related ProjectsGENMO, SOMA, ecosystem cross-references

📦 Pretrained Models

ModelBody ModelJointsDownload
GEM (SOMA)SOMA77 (body + hands + face)gem_soma.ckpt

Place checkpoints under inputs/pretrained/ or pass the path via --ckpt. The demo scripts will automatically download the checkpoint from HuggingFace if --ckpt is not provided.

🤝 Related Humanoid Work at NVIDIA

GEM is part of a larger effort to enable humanoid motion data for robotics, physical AI, and other applications.

Check out these related works:

📖 Citation

@inproceedings{genmo2025, title = {GENMO: A GENeralist Model for Human MOtion}, author = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025} }

📄 License

This project is released under Apache 2.0. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. See ATTRIBUTIONS.md for specifics.

GOVERNING TERMS:

Use of the source code is governed by the Apache License, Version 2.0. Use of the associated model is governed by the NVIDIA Open Model License Agreement.

关于 About

Monocular whole-body 3D human pose estimation using the SOMA body model

语言 Languages

Python98.1%
Shell1.7%
Dockerfile0.2%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
9
Total Commits
峰值: 4次/周
Less
More

核心贡献者 Contributors