Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

UMI-3D SLAM and Data Processing

🌐 UMI-3D Project Homepage

🔧 UMI-3D Hardware 🛰️ UMI-3D SLAM Pipeline 🤖 UMI-3D Policy
UMI-3D Hardware UMI-3D SLAM Pipeline UMI-3D Policy
Hardware design, BOM, CAD, 3D-print parts SLAM, synchronization, calibration, and data processing Policy training, deployment, inference

📦 Dataset & Models

Overview

UMI-3D provides a complete end-to-end pipeline, transforming raw rosbag recordings into training-ready datasets for embodied manipulation learning:

Collected rosbag Files
  ↓
Calibration  
  ↓
SLAM
  ↓
Aligned Demos
  ↓
Dataset Pipeline
  ↓
Zarr Dataset (for policy training, e.g. Diffusion Policy)

0. Complete Data Collection

To build the UMI-3D data collection system, please follow the hardware assembly and sensor setup instructions in:

👉 https://github.com/Physical-Intelligence-Laboratory/UMI-3D-Hardware

UMI-3D collects two types of data:

  1. Demonstration data
    Human-guided manipulation trajectories captured during task execution.

  2. Gripper calibration data
    Slowly open and close the gripper for approximately 5 cycles to estimate gripper motion range.

All data are recorded as rosbag files, including:

  • LiDAR point clouds
  • IMU measurements
  • Camera images

These recordings serve as the raw input for the full UMI-3D data processing pipeline.

1. Sensors Calibration

1.1 Fisheye Camera Intrinsic Calibration

Step 1 — Collect calibration images

  • Use a checkerboard (6 × 9 inner corners, square size = 0.10 m, configurable in script)
  • Capture ≥ 100 images with different positions (center / edges / corners), orientations, and distances
  • Save all images to fisheye_intrinsics/images/

Step 2 — Run calibration

cd fisheye_intrinsics python3 calibrate_fisheye_intrinsics.py \ --image_glob "images/*.png" \ --checkerboard_cols 6 \ --checkerboard_rows 9 \ --square_size 0.10 \ --output_dir calib_output

Calibration results will be saved to: fisheye_intrinsics/calib_output/

1.2 LiDAR–Camera Extrinsic Calibration

This step estimates the rigid transformation between the Livox MID-360 LiDAR and the fisheye camera.


Step 1 — Prepare calibration data

  • Record a static rosbag containing:

    • Livox point cloud (/livox/lidar)
    • Camera images
  • Place the rosbag into: livox2cam_calibration/src/calib_data/

  • Fill in the previously calibrated camera intrinsics into: livox2cam_calibration/src/config/qr_params.yaml

  • Calibration board files are provided here: Calibration Board Files


Step 2 — Build and Run

Prerequisites:

  • Ubuntu 20.04, ROS Noetic
  • PCL ≥ 1.8
  • OpenCV ≥ 4.0
conda deactivate # Build cd livox2cam_calibration catkin_make # Run Calibration source devel/setup.bash roslaunch livox2cam_calibration calib.launch
  • Output: Extrinsic transformation between LiDAR and camera (rotation + translation)

2. UMI-3D SLAM

This module performs LiDAR–inertial SLAM to estimate the camera trajectory and reconstruct the environment.


Step 1 — Configure extrinsics

Fill the calibrated LiDAR–camera extrinsic parameters into: umi_3d_slam_ws/src/umi_3d_slam/config/mid360_180.yaml


Step 2 — Install dependencies

  • Environment: Ubuntu 20.04, ROS Noetic

  • Libraries: PCL ≥ 1.8, Eigen ≥ 3.3.4, OpenCV ≥ 4.2

  • Install Sophus:

    git clone https://github.com/bitcat-tech/Sophus cd Sophus mkdir build && cd build cmake .. make sudo make install

Step 3 — Build the SLAM system

conda deactivate
cd umi_3d_slam_ws
catkin_make

Step 4 — Run SLAM Demo

source devel/setup.bash

# Start SLAM
roslaunch umi_3d_slam mapping_mid360_180.launch rviz:=true

# Play rosbag
rosbag play YOUR_DEMO.bag
  • Output: Estimated camera trajectory saved in umi_3d_slam_ws/src/umi_3d_slam/output/camera_trajectory.csv

Note: Ensure proper time synchronization between LiDAR and camera.

3. Data Processing for Training

3.1 Rosbag Preprocessing

This stage converts raw rosbag recordings into time-aligned multi-modal data, and prepares them for SLAM and dataset generation.

The pipeline consists of two main steps:

Raw rosbags
   ↓
auto_bag_to_mp4_aligned.py   (alignment + video export)
   ↓
aligned_bags/
   ├── demos/
   ├── 000000.bag ...
   ↓
auto_umi_3d_slam.sh         (trajectory estimation)
   ↓
Final demos with trajectory

Step 1 — Prepare Raw Rosbags

Place all raw rosbags into a single directory:

/path/to/your/rosbags/
    ├── 2026-03-30-13-33-14.bag
    ├── 2026-03-30-13-33-37.bag
    ├── ...
    ├── 20xx-xx-xx-xx-xx-xx.bag
    ├── gripper_calibration*.bag

Step 2 — Multi-modal Alignment and Video Export

Run the preprocessing script:

conda deactivate python3 scripts_slam_pipeline/auto_bag_to_mp4_aligned.py \ --dir /path/to/your/rosbags \ --align \ --organize_each \ --start_idx 0 \ --id_width 6 \ --use_header_stamp \ --gate 0.02 \ --no_symlink

🔍 What this script does
  • Synchronizes:
    • LiDAR (Livox)
    • Camera images
    • IMU
  • Uses timestamp gating (--gate 0.02) for alignment
  • Re-indexes all demos into consistent IDs
  • Converts image streams into MP4 videos
  • Outputs per-frame timestamps

📂 Output structure
aligned_bags/
├── demos/
│   ├── demo_000000_000000/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   └── source.txt
│   ├── demo_000001_000001/
│   │   ├── ...
│
├── 000000.bag
├── 000001.bag
├── ...

Each demo folder corresponds to one aligned sequence.


Step 3 — Run SLAM for Trajectory Estimation

Run batch SLAM processing:

conda deactivate bash scripts_slam_pipeline/auto_umi_3d_slam.sh \ --bag_dir /path/to/your/rosbags/aligned_bags \ --start 0 \ --end YOUR_BAG_NUMBER

🔍 What this script does

Based on the implementation :contentReference[oaicite:0]{index=0}:

  • Iterates over each indexed bag (000000.bag, 000001.bag, ...)
  • For each bag:
    1. Launches UMI-3D SLAM system
    2. Plays rosbag
    3. Waits for trajectory output
    4. Moves result to corresponding demo folder:
      demos/demo_xxxxxx_xxxxxx/camera_trajectory.csv
      
    5. Optionally deletes processed bag to save disk space

📂 Final Output
aligned_bags/
├── demos/
│   ├── demo_000000_000000/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   ├── camera_trajectory.csv   ← SLAM output
│   │   └── source.txt
│
├── ...

3.2 UMI-format Training Data Packaging

This stage converts the preprocessed aligned demos into a UMI-format replay buffer for policy training.

The full pipeline is wrapped by:

python run_dataset_pipeline.py \ --session_dir /path/to/aligned_bags \ --output /path/to/aligned_bags/DATASET_NAME.zarr.zip

The pipeline runs four stages in order:

aligned_bags/ └── demos/ ├── demo_xxxxxx_xxxxxx/ │ ├── raw_video.mp4 │ ├── raw_video_timestamps.csv │ ├── camera_trajectory.csv │ └── source.txt ├── gripper_calibration*/ │ ├── raw_video.mp4 │ ├── raw_video_timestamps.csv │ └── tag_detection.pkl 00_detect_aruco.py 01_run_calibrations.py 02_generate_dataset_plan.py 03_generate_replay_buffer.py DATASET_NAME.zarr.zip

Step 1 — Install environment

System dependencies

sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf

Conda environment

We recommend using Miniforge instead of the standard Anaconda distribution.

mamba env create -f conda_environment.yaml conda activate umi

Step 2 — Prepare inputs

Before running the dataset pipeline, make sure your session directory already contains:

  • demos/demo_*/raw_video.mp4
  • demos/demo_*/raw_video_timestamps.csv
  • demos/demo_*/camera_trajectory.csv
  • demos/gripper_calibration*/raw_video.mp4

You also need:

  • camera intrinsics: example/calibration/fisheye.json
  • ArUco configuration: example/calibration/aruco_config.yaml

If needed, you can override them with:

--camera_intrinsics /path/to/custom_fisheye.json --aruco_config /path/to/custom_aruco_config.yaml

Step 3 — Run the full dataset pipeline

conda activate umi python run_dataset_pipeline.py \ --session_dir /path/to/aligned_bags \ --output /path/to/aligned_bags/DATASET_NAME.zarr.zip

Output Summary

After the full pipeline finishes, the main outputs are:

aligned_bags/ ├── demos/ │ ├── demo_000000_000000/ │ │ ├── raw_video.mp4 │ │ ├── raw_video_timestamps.csv │ │ ├── camera_trajectory.csv │ │ ├── tag_detection.pkl │ │ └── source.txt │ ├── ... │ ├── gripper_calibration*/ │ │ ├── raw_video.mp4 │ │ ├── raw_video_timestamps.csv │ │ ├── tag_detection.pkl │ │ └── gripper_range.json ├── dataset_plan.pkl └── DATASET_NAME.zarr.zip

Note: This version is currently designed for the single-gripper UMI-3D setup, where camera_idx is fixed to 0.

4. Next Step: Policy Training and Deployment

After obtaining the final dataset:

DATASET_NAME.zarr.zip

you can proceed to policy training and real-world deployment using the UMI-3D Policy framework:

👉 https://github.com/Physical-Intelligence-Laboratory/UMI-3D-Policy

This repository provides:

  • Diffusion policy training
  • Real-world deployment on robotic platforms

Citation

If you find this work useful for your research, please consider citing:

@misc{wang2026umi3dextendinguniversalmanipulation, title={UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception}, author={Ziming Wang}, year={2026}, eprint={2604.14089}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2604.14089} }

Acknowledgements

This project builds upon a number of outstanding open-source works in LiDAR SLAM, calibration, and embodied perception, including: UMI, VoxelMap, FAST-LIVO2, FAST-LIO, IKFoM, velo2cam_calibration, FAST-Calib. We sincerely thank the authors and contributors of these projects for their pioneering work and valuable contributions to the community, which have greatly inspired and enabled the development of UMI-3D.

关于 About

Part of the UMI-3D project: https://umi-3d.github.io/

语言 Languages

Python49.9%
C++48.1%
C0.9%
CMake0.8%
Shell0.2%
Lua0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
2
Total Commits
峰值: 2次/周
Less
More

核心贡献者 Contributors