UMI-3D SLAM and Data Processing

🌐 UMI-3D Project Homepage

🔧 UMI-3D Hardware	🛰️ UMI-3D SLAM Pipeline	🤖 UMI-3D Policy

Hardware design, BOM, CAD, 3D-print parts	SLAM, synchronization, calibration, and data processing	Policy training, deployment, inference 📦 Dataset & Models

Overview

UMI-3D provides a complete end-to-end pipeline, transforming raw rosbag recordings into training-ready datasets for embodied manipulation learning:

Collected rosbag Files
  ↓
Calibration  
  ↓
SLAM
  ↓
Aligned Demos
  ↓
Dataset Pipeline
  ↓
Zarr Dataset (for policy training, e.g. Diffusion Policy)

0. Complete Data Collection

To build the UMI-3D data collection system, please follow the hardware assembly and sensor setup instructions in:

👉 https://github.com/Physical-Intelligence-Laboratory/UMI-3D-Hardware

UMI-3D collects two types of data:

Demonstration data
Human-guided manipulation trajectories captured during task execution.
Gripper calibration data
Slowly open and close the gripper for approximately 5 cycles to estimate gripper motion range.

All data are recorded as rosbag files, including:

LiDAR point clouds
IMU measurements
Camera images

These recordings serve as the raw input for the full UMI-3D data processing pipeline.

1. Sensors Calibration

1.1 Fisheye Camera Intrinsic Calibration

Step 1 — Collect calibration images

Use a checkerboard (6 × 9 inner corners, square size = 0.10 m, configurable in script)
Capture ≥ 100 images with different positions (center / edges / corners), orientations, and distances
Save all images to fisheye_intrinsics/images/

Step 2 — Run calibration

cd fisheye_intrinsics

python3 calibrate_fisheye_intrinsics.py \
    --image_glob "images/*.png" \
    --checkerboard_cols 6 \
    --checkerboard_rows 9 \
    --square_size 0.10 \
    --output_dir calib_output

Calibration results will be saved to: fisheye_intrinsics/calib_output/

1.2 LiDAR–Camera Extrinsic Calibration

This step estimates the rigid transformation between the Livox MID-360 LiDAR and the fisheye camera.

Step 1 — Prepare calibration data

Record a static rosbag containing:
- Livox point cloud (/livox/lidar)
- Camera images
Place the rosbag into: livox2cam_calibration/src/calib_data/
Fill in the previously calibrated camera intrinsics into: livox2cam_calibration/src/config/qr_params.yaml
Calibration board files are provided here: Calibration Board Files

Step 2 — Build and Run

Prerequisites:

Ubuntu 20.04, ROS Noetic
PCL ≥ 1.8
OpenCV ≥ 4.0

conda deactivate

# Build
cd livox2cam_calibration
catkin_make

# Run Calibration
source devel/setup.bash
roslaunch livox2cam_calibration calib.launch

Output: Extrinsic transformation between LiDAR and camera (rotation + translation)

2. UMI-3D SLAM

This module performs LiDAR–inertial SLAM to estimate the camera trajectory and reconstruct the environment.

Step 1 — Configure extrinsics

Fill the calibrated LiDAR–camera extrinsic parameters into: umi_3d_slam_ws/src/umi_3d_slam/config/mid360_180.yaml

Step 2 — Install dependencies

Environment: Ubuntu 20.04, ROS Noetic
Libraries: PCL ≥ 1.8, Eigen ≥ 3.3.4, OpenCV ≥ 4.2

Install Sophus:

git clone https://github.com/bitcat-tech/Sophus
cd Sophus
mkdir build && cd build
cmake ..
make
sudo make install

Step 3 — Build the SLAM system

conda deactivate
cd umi_3d_slam_ws
catkin_make

Step 4 — Run SLAM Demo

source devel/setup.bash

# Start SLAM
roslaunch umi_3d_slam mapping_mid360_180.launch rviz:=true

# Play rosbag
rosbag play YOUR_DEMO.bag

Output: Estimated camera trajectory saved in umi_3d_slam_ws/src/umi_3d_slam/output/camera_trajectory.csv

Note: Ensure proper time synchronization between LiDAR and camera.

3. Data Processing for Training

3.1 Rosbag Preprocessing

This stage converts raw rosbag recordings into time-aligned multi-modal data, and prepares them for SLAM and dataset generation.

The pipeline consists of two main steps:

Raw rosbags
   ↓
auto_bag_to_mp4_aligned.py   (alignment + video export)
   ↓
aligned_bags/
   ├── demos/
   ├── 000000.bag ...
   ↓
auto_umi_3d_slam.sh         (trajectory estimation)
   ↓
Final demos with trajectory

Step 1 — Prepare Raw Rosbags

Place all raw rosbags into a single directory:

/path/to/your/rosbags/
    ├── 2026-03-30-13-33-14.bag
    ├── 2026-03-30-13-33-37.bag
    ├── ...
    ├── 20xx-xx-xx-xx-xx-xx.bag
    ├── gripper_calibration*.bag

Step 2 — Multi-modal Alignment and Video Export

Run the preprocessing script:

conda deactivate

python3 scripts_slam_pipeline/auto_bag_to_mp4_aligned.py \
  --dir /path/to/your/rosbags \
  --align \
  --organize_each \
  --start_idx 0 \
  --id_width 6 \
  --use_header_stamp \
  --gate 0.02 \
  --no_symlink

🔍 What this script does

Synchronizes:
- LiDAR (Livox)
- Camera images
- IMU
Uses timestamp gating (--gate 0.02) for alignment
Re-indexes all demos into consistent IDs
Converts image streams into MP4 videos
Outputs per-frame timestamps

📂 Output structure

aligned_bags/
├── demos/
│   ├── demo_000000_000000/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   └── source.txt
│   ├── demo_000001_000001/
│   │   ├── ...
│
├── 000000.bag
├── 000001.bag
├── ...

Each demo folder corresponds to one aligned sequence.

Step 3 — Run SLAM for Trajectory Estimation

Run batch SLAM processing:

conda deactivate

bash scripts_slam_pipeline/auto_umi_3d_slam.sh \
  --bag_dir /path/to/your/rosbags/aligned_bags \
  --start 0 \
  --end YOUR_BAG_NUMBER

🔍 What this script does

Based on the implementation :contentReference[oaicite:0]{index=0}:

Iterates over each indexed bag (000000.bag, 000001.bag, ...)
For each bag:
1. Launches UMI-3D SLAM system
2. Plays rosbag
3. Waits for trajectory output
4. Moves result to corresponding demo folder:
```
demos/demo_xxxxxx_xxxxxx/camera_trajectory.csv
```
5. Optionally deletes processed bag to save disk space

📂 Final Output

aligned_bags/
├── demos/
│   ├── demo_000000_000000/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   ├── camera_trajectory.csv   ← SLAM output
│   │   └── source.txt
│
├── ...

3.2 UMI-format Training Data Packaging

This stage converts the preprocessed aligned demos into a UMI-format replay buffer for policy training.

The full pipeline is wrapped by:

python run_dataset_pipeline.py \
  --session_dir /path/to/aligned_bags \
  --output /path/to/aligned_bags/DATASET_NAME.zarr.zip

The pipeline runs four stages in order:

aligned_bags/
   └── demos/
        ├── demo_xxxxxx_xxxxxx/
        │    ├── raw_video.mp4
        │    ├── raw_video_timestamps.csv
        │    ├── camera_trajectory.csv
        │    └── source.txt
        ├── gripper_calibration*/
        │    ├── raw_video.mp4
        │    ├── raw_video_timestamps.csv
        │    └── tag_detection.pkl
   ↓
00_detect_aruco.py
   ↓
01_run_calibrations.py
   ↓
02_generate_dataset_plan.py
   ↓
03_generate_replay_buffer.py
   ↓
DATASET_NAME.zarr.zip

Step 1 — Install environment

System dependencies

sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf

Conda environment

We recommend using Miniforge instead of the standard Anaconda distribution.

mamba env create -f conda_environment.yaml
conda activate umi

Step 2 — Prepare inputs

Before running the dataset pipeline, make sure your session directory already contains:

demos/demo_*/raw_video.mp4
demos/demo_*/raw_video_timestamps.csv
demos/demo_*/camera_trajectory.csv
demos/gripper_calibration*/raw_video.mp4

You also need:

camera intrinsics: example/calibration/fisheye.json
ArUco configuration: example/calibration/aruco_config.yaml

If needed, you can override them with:

--camera_intrinsics /path/to/custom_fisheye.json
--aruco_config /path/to/custom_aruco_config.yaml

Step 3 — Run the full dataset pipeline

conda activate umi

python run_dataset_pipeline.py \
  --session_dir /path/to/aligned_bags \
  --output /path/to/aligned_bags/DATASET_NAME.zarr.zip

Output Summary

After the full pipeline finishes, the main outputs are:

aligned_bags/
├── demos/
│   ├── demo_000000_000000/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   ├── camera_trajectory.csv
│   │   ├── tag_detection.pkl
│   │   └── source.txt
│   ├── ...
│   ├── gripper_calibration*/
│   │   ├── raw_video.mp4
│   │   ├── raw_video_timestamps.csv
│   │   ├── tag_detection.pkl
│   │   └── gripper_range.json
│
├── dataset_plan.pkl
└── DATASET_NAME.zarr.zip

Note: This version is currently designed for the single-gripper UMI-3D setup, where camera_idx is fixed to 0.

4. Next Step: Policy Training and Deployment

After obtaining the final dataset:

DATASET_NAME.zarr.zip

you can proceed to policy training and real-world deployment using the UMI-3D Policy framework:

👉 https://github.com/Physical-Intelligence-Laboratory/UMI-3D-Policy

This repository provides:

Diffusion policy training
Real-world deployment on robotic platforms

Citation

If you find this work useful for your research, please consider citing:

@misc{wang2026umi3dextendinguniversalmanipulation,
  title={UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception},
  author={Ziming Wang},
  year={2026},
  eprint={2604.14089},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2604.14089}
}

Acknowledgements

This project builds upon a number of outstanding open-source works in LiDAR SLAM, calibration, and embodied perception, including: UMI, VoxelMap, FAST-LIVO2, FAST-LIO, IKFoM, velo2cam_calibration, FAST-Calib. We sincerely thank the authors and contributors of these projects for their pioneering work and valuable contributions to the community, which have greatly inspired and enabled the development of UMI-3D.

UMI-3D SLAM and Data Processing

🌐 UMI-3D Project Homepage

Overview

0. Complete Data Collection

1. Sensors Calibration

1.1 Fisheye Camera Intrinsic Calibration

1.2 LiDAR–Camera Extrinsic Calibration

2. UMI-3D SLAM

3. Data Processing for Training

3.1 Rosbag Preprocessing

Step 1 — Prepare Raw Rosbags

Step 2 — Multi-modal Alignment and Video Export

🔍 What this script does

📂 Output structure

Step 3 — Run SLAM for Trajectory Estimation

🔍 What this script does

📂 Final Output

3.2 UMI-format Training Data Packaging

Step 1 — Install environment

Step 2 — Prepare inputs

Step 3 — Run the full dataset pipeline

Output Summary

4. Next Step: Policy Training and Deployment

Citation

Acknowledgements

关于 About

语言 Languages

提交活跃度 Commit Activity

核心贡献者 Contributors