Public
Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Paper | Project Page | Video | 中文解读

EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu

In this work, we presented ESAM, an efficient framework that leverages vision foundation models for online, real-time, fine-grained, generalized and open-vocabulary 3D instance segmentation.

News

  • [2025/4/03]: Custom dataset is supported! Users can run EmbodiedSAM on their own data following here.
  • [2025/2/11]: EmbodiedSAM is selected as an oral presentation in ICLR 2025!
  • [2025/1/23]: EmbodiedSAM is accepted to ICLR 2025 with a top 2% rating!
  • [2024/8/22]: Code and demo released.

Demo

Real-world:

demo

Bedroom:

Office:

Demos are a little bit large; please wait a moment to load them. Welcome to the home page for more complete demos and detailed introductions.

Method

Method Pipeline: overview

Getting Started

For environment setup and dataset preparation, please follow:

For training and evaluation, please follow:

For visualization on the provided datasets or your own data, please follow:

Main Results

We provide the checkpoints for quick reproduction of the results reported in the paper. In addition to Tsinghua Cloud, we also upload the checkpoints and processed data to HuggingFace. Click here for more details.

Class-agnostic 3D instance segmentation results on ScanNet200 dataset:

MethodTypeVFMAPAP@50AP@25Speed(ms)Downloads
SAMPro3DOfflineSAM18.032.856.1----
SAI3DOfflineSemanticSAM30.850.570.6----
SAM3DOnlineSAM20.635.755.51369+1518--
ESAMOnlineSAM42.263.779.61369+80model
ESAM-EOnlineFastSAM43.465.480.920+80model

Dataset transfer results from ScanNet200 to SceneNN and 3RScan:

MethodType ScanNet200-->SceneNNScanNet200-->3RScan
APAP@50AP@25APAP@50AP@25
SAMPro3DOffline12.625.853.23.98.021.0
SAI3DOffline18.634.765.75.411.827.4
SAM3DOnline15.130.051.86.213.033.9
ESAMOnline28.852.269.314.131.259.6
ESAM-EOnline28.650.471.013.929.458.8

3D instance segmentation results on ScanNet dataset:

MethodTypeScanNetSceneNNFPSDownload
APAP@50AP@25APAP@50AP@25
TD3Doffline46.271.181.3----------
Oneformer3Doffline59.378.886.7----------
INS-Convonline--57.4------------
TD3D-MAonline39.060.571.326.042.859.23.5--
ESAM-Eonline41.660.175.627.548.764.610model
ESAM-E+FFonline42.661.977.133.353.662.59.8model

Open-Vocabulary 3D instance segmentation results on ScanNet200 dataset:

MethodAPAP@50AP@25
SAI3D9.614.719.0
ESAM13.719.223.9

TODO List

  • Release code and checkpoints.
  • Release the demo code to directly run ESAM on streaming RGB-D video.

Contributors

Both students below contributed equally and the order is determined by random draw.

Both advised by Jiwen Lu.

Acknowledgement

We thank a lot for the flexible codebase of Oneformer3D and Online3D, as well as the valuable datasets provided by ScanNet, SceneNN and 3RScan.

Citation

@article{xu2024esam, 
      title={EmbodiedSAM: Online Segment Any 3D Thing in Real Time}, 
      author={Xiuwei Xu and Huangxing Chen and Linqing Zhao and Ziwei Wang and Jie Zhou and Jiwen Lu},
      journal={arXiv preprint arXiv:2408.11811},
      year={2024}
}

关于 About

[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
3d-instance-segmentation3d-scene-understandingembodied-visioniclriclr2025real-timesegment-anythingsemi-supervisedstreaming-video

语言 Languages

Python99.9%
Shell0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
0
Total Commits
峰值: 1次/周
Less
More

核心贡献者 Contributors