Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

This software project accompanies the research paper: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second, Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun.

We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image.

The model in this repository is a reference implementation, which has been re-trained. Its performance is close to the model reported in the paper but does not match it exactly.

Getting Started

We recommend setting up a virtual environment. Using e.g. miniconda, the depth_pro package can be installed via:

conda create -n depth-pro -y python=3.9 conda activate depth-pro pip install -e .

To download pretrained checkpoints follow the code snippet below:

source get_pretrained_models.sh # Files will be downloaded to `checkpoints` directory.

Running from commandline

We provide a helper script to directly run the model on a single image:

# Run prediction on a single image: depth-pro-run -i ./data/example.jpg # Run `depth-pro-run -h` for available options.

Running from python

from PIL import Image import depth_pro # Load model and preprocessing transform model, transform = depth_pro.create_model_and_transforms() model.eval() # Load and preprocess an image. image, _, f_px = depth_pro.load_rgb(image_path) image = transform(image) # Run inference. prediction = model.infer(image, f_px=f_px) depth = prediction["depth"] # Depth in [m]. focallength_px = prediction["focallength_px"] # Focal length in pixels.

Evaluation (boundary metrics)

Our boundary metrics can be found under eval/boundary_metrics.py and used as follows:

# for a depth-based dataset boundary_f1 = SI_boundary_F1(predicted_depth, target_depth) # for a mask-based dataset (image matting / segmentation) boundary_recall = SI_boundary_Recall(predicted_depth, target_mask)

Citation

If you find our work useful, please cite the following paper:

@inproceedings{Bochkovskii2024:arxiv, author = {Aleksei Bochkovskii and Ama\"{e}l Delaunoy and Hugo Germain and Marcel Santos and Yichao Zhou and Stephan R. Richter and Vladlen Koltun}, title = {Depth Pro: Sharp Monocular Metric Depth in Less Than a Second}, booktitle = {International Conference on Learning Representations}, year = {2025}, url = {https://arxiv.org/abs/2410.02073}, }

License

This sample code is released under the LICENSE terms.

The model weights are released under the LICENSE terms.

Acknowledgements

Our codebase is built using multiple opensource contributions, please see Acknowledgements for more details.

Please check the paper for a complete list of references and datasets used in this work.

关于 About

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

语言 Languages

Python99.5%
Shell0.5%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
0
Total Commits
峰值: 1次/周
Less
More

核心贡献者 Contributors