Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing

arXiv Hugging Face dataset License MIT

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📖 English  |  简体中文

image

✨ Introduction

MinerU-Popo is a lightweight and universal framework for POst-Processing OCR outputs, bridging the gap between page-level OCR parsing and document-level semantic structure. It constructs document tree structures with a 4B post-processing model that performs four subtasks: table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis. We handle the challenges of cross-page geometric discontinuity, redundant document parsing, and scalability to long documents via:

  • Task-Oriented Data Engine: Generate representative training data and simplify the task-specific input.
  • Dynamic Chunking and Synchronization: Process long document by dynamic chunks and reduce deviations across chunks to preserve global consistency.
  • Document Enrichment: Structurally construct a tree, semantically generate summaries and split long-section nodes.

image

📊 Performance

Better Hierarchy (TEDS) after Post-Processing

Basic OCRBeforeAfter
MinerU53.790.6
MonkeyOCR48.987.4
Dolphin60.483.5
PaddleOCR59.382.6
GLM-OCR53.581.8

Advantages Compared to Directly Using Pre-trained Model

ModelTEDSDoc/s
MinerU-Popo90.60.37
Qwen3-VL-2B21.20.22
Qwen3-VL-4B56.50.20
Qwen3-VL-8B65.90.16
Qwen3-VL-32B78.00.04

Benefits for Downstream Retrieval and Analysis (Acc on ViDoRe V3)

MethodC.S.Fin.H.R.Ind.Phar.
MinerU-Popo84.449.566.858.771.6
Raw RAG82.348.763.260.464.4
Visual RAG80.758.464.859.767.6

⚙️ Setup

Prepare Environment

Install from Source

conda create -n popo python=3.10
conda activate popo
pip install -r requirements.txt

Install from Docker Image

 docker run -it --rm --gpus=all --ipc=host --network=host dockerrr8277/mineru-popo-vllm:latest

Download Model

Download the MinerU-Popo post-processing model:

hf download DreamEternal/MinerU-Popo --local-dir models/Mineru-Popo

Model Configuration

In the Configuration, for transformer inference, edit the environment POPO_MODEL_PATH. For vllm inference, edit the url and key in function popo_generate.

For enrichment and question answering, further edit the url and key in qwen_generate and gpt_generate.

💻 Usage

The post-processing pipeline takes page-level parsing results from OCR/layout systems, normalizes them into a unified schema, runs MinerU-Popo inference, and finally builds document trees.

Step 1: Prepare OCR/Layout Outputs

Run your preferred page-level parser first, such as MinerU, MonkeyOCR, Dolphin, PaddleOCR-VL, or GLM-OCR. Place each model's output under:

post-process/<model_name>/

For example:

post-process/mineru/
post-process/monkeyocr/
post-process/PaddleOCR-VL-1.5/
post-process/dolphin/
post-process/glm-ocr/

Step 2: Normalize Labels

Convert raw model-specific labels and bounding boxes into the unified MinerU-Popo input format:

bash scripts/run_label_normalization.sh

The normalized outputs are written to:

outputs/label_normalization/<model_name>/

Step 3: Run MinerU-Popo Inference

Run MinerU-Popo on the normalized labels:

bash scripts/run_inference.sh

The inference outputs are written to:

outputs/inference/<model_name>/

Step 4: Build Document Trees

Build structured document trees from the inference outputs:

bash scripts/build_tree.sh

The final tree outputs and text previews are written to:

outputs/build_tree/<model_name>/
outputs/build_tree_txt/<model_name>/

Example tree outputs are provided in:

output_cases/

🙏 Acknowledgements

  • MinerU and other OCR system (MonkeyOCR, Dolphin, PaddleOCR, GLM-OCR) for page-level parsing.
  • ViDoRe V3 and MMDA as benchmarks.

📚 Citation

If you find this project useful for your research, please consider giving us a star and citing our paper:

@article{xu2026mineru,
  title={MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing},
  author={Xu, Bangrui and Miao, Ziyang and Zhou, Xuanhe and Lin, Yiming and Tang, Zirui and Zhao, Xiaomeng and Wu, Fan and Tan, Cheng and Wang, Bin and He, Conghui},
  journal={arXiv preprint arXiv:2605.24973},
  year={2026}
}

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

关于 About

No description, website, or topics provided.

语言 Languages

Python97.2%
Shell2.8%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
10
Total Commits
峰值: 5次/周
Less
More

核心贡献者 Contributors