Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

💻 HomePage   |   🤗 Model   |   🤖 Demo

LogicsDocBench results


OmniDocBench-v1.5 results

Updates

  • [2026/03/09] We release the Logics-Parsing-Omni. For more details, please check our Technical Report.
  • [2026/02/13] 🚀🚀🚀🚀🚀 We release Logics-Parsing-v2 Model.
  • [2025/09/25] 🚀🚀🚀We release Logics-Parsing Model.

Introduction

Logics-Parsing-v2 is an advanced evolution of the previously proposed Logics-Parsing (v1). It inherits all the core capabilities of v1 model, while demonstrating more powerful capabilities on handling complex documents. Furthermore, it extends support for Parsing-2.0 scenarios, enabling structured parsing of musical sheets, flowcharts, as well as code/pseudocode blocks.

LogicsDocBench 概览

Key Features

v1
  • Effortless End-to-End Processing

    • Our single-model architecture eliminates the need for complex, multi-stage pipelines. Deployment and inference are straightforward, going directly from a document image to structured output.
    • It demonstrates exceptional performance on documents with challenging layouts.
  • Advanced Content Recognition

    • It accurately recognizes and structures difficult content, including intricate scientific formulas.
    • Chemical structures are intelligently identified and can be represented in the standard SMILES format.
  • Rich, Structured HTML Output

    • The model generates a clean HTML representation of the document, preserving its logical structure.
    • Each content block (e.g., paragraph, table, figure, formula) is tagged with its category, bounding box coordinates, and OCR text.
    • It automatically identifies and filters out irrelevant elements like headers and footers, focusing only on the core content.
  • State-of-the-Art Performance

    • Logics-Parsing achieves the best performance on our in-house benchmark, which is specifically designed to comprehensively evaluate a model’s parsing capability on complex-layout documents and STEM content.

v2

  • Effortless End-to-End Processing

    • End-to-end recognition and parsing for various kinds of document elements within a single model.
    • Handles complex-layout and text-dense documents such as newspapers and magazines with exceptional precision and ease;
  • Advanced Content Recognition

    • Smaller in size, greater in performance, delivering more accurate and structured parsing of tables and scientific formulas.
    • Introducing Parsing-2.0: natively supports parsing of diverse structured content, including flowcharts, music sheets and pseudocode blocks.
  • Rich, Structured HTML Output

    • Transforms documents into concise HTML -- capturing not just content, but also element types, spatial layouts, and semantic hierarchy.
    • More scientific and intuitive formats for structured elements -- such as Mermaid for flowcharts and ABC notation for musical scores.
  • State-of-the-Art Performance

    • SOTA across the board: Logics-Parsing-v2 sets top records on both our in-house benchmark (overall score: 82.16) and the renowned public benchmark OmniDocBench-v1.5 (overall score: 93.23).

Benchmark

v1 Existing document-parsing benchmarks often provide limited coverage of complex layouts and STEM content. To address this, we constructed an in-house benchmark comprising 1,078 page-level images across nine major categories and over twenty sub-categories. Our model achieves the best performance on this benchmark.
Model TypeMethodsOverall EditText Edit EditFormula EditTable TEDSTable EditReadOrderEditChemistryEditHandWritingEdit
ENZHENZHENZHENZHENZHENZHALLALL
Pipeline Toolsdoc2x0.2090.1880.1280.1940.3770.32181.185.30.1480.1150.1460.1221.00.307
Textin0.1530.1580.1320.1900.1850.22376.786.30.1760.1130.1180.1041.00.344
mathpix*0.1280.1460.1280.1520.060.14286.286.60.1200.1270.2040.1640.5520.263
PP_StructureV30.2200.2260.1720.290.2720.2766671.50.2370.1930.2010.1431.00.382
Mineru20.2120.2450.1340.1950.2800.40767.571.80.2280.2030.2050.1771.00.387
Marker0.3240.4090.1880.2890.2850.38365.550.40.5930.7020.230.2621.00.50
Pix2text0.4470.5470.4850.5770.3120.46564.763.00.5660.6130.4240.5341.00.95
Expert VLMsDolphin0.2080.2560.1490.1890.3340.34672.960.10.1920.350.1600.1390.9840.433
dots.ocr0.1860.1980.1150.1690.2910.35879.582.50.1720.1410.1650.1231.00.255
MonkeyOcr0.1930.2590.1270.2360.2620.32578.474.70.1860.2940.1970.1801.00.623
OCRFlux0.2520.2540.1340.1950.3260.40558.370.20.3580.2600.1910.1561.00.284
Gotocr0.2470.2490.1810.2130.2310.31859.574.70.380.2990.1950.1640.9690.446
Olmocr0.3410.3820.1250.2050.7190.76657.156.60.3270.3890.1910.1691.00.294
SmolDocling0.6570.8950.4860.9320.8590.97218.51.50.860.980.4130.6951.00.927
Logics-Parsing0.1240.1450.0890.1390.1060.16576.679.50.1650.1660.1360.1130.5190.252
General VLMsQwen2VL-72B0.2980.3420.1420.2440.4310.36364.255.50.4250.5810.1930.1820.7920.359
Qwen2.5VL-72B0.2330.2630.1620.240.2510.25769.6670.3130.3530.2050.2040.5970.349
Doubao-1.60.1880.2480.1290.2190.2730.33674.969.70.1800.2880.1710.1480.6010.317
GPT-50.2420.3730.1190.360.3980.45667.955.80.260.3970.1910.280.880.46
Gemini2.5 pro0.1850.200.1150.1550.2880.32682.680.30.1540.1820.1810.1360.5350.26
* Tested on the v3/PDF Conversion API (August 2025 deployment).

Comparisons on LogicsDocBench

We introduce LogicsDocBench, a new comprehensive evaluation benchmark comprising 900 carefully selected PDF pages, covering both traditional document Parsing-1.0 tasks and the newly introduced Parsing-2.0 scenarios. This benchmark is designed to better assess models’ capabilities in complex and diverse real-world documents parsing. The dataset is organized into three core document subsets:

  • STEM Documents (218 pages):

    Focuses on high-difficulty academic and educational content, spanning over ten domains including physics, mathematics, engineering, and interdisciplinary sciences. This subset evaluates deep understanding of mathematical formulas, technical terminology, and structured knowledge representation.

  • Complex Layouts (459 pages):

    Includes challenging real-world layouts such as multi-column text, cross-page tables, vertical writing, and mixed text-image arrangements. This subset comprehensively evaluate a model’s layout analysis abilities.

  • Parsing-2.0 Content (223 pages):

    Targets modern digital and semi-structured content that poses significant challenges for traditional OCR systems, including:

    • Chemical Molecular formulas
    • Musical sheets
    • Code and pseudo-code block
    • Flowcharts and mind maps

For Parsing-1.0 tasks, we adopt the same evaluation protocols as OmniDocBench-v1.5 to ensure fairness and consistency across benchmarks. For Parsing-2.0, we report fine-grained results using edit distance for each subcategory, and compute an overall score as follows:

$$\small \text{Overall} = \frac{Parsing1.0^{Overall} \times 3 + (1-{Chemistry}^{Edit})\times 100 + (1-{Code}^{Edit})\times 100 + (1-{Chart}^{Edit})\times 100 + (1-{Music}^{Edit})\times 100}{7}$$

Comprehensive evaluation of document parsing on LogicsDocBench is listed as follows:

The histogram below provides a more intuitive visualization of the advantages of our Logics-Parsing-v2 model in both Parsing-1.0 and 2.0 scenarios.


Comparisons on OmniDocBench_v1.5

We also provide the experimental results of our newly proposed Logics-Parsing-v2 model on the widely recognized open-source benchmark OmniDocBench-v1.5. As shown in the table below, Logics-Parsing-v2 achieves highly competitive performance.

* The model results in the table are sourced from the official OmniDocBench website.

Quick Start

v1

1. Installation

conda create -n logis-parsing python=3.10 conda activate logis-parsing pip install -r requirement.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model.py -t modelscope
# Download our model from huggingface.
pip install huggingface_hub
python download_model.py -t huggingface

3. Inference

python3 inference.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

1. Installation

conda create -n logis-parsing-v2 python=3.10 conda activate logis-parsing-v2 pip install -r requirements.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model_v2.py -t modelscope

# Download our model from huggingface.
pip install huggingface_hub
python download_model_v2.py -t huggingface

3. Inference

python3 inference_v2.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

Showcases

Acknowledgments

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work:

关于 About

No description, website, or topics provided.

语言 Languages

Python99.9%
Dockerfile0.1%
Shell0.0%
Makefile0.0%
Cython0.0%
Jsonnet0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
53
Total Commits
峰值: 25次/周
Less
More

核心贡献者 Contributors