代码库
High-performance web crawling engine with bindings for 11 languages
Rust
crawlingcsharpelixirffigolangjavamcpphppythonrubyrusttypescriptwasmweb-crawlerweb-scraping
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.
HTML
hocrhtmlhtml-convertermarkdownmarkdown-converterragtext-extractiontext-processing
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
Rust
buncsharpdocument-intelligenceelixirffigolangjavametadata-extractionnodepdf-extractionpdfiumphppythonragrubyrusttable-extractiontesseracttext-extractionwasm