Pipeline Backend
The Pipeline backend is a traditional computer vision pipeline composed of sequential expert models. It is the default backend and offers the best throughput for batch processing on GPU hardware.
Processing Stages
Pre-processing
- Orientation Correction: Detects and corrects page rotation anomalies before parsing.
- Language Detection: Identifies document language
via
fast-langdetectto select appropriate OCR models.
Layout Analysis
- Page Layout Detection: Model-based classification of page regions (text, title, image, table, formula).
- Block Segmentation: Divides detected regions into individual content blocks.
- Reading Order Reconstruction: Reassembles blocks in logical reading sequence, handling multi-column and complex layouts.
OCR Engine
Performs text recognition across detected text regions with multi-language support. The OCR model is selected based on the detected document language.
Formula Recognition
| Subsystem | Function |
|---|---|
| MFD (Math Formula Detection) | Identifies regions containing mathematical expressions |
| MFR (Math Formula Recognition) | Converts detected formula images to LaTeX notation |
Table Recognition
| Type | Method |
|---|---|
| Wired Tables | Grid line detection and cell extraction |
| Wireless Tables | Content-spacing analysis for boundary detection |
| OTSL Format | Output Table Structure Language for downstream processing |
middle_json Assembly
All model outputs are merged into a normalized
middle_json intermediate representation, which is then
passed to the Markdown generator.