Output Formats
middle_json
The middle_json is the normalized intermediate
representation that sits between model inference and final output. All
backends produce this format, and all output generators consume it.
Structure
{
"page_num": 1,
"page_size": {"width": 595.28, "height": 841.89},
"layout": [
{
"type": "text" | "title" | "image" | "table" | "formula",
"bbox": [x0, y0, x1, y1],
"text": "Extracted content...",
"formula_latex": "\\int_{a}^{b} f(x)\\,dx",
"table_data": {"rows": [...], "cols": [...]},
"image_path": "page_1_img_0.png"
}
],
"metadata": {
"language": "en",
"backend": "pipeline",
"processing_time_ms": 2340
}
}Markdown Output
Standard Mode
- Headings for detected titles (
#,##, etc.) - Text blocks in reading order
- LaTeX formulas in
$$blocks - HTML tables for structured data
- Image references with alt text
NLP-Optimized Mode
- Continuous text without page breaks
- Merged truncated paragraphs
- Stripped headers, footers, and watermarks (Data Rinsing™)
- Removed page numbers and navigation artifacts
Content List
A flat list of all detected content blocks with their types, bounding boxes, and extracted text, used for downstream programmatic processing.