VECTOR FEED

VLM & Hybrid Backends

VLM Backend

The VLM backend uses a single Vision-Language Model — Qwen2-VL (7B or 72B parameters) — to perform end-to-end document parsing. One model handles layout detection, text recognition, formula extraction, and table reconstruction simultaneously.

Supported Inference Engines

Engine	Install Extra	Best For
`vllm-engine`	`vlm-vllm`	Production, high-throughput
`lmdeploy-engine`	`vlm-lmdeploy`	Deployment-optimized serving
`transformers-engine`	`vlm-transformers`	Development, debugging

Usage

vector-feed -p document.pdf --backend vlm \
  --vlm-model Qwen/Qwen2-VL-7B-Instruct \
  --vlm-engine vllm-engine

Hybrid Backend

The Hybrid backend combines VLM coarse layout detection with expert model refinement:

VLM pass: Identifies page layout, reading order, and content types at a high level.
Expert refinement: Pipeline models (table structure, formula verification, OCR) refine specific regions identified by the VLM.

This balances the contextual awareness of VLM with the precision of specialized models.

Model Singleton

Both backends use thread-safe singletons (ModelSingleton / AtomModelSingleton) to prevent redundant model loading across concurrent requests.