VECTOR FEED

VLM & Hybrid Backends

VLM Backend

The VLM backend uses a single Vision-Language Model — Qwen2-VL (7B or 72B parameters) — to perform end-to-end document parsing. One model handles layout detection, text recognition, formula extraction, and table reconstruction simultaneously.

VLM & Hybrid Routing

Supported Inference Engines

Engine Install Extra Best For
vllm-engine vlm-vllm Production, high-throughput
lmdeploy-engine vlm-lmdeploy Deployment-optimized serving
transformers-engine vlm-transformers Development, debugging

Usage

vector-feed -p document.pdf --backend vlm \
  --vlm-model Qwen/Qwen2-VL-7B-Instruct \
  --vlm-engine vllm-engine

Hybrid Backend

The Hybrid backend combines VLM coarse layout detection with expert model refinement:

  1. VLM pass: Identifies page layout, reading order, and content types at a high level.
  2. Expert refinement: Pipeline models (table structure, formula verification, OCR) refine specific regions identified by the VLM.

This balances the contextual awareness of VLM with the precision of specialized models.

Model Singleton

Both backends use thread-safe singletons (ModelSingleton / AtomModelSingleton) to prevent redundant model loading across concurrent requests.