A three-path intelligent document processing pipeline: text-based PDFs go through unstructured / docling ($0.001 / doc), scanned pages go through Tesseract with Claude validating critical fields, and complex layouts (forms, multi-column reports, 10-Ks) route directly to Claude Vision. Every extraction is validated against a Pydantic schema with cross-field rules (subtotal + tax = total, date ranges, NIT/EIN patterns). Confidence per field decides auto-approve vs human review.
Real document corpora are mixed: 60% clean PDFs from ERP exports, 30% scanned forms with handwriting, 10% complex layouts where the standard parsers return garbage. A single approach loses money on either accuracy or cost.
unstructured-only: 0.74 field-F1, 0.8s p95, $0.001/doc. Loses on scanned forms and complex 10-K layouts. Cheap but useless on a third of the corpus.
Vision-only: 0.91 field-F1, 0.88 doc-acc, but 11.4s p95 and $0.082/doc. You can afford it for the 10% hard cases — paying it for the 60% easy ones is throwing money away.
Claude Vision classifier (one cheap call per doc) decides which of the three paths to take. Path 1 for clean PDFs, Path 2 for scanned, Path 3 for hard layouts.
Pydantic schemas per document type enforce structure. Invoice schema requires vendor.name · invoice_number · total · line_items[]. Form schema is different. Schema-invalid extractions get retried with the next-tier method.
Cross-field validators: subtotal + tax = total, NIT/EIN regex, date sanity, insurance ID format. A field with conf < 0.85 on a critical attribute routes to human review even if everything else passes.
Net: 0.89 field-F1 at $0.018/doc — same accuracy as Vision-only at 5× lower cost.
Walks through Docker stack (postgres + tesseract), dataset downloads (FUNSD + DocVQA + PubLayNet), classifier run, three-path extraction over 939 docs, evaluation on FUNSD field-level + DocVQA doc-level, and Next.js demo launch.
Watch bounding boxes appear over the source page as fields are detected, then stream into the JSON extract panel with per-field confidence bars. Cross-field validators run after extraction completes. Final routing decision shows at the bottom.