Extract text, tables, and structure from any document.
Self-hosted OCR pipeline. PDF, DOCX, or image → clean JSON in roughly 2 seconds. No external APIs. No per-call billing. Call it from anywhere.
Or try one of the sample documents:
Processing
initializing…
0%Results
Extracted in real time. Switch tabs to explore each layer.
What it extracts
~2s typical · all stages run locallyPowered by pdfplumber, pytesseract, spaCy, and scikit-learn. Handwriting fallback when OCR confidence drops.
Use it from anywhere
No UI required. Same pipeline, callable from any HTTP client. Upload returns a
task_id;
poll
/api/status/<id>
and fetch
/api/results/<id>
when done.
# Upload — returns task_id curl -F "file=@invoice.pdf" \ -F "htr_mode=auto" \ http://document-intelligence-pipeline-production.up.railway.app/api/process-document # → {"task_id":"abc-123","status":"queued"} # Poll status curl http://document-intelligence-pipeline-production.up.railway.app/api/status/abc-123 # → {"task_id":"abc-123","status":"completed","progress":100} # Fetch results curl http://document-intelligence-pipeline-production.up.railway.app/api/results/abc-123 > result.json