Live demo · Self-hosted · No API keys

Extract text, tables, and structure from any document.

Self-hosted OCR pipeline. PDF, DOCX, or image → clean JSON in roughly 2 seconds. No external APIs. No per-call billing. Call it from anywhere.

pdf · docx · png · jpg no API key needed ~2s avg response runs anywhere Python runs

Drop a file or click to browse

See what the pipeline extracts in ~2 seconds

PDF · DOCX · PNG · JPG · max 10 MB

Handwriting (HTR) Max size 10 MB Timeout 30 s Auto falls back to handwriting recognition when OCR confidence drops below 65%

Or try one of the sample documents:

loading…

What it extracts

~2s typical · all stages run locally

classify → ocr → tables → entities → key-values → layout → summary

Use it from anywhere

No UI required. Same pipeline, callable from any HTTP client. Upload returns a task_id; poll /api/status/<id> and fetch /api/results/<id> when done.

# Upload — returns task_id
curl -F "file=@invoice.pdf" \
     -F "htr_mode=auto" \
     http://document-intelligence-pipeline-production.up.railway.app/api/process-document
# → {"task_id":"abc-123","status":"queued"}

# Poll status
curl http://document-intelligence-pipeline-production.up.railway.app/api/status/abc-123
# → {"task_id":"abc-123","status":"completed","progress":100}

# Fetch results
curl http://document-intelligence-pipeline-production.up.railway.app/api/results/abc-123 > result.json

# PowerShell 7+ — uses included wrapper script
# Download submit-file.ps1 from /static/submit-file.ps1

.\submit-file.ps1 -FilePath .\invoice.pdf -Output result.json

# Options:
#   -HtrMode       auto | force | off  (default: auto)
#   -TesseractLang eng | eng+dan | ...  (default: eng)
#   -ApiUrl        override host
#   -NoPoll        skip status polling, just upload

# Or with raw Invoke-RestMethod:
$form = @{ file = Get-Item .\invoice.pdf; htr_mode = "auto" }
$task = Invoke-RestMethod -Uri "http://document-intelligence-pipeline-production.up.railway.app/api/process-document" `
                          -Method Post -Form $form

import requests, time

API = "http://document-intelligence-pipeline-production.up.railway.app/api"

# 1. Upload
with open("invoice.pdf", "rb") as f:
    r = requests.post(f"{API}/process-document",
                      files={"file": f},
                      data={"htr_mode": "auto"})
task_id = r.json()["task_id"]

# 2. Poll until done
while True:
    status = requests.get(f"{API}/status/{task_id}").json()
    if status["status"] in ("completed", "failed"): break
    time.sleep(1)

# 3. Fetch results
result = requests.get(f"{API}/results/{task_id}").json()
print(result["text"])

Download PowerShell wrapper Full API docs

Extract text, tables, and structure from any document.

Processing

Results

What it extracts

Use it from anywhere