Extract data from any document
Try it right now — no signup.
Generate a temporary API key and send up to 10 test requests for free. No registration, no credit card.
curl -X POST https://api.hotdoc.io/v1/process \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F 'prompt=Extract from this invoice:
- invoice_number
- date (ISO 8601)
- vendor_name
- total_amount (number, no currency)
- line_items: description, qty, unit_price, total
Return null for missing fields.' \
-F 'schema={
"invoice_number": "string",
"date": "string",
"vendor_name": "string",
"total_amount": "number",
"line_items": [{
"description": "string",
"qty": "number",
"unit_price": "number",
"total": "number"
}]
}' From file import to JSON
in a single API call.
Hotdoc is an orchestration layer on top of OSS libraries: we handle archives and 30+ document formats, schema-based extraction and enrichment. You send a link to your documents and a prompt with a schema. The rest is on us.
You pay for the orchestrator, reliability (retries, fallbacks, recovery), infrastructure, support, and proven pipelines for the hard cases. No markup on OCR or tokens.
Initial processing
We unpack archives, read 30+ formats natively, and run scans and images through Vision OCR.
Data extraction
We extract fields against your JSON schema. You define the prompt and the model per request.
Data enrichment
Data is enriched on the fly: each new document builds on and refines what's already been extracted.
Structured output
We return JSON, CSV, or MD with no volume limits within a session.
Need classification, chunking, or a custom flow?
Contact usNot OSS.
Not an enterprise API.
Something else.
| OSS
| Funded API
| hotdoc | |
|---|---|---|---|
| Full processing flow | ✗ | ✓ | ✓ |
| Zero-ops (nothing to deploy) | ✗ | ✓ | ✓ |
| Control over the tools used | ✓ | ✗ | ✓ |
| BYOK | ✓ | ✗ | ✓ |
| No token markup | ✓ | ✗ | ✓ |
| Support | ✗ | ✓ | ✓ |
| Integration speed | weeks | days | hours |
Between "build it yourself" and "overpay for tokens" there's a third option: a ready-made tool with zero markup.
Hotdoc vs OSS (Docling, Marker, etc.) Parsing without schema extraction, enrichment, or orchestration. You deploy and run the GPUs yourself. Costs can top $1,000 before you see a single result. We provide ready infrastructure and proven pipelines — no document plumbing on your side.
Hotdoc vs funded APIs (Reducto, Unstructured, etc.) Most products monetize through token markup: the more you spend, the more commission you pay. We charge a flat fee for the tooling — you use your own API keys and never overpay for tokens.
Architecture and
processing methods
A deterministic orchestrator with surgical AI. Every step is controlled, every error is handled, every result is reproducible.
BYOK: your key is used at runtime and never stored on Hotdoc's side.
Popular use cases
Common tasks that work out of the box. For anything custom — we'll set up a flow.
Invoice Extraction API
Invoices in any format → structured JSON against your schema. Batches from email or S3 in a single call.
Resume / CV Parser API
A stream of CVs in any language → unified candidate cards, scored against your criteria.
Bank Statement Extraction
Statements from any bank → a single transaction format. Works with PDF and Excel exports.
Batch / Archive Processing
A ZIP with hundreds of mixed-format files — one request, one JSON. Per-file status in every batch.
Something non-standard? We'll set up a flow.
contact usNeed high volume, self-hosted,
or dedicated infrastructure?
What's included
Tokens aren't expensive on their own — they're made expensive by whoever stands between you and the provider.
Funded companies take your files, run them through Claude or GPT, and bill you $10+ per thousand pages. That's where their margin lives. The bigger your volume, the more you overpay for tokens.
We decided to do it differently: you plug in your key, pay the provider directly, and pay us a flat fee for a ready-made tool. Not because we don't want to earn more — but because we think this is simply the right way to do it.
Documents are never stored
Processed at runtime, not retained after processing (by default).
Encryption at every level
TLS in transit, encryption at rest. Access by API key only.
BYOK — your keys stay yours
Your provider key is used at runtime and never stored on our side.
On-prem / VPC on request
Full isolation for Enterprise and self-host.
FAQ. Common questions
Still have questions? Email hello@hotdoc.io — we reply fast.