zero markup · API-first · BYOK · flat $50$30/mo

Extract data from any document

We unpack archives, read 30+ formats, and extract fields against your JSON schema. No infrastructure to deploy, no token markup to pay.

Incoming invoices·Price lists·Order forms·Tender documents·Resumes·Document reconciliation·Client onboarding·Shipping documents·Inbound requests·Product data sheets·Document recognition·Incoming invoices·Price lists·Order forms·Tender documents·Resumes·Document reconciliation·Client onboarding·Shipping documents·Inbound requests·Product data sheets·Document recognition·

Try it right now — no signup.

Generate a temporary API key and send up to 10 test requests for free. No registration, no credit card.

Questions? contact us
curl -X POST https://api.hotdoc.io/v1/process \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F 'prompt=Extract from this invoice:
- invoice_number
- date (ISO 8601)
- vendor_name
- total_amount (number, no currency)
- line_items: description, qty, unit_price, total

Return null for missing fields.' \
  -F 'schema={
    "invoice_number": "string",
    "date": "string",
    "vendor_name": "string",
    "total_amount": "number",
    "line_items": [{
      "description": "string",
      "qty": "number",
      "unit_price": "number",
      "total": "number"
    }]
  }'

From file import to JSON
in a single API call.

Hotdoc is an orchestration layer on top of OSS libraries: we handle archives and 30+ document formats, schema-based extraction and enrichment. You send a link to your documents and a prompt with a schema. The rest is on us.

You pay for the orchestrator, reliability (retries, fallbacks, recovery), infrastructure, support, and proven pipelines for the hard cases. No markup on OCR or tokens.

01

Initial processing

We unpack archives, read 30+ formats natively, and run scans and images through Vision OCR.

ZIP / RAR30+ formatsVision OCR
02

Data extraction

We extract fields against your JSON schema. You define the prompt and the model per request.

JSON schemaBYOK
03

Data enrichment

Data is enriched on the fly: each new document builds on and refines what's already been extracted.

04

Structured output

We return JSON, CSV, or MD with no volume limits within a session.

JSONCSVMD

Need classification, chunking, or a custom flow?

Contact us

Not OSS.
Not an enterprise API.
Something else.

OSS
Funded API
hotdoc
Full processing flow
Zero-ops (nothing to deploy)
Control over the tools used
BYOK
No token markup
Support
Integration speed weeks days hours

Between "build it yourself" and "overpay for tokens" there's a third option: a ready-made tool with zero markup.

Hotdoc vs OSS (Docling, Marker, etc.) Parsing without schema extraction, enrichment, or orchestration. You deploy and run the GPUs yourself. Costs can top $1,000 before you see a single result. We provide ready infrastructure and proven pipelines — no document plumbing on your side.

Hotdoc vs funded APIs (Reducto, Unstructured, etc.) Most products monetize through token markup: the more you spend, the more commission you pay. We charge a flat fee for the tooling — you use your own API keys and never overpay for tokens.

Architecture and
processing methods

A deterministic orchestrator with surgical AI. Every step is controlled, every error is handled, every result is reproducible.

BYOK: your key is used at runtime and never stored on Hotdoc's side.

Multi-LLM orchestrationOpenRouter + direct integrations, auto-fallback between models
Vision recognitionbeats classic OCR on any layout, in any language
Cross-document contextdata from all files enriches the combined result
Multi-step pipelinesequential steps with error handling and state memory
Fault-tolerant executionretries, checkpoints, session recovery
Key-level BYOKyour key, the provider's real price, zero markup
OpenRouterMistralOpenAIAnthropicxAIDeepseekXiaomiHybrid AI ArchitectureVision ExtractionStateful OrchestrationMulti-step PipelinesFault-tolerant Execution

Popular use cases

Common tasks that work out of the box. For anything custom — we'll set up a flow.

invoice-extraction

Invoice Extraction API

Invoices in any format → structured JSON against your schema. Batches from email or S3 in a single call.

input →PDF / scan / ZIP
output →vendor, amount, line_items, date, etc.
resume-parser

Resume / CV Parser API

A stream of CVs in any language → unified candidate cards, scored against your criteria.

input →DOCX / PDF / HTML
output →personal data, skills, experience, contact, score, etc.
bank-statement

Bank Statement Extraction

Statements from any bank → a single transaction format. Works with PDF and Excel exports.

input →PDF / XLSX
output →date, description, amount, balance
batch-processing

Batch / Archive Processing

A ZIP with hundreds of mixed-format files — one request, one JSON. Per-file status in every batch.

input →ZIP / RAR (30+ formats inside)
output →results[], errors[], stats

Something non-standard? We'll set up a flow.

contact us
$30
$50 −40%
per month · full functionality

Need high volume, self-hosted,
or dedicated infrastructure?

email hello@hotdoc.io

What's included

Tool & orchestrator ✓ included
archives & 30+ formats ✓ included
multi-file processing ✓ included
fault-tolerant execution ✓ included
ongoing development ✓ included
email support ✓ included
Vision OCR & AI → BYOK, no fees
BYOK: plug in your own key for Mistral, OpenAI, Anthropic, or any OpenRouter provider. You pay the provider directly for tokens and OCR at their price — no markup, no hidden fees.

Tokens aren't expensive on their own they're made expensive by whoever stands between you and the provider.

Funded companies take your files, run them through Claude or GPT, and bill you $10+ per thousand pages. That's where their margin lives. The bigger your volume, the more you overpay for tokens.

We decided to do it differently: you plug in your key, pay the provider directly, and pay us a flat fee for a ready-made tool. Not because we don't want to earn more but because we think this is simply the right way to do it.

Documents are never stored

Processed at runtime, not retained after processing (by default).

Encryption at every level

TLS in transit, encryption at rest. Access by API key only.

BYOK — your keys stay yours

Your provider key is used at runtime and never stored on our side.

On-prem / VPC on request

Full isolation for Enterprise and self-host.

FAQ. Common questions

Still have questions? Email hello@hotdoc.io — we reply fast.

A Document Processing API. You send a file (or an archive) and a prompt with a JSON schema of fields. The rest is on us: unpacking, conversion, OCR, schema-based AI extraction, enrichment, structured response. For non-standard cases we build custom flows.
Bring Your Own Key. You plug in your own API key from Mistral, OpenAI, Anthropic, or any OpenRouter provider. You pay the provider directly for tokens and OCR at their price. We don't stand between you and the provider, and we add no markup — it's a principle, not a pricing gimmick.
For the tool itself: the orchestrator, reliability (retries, fallbacks, session recovery), 30+ formats, batches, ongoing development, and support. Tokens and OCR are paid directly to the provider through your key.
Accuracy depends on the quality of the source files, schema complexity, your prompt, and the models you use. At the flow level we squeeze out as much accuracy as we can and point you to the right models. The best way to judge — run your own files with a test key, no signup. Need help with setup? Email hello@hotdoc.io
Yes. Generate a temporary API key right on this page and send up to 10 test requests for free. No registration, no credit card.
No. Self-serve: sign up, pay the subscription, cancel anytime. No contracts, no sales calls, no invoicing.
No. Documents are processed at runtime and not retained after processing (by default). With BYOK, your provider API key is never stored on our side either.
30+ formats: PDF, DOCX, XLSX, RTF, HTML, MSG, PNG, JPG, scans, ZIP/RAR archives, and more. Missing a format? Email hello@hotdoc.io — we'll add it on request.
There's a soft cap to protect the service — it's capacity control, not monetization. Most workloads never notice it. Need high volume? Email hello@hotdoc.io and we'll figure it out.
Yes, there's a one-time license. Interested? Email hello@hotdoc.io.

Got a question or an idea?