TL;DR
Drop any alternatives doc — sub docs, K-1, PCAP, capital call — into the inbox or upload via API. Document Intelligence classifies the type, extracts structured fields with confidence scores, validates against your CRM, and pushes the result downstream. Median round-trip is 14 seconds. Human review queue handles the <3% of cases the model isn't confident on.
Throughput & accuracy · live
From the last 30 days of production traffic across all customers (anonymized).
Request path · what runs at each step
Click any step to expand. Same five steps run for every doc type, but the field schema is different per class.
01Ingest▶
Doc arrives via one of three channels: email inbox (docs@your-firm.aqua.so), API upload (POST /v1/docs), or watched SFTP folder. Files up to 200 MB; PDF, DOCX, image, or scanned.
S3-backed. Idempotency via SHA-256 hash. Duplicate detection within a 30-day window.
02Classify▶
First-pass classifier identifies doc class: sub_doc, capital_call, distribution_notice, k1, pcap, statement, other. Returns a confidence score; below 0.85 routes to human review.
Classes are versioned; we pin a class taxonomy per customer so downstream systems don't break when new classes are added.
03Extract▶
Class-specific extractor pulls the structured fields. For a K-1: GP, LP, EIN, partner share % across capital, profit, and loss; box-by-box numeric values; footnotes flagged for review.
Output is JSON conforming to the class schema. Confidence scores per field; fields below 0.7 go to the review queue even if the overall doc passed.
04Validate▶
Cross-reference against your CRM: investor name → CRM contact; fund name → CRM account; amounts → expected ranges. Anomalies flagged (e.g., capital call amount >2σ of historical).
Validation rules are configurable per customer. Default rules ship with Aqua; custom rules added during onboarding.
05Push downstream▶
JSON pushed to: CRM (Salesforce / Wealthbox), fund admin (SS&C / Citco), Aqua reporting database. Webhooks fire for downstream listeners. Doc + extracted JSON archived for 7 years.
Failures retry with exponential backoff (max 6 attempts). Persistent failures alert your CSM and the customer's ops lead.
API examples
Three ways to upload a doc and poll for the structured result. All return the same JSON shape.
# Upload a doc curl -X POST https://api.aqua.so/v1/docs \ -H "Authorization: Bearer $AQUA_API_KEY" \ -F "file=@subdoc.pdf" \ -F "webhook=https://your.app/aqua-hook" # Response: { "id": "doc_abc123", "status": "processing" } # Poll for the structured result curl https://api.aqua.so/v1/docs/doc_abc123 \ -H "Authorization: Bearer $AQUA_API_KEY"
import aqua client = aqua.Client(api_key="...") doc = client.docs.create( file=open("subdoc.pdf", "rb"), webhook="https://your.app/aqua-hook" ) # or wait synchronously result = client.docs.retrieve(doc.id).wait() print(result.extracted)
import Aqua from "aqua"; const client = new Aqua({ apiKey: process.env.AQUA_API_KEY }); const doc = await client.docs.create({ file: fs.createReadStream("subdoc.pdf"), webhook: "https://your.app/aqua-hook" }); const result = await client.docs.retrieve(doc.id).wait(); console.log(result.extracted);
FAQ
What about scanned PDFs?
Yes. OCR runs in step 1 if the PDF has no extractable text. Adds ~3 seconds to median latency. Confidence scores reflect OCR quality.
How do you handle handwritten signatures or notes?
Detected and flagged but not extracted as structured data. You'll see a signature_present flag on relevant pages so your downstream system can route for human signature verification.
What's the cost model?
Per-doc pricing with volume tiers. K-1 / PCAP / sub docs are priced higher than statements because they contain more structured fields. Talk to your AE for current pricing.
Can we train it on our specific custom docs?
Yes. Custom doc classes are added during onboarding. Typical lift is ~50 example docs to reach 95% accuracy on a new class.
What about data residency & compliance?
SOC 2 Type II. Optional US, EU, or US-only data residency. Docs encrypted at rest (AES-256) and in transit (TLS 1.3). 7-year retention default, configurable to your needs.