Cluster: Email-in ratesheet ingestion in productionOn the roadmap
What breaks, what to monitor, the operational runbook.
The seven-stage pipeline that turns inbound investor documents into versioned, queryable ratesheets — and the operational disciplines that keep it boring.
The 7am ratesheet ritual — log into vendor portals, download PDFs, copy cells into the LOS — is the single most expensive recurring task in small and mid-size lender ops. Manual entry has a non-trivial error rate, no audit trail outside the inbox, and no resilience: if the person who runs it is sick, prices don't update.
Automation isn't about replacing humans. It's about moving them up the value chain — from data entry to QC and exception handling. The goal of a ratesheet automation pipeline is to take 90% of the morning's work off humans while making the remaining 10% (the unusual cases) more visible and auditable.
Investors deliver ratesheets through four channels. A serious pipeline handles all four with the same downstream stages:
Sources deliver in three formats: native Excel (XLSX), PDF (text-based or image-based), and HTML/JSON (scrape). The conversion stage normalizes every input into the same shape: { sheets: [{ name, rows: [[cell, cell, ...]] }] }.
Critically: the conversion stage is content-addressed. Every raw document is hashed (SHA-256), stored in object storage by hash, and deduplicated. The same investor sending the same sheet twice in a morning is a no-op after the first insert.
Conversion produces undifferentiated rows. Extraction segments those rows into tables and classifies each table by purpose: a rate grid, an LLPA matrix, a program-eligibility block, a notes section, a margin schedule. Each classification path feeds different downstream normalization logic.
Classification is heuristic: row-density patterns, header-row recognition (presence of expected fields like "rate," "price," "LTV," "FICO"), and column-type inference. Confidence below threshold raises a QC item before the ratesheet can activate.
The most consequential design decision in the pipeline: how do you normalize an investor's column headers (which vary by investor and change occasionally) into the engine's canonical fields?
A naive approach hardcodes per-investor mappings. This is brittle: a column reorder or a header rename breaks the parser, and adding a new investor requires code.
A serious approach uses a three-tier resolver:
The downstream property is that the system learns. New investors cost an AI call once; subsequent sheets from that investor match the stored template. Vendor changes (column reorder, header rename) are absorbed by re-running the resolver and writing a new template. The pipeline keeps running.
A ratesheet is not a configuration file you overwrite. It is a versioned data object with three states:
Activation is an explicit human action by default — an operator approves the DRAFT through the QC dashboard. The transition is atomic: previous ACTIVE flips to SUPERSEDED, new ACTIVE goes live, and an event fires for cache invalidation.
Rollback is a single API call. Rolling back to a prior version flips states again. The pricing engine reflects the rollback within the event-driven cache invalidation window — typically under a second.
QC is what makes automation viable in production. The dashboard shows, per DRAFT ratesheet:
Real pipelines fail in predictable ways:
A healthy ingestion pipeline reports on:
A team that watches these metrics knows when to add a vendor template, when to retrain the OCR for a problematic source, and when to escalate to the investor about a delivery problem.
Cluster
Deep dives that pick up where the pillar leaves off.
Migrating from manual ratesheet entry01
A two-week playbook from copy-paste to automated ingestion.
ReadRatesheet ingestion feature page02
Capabilities, specs, and the integration story.
ReadGlossary: OCR03
What OCR is and how the pipeline uses it.
ReadGlossary: ratesheet04
What an investor ratesheet contains and how it's used downstream.
ReadCluster: Email-in ratesheet ingestion in productionOn the roadmap
What breaks, what to monitor, the operational runbook.
Cluster: The seven failure modes of OCR'd ratesheetsOn the roadmap
Anatomy of each failure and the QC gate that catches it.
Frequently asked
XLSX (Excel), text-based PDFs, image-based PDFs (via OCR), HTML scrapes, and direct JSON. Manual upload supports any of those plus CSV.
For an investor we already have a template for, instant. For a new investor with a typical layout, the first sheet costs an AI call; subsequent sheets match the template directly. Operationally, most teams reach steady-state ingestion across their full investor pack within two weeks.
It can, but we recommend against it for new deployments. The QC step is what catches subtle mapping errors and unusual investor changes. Once you trust the pipeline on a per-investor basis, you can configure auto-activation for that investor specifically — most teams keep human approval as the default.
Portal automation is a contractual relationship — we use credentials you provide and respect the portal's terms. We don't bypass CAPTCHAs or anti-automation defenses. When a portal breaks automation, we surface the failure explicitly so you can fall back to manual download.
Yes, content-addressed by SHA-256, encrypted at rest, with bucket versioning and configurable lifecycle expiration. Default retention is 90 days post-supersede; configurable on Business and Enterprise.
Other pillars
Loan-level pricing
How modern mortgage pricing engines turn ratesheets, eligibility predicates, and adjustment rules into an explainable price.
Rate lock management
Lifecycle, sell-side, lock-desk policy, and the operational discipline that keeps lock-day surprises rare.
Compliance & audit
ECOA, HMDA, TRID, and the audit-chain disciplines that make compliance a query rather than a quarterly fire drill.
Secondary marketing
Bid-tape execution, hedging inputs, pull-through analysis, and the event-stream architecture that ties them together.
Ready to see it on your data?
Spin up a sandbox or talk to us about a guided demo. Everything in this guide is wired into the platform — not aspirational.