Email-in (IMAP)
Dedicated mailbox per environment; investor distributions land directly. The pipeline polls, fingerprints, and routes attachments through the same conversion stages as portal-sourced sheets.
Email-in, portal automation, OCR, learning templates, versioned activation. The full pipeline that replaces manual ratesheet wrangling — and what to look for when you evaluate one.
Dedicated mailbox per environment; investor distributions land directly. The pipeline polls, fingerprints, and routes attachments through the same conversion stages as portal-sourced sheets.
Headless-browser scripts log into vendor portals on a schedule, download the latest ratesheet, and feed it through ingestion. Credentials are encrypted at rest with online master-key rotation.
Image-only PDFs are converted with PDFBox + Tess4J. Table-detection algorithms isolate the rate grid first, then OCR runs cell-by-cell — orders of magnitude more accurate than full-document OCR.
DRAFT → ACTIVE → SUPERSEDED transitions with audit timestamps and content hashes. Rollback is a single API call; historical replay reads the version that was ACTIVE at any prior moment.
Most originators run a daily ritual: an ops person logs into eight to twenty wholesale-lender portals, downloads the morning ratesheet, opens each in Excel, and either manually re-types the relevant rows into the LOS or pastes them into a master spreadsheet. The same person also fields the email distributions that arrive across the day. Every error is a pricing error; the audit trail lives in the inbox.
At a typical correspondent originating 3,000 loans per year with eight active investors, that's roughly 8 investors × 0.5 hours/day × 250 business days × $75/hr loaded rate ≈ $75,000/year of pure ratesheet wrangling — before counting the cost of mistakes that propagate to live pricing. Automation removes 80–90% of that cost on day one and improves quality monotonically as the template engine learns.
Frequently asked
Software that ingests investor and wholesale-lender ratesheets — usually delivered as email attachments, downloaded from portals, or scraped from public web pages — and converts them into a structured, queryable form an originator's pricing engine can run against. The whole pipeline replaces the manual download → Excel reformat → LOS-import workflow most teams still run by hand.
Manual ratesheet handling is a daily, error-prone, multi-hour task that scales linearly with the number of investors. Errors propagate to live pricing and frequently surface only at lock-day. Automation reduces the daily ops cost, eliminates an entire class of pricing errors, and produces a versioned audit trail of every ratesheet that hit your engine.
Email IMAP polling for the inboxes investors send to, headless-browser portal automation for vendors that gate ratesheets behind a login, web/API scraping for public ratesheets, and direct file upload as the universal fallback. PDFs, Excel, and image-only PDFs (OCR) all need first-class support — investors deliver in all three.
A learning template engine: the first time a new ratesheet shape appears, an operator confirms the column-to-field mapping; the system stores that as a template keyed by document fingerprint. Subsequent sheets from the same investor match the template with no human intervention. Best-in-class systems blend template lookup, AI-assisted mapping, and regex fallback so onboarding gets faster every week.
The pipeline detects the format mismatch on the first arrival, routes the ratesheet to a human-review queue, and learns the new template once an operator confirms. The active version of the ratesheet remains the previous one until the new ingestion is reviewed and activated — no surprise overwrites.
For the subset of investors who deliver image-only PDFs (more common than vendors admit), yes. The trick is not to OCR the whole document — extract the rate grids with table-detection algorithms (PDFBox / Tess4J / Camelot equivalents), then OCR the cells. Quality is high enough for production when paired with a confidence-threshold review queue.
Ratesheets transition through DRAFT → ACTIVE → SUPERSEDED states. Each transition is audited with timestamp, actor, and a content hash. Pricing always reads the ACTIVE version unless the caller passes an as-of timestamp for historical replay. Rollback is a state transition, not a data restore.
Where to next
Feature: ratesheet ingestion
The how-it-works for the RateStack pipeline specifically — sources, conversion, mapping, versioning.
ReadPillar: ratesheet automation
Long-form guide that goes deeper on email-in, OCR, learning templates, and operator workflows.
ReadROI calculator
Estimate the dollar value of automation against your specific investor count and ops loaded rate.
ReadCategory: pricing engines
The system that runs against the ratesheets you ingest. Often bought together.
ReadVendor comparisons
Side-by-side comparisons against Optimal Blue, Polly, Lender Price, ICE, and Mortech.
ReadSee it on your data
We'll wire up your messiest investor and onboard a real template live in the demo.
ReadStop downloading PDFs at 7 a.m.
We'll wire up email-in or portal scrape, run an actual ratesheet through the pipeline, and show you the QC queue, the mapping editor, and the audit history live on your data.