Skip to content
RateStack
Blog

Engineering an explainable pricing engine

Why we built the engine stateless, what the rule trace looks like under the hood, and the trade-offs we made along the way.

RTBy RateStack TeamPublishedReviewed9 min read

The pricing-service is a stateless Spring Boot app that, on each request, loads the active ratesheet (Caffeine 60s → Redis → MySQL), evaluates the rule chain, and returns the trace as a side-effect of doing the math. Stateless means horizontal scaling is trivial — no sticky sessions, no in-memory state to migrate, no leader election. The three-tier cache covers latency.

The engine's data model is the interesting part. Programs are rows. Eligibility predicates are rows. Adjustment rules are rows. Combine strategies are an enum. The pricing call is a deterministic walk over those rows. Adding an investor is configuration; adding a new combine strategy is the only thing that requires code.

The trace

The trace is the output of List<TraceLine> built up in lockstep with price computation. Every rule that fires writes a line with its id, description, condition matched (rendered for humans), the combine strategy it used, and the price contribution. The cumulative running price is computed inline so the consumer doesn't have to recompute it.

Crucially, the trace is not a separate "explain" pass. The engine builds it as it computes — the "explain" mode is just the regular pricing mode with the trace included in the response. There is no divergence between the explained quote and the regular quote.

Trade-offs

We accept some latency cost for the trace — a few percent in our benchmarks. Worth it. We accept some response size cost — typical quotes carry 8-20 trace lines, ~3 KB after gzip. Worth it. We accept some database row count for the rule definitions — currently around 100k rows for a typical correspondent investor pack. Worth it.

What we don't accept: in-memory rule caches that diverge from the database, JIT-compiled rule fragments that bypass the trace path, or any pattern that produces a price without a corresponding trace line. Those would all be faster. They'd also undermine the contract.

What surprised us

We expected operators to use the trace mostly for compliance. They use it more for self-service debugging. When a quote "feels off," the first thing the operator does is open the drill-down and look at the rules. Compliance is the formal use case; debugging is the daily use case. Building for compliance gave us debugging for free.

Engineering an explainable pricing engine | RateStack