Reconciliation as Eventual Consistency

🎯

Reconciliation reframed as a distributed-systems problem: your ledger and the bank are two replicas that drift. Detecting divergence, classifying it by structure and age, and converging — automatically where safe, with a human where not.

Introduction

Your system thinks a user has 1,200,000 VND. The bank thinks they have 1,150,000. Who's right? The uncomfortable answer that every fintech engineer eventually internalises: the bank is, always, and your job is not to be right but to detect when you've drifted and converge back.

This is reconciliation, and it's widely misunderstood as a boring back-office matching task — tick off the rows, flag the mismatches, done. That framing misses what's actually going on. Reconciliation is a distributed-systems problem: you and the bank are two independent replicas of the same financial reality, communicating over lossy channels (webhooks, batch files, APIs), and the question "do our records agree?" is the question "have my replica and the source-of-truth replica converged?" Seen that way, reconciliation isn't matching — it's eventual consistency with money on the line.

Having built reconciliation for a payments system, I want to reframe it properly: why your ledger and the bank inevitably diverge, why that's expected rather than a bug, how to build an engine that detects and classifies the drift, and how to converge — automatically where safe, with a human where not.

Reconciliation Is Not Matching — It's Convergence

The matching framing assumes the two systems should always agree and a mismatch is an error. But in a distributed system, temporary divergence is the normal, healthy state. A transfer happens at the bank at 14:00:00. The webhook telling you about it arrives at 14:00:03 — or 14:05, or after a retry at 14:30, or, if the webhook was dropped, never (until reconciliation finds it). Between the event and your learning of it, your replica is correctly behind. You are not wrong; you are eventually consistent.

So the goal isn't "always match." It's:

Given two replicas that drift, detect the divergence, classify why, and converge — bringing your ledger into agreement with the source of truth.

This is the same shape as any eventual-consistency system (DNS, replicated databases, CRDTs): independent copies, asynchronous propagation, a convergence process that reconciles them. Reconciliation is that convergence process, and money is just the payload.

The Sources of Drift

Before building the engine, name why the replicas diverge. Each cause needs different handling:

Timing (benign). The bank has an event you haven't received yet. Given more time, it arrives and you converge on your own. Reconciliation should not panic over these — only over divergence that persists.
Lost events. A webhook was dropped and never retried successfully. The bank has it; you never will, unless reconciliation surfaces it. This is the main thing recon recovers.
Your bugs. You recorded a credit that never happened, or recorded the wrong amount (a currency-unit error, a parsing slip). Your replica is genuinely wrong.
Their corrections. Banks reverse, adjust, and post fees you didn't initiate. Real changes on their side you must absorb.
Fraud / unexpected events. Money moved that no part of your system initiated. The scariest class — recon is often how you first learn of it.

A mature reconciliation engine doesn't just find discrepancies; it tells you which kind each one is, because the response to "timing" (wait) is the opposite of the response to "fraud" (alarm).

Building a Reconciliation Engine

The core loop: pull the source of truth (the bank statement for a period), pull your ledger for the same period, and compare them by a shared key — the bank's transaction ID, which you stored when you ingested each event.

async function reconcile(period: DateRange) {
  const statement = await bankApi.getStatement(period) // source of truth
  const ledger = await db.ledgerEntry.findMany({ where: { period } })

  const byBankId = new Map(ledger.map(e => [e.bankTransactionId, e]))
  const discrepancies: Discrepancy[] = []

  // 1. Every bank line should have a matching ledger entry
  for (const line of statement.lines) {
    const entry = byBankId.get(line.transactionId)
    if (!entry) {
      discrepancies.push({ type: 'MISSING_IN_LEDGER', line }) // we never recorded it
    } else if (entry.amount !== line.amount) {
      discrepancies.push({ type: 'AMOUNT_MISMATCH', line, entry })
    }
    byBankId.delete(line.transactionId)
  }

  // 2. Anything left in our ledger has no bank counterpart
  for (const orphan of byBankId.values()) {
    discrepancies.push({ type: 'MISSING_AT_BANK', entry: orphan })
  }

  return discrepancies
}

Two design properties matter as much as the comparison itself:

Idempotent and resumable. A reconciliation run over a busy day processes a lot; it must be safe to re-run (it computes the same discrepancies) and able to resume if interrupted. This is the same idempotency discipline the rest of the money pipeline relies on.
Run continuously, not once. Because timing divergence resolves itself, a discrepancy that's 30 seconds old is probably benign; one that's 24 hours old is real. Re-running reconciliation lets benign drift age out and surfaces only what persists.

Classifying Discrepancies

The three structural outcomes map directly to the drift causes, and each implies an action:

Discrepancy	Meaning	Typical cause	Action
Missing in ledger	Bank has it, you don't	Lost/late webhook	Ingest it (recover); if persistent, it's a real lost event
Missing at bank	You have it, bank doesn't	Your bug, or a not-yet-settled entry	Investigate hard — you may have invented money
Amount mismatch	Same txn, different number	Currency-unit / parsing bug	Almost always your side; fix and correct

The art is layering time on top of structure. A "missing in ledger" that's seconds old is the benign timing case — wait. The same discrepancy still present after your ret/redelivery window has elapsed is a genuine lost event — recover it. Encoding that age threshold is what separates a noisy reconciliation system (screaming about every in-flight transfer) from a useful one (alerting only on true, settled divergence).

Converging: Auto-Heal vs Escalate

Detection is half the job; convergence is the other half. Split discrepancies by how safe automatic resolution is:

Auto-heal the safe, well-understood cases. A "missing in ledger" that exactly matches a known bank transaction format can be ingested automatically — it's the lost-webhook recovery path, and re-ingesting through your normal idempotent pipeline is safe. The system converges itself.
Escalate the dangerous cases to a human. "Missing at bank" (you may have credited phantom money) and anything resembling fraud must go to a person with an audit trail. Never let software silently invent or destroy money to "fix" a mismatch — that's how a reconciliation bug becomes a financial loss.

The guiding principle: automation handles the cases where the correct convergence is unambiguous; humans handle the cases where converging the wrong way would move real money incorrectly.

The Eventual-Consistency Mindset

Reframing reconciliation as eventual consistency changes how you build the whole system:

You stop treating divergence as failure. It's the expected steady state of two asynchronously-communicating replicas. Failure is persistent divergence, not divergence itself.
The bank is authoritative; your ledger is a replica to keep honest. When they disagree, you converge toward the bank, not the other way around.
Real-time ingestion and reconciliation are complementary, not redundant. Webhooks give you low latency; reconciliation gives you a correctness backstop that catches everything the real-time path drops. You need both — fast and eventually correct.

Pitfalls

Treating every real-time mismatch as an incident. Without a time threshold, in-flight transfers look like errors and you drown in false alarms.
Auto-correcting "missing at bank." Deleting or reversing a ledger entry because the bank "doesn't have it" may erase a legitimate not-yet-settled transaction. Investigate, don't auto-heal.
Reconciling on a key that isn't stable. If you match on amount+timestamp instead of the bank's transaction ID, coincidental collisions and rounding will wreck you. Always reconcile on the stable, unique identifier.
Running reconciliation only at end-of-day. Daily batch was the old world; continuous reconciliation catches drift while it's still cheap to fix.
No audit trail on convergence. Every auto-heal and every manual correction must be recorded — reconciliation that quietly changes balances is itself unauditable.

Conclusion

Reconciliation looks like bookkeeping but it's really distributed-systems convergence wearing an accountant's coat. You and the bank are two replicas of one financial truth, drifting apart constantly as events propagate asynchronously and occasionally get lost. The engine's job is to detect that drift, classify it by structure and age, converge the safe cases automatically, and escalate the dangerous ones — all while treating temporary divergence as normal and only persistent divergence as a problem.

Internalising this reframes the whole money pipeline: idempotent processing keeps ingestion clean, the outbox pattern keeps events from being lost internally, careful concurrency control keeps the ledger self-consistent, and reconciliation is the final safety net proving your replica still agrees with reality. Build all four and you can say the thing every fintech system must be able to say: the books are correct, and here's the proof.