Meridian Bank Formal Models Lab · v1.0
Tab 00 · The Substrate

The fragmentation problem, made concrete

Three source databases. Three customers spread across them. No single source of truth. Every formal model below addresses one slice of this problem.

Why three databases?

Every tier-1 bank lives with this. Core banking owns accounts. Cards is its own legacy system, often acquired. Wealth Management runs on a third platform with a different customer model. The same human appears in all three — under slightly different identities. The job of the knowledge graph is to know they're the same person, to know what each system says about them, and to know which assertion came from where.

Source DBs
3
Total Rows
9
Distinct Humans
3
Identity Variations
7
What the lab demonstrates
  • Tab 01 — Fellegi-Sunter (1969). Probabilistic record linkage. Watch Sarah Chen's three records compute m-probabilities, u-probabilities, log-likelihood ratio. See the match decision emerge from the math.
  • Tab 02 — RIGOR (2025). Retrieval-Augmented Ontology Generation. Watch an LLM-style iterative process build OWL axioms from the three schemas, table by table, with provenance tags.
  • Tab 03 — Provenance Semirings (Green-Karvounarakis-Tannen, 2007). Marcus's transaction propagates through a join. See the polynomial annotation track every source tuple that contributed.
  • Tab 04 — The Unified KG. Everything composed. Entities resolved. Ontology applied. Provenance attached. Interactive graph.
The thesis. Modern catalogs (Actian, Atlan, Glean) implement these academic models — often without naming them. This lab exposes the foundations explicitly, so you can reason about vendor claims from first principles.
Tab 01 · Identity Resolution

Fellegi-Sunter on Sarah Chen

Three records. Same human? The 1969 model says: compute m and u probabilities per field, sum the log-likelihood ratios, apply the threshold. Watch it happen.

Fellegi-Sunter (1969) Theory for Record Linkage · JASA Foundational

For each comparable field, compute m (P[agree | match]) and u (P[agree | non-match]). Sum log2(m/u) across fields → match weight. Compare to upper and lower thresholds to classify as MATCH, POSSIBLE, or NON-MATCH.

CANDIDATES UNDER COMPARISON: CORE.C001 CARDS.CD-0451 WEALTH.WM-7821
The modern stack — Splink + Ditto Linacre 2020 · Megagon 2020 Production

Modern implementations: Splink applies Fellegi-Sunter at scale via expectation-maximisation (no labels needed). Ditto uses fine-tuned BERT for transformer-based matching. Both end at the same place — a confidence score and a canonical entity.

Classical FS
Splink (EM)
Ditto (BERT)
Consensus
Tab 02 · Ontology Generation

RIGOR — schemas to OWL, iteratively

The 2025 model: an LLM iterates table by table, retrieves from domain ontologies (FIBO, schema.org), and builds an OWL 2 DL ontology with provenance-tagged delta fragments. Watch it work.

RIGOR (Nayyeri et al., 2025) Retrieval-Augmented Iterative Generation of RDB Ontologies Frontier

For each table: retrieve schema + domain ontology + growing core ontology → prompt Gen-LLM → produce delta-ontology fragment → Judge-LLM validates → merge into core. Iterate following foreign-key constraints.

RIGOR Pipeline · Live
Click ▶ Generate to start the iterative process…
Growing OWL Ontology · Core
Lineage: R2RML → RML → RIGOR 2012 → 2014 → 2025 Evolution

R2RML (W3C 2012) — declarative relational-to-RDF mapping language. You write the mapping. RML (Ghent 2014) — extends R2RML to CSV, JSON, XML. RIGOR (2025) — the LLM writes the mapping AND the ontology, iteratively. Same end-state (OWL ontology + RDF instances), three orders of magnitude less human effort.

# R2RML fragment (what RIGOR generates automatically) <#CustomerMap> rr:logicalTable [ rr:tableName "CORE_BANKING.customers" ] ; rr:subjectMap [ rr:template "http://meridian.bank/customer/{customer_id}" ; rr:class fibo:Customer ; ] ; rr:predicateObjectMap [ rr:predicate fibo:hasFullName ; rr:objectMap [ rr:column "customer_name" ] ] .
Tab 03 · Provenance & Lineage

Marcus Aldridge's transaction → BCBS report

Green-Karvounarakis-Tannen (PODS 2007) — track which source tuples contributed to every derived fact, as a polynomial over a semiring. Watch the algebra propagate.

Provenance Semirings (PODS 2007) Green · Karvounarakis · Tannen · University of Pennsylvania Algebraic

Annotate each source tuple with a variable. Join (⊗) multiplies. Union (⊕) adds. The result is a polynomial that captures HOW each output tuple was derived — and lets you compute trust, probability, multiplicity by evaluating the same polynomial in different semirings.

W3C PROV-DM (2013) Entity · Activity · Agent Standard

The semiring polynomial is the algebraic layer. W3C PROV is the standard representation: three node types (Entity, Activity, Agent), five core relations (wasGeneratedBy, used, wasAssociatedWith, wasDerivedFrom, wasAttributedTo). Every modern lineage tool emits PROV-compatible events.

Tab 04 · The Synthesis

The Unified Knowledge Graph

Entities resolved (Fellegi-Sunter). Ontology applied (RIGOR). Provenance attached (Semirings + PROV). The graph below is the composition of every model above, on Meridian Bank's actual data.

The composed graph Live · Interactive

Toggle the layers to see how each formal model contributes a different facet of the same knowledge graph. Click any node to see its provenance and attestations.

Resolved Entities
Ontology Classes
Provenance Edges
Policy & Access
Click any node to inspect…
What this graph encodes
Source Rows
9
Resolved Entities
3
Ontology Axioms
12
PROV Edges
15
Policy Rules
4

Three real people. Three formal models. One queryable graph. Every claim in the graph is attributable to a source row, defended by an ontology axiom, and traceable through PROV. That is the academic contract.