The Substrate

Three source databases. One bank.

Every formal model below operates on these exact rows. Fragmentation is the problem.

CORE_BANKING.customers

SQL

CARDS_DB.cardholders

SQL

WEALTH_DB.clients

SQL

The three customers

Sarah Chen — name variations · all 3 DBs · ER case

Marcus Aldridge — high AUM · provenance / lineage case

Priya Raman — new client · 1 DB only · cold-start case

Tab 00 · The Substrate

The fragmentation problem, made concrete

Three source databases. Three customers spread across them. No single source of truth. Every formal model below addresses one slice of this problem.

Why three databases?

Every tier-1 bank lives with this. Core banking owns accounts. Cards is its own legacy system, often acquired. Wealth Management runs on a third platform with a different customer model. The same human appears in all three — under slightly different identities. The job of the knowledge graph is to know they're the same person, to know what each system says about them, and to know which assertion came from where.

Source DBs

Total Rows

Distinct Humans

Identity Variations

What the lab demonstrates

Tab 01 — Fellegi-Sunter (1969). Probabilistic record linkage. Watch Sarah Chen's three records compute m-probabilities, u-probabilities, log-likelihood ratio. See the match decision emerge from the math.
Tab 02 — RIGOR (2025). Retrieval-Augmented Ontology Generation. Watch an LLM-style iterative process build OWL axioms from the three schemas, table by table, with provenance tags.
Tab 03 — Provenance Semirings (Green-Karvounarakis-Tannen, 2007). Marcus's transaction propagates through a join. See the polynomial annotation track every source tuple that contributed.
Tab 04 — The Unified KG. Everything composed. Entities resolved. Ontology applied. Provenance attached. Interactive graph.

The thesis. Modern catalogs (Actian, Atlan, Glean) implement these academic models — often without naming them. This lab exposes the foundations explicitly, so you can reason about vendor claims from first principles.

Tab 01 · Identity Resolution

Fellegi-Sunter on Sarah Chen

Three records. Same human? The 1969 model says: compute m and u probabilities per field, sum the log-likelihood ratios, apply the threshold. Watch it happen.

Fellegi-Sunter (1969) Theory for Record Linkage · JASA Foundational

For each comparable field, compute m (P[agree | match]) and u (P[agree | non-match]). Sum log₂(m/u) across fields → match weight. Compare to upper and lower thresholds to classify as MATCH, POSSIBLE, or NON-MATCH.

CANDIDATES UNDER COMPARISON: CORE.C001 CARDS.CD-0451 WEALTH.WM-7821

The modern stack — Splink + Ditto Linacre 2020 · Megagon 2020 Production

Modern implementations: Splink applies Fellegi-Sunter at scale via expectation-maximisation (no labels needed). Ditto uses fine-tuned BERT for transformer-based matching. Both end at the same place — a confidence score and a canonical entity.

Classical FS

—

Splink (EM)

—

Ditto (BERT)

—

Consensus

—

Tab 02 · Ontology Generation

RIGOR — schemas to OWL, iteratively

The 2025 model: an LLM iterates table by table, retrieves from domain ontologies (FIBO, schema.org), and builds an OWL 2 DL ontology with provenance-tagged delta fragments. Watch it work.

RIGOR (Nayyeri et al., 2025) Retrieval-Augmented Iterative Generation of RDB Ontologies Frontier

For each table: retrieve schema + domain ontology + growing core ontology → prompt Gen-LLM → produce delta-ontology fragment → Judge-LLM validates → merge into core. Iterate following foreign-key constraints.

RIGOR Pipeline · Live

Click ▶ Generate to start the iterative process…

Growing OWL Ontology · Core

Lineage: R2RML → RML → RIGOR 2012 → 2014 → 2025 Evolution

R2RML (W3C 2012) — declarative relational-to-RDF mapping language. You write the mapping. RML (Ghent 2014) — extends R2RML to CSV, JSON, XML. RIGOR (2025) — the LLM writes the mapping AND the ontology, iteratively. Same end-state (OWL ontology + RDF instances), three orders of magnitude less human effort.

# R2RML fragment (what RIGOR generates automatically) <#CustomerMap> rr:logicalTable [ rr:tableName "CORE_BANKING.customers" ] ; rr:subjectMap [ rr:template "http://meridian.bank/customer/{customer_id}" ; rr:class fibo:Customer ; ] ; rr:predicateObjectMap [ rr:predicate fibo:hasFullName ; rr:objectMap [ rr:column "customer_name" ] ] .

Tab 03 · Provenance & Lineage

Marcus Aldridge's transaction → BCBS report

Green-Karvounarakis-Tannen (PODS 2007) — track which source tuples contributed to every derived fact, as a polynomial over a semiring. Watch the algebra propagate.

Provenance Semirings (PODS 2007) Green · Karvounarakis · Tannen · University of Pennsylvania Algebraic

Annotate each source tuple with a variable. Join (⊗) multiplies. Union (⊕) adds. The result is a polynomial that captures HOW each output tuple was derived — and lets you compute trust, probability, multiplicity by evaluating the same polynomial in different semirings.

W3C PROV-DM (2013) Entity · Activity · Agent Standard

The semiring polynomial is the algebraic layer. W3C PROV is the standard representation: three node types (Entity, Activity, Agent), five core relations (wasGeneratedBy, used, wasAssociatedWith, wasDerivedFrom, wasAttributedTo). Every modern lineage tool emits PROV-compatible events.

Tab 04 · The Synthesis

The Unified Knowledge Graph

Entities resolved (Fellegi-Sunter). Ontology applied (RIGOR). Provenance attached (Semirings + PROV). The graph below is the composition of every model above, on Meridian Bank's actual data.

The composed graph Live · Interactive

Toggle the layers to see how each formal model contributes a different facet of the same knowledge graph. Click any node to see its provenance and attestations.

Resolved Entities

Ontology Classes

Provenance Edges

Policy & Access

Click any node to inspect…

What this graph encodes

Source Rows

Resolved Entities

Ontology Axioms

PROV Edges

Policy Rules

Three real people. Three formal models. One queryable graph. Every claim in the graph is attributable to a source row, defended by an ontology axiom, and traceable through PROV. That is the academic contract.