The Substrate

Three banks. One open-banking ecosystem.

Every formal model in this lab operates on these exact rows. The agentic knowledge graph is built from them.

MERIDIAN.core_banking

SQL

MERIDIAN.cards

SQL

MERIDIAN.wealth

SQL

OB_API.consents

JSON

EXT.tpp_providers

API

The three customers

Sarah Chen — across all 3 banks & consented to a budgeting TPP

Marcus Aldridge — high AUM · BCBS-reportable

Priya Raman — KYC pending · single-bank · cold-start

BudgetEye — regulated TPP (PSD2 AISP)

Tab 00 · The Substrate

The open-banking knowledge graph problem

Three banks. One ecosystem. Customers fragmented across systems. Third-party providers consuming via consented APIs. Every formal model below addresses one slice of this — the agentic graph composes them all.

Why this substrate matters

Open banking forces the bank's catalog to externalise. A regulated third-party provider (TPP) accesses customer data under explicit consent, for a stated purpose, under a defined retention. The agent doing the access must identify the customer correctly (Fellegi-Sunter), understand the domain semantically (RIGOR / OWL), map the source schemas (R2RML / RML), validate against contracts (ODCS / SHACL), trace provenance for audit (W3C PROV / Semirings), enforce policy (OPA / Zanzibar), and retrieve over the graph (GraphRAG / SPARQL). Each formal model does one job. The lab demonstrates each, then composes them.

Source Systems

3 internal · 2 external

Total Source Rows

Distinct Humans

Identity Variations

Active TPPs

BudgetEye · PSD2 AISP

Consents

What the lab demonstrates · 10 tabs · 45+ models

01 · Identity

Fellegi-Sunter · Splink · Ditto · Jaro-Winkler · Sorted Neighborhood

Resolve Sarah Chen across all 3 banks with probabilistic linkage + blocking + transformer matching.

02 · Ontology

RIGOR · OWL · RDFS · Description Logic · OntoGPT · Layer Cake

Generate OWL ontology from schemas iteratively. Reason: ISA ⊑ SavingsAccount, illiquid disjoint LiquidAsset.

03 · Mapping

R2RML · RML · OBDA · Ontop

Map relational tables to RDF. Rewrite SPARQL to SQL across 3 banks live.

04 · Schema Matching

Cupid · COMA · Valentine · TaBERT

Match balance ↔ accountBalance ↔ availableBalance across 3 different bank schemas.

05 · Contracts

ODCS · SHACL · ShEx · DCAT · ISO 11179

Validate Sarah's consent payload against SHACL shapes. Fail fast. Enforce semantics.

06 · Provenance

W3C PROV · OPM · Why/Where/How · Semirings · OpenLineage

Trace Marcus's transaction → BCBS report as a polynomial. Replay 7 years later.

07 · Rules & Policy

DMN · SBVR · Datalog · OPA · Cedar · RBAC · ABAC · ReBAC

"Can BudgetEye read Sarah's balance?" Eight policy models · one decision · one audit row.

08 · Query & GraphRAG

SPARQL · Cypher · GQL · GraphRAG · TransE · GraphSAGE

Same question, three query languages. Then watch GraphRAG ground an LLM answer.

09 · Unified KG

All models composed

Everything together. Click any node. Toggle layers. The graph carries the contract.

The thesis. Modern catalogs and open-banking platforms implement these models — often without naming them. By naming them explicitly and demonstrating each on the same substrate, this lab makes the academic spine of an agentic open-banking knowledge graph visible.

Tab 01 · Identity Resolution

Resolving Sarah Chen across three banks

Probabilistic record linkage (Fellegi-Sunter 1969) plus blocking, string similarity, and modern transformer matching — composed on real rows.

Fellegi-Sunter 1969

Jaro-Winkler 1989

Levenshtein 1965

Sorted Neighborhood 1995

Splink 2020

Ditto 2020

DeepMatcher 2018

Magellan / ZeroER

Fellegi-Sunter (1969) Theory for Record Linkage · JASA Foundational

For each comparable field, compute m (P[agree | match]) and u (P[agree | non-match]). Sum log₂(m/u) → match weight. Threshold to MATCH / POSSIBLE / NON-MATCH.

CANDIDATES UNDER COMPARISON:

CORE.C001 CARDS.CD-0451 WEALTH.WM-7821

String similarity Jaro-Winkler 1989 · Levenshtein 1965

The actual functions that feed FS comparison vectors. Compare names across banks:

A	B	JW	Lev
Sarah Chen	Chen Sarah	0.94	10
Sarah Chen	S. Chen	0.82	5
Sarah Chen	Marcus Aldridge	0.41	14

Sorted Neighborhood Hernandez-Stolfo 1995

Reduces O(n²) comparisons by sorting on a key (Soundex / dob-year) then comparing only within a sliding window. Makes cross-bank ER tractable.

// Across 3 banks: 7 rows → C(7,2) = 21 comparisons // With blocking on dob-year: window(1989): [CORE.C001, CARDS.CD-0451, WEALTH.WM-7821] window(1972): [CORE.C002, CARDS.CD-0892, WEALTH.WM-7944] window(1995): [CORE.C003] // 3+3+0 = 6 pair-comparisons instead of 21 (71% reduction)

Modern stack · scale + neural Splink 2020 · Ditto 2020 · DeepMatcher 2018 · Magellan / ZeroER Production

Splink applies FS at scale with expectation-maximisation (no labels needed). Ditto / DeepMatcher fine-tune BERT for transformer-based matching. Magellan / ZeroER add zero-shot capability. All output a confidence score; the catalog reconciles them.

Classical FS

—

Splink (EM)

—

Ditto (BERT)

—

ZeroER

—

Consensus

—

Tab 02 · Ontology & Semantics

From schemas to OWL — and reasoning over it

RIGOR generates the ontology iteratively. OWL 2 DL gives the formal logic. A Description Logic reasoner draws inferences. OntoGPT extracts from text. The layer cake puts it all in order.

RIGOR 2025

OWL / OWL 2 (W3C)

RDFS (W3C 2004)

Description Logic

OntoGPT 2023

Layer Cake (Cimiano-Mädche)

FIBO / BIAN / ISO 20022

RIGOR (Nayyeri et al., 2025) Retrieval-Augmented Iterative Generation of RDB Ontologies Frontier

For each table: retrieve schema + domain ontology (FIBO/BIAN) + growing core ontology → Gen-LLM produces delta-ontology fragment → Judge-LLM validates → merge. Iterate following foreign-key constraints.

RIGOR Pipeline · Live

Click ▶ Generate to start the iterative process…

Growing OWL Ontology · Core

Description Logic Reasoner SROIQ · the formal logic behind OWL 2 DL Reasoning

Once the ontology exists, a DL reasoner draws inferences. Premise: ISA ⊑ SavingsAccount, SavingsAccount ⊑ Account ⊓ ∃ hasWithdrawalPenalty.Penalty, hasWithdrawalPenalty Disjoint LiquidAsset. Inference: ISA accounts are not liquid assets.

OntoGPT 2023

Extract structured ontology-aligned knowledge from unstructured text. Example: a regulatory policy doc → ontology candidates.

Input doc: "All ISAs must have a 90-day withdrawal restriction or pay a penalty of 5% of withdrawn amount." OntoGPT extraction: Concept: ISA Property: hasWithdrawalRestriction range: Duration ≥ 90 days Property: hasEarlyWithdrawalPenalty range: Percentage = 5%

Ontology Learning Layer Cake Cimiano & Mädche 2005

The canonical 7-layer pipeline that every ontology-learning method instantiates:

▲ Axioms (rules)
│ Relations
│ Concept hierarchies
│ Concepts
│ Synonyms
│ Terms
└─ Raw text / data

RDFS · the lightweight base W3C 2004 · classes, subclasses, domain, range

Before full OWL reasoning, RDFS provides the semantic backbone — classes, sub-class relations, property domains and ranges. Every richer ontology compiles down to RDFS triples for portability.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix ob: <http://meridian.bank/ontology/> . ob:Customer rdfs:subClassOf ob:LegalPerson . ob:WealthClient rdfs:subClassOf ob:Customer . ob:hasAccount rdfs:domain ob:Customer ; rdfs:range ob:Account .

Tab 03 · Schema → Graph Mapping

From relational tables to a virtual knowledge graph

R2RML and RML declare the mapping from tabular data to RDF. OBDA / Ontop rewrite SPARQL queries into SQL against the live sources — no data is moved. Watch a SPARQL query traverse three banks live.

R2RML (W3C 2012)

RML (Ghent · 2014 · 2024 spec)

OBDA · Calvanese 2007+

Ontop

RDF (W3C 1999)

R2RML mapping fragment W3C 2012 · the standard relational→RDF mapping language Standard

Declarative TriplesMap: a logical source (SQL view), a subject template, predicate-object maps. RIGOR generates these automatically.

@prefix rr: <http://www.w3.org/ns/r2rml#> . @prefix fibo: <https://spec.edmcouncil.org/fibo/> . @prefix ob: <http://meridian.bank/ontology/> . <#CustomerMap> rr:logicalTable [ rr:tableName "MERIDIAN.core_banking" ] ; rr:subjectMap [ rr:template "http://meridian.bank/customer/{customer_id}" ; rr:class fibo:Customer ; ] ; rr:predicateObjectMap [ rr:predicate ob:hasFullName ; rr:objectMap [ rr:column "name" ] ] ; rr:predicateObjectMap [ rr:predicate ob:hasKycStatus ; rr:objectMap [ rr:column "kyc" ] ] .

RML · the JSON/CSV/API extension Ghent IDLab · 2014 · v2 spec 2024

Open banking sources are API/JSON-heavy. RML adds a rml:source with format/iterator/reference so the same mapping grammar applies to PSD2 JSON responses.

@prefix rml: <http://semweb.mmlab.be/ns/rml#> . @prefix ql: <http://semweb.mmlab.be/ns/ql#> . <#ConsentMap> rml:logicalSource [ rml:source "/ob/v3.1/aisp/consents.json" ; rml:referenceFormulation ql:JSONPath ; rml:iterator "$.Data.Consent[*]" ] ; rr:subjectMap [ rml:template "http://meridian.bank/consent/{ConsentId}" ; rr:class ob:Consent ] .

OBDA · Virtual Knowledge Graph Calvanese et al. · Ontop · the bank's data stays put Live

The agent writes SPARQL. Ontop rewrites it to SQL across all three banks. Data is never physically moved. Click below to watch a real query flow.

Agent's SPARQL query

PREFIX ob: <http://meridian.bank/ontology/> PREFIX fibo: <https://spec.edmcouncil.org/fibo/> SELECT ?customer ?name ?aum ?cardTier WHERE { ?customer a ob:WealthClient . ?customer ob:hasFullName ?name . ?customer ob:hasAum ?aum . ?customer ob:holdsCard ?card . ?card ob:hasCardTier ?cardTier . FILTER(?aum > 1000000) }

Ontop rewriter output

Click ▶ Rewrite to watch SPARQL become federated SQL…

Tab 04 · Schema Matching

When three banks call the same thing three different names

balance · accountBalance · availableBalance — same concept, three labels. Cupid uses linguistic + structural similarity. COMA composes multiple matchers. Valentine benchmarks them. TaBERT does it with transformers.

Cupid · Madhavan-Bernstein-Rahm 2001

COMA / COMA++ 2002-2005

Valentine · TU Delft 2021

TaBERT / TURL / Tabbie 2020-21

LogMap (Oxford)

Three bank schemas · one shared concept Interactive

Each bank labels the customer's spendable funds differently. The matcher must align them to the canonical concept ob:availableBalance.

MERIDIAN.core_banking

field: balance

int(13,2) · GBP · updated daily 02:00 UTC

MERIDIAN.cards

field: avail_credit

decimal · credit limit minus outstanding

MERIDIAN.wealth

field: cashBalanceGBP

numeric · settled cash · excludes pending

Why composite matching · COMA Aumueller-Do-Massmann-Rahm 2002-2005

No single matcher works in isolation. COMA runs linguistic (Levenshtein on labels), structural (graph topology of foreign keys), type (data-type compatibility), and instance (sample value overlap) matchers in parallel and combines with learned weights. The composite score is what the catalog records.

Valentine benchmark · is the match good? TU Delft 2021 · the matcher's report card

Valentine provides labeled benchmarks (TPC-DI, Magellan, OpenData) so you can measure precision/recall/F1 of any matcher. Without it, agentic schema matching is unverifiable. With it, the catalog can record: "Cupid on this corpus: F1 = 0.82. TaBERT: F1 = 0.91. We trust TaBERT above threshold 0.85 with steward review below."

Tab 05 · Data Contracts & Semantic Validation

Sarah's consent enters BudgetEye — is it valid?

ODCS captures the contract metadata (schema, SLA, ownership). SHACL validates the actual RDF graph against required shapes. ShEx is the compact alternative. DCAT publishes data products. ISO 11179 registers data elements. Together: enforceable semantic contracts.

ODCS · Open Data Contract Standard

SHACL (W3C 2017)

ShEx 2014+

DCAT v3 (W3C 2024)

Dublin Core

ISO/IEC 11179

ODCS contract for Sarah's consent Open Data Contract Standard · YAML Contract

The TPP cannot consume the data product without an explicit contract. The contract names the schema, SLA, ownership, quality expectations, and access conditions.

version: 1.0.0 kind: DataContract apiVersion: v3.0.1 id: ob.consent.aisp.v1 info: title: Open Banking AISP Consent owner: meridian.bank/data-office jurisdiction: UK regulation: [PSD2, UK_OB_v3.1, FAPI2] schema: - name: consent physicalName: OB_API.consents properties: - name: consent_id primary: true required: true - name: customer_id required: true fk: Customer - name: tpp_id required: true fk: TPP - name: scope required: true cardinality: 1..n - name: expires_at required: true type: timestamp - name: status required: true enum: [ACTIVE, REVOKED, EXPIRED] quality: - rule: "status==ACTIVE implies expires_at > now()" - rule: "every consent_id is unique" - rule: "scope MUST be in [accounts:read, balances:read, transactions:read]" sla: availability: 99.95% latency_p99_ms: 250 retention_days: 2557 # 7 years access: classification: CONFIDENTIAL_PII policy: ob.policy.consent.tpp_access.v3

SHACL · validating the graph W3C 2017 · shapes constraint language Validator

The contract is the spec. SHACL is the runtime enforcement. Every consent flowing into the KG passes the validator first. Reject early.

SHACL Shape · ob:ConsentShape

@prefix sh: <http://www.w3.org/ns/shacl#> . @prefix ob: <http://meridian.bank/ontology/> . ob:ConsentShape a sh:NodeShape ; sh:targetClass ob:Consent ; sh:property [ sh:path ob:authorisedBy ; sh:class ob:Customer ; sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:path ob:hasScope ; sh:in ("accounts:read" "balances:read" "transactions:read") ; sh:minCount 1 ] ; sh:property [ sh:path ob:hasExpiry ; sh:datatype xsd:dateTime ; sh:minCount 1 ] .

Validation report · live

Click ▶ Validate to run SHACL against three test consents…

ShEx · the compact alternative 2014+

Shape Expressions — same job as SHACL, more compact syntax, easier to author by hand. Some practitioners pair the two: ShEx for definition, SHACL for execution.

PREFIX ob: <http://meridian.bank/ontology/> ob:ConsentShape { ob:authorisedBy @<CustomerShape> ; ob:grantedTo @<TPPShape> ; ob:hasScope [ "accounts:read" "balances:read" "transactions:read" ]+ ; ob:hasExpiry xsd:dateTime ; ob:hasStatus [ "ACTIVE" "REVOKED" "EXPIRED" ] }

DCAT · publishing the data product W3C · v3 2024

Once the contract is valid, the data product is published via DCAT. Dataset, Distribution, DataService nodes. Combined with Dublin Core for metadata (title, creator, license).

@prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . :obConsent_v3 a dcat:Dataset ; dct:title "Open Banking AISP Consents" ; dct:publisher :meridian ; dct:license :fapi2_license ; dcat:distribution :obConsent_v3_json ; dcat:contactPoint :dataoffice ; dct:conformsTo :ob.consent.aisp.v1 .

ISO/IEC 11179 · the registry of data elements 1990s+ · the discipline behind any glossary

Every data element in the contract — consent_id, customer_id, scope — is registered with a definition, name, identifier, value domain, and steward. ISO 11179 is what makes a bank's glossary defensible to a regulator.

Tab 06 · Provenance & Lineage

Marcus's transaction → BCBS report — five formal models

The Open Provenance Model gave us the basic vocabulary. W3C PROV made it a standard. Provenance Semirings gave us the algebra of trust. Why/Where/How gave us three formal questions to ask. OpenLineage made it operational.

W3C PROV-DM (2013)

OPM · Moreau 2008

Why-Prov · Cui-Widom 2000

Where-Prov · Buneman 2001

How-Prov · Green 2007

Provenance Semirings · PODS 2007

OpenLineage 2020

Provenance Semirings (PODS 2007) Green · Karvounarakis · Tannen · University of Pennsylvania Algebraic

Annotate each source tuple with a variable. Join (⊗) multiplies. Union (⊕) adds. The result is a polynomial — evaluate it in different semirings to get why-provenance, trust, confidence, multiplicity.

Why · Where · How — three formal questions Cui-Widom 2000 · Buneman-Khanna-Tan 2001 · Green-Tannen 2007

Every derived fact in the catalog can be queried three ways. Why: which source tuples contributed? Where: which source location did each value come from? How: which derivation steps were applied?

WHY-PROV (witness sets) > Which tuples produced the BCBS line? ⊢ { txn(T-99041), pos(P-7741), risk(R-CBS-EU), cust(C002) }

WHERE-PROV (locations) > Where did "EUR Bund" come from? ⊢ TXN_DB.t-99041.instrument @ ingested 2026-03-12T08:14Z

HOW-PROV (derivation) > How was RWA 625K computed? ⊢ compose( join(txn, position, on=client_id, instrument), apply(risk_model.lookup(instrument_type)), aggregate(group=counterparty_type, op=sum) )

W3C PROV-DM · the standard representation 2013 · Entity · Activity · Agent Standard

Provenance semirings provide the algebra. W3C PROV provides the interchange format. Every modern lineage tool (OpenLineage, Marquez, Atlas, Egeria) emits PROV-compatible events.

OPM & OpenLineage · operational Moreau 2008 → W3C 2013 → LFAI 2020

OPM (2008) introduced Artifact-Process-Agent as the three node types. W3C PROV refined this to Entity-Activity-Agent. OpenLineage (2020) made it operational — event-driven emission from every pipeline run. The catalog projects the events into a graph.

// OpenLineage event emitted when risk_aggregate job runs { "eventType": "COMPLETE", "eventTime": "2026-04-30T03:00:14Z", "run": { "runId": "a8f3..." }, "job": { "namespace": "meridian.risk", "name": "risk_aggregate" }, "inputs": [ { "namespace": "meridian.txn", "name": "transactions" }, { "namespace": "meridian.pos", "name": "positions" }, { "namespace": "meridian.risk", "name": "rwa_model" } ], "outputs": [ { "namespace": "meridian.report", "name": "bcbs_239_exposure" } ] }

Tab 07 · Rules & Policy

Can BudgetEye read Sarah's balance?

One question. Eight formal models converge on one decision. Business rules from DMN/SBVR/Datalog. Authorisation from RBAC/ABAC/ReBAC. Policy-as-code in OPA/Rego or Cedar. Watch them compose.

DMN 2015

SBVR 2008

Datalog 1977+

RuleML

RBAC 1996

ABAC · NIST 800-162

ReBAC · Zanzibar 2019

OPA / Rego 2018

AWS Cedar 2023

The decision · live Interactive

Configure the request and watch every model produce its verdict. The catalog records the union of all eight decisions plus the final composed allow/deny.

SUBJECT

ACTION · scope

OBJECT

DMN · Decision Model and Notation OMG 2015

Business decisions as tables. Auditable. Owned by business, not engineering. Compiles to executable.

Decision Table · TPP_AccessEligibility ┌─────────────────┬──────────┬──────────┐ │ TPP.regulated │ TPP.eIDAS│ Result │ ├─────────────────┼──────────┼──────────┤ │ true │ valid │ ELIGIBLE │ │ true │ expired │ REFUSE │ │ false │ – │ REFUSE │ └─────────────────┴──────────┴──────────┘

SBVR · Semantics of Business Vocabulary and Rules OMG 2008

Business rules in structured natural language. Bridges business and machine.

// Rule R-OB-001 · expressed in SBVR-SE It is obligatory that each consent has an expiry date after its creation date. // Rule R-OB-002 It is prohibited that a TPP accesses an account without an active consent.

The authorisation triad · RBAC / ABAC / ReBAC 1996 · 2014 · 2019

Three philosophies of access decisions, each strongest in a different layer of the open-banking graph.

RBAC · roles

Sandhu et al. 1996

"Anyone with role TPP-AISP can call /balances endpoints." Coarse but auditable.

ABAC · attributes

NIST 800-162 (2014)

"Subject.regulator=FCA AND Resource.classification=PII AND Env.jurisdiction=UK." Composable but verbose.

ReBAC · relationships

Zanzibar 2019

"BudgetEye granted Sarah's consent → traverse graph edge." Native to KGs. The agentic answer.

OPA / Rego · policy-as-code Styra 2018 · Datalog-derived

package ob.tpp.balances default allow := false allow if { input.subject.type == "TPP" input.subject.regulated == true input.action == "balances:read" consent := data.consents[_] consent.customer == input.object.owner consent.tpp == input.subject.id "balances:read" in consent.scope consent.status == "ACTIVE" time.parse_rfc3339_ns(consent.expiry) > time.now_ns() }

AWS Cedar · formally verified AWS 2023 · provable termination

permit ( principal in Role::"TPP_AISP", action == Action::"ReadBalance", resource ) when { resource.owner has consent && resource.owner.consent.tpp == principal && resource.owner.consent.status == "ACTIVE" && resource.owner.consent.scope.contains("balances:read") };

Datalog · the recursion engine 1977+ · the ancestor of Rego & SpiceDB

Both OPA Rego and Zanzibar's relationship resolver compile to Datalog or its evaluation. The reason: Datalog has natural recursion (closed under fixpoint) — exactly what you need to answer "is X reachable from Y via authorised edges?" in a graph.

// Datalog rules for transitive access can_read(S, O) :- consent(C, O, S, Scope), active(C), member("balances:read", Scope). can_read(S, O) :- delegated(S, S2), can_read(S2, O). ?- can_read("BudgetEye", "Sarah"). ⊢ yes

Tab 08 · Query · Reasoning · Retrieval

Same question. Three query languages. Then watch GraphRAG ground an LLM.

SPARQL is W3C's standard for RDF. Cypher rose with Neo4j. ISO GQL standardised the property-graph world in 2024. Then GraphRAG retrieves over the graph for the LLM. KG embeddings (TransE, GraphSAGE) predict missing links and similar customers.

SPARQL 1.1 (W3C 2013)

Cypher

ISO GQL 2024

GraphRAG · MSR 2024

TransE 2013

DistMult / ComplEx / RotatE

Node2Vec · DeepWalk 2014-16

GraphSAGE · GAT · R-GCN 2017-19

Probabilistic Databases

Same question · three languages Standards

"Find all customers who hold a card AND have a wealth portfolio above £1M, with their card tier." One question, expressed three ways:

SPARQL · W3C 2013

SELECT ?c ?name ?aum ?tier WHERE { ?c a ob:Cardholder, ob:WealthClient . ?c ob:hasFullName ?name . ?c ob:hasAum ?aum . ?c ob:holdsCard ?card . ?card ob:hasCardTier ?tier . FILTER(?aum > 1000000) }

Cypher · Neo4j

MATCH (c:Customer) WHERE c:Cardholder AND c:WealthClient AND c.aum > 1000000 MATCH (c)-[:HOLDS_CARD]->(card) RETURN c.fullName AS name, c.aum AS aum, card.tier AS tier

ISO GQL · 2024

MATCH (c:Customer&Cardholder &WealthClient {aum > 1000000}) -[:holds_card]-> (card:Card) RETURN c.full_name AS name, c.aum, card.tier

The standards story. SPARQL won the RDF/semantic world. Cypher won the property-graph world. ISO GQL (2024) is the convergence — same conceptual model, vendor-neutral. An agentic open-banking platform should emit all three from the same query abstraction, so the catalog can talk to whichever store the bank operates.

GraphRAG · ground the LLM with the graph Microsoft Research 2024 Live

The LLM alone hallucinates. RAG retrieves chunks. GraphRAG retrieves subgraphs — multi-hop, community-aware, with cited edges. Watch a banking question grounded through the KG.

AGENT QUESTION

"Show me all our PLATINUM cardholders who are also Wealth clients with AUM > £1M and have an active TPP consent. What's their typical risk profile?"

KG embeddings · link prediction & similarity TransE 2013 · DistMult · ComplEx · RotatE · GraphSAGE 2017 ML

Embed every entity and relation as a vector. TransE uses h + r ≈ t (head + relation = tail). GraphSAGE samples neighbour aggregations. The catalog can then predict: missing edges, similar customers, suspicious graph patterns (fraud).

TransE example · predicting Sarah's likely TPPs

// Trained embeddings (illustrative) e(Sarah) = [0.32, -0.18, 0.71, ...] e(consents) = [0.05, 0.22, 0.11, ...] // Predict: who is Sarah likely to consent to? candidate = argmin_tpp ‖ e(Sarah) + e(consents) - e(tpp) ‖

Candidate TPP	Distance	Verdict
BudgetEye	0.12	already consented
MoneyHub	0.18	likely candidate
Plaid UK	0.21	possible
FraudCo	0.94	very unlikely (anomaly?)

GraphSAGE · customer similarity for upsell

// Aggregate neighbour signal h_Sarah^(k+1) = σ( W · [h_Sarah^(k); MEAN({h_n^(k) ∀ n ∈ N(Sarah)})] ) // Find similar customers top₅ = argmax_c cos(h_Sarah, h_c)

The bank's marketing agent can now propose: "customers like Sarah upgraded to PLATINUM after 18 months" with cosine similarity as evidence — anchored in the KG, not in a black-box model.

Probabilistic Databases & Knowledge Vault Suciu et al. · Google 2014

Not every fact in the KG is certain. External-bank data, merchant tags, NER-extracted entities come with confidence scores. Probabilistic databases represent every tuple with a probability. Knowledge Vault (Google 2014) was the canonical example: 1.6B facts, each with a confidence.

// Each KG triple carries a probability (Sarah, employed_by, Acme_Ltd) p = 0.92 // from KYC docs (Sarah, employed_by, Beta_Consulting) p = 0.41 // from a salary credit (Sarah, spends_at, GreenGrocer) p = 0.98 // from card txn

Tab 09 · The Synthesis

The Unified Open Banking Knowledge Graph

Entities resolved (Fellegi-Sunter). Ontology applied (RIGOR + OWL + DL). Mapped from schemas (R2RML/RML). Validated against contracts (SHACL). Provenance attached (PROV + Semirings). Policy enforced (OPA + Zanzibar). Queried via SPARQL/Cypher/GQL. Retrieved via GraphRAG. The graph below is the composition of every model above, on Meridian's real rows.

The composed graph · all 45+ models, one canvas Live · Interactive

Toggle the layers to see how each formal model contributes a different facet. Click any node to see its provenance and attestations.

Resolved Entities

Ontology Classes

Provenance & Sources

Consents & TPPs

Policy & Access

Click any node to inspect its full attestation chain…

What this graph encodes

Source Rows

Resolved Entities

Ontology Axioms

Active Consents

PROV Edges

Policy Rules

Formal Models

45+

Years of Research

since Fellegi-Sunter

Three real customers. One regulated TPP. Forty-five formal models. Every claim in the graph is attributable to a source row, defended by an ontology axiom, validated by a contract, traceable through PROV, governed by a policy, and queryable through three standards. That is the academic contract of an agentic open-banking knowledge graph.

The final thesis. Modern open-banking catalogs (Actian, Atlan, Egeria, Glean, Ontop, AuthZed, Open Policy Agent, RIGOR-style generators) implement these academic models — often without naming them. Naming them explicitly, demonstrating each on the same substrate, and composing them in one interactive graph is what makes this lab faithful — to the research, to the regulator, and to the agent that has to act.