Meridian Open Banking Lab
Formal Models of the Agentic Knowledge Graph · v2.0
Tab 00 · The Substrate
The open-banking knowledge graph problem
Three banks. One ecosystem. Customers fragmented across systems. Third-party providers consuming via consented APIs. Every formal model below addresses one slice of this — the agentic graph composes them all.
Why this substrate matters
Open banking forces the bank's catalog to externalise. A regulated third-party provider (TPP) accesses customer data under explicit consent, for a stated purpose, under a defined retention. The agent doing the access must identify the customer correctly (Fellegi-Sunter), understand the domain semantically (RIGOR / OWL), map the source schemas (R2RML / RML), validate against contracts (ODCS / SHACL), trace provenance for audit (W3C PROV / Semirings), enforce policy (OPA / Zanzibar), and retrieve over the graph (GraphRAG / SPARQL). Each formal model does one job. The lab demonstrates each, then composes them.
Same question, three query languages. Then watch GraphRAG ground an LLM answer.
09 · Unified KG
All models composed
Everything together. Click any node. Toggle layers. The graph carries the contract.
The thesis. Modern catalogs and open-banking platforms implement these models — often without naming them. By naming them explicitly and demonstrating each on the same substrate, this lab makes the academic spine of an agentic open-banking knowledge graph visible.
Tab 01 · Identity Resolution
Resolving Sarah Chen across three banks
Probabilistic record linkage (Fellegi-Sunter 1969) plus blocking, string similarity, and modern transformer matching — composed on real rows.
Fellegi-Sunter 1969
Jaro-Winkler 1989
Levenshtein 1965
Sorted Neighborhood 1995
Splink 2020
Ditto 2020
DeepMatcher 2018
Magellan / ZeroER
Fellegi-Sunter (1969)Theory for Record Linkage · JASAFoundational
For each comparable field, compute m (P[agree | match]) and u (P[agree | non-match]). Sum log2(m/u) → match weight. Threshold to MATCH / POSSIBLE / NON-MATCH.
Splink applies FS at scale with expectation-maximisation (no labels needed). Ditto / DeepMatcher fine-tune BERT for transformer-based matching. Magellan / ZeroER add zero-shot capability. All output a confidence score; the catalog reconciles them.
Classical FS
—
Splink (EM)
—
Ditto (BERT)
—
ZeroER
—
Consensus
—
Tab 02 · Ontology & Semantics
From schemas to OWL — and reasoning over it
RIGOR generates the ontology iteratively. OWL 2 DL gives the formal logic. A Description Logic reasoner draws inferences. OntoGPT extracts from text. The layer cake puts it all in order.
RIGOR 2025
OWL / OWL 2 (W3C)
RDFS (W3C 2004)
Description Logic
OntoGPT 2023
Layer Cake (Cimiano-Mädche)
FIBO / BIAN / ISO 20022
RIGOR (Nayyeri et al., 2025)Retrieval-Augmented Iterative Generation of RDB OntologiesFrontier
For each table: retrieve schema + domain ontology (FIBO/BIAN) + growing core ontology → Gen-LLM produces delta-ontology fragment → Judge-LLM validates → merge. Iterate following foreign-key constraints.
Once the ontology exists, a DL reasoner draws inferences. Premise: ISA ⊑ SavingsAccount, SavingsAccount ⊑ Account ⊓ ∃ hasWithdrawalPenalty.Penalty, hasWithdrawalPenalty Disjoint LiquidAsset. Inference: ISA accounts are not liquid assets.
OntoGPT2023
Extract structured ontology-aligned knowledge from unstructured text. Example: a regulatory policy doc → ontology candidates.
Input doc: "All ISAs must have a 90-daywithdrawal restriction or pay a penaltyof 5% of withdrawn amount."OntoGPT extraction:Concept: ISA
Property: hasWithdrawalRestriction
range: Duration ≥ 90 days
Property: hasEarlyWithdrawalPenalty
range: Percentage = 5%
Ontology Learning Layer CakeCimiano & Mädche 2005
The canonical 7-layer pipeline that every ontology-learning method instantiates:
▲ Axioms (rules)
│ Relations
│ Concept hierarchies
│ Concepts
│ Synonyms
│ Terms
└─ Raw text / data
RDFS · the lightweight baseW3C 2004 · classes, subclasses, domain, range
Before full OWL reasoning, RDFS provides the semantic backbone — classes, sub-class relations, property domains and ranges. Every richer ontology compiles down to RDFS triples for portability.
From relational tables to a virtual knowledge graph
R2RML and RML declare the mapping from tabular data to RDF. OBDA / Ontop rewrite SPARQL queries into SQL against the live sources — no data is moved. Watch a SPARQL query traverse three banks live.
R2RML (W3C 2012)
RML (Ghent · 2014 · 2024 spec)
OBDA · Calvanese 2007+
Ontop
RDF (W3C 1999)
R2RML mapping fragmentW3C 2012 · the standard relational→RDF mapping languageStandard
Declarative TriplesMap: a logical source (SQL view), a subject template, predicate-object maps. RIGOR generates these automatically.
Open banking sources are API/JSON-heavy. RML adds a rml:source with format/iterator/reference so the same mapping grammar applies to PSD2 JSON responses.
Click ▶ Rewrite to watch SPARQL become federated SQL…
Tab 04 · Schema Matching
When three banks call the same thing three different names
balance · accountBalance · availableBalance — same concept, three labels. Cupid uses linguistic + structural similarity. COMA composes multiple matchers. Valentine benchmarks them. TaBERT does it with transformers.
Cupid · Madhavan-Bernstein-Rahm 2001
COMA / COMA++ 2002-2005
Valentine · TU Delft 2021
TaBERT / TURL / Tabbie 2020-21
LogMap (Oxford)
Three bank schemas · one shared conceptInteractive
Each bank labels the customer's spendable funds differently. The matcher must align them to the canonical concept ob:availableBalance.
No single matcher works in isolation. COMA runs linguistic (Levenshtein on labels), structural (graph topology of foreign keys), type (data-type compatibility), and instance (sample value overlap) matchers in parallel and combines with learned weights. The composite score is what the catalog records.
Valentine benchmark · is the match good?TU Delft 2021 · the matcher's report card
Valentine provides labeled benchmarks (TPC-DI, Magellan, OpenData) so you can measure precision/recall/F1 of any matcher. Without it, agentic schema matching is unverifiable. With it, the catalog can record: "Cupid on this corpus: F1 = 0.82. TaBERT: F1 = 0.91. We trust TaBERT above threshold 0.85 with steward review below."
Tab 05 · Data Contracts & Semantic Validation
Sarah's consent enters BudgetEye — is it valid?
ODCS captures the contract metadata (schema, SLA, ownership). SHACL validates the actual RDF graph against required shapes. ShEx is the compact alternative. DCAT publishes data products. ISO 11179 registers data elements. Together: enforceable semantic contracts.
ODCS · Open Data Contract Standard
SHACL (W3C 2017)
ShEx 2014+
DCAT v3 (W3C 2024)
Dublin Core
ISO/IEC 11179
ODCS contract for Sarah's consentOpen Data Contract Standard · YAMLContract
The TPP cannot consume the data product without an explicit contract. The contract names the schema, SLA, ownership, quality expectations, and access conditions.
Click ▶ Validate to run SHACL against three test consents…
ShEx · the compact alternative2014+
Shape Expressions — same job as SHACL, more compact syntax, easier to author by hand. Some practitioners pair the two: ShEx for definition, SHACL for execution.
Once the contract is valid, the data product is published via DCAT. Dataset, Distribution, DataService nodes. Combined with Dublin Core for metadata (title, creator, license).
ISO/IEC 11179 · the registry of data elements1990s+ · the discipline behind any glossary
Every data element in the contract — consent_id, customer_id, scope — is registered with a definition, name, identifier, value domain, and steward. ISO 11179 is what makes a bank's glossary defensible to a regulator.
Tab 06 · Provenance & Lineage
Marcus's transaction → BCBS report — five formal models
The Open Provenance Model gave us the basic vocabulary. W3C PROV made it a standard. Provenance Semirings gave us the algebra of trust. Why/Where/How gave us three formal questions to ask. OpenLineage made it operational.
W3C PROV-DM (2013)
OPM · Moreau 2008
Why-Prov · Cui-Widom 2000
Where-Prov · Buneman 2001
How-Prov · Green 2007
Provenance Semirings · PODS 2007
OpenLineage 2020
Provenance Semirings (PODS 2007)Green · Karvounarakis · Tannen · University of PennsylvaniaAlgebraic
Annotate each source tuple with a variable. Join (⊗) multiplies. Union (⊕) adds. The result is a polynomial — evaluate it in different semirings to get why-provenance, trust, confidence, multiplicity.
Why · Where · How — three formal questionsCui-Widom 2000 · Buneman-Khanna-Tan 2001 · Green-Tannen 2007
Every derived fact in the catalog can be queried three ways. Why: which source tuples contributed? Where: which source location did each value come from? How: which derivation steps were applied?
WHY-PROV (witness sets)
> Which tuples produced the BCBS line?
⊢ { txn(T-99041), pos(P-7741),
risk(R-CBS-EU), cust(C002) }
WHERE-PROV (locations)
> Where did "EUR Bund" come from?
⊢ TXN_DB.t-99041.instrument
@ ingested 2026-03-12T08:14Z
HOW-PROV (derivation)
> How was RWA 625K computed?
⊢ compose(
join(txn, position, on=client_id, instrument),
apply(risk_model.lookup(instrument_type)),
aggregate(group=counterparty_type, op=sum)
)
W3C PROV-DM · the standard representation2013 · Entity · Activity · AgentStandard
Provenance semirings provide the algebra. W3C PROV provides the interchange format. Every modern lineage tool (OpenLineage, Marquez, Atlas, Egeria) emits PROV-compatible events.
OPM (2008) introduced Artifact-Process-Agent as the three node types. W3C PROV refined this to Entity-Activity-Agent. OpenLineage (2020) made it operational — event-driven emission from every pipeline run. The catalog projects the events into a graph.
One question. Eight formal models converge on one decision. Business rules from DMN/SBVR/Datalog. Authorisation from RBAC/ABAC/ReBAC. Policy-as-code in OPA/Rego or Cedar. Watch them compose.
DMN 2015
SBVR 2008
Datalog 1977+
RuleML
RBAC 1996
ABAC · NIST 800-162
ReBAC · Zanzibar 2019
OPA / Rego 2018
AWS Cedar 2023
The decision · liveInteractive
Configure the request and watch every model produce its verdict. The catalog records the union of all eight decisions plus the final composed allow/deny.
SUBJECT
ACTION · scope
OBJECT
DMN · Decision Model and NotationOMG 2015
Business decisions as tables. Auditable. Owned by business, not engineering. Compiles to executable.
SBVR · Semantics of Business Vocabulary and RulesOMG 2008
Business rules in structured natural language. Bridges business and machine.
// Rule R-OB-001 · expressed in SBVR-SE
It is obligatory that each
consent has an expiry dateafter its creation date.
// Rule R-OB-002
It is prohibited that a TPPaccesses an account
without an active consent.
AWS Cedar · formally verifiedAWS 2023 · provable termination
permit (
principal in Role::"TPP_AISP",
action == Action::"ReadBalance",
resource
)
when {
resource.owner has consent
&& resource.owner.consent.tpp == principal
&& resource.owner.consent.status == "ACTIVE"
&& resource.owner.consent.scope.contains("balances:read")
};
Datalog · the recursion engine1977+ · the ancestor of Rego & SpiceDB
Both OPA Rego and Zanzibar's relationship resolver compile to Datalog or its evaluation. The reason: Datalog has natural recursion (closed under fixpoint) — exactly what you need to answer "is X reachable from Y via authorised edges?" in a graph.
Same question. Three query languages. Then watch GraphRAG ground an LLM.
SPARQL is W3C's standard for RDF. Cypher rose with Neo4j. ISO GQL standardised the property-graph world in 2024. Then GraphRAG retrieves over the graph for the LLM. KG embeddings (TransE, GraphSAGE) predict missing links and similar customers.
SPARQL 1.1 (W3C 2013)
Cypher
ISO GQL 2024
GraphRAG · MSR 2024
TransE 2013
DistMult / ComplEx / RotatE
Node2Vec · DeepWalk 2014-16
GraphSAGE · GAT · R-GCN 2017-19
Probabilistic Databases
Same question · three languagesStandards
"Find all customers who hold a card AND have a wealth portfolio above £1M, with their card tier." One question, expressed three ways:
MATCH (c:Customer)
WHERE c:Cardholder
AND c:WealthClient
AND c.aum > 1000000MATCH (c)-[:HOLDS_CARD]->(card)
RETURN
c.fullName AS name,
c.aum AS aum,
card.tier AS tier
ISO GQL · 2024
MATCH
(c:Customer&Cardholder
&WealthClient
{aum > 1000000})
-[:holds_card]->
(card:Card)
RETURN
c.full_name AS name,
c.aum,
card.tier
The standards story. SPARQL won the RDF/semantic world. Cypher won the property-graph world. ISO GQL (2024) is the convergence — same conceptual model, vendor-neutral. An agentic open-banking platform should emit all three from the same query abstraction, so the catalog can talk to whichever store the bank operates.
GraphRAG · ground the LLM with the graphMicrosoft Research 2024Live
The LLM alone hallucinates. RAG retrieves chunks. GraphRAG retrieves subgraphs — multi-hop, community-aware, with cited edges. Watch a banking question grounded through the KG.
AGENT QUESTION
"Show me all our PLATINUM cardholders who are also Wealth clients with AUM > £1M and have an active TPP consent. What's their typical risk profile?"
KG embeddings · link prediction & similarityTransE 2013 · DistMult · ComplEx · RotatE · GraphSAGE 2017ML
Embed every entity and relation as a vector. TransE uses h + r ≈ t (head + relation = tail). GraphSAGE samples neighbour aggregations. The catalog can then predict: missing edges, similar customers, suspicious graph patterns (fraud).
TransE example · predicting Sarah's likely TPPs
// Trained embeddings (illustrative)
e(Sarah) = [0.32, -0.18, 0.71, ...]
e(consents) = [0.05, 0.22, 0.11, ...]
// Predict: who is Sarah likely to consent to?
candidate = argmintpp ‖ e(Sarah) + e(consents) - e(tpp) ‖
Candidate TPP
Distance
Verdict
BudgetEye
0.12
already consented
MoneyHub
0.18
likely candidate
Plaid UK
0.21
possible
FraudCo
0.94
very unlikely (anomaly?)
GraphSAGE · customer similarity for upsell
// Aggregate neighbour signal
hSarah(k+1) = σ( W · [hSarah(k);
MEAN({hn(k) ∀ n ∈ N(Sarah)})] )
// Find similar customers
top5 = argmaxc cos(hSarah, hc)
The bank's marketing agent can now propose: "customers like Sarah upgraded to PLATINUM after 18 months" with cosine similarity as evidence — anchored in the KG, not in a black-box model.
Probabilistic Databases & Knowledge VaultSuciu et al. · Google 2014
Not every fact in the KG is certain. External-bank data, merchant tags, NER-extracted entities come with confidence scores. Probabilistic databases represent every tuple with a probability. Knowledge Vault (Google 2014) was the canonical example: 1.6B facts, each with a confidence.
// Each KG triple carries a probability
(Sarah, employed_by, Acme_Ltd) p = 0.92// from KYC docs
(Sarah, employed_by, Beta_Consulting) p = 0.41// from a salary credit
(Sarah, spends_at, GreenGrocer) p = 0.98// from card txn
Tab 09 · The Synthesis
The Unified Open Banking Knowledge Graph
Entities resolved (Fellegi-Sunter). Ontology applied (RIGOR + OWL + DL). Mapped from schemas (R2RML/RML). Validated against contracts (SHACL). Provenance attached (PROV + Semirings). Policy enforced (OPA + Zanzibar). Queried via SPARQL/Cypher/GQL. Retrieved via GraphRAG. The graph below is the composition of every model above, on Meridian's real rows.
The composed graph · all 45+ models, one canvasLive · Interactive
Toggle the layers to see how each formal model contributes a different facet. Click any node to see its provenance and attestations.
Resolved Entities
Ontology Classes
Provenance & Sources
Consents & TPPs
Policy & Access
Click any node to inspect its full attestation chain…
What this graph encodes
Source Rows
12
Resolved Entities
3
Ontology Axioms
17
Active Consents
2
PROV Edges
19
Policy Rules
8
Formal Models
45+
Years of Research
56
since Fellegi-Sunter
Three real customers. One regulated TPP. Forty-five formal models. Every claim in the graph is attributable to a source row, defended by an ontology axiom, validated by a contract, traceable through PROV, governed by a policy, and queryable through three standards. That is the academic contract of an agentic open-banking knowledge graph.
The final thesis. Modern open-banking catalogs (Actian, Atlan, Egeria, Glean, Ontop, AuthZed, Open Policy Agent, RIGOR-style generators) implement these academic models — often without naming them. Naming them explicitly, demonstrating each on the same substrate, and composing them in one interactive graph is what makes this lab faithful — to the research, to the regulator, and to the agent that has to act.