The Running Example

One customer. One question. Six ontology models.

Every tab uses the same data. Watch the same question get progressively richer answers as the ontology gets richer.

Sarah Chen

Customer · age 36 · Singapore tax resident

Accounts at Meridian Bank
acct_isa_001 · ISA · £42,000
acct_curr_001 · Current · £8,000
port_wm_001 · Portfolio · £420,000

The agent's question

"Can Sarah cover a £5,000 emergency this week without penalty?"

Progress through the lab

Overview · why ontology at all

RDFS · the grammar lesson

OWL · the contract clauses

Description Logics · the detective

RIGOR · the apprentice librarian

Layer Cake · the staircase

OntoGPT · the policy reader

Tab 00 · Overview

Why ontology, in plain English

A database knows that Sarah has a row called "ISA" worth £42,000. It does not know what an ISA is. An ontology fixes that — and unlocks an agent's ability to reason instead of just look up.

In Plain English

The difference between knowing a field and knowing a meaning

A column in a database is a name with values. An ontology is what makes that name actually mean something to software — and once it means something, software can reason about it.

Without ontology, software sees fields. Sarah’s ISA is a row with the text "ISA" and the number 42000. The software does not know that an ISA is a kind of savings account, that savings accounts have withdrawal penalties, or that withdrawal penalties make money less easy to spend.
An ontology adds the rules of the world. "ISA is a kind of savings account." "Savings accounts have penalties on early withdrawal." "Money with penalties on withdrawal is not freely spendable." Now the software has the missing connections.
Now the agent can answer real questions. Asked "can Sarah cover £5,000 this week without penalty?" the agent does not just sum her accounts. It checks which accounts are actually spendable without penalty. The current account (£8,000) is. The ISA (£42,000) is not. The portfolio (£420,000) takes 3-5 days to liquidate. The answer is yes — £8,000 in the current account covers it.
This lab walks the six models that make this possible. RDFS gives the basic vocabulary. OWL adds rules and constraints. Description Logics powers the reasoning engine. RIGOR generates the ontology from the bank’s existing schemas. The Layer Cake organises the build process. OntoGPT extracts new concepts from regulatory text.
The whole point. An ontology is what lets an agent be helpful rather than literal. Without one, it can only show you the columns. With one, it can answer the question you actually asked.

The six models · what each one contributes Preview

A cheat-sheet for what you’ll see across the next six tabs.

Tab	Model	Year	The everyday analogy	What it adds to the agent
01	RDFS	2004	The grammar lesson	Vocabulary: things, kinds, relations
02	OWL / OWL 2	2004 / 2012	The contract with clauses	Rules: required, forbidden, mutually exclusive
03	Description Logics	1980s+	The detective	Inference: derives new facts from the rules
04	RIGOR	2025	The apprentice librarian	Automation: builds the ontology from existing data
05	Layer Cake	2005	The staircase	Method: order the build from terms up to axioms
06	OntoGPT	2023	The policy reader	Extraction: pulls concepts from regulatory text

How to read this lab. Each tab opens with its plain-English explainer at the top — five steps, sharp prose, no jargon. Then a working demo on Sarah’s data. Then the formal technical detail for the doctoral or architect reader. The plain-English bit is the primary thing.

Tab 01 · RDFS — Resource Description Framework Schema

The grammar lesson

Before software can reason about Sarah’s ISA, it needs the basic vocabulary: things, what kind of thing they are, and how they relate. RDFS is the minimum a machine needs to start understanding.

In Plain English

RDFS is your old grammar lesson, but for facts

Remember diagramming sentences in school? Subject, verb, object. RDFS turns every fact in the bank’s data into exactly that — a three-part sentence software can read.

Everything becomes a sentence with three pieces. Subject, predicate, object. "Sarah holds acct_isa_001." "acct_isa_001 is a kind of ISA." "ISA is a kind of SavingsAccount." Three facts, three sentences, three rows in the knowledge base.
The "is a kind of" relationship is the magic. RDFS gives you one word for it: rdfs:subClassOf. Now you can say ISA is a subclass of SavingsAccount, and SavingsAccount is a subclass of Account. Software follows the chain automatically — if something is true of an Account, it’s true of an ISA too.
Properties get rules too. "Holds" connects a Customer to an Account — never the other way round. RDFS lets you say so: "the property 'holds' goes from Customer (domain) to Account (range)." Anything you accidentally wire up wrong gets flagged.
Labels and comments make it human-readable. Every class and property gets a human-friendly name and a description. So the technical term fibo:LegalPerson can have the label "Legal Person" and a description "An entity recognised in law as having rights and obligations." Software still uses the technical name; humans read the friendly one.
Why this is the floor. RDFS gives the agent a way to follow a chain: Sarah → holds → acct_isa_001 → which is an → ISA → which is a → SavingsAccount → which is an → Account. Walking that chain is how the agent finds out that Sarah’s ISA, however the database labels it, is fundamentally an Account. Everything richer (OWL, DL) builds on this floor.

RDFS triples for Sarah’s accounts · the actual facts Interactive

Click ▶ to build the RDFS knowledge base for Sarah, one triple at a time. Every triple is a sentence.

Technical Detail

RDFS as a W3C Recommendation

RDFS (Resource Description Framework Schema) is a W3C Recommendation from 2004 that extends RDF with vocabulary for describing classes and properties. Built on RDF’s subject-predicate-object triple model, RDFS adds the core schema vocabulary needed to declare type hierarchies and property semantics.

Core vocabulary. rdfs:Class declares a class. rdfs:subClassOf declares a subclass relation. rdfs:subPropertyOf declares property subsumption. rdfs:domain and rdfs:range declare the type signature of a property. rdfs:label and rdfs:comment attach human-readable annotations. rdf:type (from RDF itself) asserts membership.

Entailment. RDFS entailment is decidable in polynomial time. A reasoner can derive: if ?x rdf:type ISA and ISA rdfs:subClassOf SavingsAccount, then ?x rdf:type SavingsAccount. This is the transitive closure of subclass and subproperty relationships — strictly weaker than OWL DL inference.

Limitations. RDFS cannot express cardinality constraints, disjointness, equivalence, property restrictions, inverse properties, or transitive properties. For those, OWL (Tab 02) is required. RDFS is the minimum semantic layer; richer semantics layer on top without breaking it.

Citation. Brickley D. & Guha R.V. (eds.) (2004). RDF Schema 1.1. W3C Recommendation. Updated 2014 for RDF 1.1 alignment.

Tab 02 · OWL / OWL 2 — Web Ontology Language

The contract with clauses

RDFS gives vocabulary. OWL gives rules. Required clauses, forbidden combinations, mutual exclusions, equivalence — the things a contract spells out so there’s no ambiguity.

In Plain English

OWL is a contract with clauses, written for software

A contract doesn’t just list the parties. It says what’s required, what’s forbidden, who’s the same as whom, and which things are mutually exclusive. OWL gives an ontology those same powers.

Required clauses (cardinality). "Every Account must have exactly one Owner." Not zero, not two — exactly one. OWL writes this as owl:cardinality 1. Any data that violates it is flagged before it goes into production.
Mutually exclusive clauses (disjointness). "An account is either a SavingsAccount or a CurrentAccount, but not both." OWL writes this as owl:disjointWith. The contract makes that fork in the road formal.
Equivalence clauses. "A WealthClient is exactly a Customer who holds a Portfolio worth more than £100,000." OWL lets you define a class by its properties using owl:equivalentClass. Anyone in the data who matches the definition is automatically classified as a WealthClient — no human curation needed.
Property restrictions. "A SavingsAccount has at least one WithdrawalPenalty." OWL writes this as ∃ hasPenalty.WithdrawalPenalty (read: some value of hasPenalty must be a WithdrawalPenalty). This is the clause that makes Sarah’s ISA recognisably different from her current account.
The pay-off. Once these clauses exist, software can answer richer questions. "Is Sarah’s ISA freely spendable?" No — the OWL contract says it has a withdrawal penalty, and accounts with penalties are by definition not freely spendable. The contract did the work; the agent didn’t need to be told.

OWL axioms for Sarah’s ISA · the clauses Interactive

Click ▶ to stream the OWL ontology fragment that governs Sarah’s accounts. Each axiom adds one clause to the contract.

Technical Detail

OWL 2 DL semantics and the SROIQ logic

OWL (W3C Recommendation 2004, OWL 2 in 2012) is built on RDF/RDFS and corresponds to fragments of first-order logic with controlled expressiveness, chosen so that reasoning remains decidable. OWL 2 DL — the most expressive sub-language commonly used — corresponds to the Description Logic SROIQ(D) (Horrocks-Kutz-Sattler 2006).

Class constructors. Intersection (owl:intersectionOf, ⊓), union (owl:unionOf, ⊔), complement (owl:complementOf, ¬), existential restriction (owl:someValuesFrom, ∃R.C), universal restriction (owl:allValuesFrom, ∀R.C), cardinality restrictions (qualified and unqualified).

Axioms. owl:equivalentClass for definitional equivalence; owl:disjointWith for class disjointness; owl:sameAs / owl:differentFrom for individual identity; owl:inverseOf, owl:TransitiveProperty, owl:FunctionalProperty, owl:SymmetricProperty, owl:AsymmetricProperty for property characteristics.

The three OWL 2 profiles. EL (PTIME, optimised for very large class hierarchies — used in SNOMED CT); QL (FO-rewritable, optimised for query answering over data sources — the OBDA foundation); RL (rule-based, scales linearly with data — implementable in standard rule engines). Full OWL 2 DL is N2EXPTIME-complete in the worst case but typically efficient on practical ontologies (≤ 10⁴ axioms).

Citations. W3C OWL Working Group (2012). OWL 2 Web Ontology Language: Document Overview. W3C Recommendation. Horrocks I., Kutz O., Sattler U. (2006). The Even More Irresistible SROIQ. KR. Motik B. et al. (2009). OWL 2 Web Ontology Language: Profiles.

Tab 03 · Description Logics

The detective

The ontology has the rules and the facts. Description Logics is the engine that combines them, like a detective combining clues, to deduce new facts nobody wrote down — including the answer to the agent’s question.

In Plain English

A detective who turns rules and clues into new facts

You have some rules of the world. You have some clues about Sarah. A Description Logic reasoner combines them and tells you things nobody explicitly wrote down — including the answer the agent needs.

Start with the rules of the world (the "TBox"). "An ISA is a kind of SavingsAccount." "Every SavingsAccount has a withdrawal penalty." "Anything with a withdrawal penalty is, by definition, not a LiquidAsset." "A Customer who owns a LiquidAsset is LiquidityReady."
Add the actual data (the "ABox"). "Sarah owns acct_isa_001." "acct_isa_001 is an ISA." "Sarah also owns acct_curr_001." "acct_curr_001 is a CurrentAccount, and CurrentAccounts are LiquidAssets."
The detective starts chaining. Step 1: acct_isa_001 is an ISA, so it’s a SavingsAccount. Step 2: SavingsAccounts have withdrawal penalties, so acct_isa_001 has one. Step 3: things with withdrawal penalties are not LiquidAssets, so acct_isa_001 is not a LiquidAsset. Nobody wrote step 3 in the database. The detective deduced it.
The other path. Sarah owns acct_curr_001, which is a LiquidAsset. The rule says a Customer who owns a LiquidAsset is LiquidityReady. Therefore: Sarah is LiquidityReady. Via the current account, not the ISA.
The agent’s question, finally answered. Can Sarah cover £5,000 this week without penalty? Yes — through acct_curr_001 (£8,000, fully liquid). The ISA would technically cover it but with a 5% penalty (£250). The reasoner gave the agent both the answer and the right reason.

DL reasoner trace · Sarah’s ISA Interactive

Click ▶ to watch the reasoner chain through TBox axioms and ABox facts to derive new conclusions. Yellow rows = derived. Green = the final answer.

Technical Detail

SROIQ semantics and tableau-based reasoning

Description Logics (DLs) are decidable fragments of first-order logic developed since the 1980s to support knowledge representation with tractable reasoning. The DL underlying OWL 2 DL is SROIQ — comprising role hierarchies (R), nominals (O), inverse roles (I), and qualified number restrictions (Q), among others.

TBox / ABox separation. The TBox (terminological box) contains schema-level axioms — class inclusions, equivalences, and role characteristics. The ABox (assertional box) contains data-level facts — class memberships and role assertions for individuals. Together they form a knowledge base K = ⟨T, A⟩.

Inference services. A DL reasoner provides: consistency checking (is K satisfiable?), subsumption (does C ⊑ D follow from T?), classification (compute the full subsumption hierarchy), instance checking (is a:C entailed by K?), realisation (compute the most specific class for each individual).

Algorithms. Modern reasoners (HermiT, ELK, Pellet, FaCT++) implement tableau-based decision procedures for SROIQ — which is N2EXPTIME-complete in the worst case but typically efficient on practical ontologies. ELK is optimised for OWL 2 EL and runs in polynomial time on biomedical-scale ontologies.

Open-world semantics. Unlike databases, DLs adopt the Open-World Assumption (OWA): the absence of a fact does not imply its negation. Combined with the Unique Name Assumption being optional, this gives DL inference its distinctive power — and makes integrity constraints (very different in flavour from SQL constraints) a topic in their own right.

Citations. Baader F., Calvanese D., McGuinness D., Nardi D., Patel-Schneider P. (2003, 2nd ed. 2010). The Description Logic Handbook. Cambridge UP. Horrocks I., Kutz O., Sattler U. (2006). The Even More Irresistible SROIQ. KR. Glimm B. et al. (2014). HermiT: An OWL 2 Reasoner. JAR.

Tab 04 · RIGOR — Retrieval-Augmented Iterative Generation of RDB Ontologies

The apprentice librarian

A bank already has hundreds of database tables. RIGOR is what reads them — and the existing finance reference books — to generate the ontology automatically, with a senior librarian checking every page.

In Plain English

An apprentice librarian who reads your tables and writes the ontology

Building an ontology by hand takes months. RIGOR is an LLM-powered process that does it in minutes — by reading the database schemas, consulting the reference books (FIBO, BIAN), and getting every page checked by a senior reviewer before adding it.

Start with nothing. The ontology begins empty. No classes, no properties, no axioms. The apprentice has a stack of database tables to process.
Pick a table. Look at customers. Read its columns: customer_id, name, dob, tax_residency, kyc_status, segment. Each column is a clue to a domain concept.
Consult the reference books. The apprentice has access to FIBO (the financial industry ontology) and BIAN (the banking architecture network reference). They look up "customer" and find fibo:LegalPerson, "kyc_status" and find the regulatory categories.
Write a draft (the Gen-LLM). The apprentice proposes: "Customer is a subclass of fibo:LegalPerson. Customer has a property hasKycStatus whose range is {VERIFIED, PENDING, REJECTED}. Customer has a property hasTaxResidency whose range is country-code."
The senior librarian checks (the Judge-LLM). Is the draft consistent with what’s already in the ontology? Does it cover all the columns? Is it syntactically valid OWL? If yes, merge. If no, send back for revision.
Move to the next table and iterate. Process accounts, then positions, then consents, following the foreign-key relationships so each table adds to what came before. After all tables are processed, you have a complete, validated OWL ontology. Months of architect work in minutes.

RIGOR pipeline · 4 banking tables → full ontology Interactive

Click ▶ to watch the iterative loop process four tables, each consulting FIBO/BIAN and validated before merging.

Technical Detail

Two-LLM iteration with retrieval grounding

RIGOR (Retrieval-Augmented Iterative Generation of RDB Ontologies; 2025) combines schema introspection, retrieval over domain ontologies (FIBO, BIAN, ISO 20022), and a two-LLM architecture (Gen + Judge) to produce OWL 2 DL ontologies from relational database schemas with controlled quality.

Topological ordering. Tables are processed in foreign-key dependency order — referenced tables before referencing ones. This ensures that when accounts is processed, the Customer class it references already exists in the growing core ontology.

Retrieval context per table. For table T_i, the retrieval index returns: (a) T_i’s DDL with column types and constraints, (b) DDLs of FK-related tables, (c) embedding-matched fragments from FIBO / BIAN / ISO 20022, (d) the current core ontology O_{i−1}.

Gen-LLM output. A delta-ontology fragment ΔO_i in OWL 2 DL syntax — class declarations, equivalence axioms, property restrictions, cardinality constraints, datatype/object property declarations with domain and range.

Judge-LLM validation. Three checks: (i) syntactic well-formedness (parsed as valid OWL); (ii) logical consistency with O_{i−1} (no contradiction — optionally verified by a classical DL reasoner like HermiT or ELK in the loop); (iii) coverage — does ΔO_i describe all columns of T_i? Failed checks trigger revision requests up to k_max iterations.

Output guarantees. The final ontology O_n is OWL 2 DL compliant, retains provenance metadata for every axiom (source table, retrieval context, iteration index), and is itself queryable via SPARQL.

Citation. Nayyeri M. et al. (2025). RIGOR: Retrieval-Augmented Iterative Generation of RDB Ontologies. Builds on W3C R2RML (2012) for mapping primitives and OWL 2 (2012) for the target language.

Tab 05 · Ontology Learning Layer Cake

The staircase

Building an ontology isn’t one step — it’s seven, each layer built on the one below. The Layer Cake (Cimiano-Mädche 2005) is the canonical recipe: start with raw words, end with formal axioms.

In Plain English

A staircase from raw words to enforceable rules

Imagine building a house. You don’t paint the walls before laying the foundation. The Layer Cake says: build an ontology in seven layers, lowest to highest, and skip none of them.

Layer 1 · Raw text and data. The starting point. Open banking regulatory documents, the bank’s database schemas, product descriptions, policy memos. Just text and tables. No structure yet.
Layer 2 · Terms. Pull out every important word. "ISA", "savings account", "withdrawal", "penalty", "AUM", "consent", "TPP". Just the words people use — no organisation.
Layer 3 · Synonyms. Group the words that mean the same thing. "ISA" and "Individual Savings Account." "AUM" and "assets under management." "Customer" and "client" and "account holder." Now the terminology stops fighting itself.
Layer 4 · Concepts. Turn synonym groups into formal concepts. "Customer" becomes the concept Customer. "ISA" becomes the concept ISA. These are the building blocks; they don’t yet have parent or child relationships.
Layer 5 · Concept hierarchies. Arrange the concepts as a family tree. ISA is a kind of SavingsAccount. SavingsAccount is a kind of Account. WealthClient is a kind of Customer. This is the spine of the ontology.
Layer 6 · Relations. Wire concepts together. Customer holds Account. Account has Balance. Customer resides in Country. These are the named connections.
Layer 7 · Axioms. Add the enforceable rules. "Every Account is held by exactly one Customer." "ISA accounts have a withdrawal penalty." "A WealthClient’s AUM exceeds £100,000." Now the ontology can reason, not just describe.

The Layer Cake applied to Sarah’s ISA Visual

Each layer below shows what the ontology looks like at that stage of construction — and what example content lives there for the running ISA case.

L1
Raw text & data
Schemas · regulatory PDFs · product specs
L2
Terms
"ISA" · "savings" · "penalty" · "withdrawal" · "AUM"
L3
Synonyms
"ISA" ≡ "Individual Savings Account" · "AUM" ≡ "assets under management"
L4
Concepts
ISA · SavingsAccount · Account · Customer · Penalty
L5
Concept hierarchies
ISA ⊑ SavingsAccount ⊑ Account
L6
Relations
Customer holds Account · Account hasPenalty Penalty
L7
Axioms
SavingsAccount ⊑ ∃ hasPenalty.WithdrawalPenalty · ⇒ ¬ LiquidAsset

How the layers relate to the other tabs. Layer 1 is what Tab 04 (RIGOR) and Tab 06 (OntoGPT) consume as input. Layers 2-4 are what RDFS (Tab 01) gives you. Layers 5-6 are what RDFS plus simple OWL covers. Layer 7 is what OWL DL (Tab 02) and the Description Logic reasoner (Tab 03) operate on.

Technical Detail

Cimiano & Mädche’s ontology learning pipeline

The Ontology Learning Layer Cake (Cimiano & Mädche, 2005) is the canonical staged model for systematic ontology construction. It separates the learning task into seven well-defined sub-tasks, each with established techniques and evaluable output.

The seven layers in detail. (L1) Corpus: text/data inputs. (L2) Terms: extracted via TF-IDF, C-value/NC-value, or domain-specific tokenisation. (L3) Synonyms: clustered via distributional similarity (Lin similarity, word embeddings) or curated lexicons (WordNet, FinancialTerms). (L4) Concepts: formed by mapping synonym sets to canonical labels with disambiguation. (L5) Concept hierarchies: built via lexico-syntactic patterns (Hearst patterns) or distributional inclusion. (L6) Relations: extracted via verb-frame parsing or LLM-based triple extraction. (L7) Axioms: induced from data or extracted from formal specification text.

Why the order matters. Each layer is a checkpoint with measurable quality. Skipping layers leads to ontologies whose lower-layer disagreements compound into upper-layer contradictions. The Layer Cake makes this failure mode visible at the layer where it occurs.

Modern reinterpretation. Tools like RIGOR (Tab 04) collapse the cake by using LLMs to jump from L1 directly to L7 in one iterative loop — but the conceptual sequence still describes what must internally happen. The Cake stays useful as the discipline of what to evaluate at each stage, even when the implementation is end-to-end.

Citation. Cimiano P. & Mädche A. (2005). Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications, IOS Press. Subsequent surveys: Wong et al. (2012), Asim et al. (2018).

Tab 06 · OntoGPT

The policy reader

Regulators publish guidance in prose. Product teams write specs in narrative. OntoGPT (2023) reads that prose and pulls out ontology-shaped knowledge — concepts, properties, restrictions — that the agent can use.

In Plain English

An assistant that reads regulatory prose and pulls out the rules

A regulator writes "All ISAs must have a minimum 90-day notice or pay a 5% early-withdrawal penalty." A human can read that. OntoGPT lets software read it too — and turn it into ontology entries that everything else in the lab can consume.

Feed it a regulatory paragraph. A product memo, an FCA guidance note, a BCBS standard, a MAS notice. Anything in normal English (or German, or French) describing how things work.
It picks out the concepts. "ISA" is a concept. "Notice period" is a concept. "Early-withdrawal penalty" is a concept. It distinguishes these from filler words like "must" and "have."
It picks out the properties. "ISA has a notice period." "Notice period has duration 90 days." "ISA has an early-withdrawal penalty." "Penalty has amount 5%."
It picks out the restrictions. The "must" word is important. It signals an axiom: every ISA must satisfy this rule. OntoGPT promotes this into an OWL restriction — cardinality, value range, required type.
The output is ontology-shaped. The result isn’t a summary paragraph — it’s actual OWL axioms ready to merge into the bank’s ontology. The reasoner from Tab 03 can immediately use them. The agent now knows that Sarah’s ISA has a 90-day notice and a 5% penalty, because the regulator said so — and that reasoning trail is traceable.

Extract OWL axioms from a regulatory paragraph Interactive

Click ▶ to watch OntoGPT process a real-shaped policy paragraph about ISA withdrawals and emit OWL axioms ready for the ontology.

Technical Detail

LLM-based ontology-aligned information extraction

OntoGPT (2023) is an LLM-based knowledge extraction system that pairs a target ontology schema (in LinkML or OWL) with a prompt-templated extraction loop, producing structured outputs that conform to the target schema. Unlike free-form extraction, OntoGPT’s output is constrained to known classes, properties, and value ranges.

Three-stage pipeline. (i) Schema specification: the user provides a target schema fragment naming the classes to extract and their properties. (ii) Prompted extraction: an LLM (typically GPT-4-class) is prompted with the text and the schema; it emits candidate instances. (iii) Schema validation: outputs are checked against the schema; non-conforming instances are rejected or revised via a second LLM pass.

Provenance. Every extracted instance retains a pointer to the source span — start offset, end offset, source document URI. This makes the extraction auditable: "where did this axiom come from?" has a literal answer.

Domain track record. Originally developed for biomedical concept extraction (rare disease ontology Mondo, gene ontology terms). Adapts cleanly to other regulated domains — finance, legal, clinical guidelines — where the source text is structured prose with stable terminology.

Composition with RIGOR. RIGOR (Tab 04) generates the schema-derived ontology from databases. OntoGPT extracts the text-derived axioms from policy documents. Combined, the bank gets a unified ontology covering both the data and the rules that govern the data — both regulator-traceable.

Citation. Caufield J.H. et al. (2023). Structured prompt interrogation and recursive extraction of semantics (SPIRES). Bioinformatics. The associated tool: OntoGPT — github.com/monarch-initiative/ontogpt.