Google Cloud introduced OKF v0.1 on June 12, 2026 — a Markdown+YAML format for AI knowledge sharing. This infographic situates OKF in the broader semantic web landscape, modelled as an RDF Knowledge Graph.
The Open Knowledge Format is a vendor-neutral, agent- and human-friendly format for representing metadata, context, and curated knowledge for AI systems — built from Markdown files with YAML frontmatter.
A Markdown file with YAML frontmatter. The only mandatory field is type. All other fields (title, description, resource, tags, timestamp) and body sections are producer-defined conventions.
A portable directory hierarchy of ConceptDocuments with index.md entry points per subdirectory. Shippable as a tarball or hosted in a git repository alongside code.
An AI producer that walks data sources (BigQuery datasets) and automatically drafts OKF concept docs — with a second LLM pass to add citations, join paths, and metric definitions.
Converts any OKF bundle into a single self-contained interactive HTML graph file — no backend, no cloud service required. A KnowledgeConsumer that renders without modifying the bundle.
Three governing principles ensure OKF remains minimally opinionated, portable, and LLM-friendly.
Only type is required. Types, additional fields, and body sections are left entirely to the producer. The spec defines interoperability surface, not the content model.
Knowledge writing is fully decoupled from consumption. Humans, AI enrichers, and export tools can all produce; AI agents, visualizers, and search indexes can all consume independently.
No proprietary accounts, services, SDKs, or cloud providers required. OKF is a file-format spec — it lives anywhere files live.
Example OKF bundle from the blog post — modelled as an okf:KnowledgeBundle in the companion RDF-Turtle instance-data file.
In the RDF knowledge graph, the customer_id column in orders carries an explicit okf:isForeignKeyTo triple pointing to the customers TableDocument — a relationship that OKF can only express in prose.
Both approaches share the same goals. The comparison below shows where OKF has advantages, where RDF/LD/SPARQL goes further, and where they are equivalent.
| Dimension | OKF (Markdown + YAML) | RDF / Linked Data / SPARQL |
|---|---|---|
| Entity Identity | Limitation Relative file paths — break on restructuring, no global uniqueness |
Advantage HTTP IRIs — globally unique, dereferenceable, persistent across organisations |
| Queryability | Limitation File-system traversal or LLM prompting — no standard query language |
Advantage SPARQL 1.1 — SELECT, CONSTRUCT, aggregates, property paths, federation |
| Semantic Fidelity | Limitation Semantics implicit in prose. type is free-form — software cannot distinguish synonyms |
Advantage Explicit rdf:type from OWL ontologies. XSD datatypes. Machine-verifiable |
| Federation | Limitation Not supported — combining bundles requires custom engineering |
Advantage SPARQL SERVICE keyword — span remote endpoints in one query |
| Inference | Limitation None — LLM reasoning is probabilistic and not reproducible |
Advantage RDFS/OWL reasoners derive new facts deterministically |
| Shared Vocabulary | Limitation Free-form strings — semantic islands across organisations |
Advantage schema.org, PROV-O, SKOS, FOAF — reuse = automatic interoperability |
| Version Control | Parity Markdown diffs are human-readable in GitHub PRs |
Parity Turtle diffs are also human-readable and semantically interpretable |
| Human Authoring | Advantage Markdown is the most widely-adopted human readable format — zero new tooling |
Trade-off Turtle is readable but has a learning curve; LLM generation reduces the gap |
| LLM Friendliness | Advantage Designed for LLM context windows; Markdown+YAML is well-represented in training data |
Advantage Frontier LLMs generate valid Turtle reliably; SPARQL from natural language works well |
| Adoption Barrier | Advantage Near-zero — any developer can produce OKF with a text editor |
Trade-off Triplestore setup needed; LLM tooling now significantly lowers the practical barrier |
The four principles that turn RDF into the Web of Data — giving every entity a permanent, globally dereferenceable address.
Every entity — a table, metric, person, or concept — must have a globally unique IRI as its identity. This makes entities unambiguous across any system. OKF uses relative file paths, which are bundle-local and non-global.
IRIs should use the http: or https: scheme so that any agent can resolve the identifier over the Web and retrieve a description of the entity.
When an HTTP IRI is looked up, the server should return structured RDF data describing the entity — enabling fully machine-readable discovery of schema, relationships, and provenance.
RDF descriptions should include owl:sameAs and other linking predicates to related entities in external knowledge graphs, so agents can navigate the Web of Data.
These queries run against the RDF knowledge graph generated from the OKF blog post. Each demonstrates a capability that OKF's Markdown+YAML format cannot provide natively.
🔗 Named graph: okf-instance-data-claude_sonnet_4_6-1.ttl — 76 triples · endpoint: URIBurner SPARQL
Joins okf:OrderRecord and okf:CustomerRecord instances via the shared okf:customerId data property — a real SQL-style JOIN over row-level instance data, not a description of the schema.
Counts distinct customers who placed at least one order in the week of 2026-06-01, computing the WeeklyActiveUsers metric directly from okf:OrderRecord instance data.
Aggregates revenue across all okf:OrderRecord instances per customer, joining to okf:CustomerRecord for names — SPARQL GROUP BY aggregation over actual row data.
Eight capabilities that RDF+LD+SPARQL provide that Markdown+YAML cannot.
HTTP IRIs give every entity a permanent, globally unique address that any agent on the Web can look up. OKF file paths are bundle-local and break on restructuring.
SPARQL provides SELECT, CONSTRUCT, ASK, DESCRIBE, aggregates, and property paths — a complete query algebra. OKF's only query mechanism is LLM prompting or custom parsing code.
A single SPARQL query can span your local graph, DBpedia, Wikidata, and any SPARQL endpoint simultaneously. OKF bundles are isolated islands requiring custom engineering to combine.
OWL and RDFS reasoners derive new facts from existing RDF automatically — e.g., symmetric join relationships, subclass membership, inverse properties. This is reproducible; LLM reasoning over OKF is probabilistic.
Using schema.org, PROV-O, SKOS, and domain ontologies makes your knowledge automatically interoperable with every other publisher using those vocabularies. OKF's free-form type string creates semantic islands.
RDF and Linked Data connect your knowledge graph to billions of existing RDF triples in DBpedia, Wikidata, schema.org-annotated pages, government open data, and scientific datasets.
SHACL validators can verify that a knowledge graph conforms to a shape — every table has a name, every metric has a source table, etc. OKF "validation" means having an LLM read the prose.
PROV-O is natively composable with RDF — every triple or named graph carries prov:wasGeneratedBy, prov:wasAttributedTo, prov:generatedAtTime. OKF relies on a timestamp YAML field.
OKF and RDF/LD/SPARQL are complementary layers, not competitors.
OKF solves knowledge capture and authoring for teams with zero semantic web expertise. RDF/LD/SPARQL solves knowledge integration, reasoning, and Web-scale federation. The optimal architecture is OKF-style Markdown as the human authoring surface with an RDF extraction step that converts OKF bundles into queryable knowledge graphs — combining OKF's ease of production with RDF's power for consumption.
Extract YAML frontmatter from all OKF .md files into structured objects.
Map fields to RDF properties — type → rdf:type; title → schema:name; resource → schema:url.
Mint HTTP IRIs — replace relative paths with dereferenceable identifiers and load into a SPARQL triplestore.
Three structural layers every OKF document must follow, each linked to its knowledge graph entity.
Every OKF document begins with a YAML block delimited by ---. Required field: type (string, producer-defined). Optional standard fields: title, description, resource (URL), tags (list), timestamp (ISO 8601).
Following the YAML block, the document body is standard GitHub-Flavored Markdown. Typical sections include Schema (table definitions), Joins, Sample Queries, Notes, and Related Documents.
Documents within a bundle reference each other via relative Markdown links (e.g., [customers](/tables/customers.md)). This preserves portability across hosting environments.
Common questions about the Open Knowledge Format and its relationship to RDF, Linked Data, and SPARQL.
OKF is a vendor-neutral, minimally opinionated specification for packaging knowledge as Markdown files with YAML frontmatter. Each file is a typed knowledge document — concept, dataset, table, metric, index, or runbook — and the only mandatory field is type. OKF is designed so that LLM producers can generate and maintain bundles without human bookkeeping overhead.
OKF addresses the fragmented context landscape: in most organizations, internal knowledge (table schemas, metric definitions, runbooks, API notices) is scattered across incompatible proprietary systems. Every AI agent builder must re-assemble context from scratch, and knowledge becomes locked behind platform-specific APIs. OKF gives any agent or tool a portable, human-readable, LLM-friendly knowledge starting point.
The only mandatory field is type, specified in the YAML frontmatter. All other fields — title, description, resource, join, columns, formula, etc. — are optional and contextually meaningful based on the declared document type. This minimally opinionated design lowers the barrier to producing OKF-compliant knowledge documents.
OKF defines six core document types: concept (narrative definitions and explanations), dataset (pointers to tabular data sources), table (relational schemas with column definitions and FK relationships), metric (business metric formulas and aggregation logic), index (bundle manifests that group related documents), and runbook (operational procedures and step-by-step guides).
Traditional data catalogs are platform-specific SaaS products with vendor APIs and proprietary storage. OKF documents are plain Markdown files readable by humans, LLMs, and any text-processing tool without licensing, SDKs, or vendor lock-in. The core design principle is Format Not Platform: the specification is a file format, not a product.
Producer/consumer independence means knowledge producers (data engineers, domain experts, LLMs) write OKF Markdown files without knowing how consumers will use them. Consumers (LLMs, BI tools, enrichment agents, SPARQL endpoints) read standard OKF documents without depending on the producer's toolchain. Each side evolves independently, reducing coupling and increasing reuse.
OKF and RDF/SPARQL are complementary layers: OKF is the authoring layer optimized for human and LLM production via simple Markdown + YAML; RDF/SPARQL is the query and integration layer enabling Web-scale federation, SPARQL reasoning, and global linking via HTTP IRIs. OKF bundles can be extracted to RDF triples by mapping type fields to ontology classes and minting dereferenceable IRIs — combining OKF's ease of production with RDF's power for knowledge consumption.
Key terms defined by or central to the Open Knowledge Format (OKF v0.1) specification, each linked to its knowledge graph entity.
---. Contains the mandatory type field and optional fields such as title, description, resource, join, columns, and formula. YAML frontmatter is both human-editable and machine-parseable.