โ˜ฐ Navigate

From Databricks Tables
to a Virtual Knowledge Graph with Virtuoso

A step-by-step guide demonstrating how to unlock machine-computable entity relationships across existing Databricks data without migration, duplication, or platform lock-in โ€” using RDF, SPARQL, R2RML, and Virtuoso's Virtual Database to create an AI Agent-friendly Knowledge Graph deployed on the Semantic Web.

Source: OpenLink Community ยท 2026-06-09 ยท Author: danielhm

Core Concepts

AI Agent-Friendly Knowledge Graph

A knowledge graph designed for consumption by AI agents โ€” featuring dereferenceable entity URIs, machine-readable RDF representations, SPARQL endpoints for structured querying, and navigable entity re

Content Negotiation

An HTTP mechanism where the server inspects the client's Accept header to determine the response format โ€” returning an HTML description page for browsers and RDF (Turtle, JSON-LD, RDF/XML) for semanti

Graph Reasoning

The ability to traverse relationships and discover patterns across connected data using ontology-defined semantics rather than implicit foreign keys. In the bakehouse example, foreign keys become navi

Hyperlink-Based Entity Identity

The use of IRIs (Internationalized Resource Identifiers) as globally unique, dereferenceable entity identifiers โ€” enabling entities from different systems (Databricks tables, external reference data,

Loosely Coupled Semantic Layer

A semantic enrichment layer that sits atop existing data platforms without tight coupling โ€” replacing ETL-driven data copying with standards-based virtualization. The knowledge graph is an overlay, no

Semantic Enrichment

The process of augmenting relational data with ontology-defined classes (e.g., :Transaction, :Franchise), properties (e.g., :franchise, :totalPrice), and inferred relationships to produce a machine-re

Zero Data Movement

A defining characteristic of the virtual knowledge graph approach โ€” Databricks tables are attached via ODBC as virtual references, not copied. SPARQL queries are translated to SQL and executed remotel

Virtual Knowledge Graph

A knowledge graph constructed over existing relational data without physical data movement โ€” using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph

Value Proposition
๐Ÿง 

Graph reasoning on existing data

SPARQL queries over Databricks tables without data movement โ€” SQL is generated and executed at query time against the live Databricks warehouse.

๐Ÿ”—

Linked Data entity navigation

Every entity (customer, franchise, transaction) receives a dereferenceable HTTP URI with content negotiation โ€” browsers see HTML descriptions, AI agents retrieve machine-readable RDF.

๐Ÿญ

Production-grade infrastructure

ACID compliance through the underlying Databricks warehouse, federated SPARQL for cross-graph queries, and Virtuoso's proven SPARQL-to-SQL query federation.

๐Ÿท๏ธ

Semantic enrichment

R2RML mappings add ontology-defined classes, properties, and inferred relationships โ€” transforming implicit foreign keys into navigable semantic links.

๐Ÿ“ก

Standards-based interoperability

W3C standards (RDF, SPARQL, R2RML) prevent vendor lock-in โ€” the knowledge graph is portable and queryable by any standards-compliant SPARQL client or AI agent.

โšก

Zero data movement

Virtuoso's Virtual Database attaches Databricks tables via ODBC as virtual references โ€” no ETL, no duplication, no stale copies.

Open Standards

Graph Reasoning

The ability to traverse relationships and discover patterns across connected data using ontology-defined semantics rather than implicit foreign keys. In the bakehouse example, foreign keys become navi

Semantic Enrichment

The process of augmenting relational data with ontology-defined classes (e.g., :Transaction, :Franchise), properties (e.g., :franchise, :totalPrice), and inferred relationships to produce a machine-re

Virtual Knowledge Graph

A knowledge graph constructed over existing relational data without physical data movement โ€” using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph

Step-by-Step Guide
Step 1

Create the ODBC DSN

Configure a Databricks ODBC Data Source Name (DSN) โ€” install the Databricks ODBC Driver, obtain workspace Host/HTTP Path/Personal Access Token, and populate odbcinst.ini and odbc.ini with the driver and connection details.

Step 2

Register the DSN in Virtuoso

Access the External Data Sources Manager in Virtuoso Conductor at /conductor/vdb_conn_dsn.vspx to register the databricks_odbc DSN for use by Virtuoso's Virtual Database engine.

Step 3

Connect to the Data Source

Find databricks_odbc in the External Data Sources list, click Connect, supply the username (token) and password (personal access token), and establish the ODBC connection from Virtuoso to the Databricks SQL warehouse.

Step 4

Clone the Demo Repository

Clone the companion GitHub repository containing ODBC templates, R2RML mapping files, ontology, and setup scripts: git clone https://github.com/danielhmills/databricks-sample-kg.git.

Step 5

Run the Quick Setup Script

Execute quick_setup.sql via isql โ€” this script attaches each Databricks table (ATTACH TABLE ... FROM 'databricks_odbc'), grants SPARQL_SELECT privileges, loads the R2RML mapping and ontology via SPARQL LOAD, generates quad maps via R2RML_MAKE_QM_FROM_G, and configures URL rewrite rules for Linked Data content negotiation.

Step 6

Test the Attached Tables

Verify the virtual attachment by running a SQL SELECT query against the virtual tables (e.g., SELECT TOP 10 * FROM databricks.bakehouse.sales_customers) via isql or the Conductor iSQL UI.

Step 7

Verify the Knowledge Graph with SPARQL

Run SPARQL queries against the virtual knowledge graph โ€” test entity type discovery (SELECT * FROM <...> WHERE { ?s a ?o }), cross-table joins (Revenue by Franchise), and CONSTRUCT queries for graph visualization in SPARQLWorks. Confirm dereferenceable entity URIs respond to both browsers and RDF clients via content negotiation.

SPARQL Query Examples

Ready-to-run queries against the bakehouse virtual knowledge graph. SELECT uses text/x-html+tr; CONSTRUCT uses text/x-html-nice-turtle.

PREFIX : <http://www.databricks.com/bakehouse#>

CONSTRUCT
{
  ?transaction a :Transaction;
   :franchise ?franchise;
   :totalPrice ?totalPrice.

  ?franchise :city ?franchiseCity.
}
WHERE
{
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl>
  {
    ?transaction a :Transaction;
     :franchise ?franchise;
     :totalPrice ?totalPrice.

    ?franchise :city ?franchiseCity.
  }
}
LIMIT 100
โ–ถ Open at linkeddata.uriburner.com/sparql

CONSTRUCT uses text/x-html-nice-turtle result format.

2Entity Type Discovery
SELECT *
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?s a ?o
  
  }
}
LIMIT 10
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?orderSizeBucket
(COUNT(?transaction) AS ?transactionCount)
(SUM(?totalPrice) AS ?totalRevenue)
(ROUND(AVG(?totalPrice)) AS ?avgValue)
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?transaction a :Transaction;
     :totalPrice ?totalPrice.
    BIND(
      IF(?totalPrice < 10, "Small (<$10)",
      IF(?totalPrice < 30, "Medium ($10-$30)",
      IF(?totalPrice < 60, "Large ($30-$60)",
      "Enterprise (>$60)")))
      AS ?orderSizeBucket)
  
  }
}
GROUP BY ?orderSizeBucket
ORDER BY DESC(?totalRevenue)
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?city
(COUNT(DISTINCT ?transaction) AS ?orderCount)
(SUM(?totalPrice) AS ?revenue)
(ROUND(AVG(?totalPrice)) AS ?avgOrderValue)
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?transaction a :Transaction;
     :franchise ?franchise;
     :totalPrice ?totalPrice.
    ?franchise :city ?city.
  
  }
}
GROUP BY ?city
ORDER BY DESC(?revenue)
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?franchise
?franchiseCity
SUM(?totalPrice) as ?revenue
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?transaction a :Transaction;
     :franchise ?franchise;
     :totalPrice ?totalPrice.
  
    ?franchise :city ?franchiseCity.
  
  }
}
GROUP BY ?franchise ?franchiseCity
ORDER BY DESC(?revenue)
LIMIT 10
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?supplierName
?ingredientName
(COUNT(DISTINCT ?franchise) AS ?franchisesServed)
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?supply a :SupplyContract;
     :supplier ?supplier;
     :ingredient ?ingredient;
     :franchise ?franchise.
    ?supplier :supplierName ?supplierName.
    ?ingredient :ingredientName ?ingredientName.
  
  }
}
GROUP BY ?supplierName ?ingredientName
ORDER BY DESC(?franchisesServed)
LIMIT 15
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?customerName
(COUNT(DISTINCT ?transaction) AS ?purchaseCount)
(SUM(?totalPrice) AS ?lifetimeValue)
(ROUND(SUM(?totalPrice)/COUNT(DISTINCT ?transaction)) AS ?avgPurchaseValue)
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?transaction a :Transaction;
     :customer ?customer;
     :totalPrice ?totalPrice.
    ?customer :customerName ?customerName.
  
  }
}
GROUP BY ?customerName
ORDER BY DESC(?lifetimeValue)
LIMIT 10
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

PREFIX : <http://www.databricks.com/bakehouse#>

SELECT
?franchiseName
?city
(COUNT(?review) AS ?reviewCount)
(ROUND(AVG(?rating))*10/10 AS ?avgRating)
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?review a :Review;
     :franchise ?franchise;
     :rating ?rating.
    ?franchise :city ?city.
    OPTIONAL { ?franchise :franchiseName ?franchiseName. }
  
  }
}
GROUP BY ?franchiseName ?city
HAVING (COUNT(?review) >= 3)
ORDER BY DESC(?avgRating)
LIMIT 10
โ–ถ Open at linkeddata.uriburner.com/sparql

SELECT uses text/x-html+tr result format.

FAQ

A Virtual Knowledge Graph (VKG) is a semantic layer constructed over existing relational data without physical data movement. It uses virtual database attachment (ODBC/JDBC) to connect to source systems, R2RML mappings to declare how tables and columns map to RDF classes and properties, and SPARQL-to-SQL query translation to execute graph queries against the live relational data at query time. The data stays in place; the semantics are layered on top.

Traditional ETL-based approaches: extract data from source, transform to RDF, load into a triplestore โ€” creating a copy that must be kept in sync. The virtual approach: no extraction, no transformation pipeline, no load step, no stale copies. R2RML mappings define the semantic model declaratively; SPARQL queries execute against live data. Changes in Databricks tables are immediately visible in SPARQL results. The trade-off is query latency versus data freshness โ€” the virtual approach prioritizes freshness and simplicity over raw graph traversal speed.

Yes. Virtuoso's Virtual Database supports any ODBC or JDBC data source โ€” PostgreSQL, MySQL, Oracle, SQL Server, Snowflake, BigQuery, and many others. The same pattern applies: attach tables via ODBC, define R2RML mappings, load the ontology, generate quad maps, and query via SPARQL. The knowledge graph becomes a unified semantic layer spanning multiple heterogeneous data platforms.

SQL knowledge for table attachment and testing; basic understanding of RDF and SPARQL for query writing; familiarity with R2RML for mapping design (the Turtle syntax is straightforward for anyone comfortable with data modelling); and Virtuoso administration basics (Conductor UI or isql command-line). The companion GitHub repository provides templates and working examples โ€” most practitioners can have a working virtual knowledge graph running within an hour.

Databricks provides graph capabilities within its platform, but Virtuoso adds: (1) standards-based interoperability via W3C RDF/SPARQL/R2RML โ€” no vendor lock-in; (2) Linked Data entity URIs enabling cross-system entity navigation; (3) federated SPARQL across multiple data sources; (4) AI Agent-friendly machine-readable RDF with content negotiation; and (5) a loosely coupled architecture where the knowledge graph is an overlay, not yet another silo.

No. The virtual knowledge graph approach achieves zero data movement. Databricks tables are attached via ODBC as virtual references โ€” they appear in Virtuoso's local catalog but no data is copied. SPARQL queries are translated to SQL at query time and executed remotely against the live Databricks SQL warehouse. This means the knowledge graph always reflects current data without ETL pipelines or stale copies.

R2RML (RDB to RDF Mapping Language) is a W3C recommendation that defines how relational database tables, columns, and foreign keys map to RDF classes, properties, and relationships. It provides a declarative, standards-based bridge between the SQL world and the semantic graph world โ€” no custom code needed. The mapping is a Turtle file that can be version-controlled, reviewed, and reused across projects.

AI agents consume the virtual knowledge graph through three standard interfaces: (1) SPARQL endpoint โ€” structured graph queries returning typed results with entity IRIs; (2) Linked Data entity URIs โ€” dereferenceable HTTP URIs that return RDF (Turtle, JSON-LD, RDF/XML) when requested with the appropriate Accept header; (3) HTML entity descriptions โ€” human-readable pages with navigable hyperlinks for agentic browsing. No custom API, SDK, or platform-specific integration is required โ€” any standards-compliant SPARQL client or HTTP agent can interact with the graph.

Quad Maps are Virtuoso's internal representation of R2RML mappings โ€” they define how SPARQL graph patterns translate to SQL queries against virtual tables. The R2RML_MAKE_QM_FROM_G function converts declarative R2RML Turtle mappings into optimized Quad Maps that Virtuoso's SPARQL engine uses at query time. This is what makes SPARQL queries executable against remote relational tables without data movement.

Content negotiation is an HTTP mechanism where the server inspects the client's Accept header to determine the response format. For the bakehouse knowledge graph: browsers requesting text/html receive a human-readable HTML description page with navigable links; AI agents and RDF clients requesting application/ld+json or text/turtle receive machine-readable RDF. This single-URI, multi-format approach means the same entity identifier works for both human exploration and automated agent consumption โ€” a cornerstone of Linked Data and the Semantic Web.

The virtual knowledge graph approach using ODBC attachment is best suited for batch reasoning, GraphRAG over reference data, or exploratory graph analytics. For millisecond-latency graph traversals or high-frequency transactional writes, a native graph store may be more appropriate. However, for the majority of enterprise AI agent use cases โ€” where agents need to discover entity relationships, traverse connections, and retrieve structured context โ€” the performance characteristics are more than adequate.

You need: (1) a Databricks SQL warehouse (Serverless, Pro, or Classic) to provide the ODBC endpoint; (2) the Databricks ODBC Driver installed on the Virtuoso host; (3) a Personal Access Token for authentication; and (4) the workspace Host and HTTP Path for the SQL warehouse. The public samples.bakehouse dataset provides the example tables โ€” your own Databricks catalogs and schemas work the same way.

A knowledge graph designed for AI agent consumption โ€” featuring dereferenceable entity URIs, machine-readable RDF, SPARQL endpoints, and navigable entity relationships without custom API integration.
An HTTP mechanism where a server returns different representations of the same resource based on the client's Accept header โ€” HTML for browsers, RDF for semantic agents.
A globally unique, dereferenceable HTTP identifier for a knowledge graph entity (e.g., http://demo.openlinksw.com/databricks/bakehouse/franchise-3000046#this). When dereferenced, returns either human-readable HTML or machine-readable RDF depending on the client.
A method of publishing structured data on the web using HTTP URIs, RDF, and content negotiation โ€” enabling entities to be interlinked across systems and consumed by both humans and machines.
A standard API for accessing database management systems โ€” used by Virtuoso to connect to Databricks SQL warehouses for virtual table attachment.
A formal vocabulary defining classes, properties, and relationships within a domain. The bakehouse ontology defines Transaction, Franchise, Customer classes and franchise, totalPrice, city properties.
Virtuoso's internal compiled representation of R2RML mapping rules โ€” the SPARQL engine uses quad maps at query time to translate SPARQL graph patterns into SQL queries against virtual tables.
A W3C recommendation that declaratively defines how relational database tables, columns, primary keys, and foreign keys map to RDF classes, properties, and object relationships.
SPARQL Protocol and RDF Query Language โ€” the W3C standard query language for RDF graphs. In the virtual knowledge graph, SPARQL queries are translated to SQL and executed against the live Databricks warehouse.
An extension of the World Wide Web where data is given well-defined meaning through standards like RDF, SPARQL, and OWL โ€” enabling machines and AI agents to reason about interconnected data across system boundaries.
Virtuoso's capability to attach remote tables from ODBC/JDBC data sources โ€” tables appear in the local catalog as virtual references without copying data.
A knowledge graph constructed over existing relational data without physical data movement โ€” using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph querying.

Knowledge Graph Explorer

157 nodes / 273 links
Concepts Creative Works Organizations Software Web APIs Classes Properties Instances All None

Click inside graph to activate zoom ยท Click outside to release ยท Drag nodes to pin ยท Double-click to unpin

Explore Knowledge Graph using SPARQL

Run live query

SELECT uses text/x-html+tr. DESCRIBE and CONSTRUCT use text/x-html-nice-turtle.