Databricks → Virtual Knowledge Graph with Virtuoso

Core Concepts

Key Concepts

AI Agent-Friendly Knowledge Graph

A knowledge graph designed for consumption by AI agents — featuring dereferenceable entity URIs, machine-readable RDF representations, SPARQL endpoints for structured querying, and navigable entity re

Content Negotiation

An HTTP mechanism where the server inspects the client's Accept header to determine the response format — returning an HTML description page for browsers and RDF (Turtle, JSON-LD, RDF/XML) for semanti

Graph Reasoning

The ability to traverse relationships and discover patterns across connected data using ontology-defined semantics rather than implicit foreign keys. In the bakehouse example, foreign keys become navi

Hyperlink-Based Entity Identity

The use of IRIs (Internationalized Resource Identifiers) as globally unique, dereferenceable entity identifiers — enabling entities from different systems (Databricks tables, external reference data,

Loosely Coupled Semantic Layer

A semantic enrichment layer that sits atop existing data platforms without tight coupling — replacing ETL-driven data copying with standards-based virtualization. The knowledge graph is an overlay, no

Semantic Enrichment

The process of augmenting relational data with ontology-defined classes (e.g., :Transaction, :Franchise), properties (e.g., :franchise, :totalPrice), and inferred relationships to produce a machine-re

Zero Data Movement

A defining characteristic of the virtual knowledge graph approach — Databricks tables are attached via ODBC as virtual references, not copied. SPARQL queries are translated to SQL and executed remotel

Virtual Knowledge Graph

A knowledge graph constructed over existing relational data without physical data movement — using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph

Value Proposition

Capabilities Enabled

🧠

Graph reasoning on existing data

SPARQL queries over Databricks tables without data movement — SQL is generated and executed at query time against the live Databricks warehouse.

🔗

Linked Data entity navigation

Every entity (customer, franchise, transaction) receives a dereferenceable HTTP URI with content negotiation — browsers see HTML descriptions, AI agents retrieve machine-readable RDF.

🏭

Production-grade infrastructure

ACID compliance through the underlying Databricks warehouse, federated SPARQL for cross-graph queries, and Virtuoso's proven SPARQL-to-SQL query federation.

🏷️

Semantic enrichment

R2RML mappings add ontology-defined classes, properties, and inferred relationships — transforming implicit foreign keys into navigable semantic links.

📡

Standards-based interoperability

W3C standards (RDF, SPARQL, R2RML) prevent vendor lock-in — the knowledge graph is portable and queryable by any standards-compliant SPARQL client or AI agent.

⚡

Zero data movement

Virtuoso's Virtual Database attaches Databricks tables via ODBC as virtual references — no ETL, no duplication, no stale copies.

Open Standards

W3C Standards & Technologies

Graph Reasoning

Semantic Enrichment

Virtual Knowledge Graph

Step-by-Step Guide

How to Build a Virtual Knowledge Graph from Databricks

Step 1

Create the ODBC DSN

Configure a Databricks ODBC Data Source Name (DSN) — install the Databricks ODBC Driver, obtain workspace Host/HTTP Path/Personal Access Token, and populate odbcinst.ini and odbc.ini with the driver and connection details.

Step 2

Register the DSN in Virtuoso

Access the External Data Sources Manager in Virtuoso Conductor at /conductor/vdb_conn_dsn.vspx to register the databricks_odbc DSN for use by Virtuoso's Virtual Database engine.

Step 3

Connect to the Data Source

Find databricks_odbc in the External Data Sources list, click Connect, supply the username (token) and password (personal access token), and establish the ODBC connection from Virtuoso to the Databricks SQL warehouse.

Step 4

Clone the Demo Repository

Clone the companion GitHub repository containing ODBC templates, R2RML mapping files, ontology, and setup scripts: git clone https://github.com/danielhmills/databricks-sample-kg.git.

Step 5

Run the Quick Setup Script

Execute quick_setup.sql via isql — this script attaches each Databricks table (ATTACH TABLE ... FROM 'databricks_odbc'), grants SPARQL_SELECT privileges, loads the R2RML mapping and ontology via SPARQL LOAD, generates quad maps via R2RML_MAKE_QM_FROM_G, and configures URL rewrite rules for Linked Data content negotiation.

Step 6

Test the Attached Tables

Verify the virtual attachment by running a SQL SELECT query against the virtual tables (e.g., SELECT TOP 10 * FROM databricks.bakehouse.sales_customers) via isql or the Conductor iSQL UI.

Step 7

Verify the Knowledge Graph with SPARQL

Run SPARQL queries against the virtual knowledge graph — test entity type discovery (SELECT * FROM <...> WHERE { ?s a ?o }), cross-table joins (Revenue by Franchise), and CONSTRUCT queries for graph visualization in SPARQLWorks. Confirm dereferenceable entity URIs respond to both browsers and RDF clients via content negotiation.

SPARQL Query Examples

Ready-to-run queries against the bakehouse virtual knowledge graph. SELECT uses text/x-html+tr; CONSTRUCT uses text/x-html-nice-turtle.

1CONSTRUCT for Graph Visualization (SPARQLWorks)

PREFIX : <http://www.databricks.com/bakehouse#>

CONSTRUCT
{
  ?transaction a :Transaction;
   :franchise ?franchise;
   :totalPrice ?totalPrice.

  ?franchise :city ?franchiseCity.
}
WHERE
{
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl>
  {
    ?transaction a :Transaction;
     :franchise ?franchise;
     :totalPrice ?totalPrice.

    ?franchise :city ?franchiseCity.
  }
}
LIMIT 100

▶ Open at linkeddata.uriburner.com/sparql

CONSTRUCT uses text/x-html-nice-turtle result format.

2Entity Type Discovery

SELECT *
WHERE {
  GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
  
    ?s a ?o
  
  }
}
LIMIT 10

▶ Open at linkeddata.uriburner.com/sparql