A step-by-step guide demonstrating how to unlock machine-computable entity relationships across existing Databricks data without migration, duplication, or platform lock-in โ using RDF, SPARQL, R2RML, and Virtuoso's Virtual Database to create an AI Agent-friendly Knowledge Graph deployed on the Semantic Web.
A knowledge graph designed for consumption by AI agents โ featuring dereferenceable entity URIs, machine-readable RDF representations, SPARQL endpoints for structured querying, and navigable entity re
An HTTP mechanism where the server inspects the client's Accept header to determine the response format โ returning an HTML description page for browsers and RDF (Turtle, JSON-LD, RDF/XML) for semanti
The ability to traverse relationships and discover patterns across connected data using ontology-defined semantics rather than implicit foreign keys. In the bakehouse example, foreign keys become navi
The use of IRIs (Internationalized Resource Identifiers) as globally unique, dereferenceable entity identifiers โ enabling entities from different systems (Databricks tables, external reference data,
A semantic enrichment layer that sits atop existing data platforms without tight coupling โ replacing ETL-driven data copying with standards-based virtualization. The knowledge graph is an overlay, no
The process of augmenting relational data with ontology-defined classes (e.g., :Transaction, :Franchise), properties (e.g., :franchise, :totalPrice), and inferred relationships to produce a machine-re
A defining characteristic of the virtual knowledge graph approach โ Databricks tables are attached via ODBC as virtual references, not copied. SPARQL queries are translated to SQL and executed remotel
A knowledge graph constructed over existing relational data without physical data movement โ using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph
SPARQL queries over Databricks tables without data movement โ SQL is generated and executed at query time against the live Databricks warehouse.
Every entity (customer, franchise, transaction) receives a dereferenceable HTTP URI with content negotiation โ browsers see HTML descriptions, AI agents retrieve machine-readable RDF.
ACID compliance through the underlying Databricks warehouse, federated SPARQL for cross-graph queries, and Virtuoso's proven SPARQL-to-SQL query federation.
R2RML mappings add ontology-defined classes, properties, and inferred relationships โ transforming implicit foreign keys into navigable semantic links.
W3C standards (RDF, SPARQL, R2RML) prevent vendor lock-in โ the knowledge graph is portable and queryable by any standards-compliant SPARQL client or AI agent.
Virtuoso's Virtual Database attaches Databricks tables via ODBC as virtual references โ no ETL, no duplication, no stale copies.
The ability to traverse relationships and discover patterns across connected data using ontology-defined semantics rather than implicit foreign keys. In the bakehouse example, foreign keys become navi
The process of augmenting relational data with ontology-defined classes (e.g., :Transaction, :Franchise), properties (e.g., :franchise, :totalPrice), and inferred relationships to produce a machine-re
A knowledge graph constructed over existing relational data without physical data movement โ using virtual database attachment to expose tables, R2RML to define semantic mappings, and SPARQL for graph
Configure a Databricks ODBC Data Source Name (DSN) โ install the Databricks ODBC Driver, obtain workspace Host/HTTP Path/Personal Access Token, and populate odbcinst.ini and odbc.ini with the driver and connection details.
Access the External Data Sources Manager in Virtuoso Conductor at /conductor/vdb_conn_dsn.vspx to register the databricks_odbc DSN for use by Virtuoso's Virtual Database engine.
Find databricks_odbc in the External Data Sources list, click Connect, supply the username (token) and password (personal access token), and establish the ODBC connection from Virtuoso to the Databricks SQL warehouse.
Clone the companion GitHub repository containing ODBC templates, R2RML mapping files, ontology, and setup scripts: git clone https://github.com/danielhmills/databricks-sample-kg.git.
Execute quick_setup.sql via isql โ this script attaches each Databricks table (ATTACH TABLE ... FROM 'databricks_odbc'), grants SPARQL_SELECT privileges, loads the R2RML mapping and ontology via SPARQL LOAD, generates quad maps via R2RML_MAKE_QM_FROM_G, and configures URL rewrite rules for Linked Data content negotiation.
Verify the virtual attachment by running a SQL SELECT query against the virtual tables (e.g., SELECT TOP 10 * FROM databricks.bakehouse.sales_customers) via isql or the Conductor iSQL UI.
Run SPARQL queries against the virtual knowledge graph โ test entity type discovery (SELECT * FROM <...> WHERE { ?s a ?o }), cross-table joins (Revenue by Franchise), and CONSTRUCT queries for graph visualization in SPARQLWorks. Confirm dereferenceable entity URIs respond to both browsers and RDF clients via content negotiation.
Ready-to-run queries against the bakehouse virtual knowledge graph. SELECT uses text/x-html+tr; CONSTRUCT uses text/x-html-nice-turtle.
PREFIX : <http://www.databricks.com/bakehouse#>
CONSTRUCT
{
?transaction a :Transaction;
:franchise ?franchise;
:totalPrice ?totalPrice.
?franchise :city ?franchiseCity.
}
WHERE
{
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl>
{
?transaction a :Transaction;
:franchise ?franchise;
:totalPrice ?totalPrice.
?franchise :city ?franchiseCity.
}
}
LIMIT 100
โถ Open at linkeddata.uriburner.com/sparql
CONSTRUCT uses text/x-html-nice-turtle result format.
SELECT *
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?s a ?o
}
}
LIMIT 10
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?orderSizeBucket
(COUNT(?transaction) AS ?transactionCount)
(SUM(?totalPrice) AS ?totalRevenue)
(ROUND(AVG(?totalPrice)) AS ?avgValue)
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?transaction a :Transaction;
:totalPrice ?totalPrice.
BIND(
IF(?totalPrice < 10, "Small (<$10)",
IF(?totalPrice < 30, "Medium ($10-$30)",
IF(?totalPrice < 60, "Large ($30-$60)",
"Enterprise (>$60)")))
AS ?orderSizeBucket)
}
}
GROUP BY ?orderSizeBucket
ORDER BY DESC(?totalRevenue)
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?city
(COUNT(DISTINCT ?transaction) AS ?orderCount)
(SUM(?totalPrice) AS ?revenue)
(ROUND(AVG(?totalPrice)) AS ?avgOrderValue)
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?transaction a :Transaction;
:franchise ?franchise;
:totalPrice ?totalPrice.
?franchise :city ?city.
}
}
GROUP BY ?city
ORDER BY DESC(?revenue)
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?franchise
?franchiseCity
SUM(?totalPrice) as ?revenue
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?transaction a :Transaction;
:franchise ?franchise;
:totalPrice ?totalPrice.
?franchise :city ?franchiseCity.
}
}
GROUP BY ?franchise ?franchiseCity
ORDER BY DESC(?revenue)
LIMIT 10
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?supplierName
?ingredientName
(COUNT(DISTINCT ?franchise) AS ?franchisesServed)
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?supply a :SupplyContract;
:supplier ?supplier;
:ingredient ?ingredient;
:franchise ?franchise.
?supplier :supplierName ?supplierName.
?ingredient :ingredientName ?ingredientName.
}
}
GROUP BY ?supplierName ?ingredientName
ORDER BY DESC(?franchisesServed)
LIMIT 15
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?customerName
(COUNT(DISTINCT ?transaction) AS ?purchaseCount)
(SUM(?totalPrice) AS ?lifetimeValue)
(ROUND(SUM(?totalPrice)/COUNT(DISTINCT ?transaction)) AS ?avgPurchaseValue)
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?transaction a :Transaction;
:customer ?customer;
:totalPrice ?totalPrice.
?customer :customerName ?customerName.
}
}
GROUP BY ?customerName
ORDER BY DESC(?lifetimeValue)
LIMIT 10
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
PREFIX : <http://www.databricks.com/bakehouse#>
SELECT
?franchiseName
?city
(COUNT(?review) AS ?reviewCount)
(ROUND(AVG(?rating))*10/10 AS ?avgRating)
WHERE {
GRAPH <https://linkeddata.uriburner.com/DAV/demos/daas/databricks-virtuoso-kg-deepseek_v4pro-1.ttl> {
?review a :Review;
:franchise ?franchise;
:rating ?rating.
?franchise :city ?city.
OPTIONAL { ?franchise :franchiseName ?franchiseName. }
}
}
GROUP BY ?franchiseName ?city
HAVING (COUNT(?review) >= 3)
ORDER BY DESC(?avgRating)
LIMIT 10
โถ Open at linkeddata.uriburner.com/sparql
SELECT uses text/x-html+tr result format.
A Virtual Knowledge Graph (VKG) is a semantic layer constructed over existing relational data without physical data movement. It uses virtual database attachment (ODBC/JDBC) to connect to source systems, R2RML mappings to declare how tables and columns map to RDF classes and properties, and SPARQL-to-SQL query translation to execute graph queries against the live relational data at query time. The data stays in place; the semantics are layered on top.
Traditional ETL-based approaches: extract data from source, transform to RDF, load into a triplestore โ creating a copy that must be kept in sync. The virtual approach: no extraction, no transformation pipeline, no load step, no stale copies. R2RML mappings define the semantic model declaratively; SPARQL queries execute against live data. Changes in Databricks tables are immediately visible in SPARQL results. The trade-off is query latency versus data freshness โ the virtual approach prioritizes freshness and simplicity over raw graph traversal speed.
Yes. Virtuoso's Virtual Database supports any ODBC or JDBC data source โ PostgreSQL, MySQL, Oracle, SQL Server, Snowflake, BigQuery, and many others. The same pattern applies: attach tables via ODBC, define R2RML mappings, load the ontology, generate quad maps, and query via SPARQL. The knowledge graph becomes a unified semantic layer spanning multiple heterogeneous data platforms.
SQL knowledge for table attachment and testing; basic understanding of RDF and SPARQL for query writing; familiarity with R2RML for mapping design (the Turtle syntax is straightforward for anyone comfortable with data modelling); and Virtuoso administration basics (Conductor UI or isql command-line). The companion GitHub repository provides templates and working examples โ most practitioners can have a working virtual knowledge graph running within an hour.
Databricks provides graph capabilities within its platform, but Virtuoso adds: (1) standards-based interoperability via W3C RDF/SPARQL/R2RML โ no vendor lock-in; (2) Linked Data entity URIs enabling cross-system entity navigation; (3) federated SPARQL across multiple data sources; (4) AI Agent-friendly machine-readable RDF with content negotiation; and (5) a loosely coupled architecture where the knowledge graph is an overlay, not yet another silo.
No. The virtual knowledge graph approach achieves zero data movement. Databricks tables are attached via ODBC as virtual references โ they appear in Virtuoso's local catalog but no data is copied. SPARQL queries are translated to SQL at query time and executed remotely against the live Databricks SQL warehouse. This means the knowledge graph always reflects current data without ETL pipelines or stale copies.
R2RML (RDB to RDF Mapping Language) is a W3C recommendation that defines how relational database tables, columns, and foreign keys map to RDF classes, properties, and relationships. It provides a declarative, standards-based bridge between the SQL world and the semantic graph world โ no custom code needed. The mapping is a Turtle file that can be version-controlled, reviewed, and reused across projects.
AI agents consume the virtual knowledge graph through three standard interfaces: (1) SPARQL endpoint โ structured graph queries returning typed results with entity IRIs; (2) Linked Data entity URIs โ dereferenceable HTTP URIs that return RDF (Turtle, JSON-LD, RDF/XML) when requested with the appropriate Accept header; (3) HTML entity descriptions โ human-readable pages with navigable hyperlinks for agentic browsing. No custom API, SDK, or platform-specific integration is required โ any standards-compliant SPARQL client or HTTP agent can interact with the graph.
Quad Maps are Virtuoso's internal representation of R2RML mappings โ they define how SPARQL graph patterns translate to SQL queries against virtual tables. The R2RML_MAKE_QM_FROM_G function converts declarative R2RML Turtle mappings into optimized Quad Maps that Virtuoso's SPARQL engine uses at query time. This is what makes SPARQL queries executable against remote relational tables without data movement.
Content negotiation is an HTTP mechanism where the server inspects the client's Accept header to determine the response format. For the bakehouse knowledge graph: browsers requesting text/html receive a human-readable HTML description page with navigable links; AI agents and RDF clients requesting application/ld+json or text/turtle receive machine-readable RDF. This single-URI, multi-format approach means the same entity identifier works for both human exploration and automated agent consumption โ a cornerstone of Linked Data and the Semantic Web.
The virtual knowledge graph approach using ODBC attachment is best suited for batch reasoning, GraphRAG over reference data, or exploratory graph analytics. For millisecond-latency graph traversals or high-frequency transactional writes, a native graph store may be more appropriate. However, for the majority of enterprise AI agent use cases โ where agents need to discover entity relationships, traverse connections, and retrieve structured context โ the performance characteristics are more than adequate.
You need: (1) a Databricks SQL warehouse (Serverless, Pro, or Classic) to provide the ODBC endpoint; (2) the Databricks ODBC Driver installed on the Virtuoso host; (3) a Personal Access Token for authentication; and (4) the workspace Host and HTTP Path for the SQL warehouse. The public samples.bakehouse dataset provides the example tables โ your own Databricks catalogs and schemas work the same way.
Click inside graph to activate zoom ยท Click outside to release ยท Drag nodes to pin ยท Double-click to unpin
SELECT uses text/x-html+tr. DESCRIBE and CONSTRUCT use text/x-html-nice-turtle.