Anthropic Self-Service Analytics with Claude

Compared against the data-twingler reusable SKILL.md approach — where multi-language querying, Linked Data provenance, and declarative analytics go beyond SQL.

Source: claude.com/blog · June 3, 2026 · Enterprise AI

Overview 🔗

Anthropic's Data Science team describes how they enable self-service business analytics using Claude Code, achieving ~95% automation of business queries at ~95% accuracy. Their approach centers on a 4-layer agentic analytics stack designed to address three failure modes that plague LLM-driven data querying.

This infographic presents their approach and compares it with the data-twingler skill from the ai-agent-skills repository — a reusable SKILL.md that any agent environment can load, offering SQL, SPARQL, SPASQL, SPARQL-FED, and GraphQL via natural language, with inherent Linked Data provenance.

Three Failure Modes (Anthropic) 🔗

Anthropic identifies these as the root causes of inaccurate analytics responses:

🔀

Concept-Entity Ambiguity

With hundreds of viable fields, the agent cannot map a user's question to the correct fields. E.g., "active users" — what actions count? What lookback window?

Data Staleness

Data sources, business definitions, and schemas change constantly. Agent knowledge goes stale, returning subtly wrong answers.

🔍

Retrieval Failure

The right information exists in the data model but the agent cannot find it given the vastness of the search space.

1

Data Foundations SQL-only

Canonical datasets, enforced standards via tooling/CI/mandate, colocated artifacts (data code + semantic layer + dashboards in one repo), metadata as first-class product. Dimensional modeling and shift-left testing remain essential.

2

Sources of Truth Semantic layer mandatory

Semantic layer (compiled metrics, mandatory first path), lineage/transformation graph, distilled query corpus, business context (company knowledge graph). Agents are structurally routed to governed metrics first.

3

Skills Markdown + CI

Pairwise skills (knowledge router + unbook procedure) with LLM-oriented reference docs. Skill maintenance colocated with data model changes. Synced across surfaces (Slack, IDE, dashboards, standalone sessions).

4

Validation Evals + adversarial

Offline evals (dashboard + long-tail), ablation at PR granularity, online adversarial review, provenance footer, passive monitoring, active correction harvesting. Gate launches per domain at ~90% eval threshold.

1

Local-First Vector Search Step 0

Before any endpoint call, scan local RDF directories, extract candidates (schema:Question, schema:DefinedTerm, schema:HowTo, skos:Concept), embed the user's prompt, return cosine-similarity matches above 0.75 threshold. Inverts the traditional remote-first workflow.

2

KG-Hybrid Graph Discovery bif:contains + vvec

Two parallel search strategies: keyword (bif:contains) as primary, server-side vector similarity (vvec:cosine_similarity_openai) as fallback. Semantic variant retry — up to 3 rephrasings before escalating to fallback endpoints.

3

Predefined Template Matching T1–T8

8 templates matched to trigger phrases: T1 (data space), T2 (KG), T3 (KG+inference), T4 (federated), T5 (HowTo), T6 (Q&A), T7 (DefinedTerm), T8 (entity description). No query executes until template matching is attempted.

4

Multi-Language Execution SQL + SPARQL + more

6 execution modes: direct curl → URIBurner REST functions → OAuth2 → MCP → chatPromptComplete → OPAL Agent. Covers SQL, SPARQL, SPASQL, SPARQL-FED, and GraphQL. Default endpoint: linkeddata.uriburner.com/sparql.

5

Linked Data Entity Denotation Inherent provenance

All entity identifiers hyperlinked via linkeddata.uriburner.com/describe/?uri={url_encoded_entity}. Every result carries resolvable URIs to source entities in the KG — provenance is structural, not cosmetic.

Query Language Support

Anthropic
SQL-only, routed through a governed semantic layer. Raw SQL fallback when semantic layer has no coverage.
data-twingler
SQL + SPARQL + SPASQL + SPARQL-FED + GraphQL. Natural language routed to the appropriate query language automatically.

Skill / Packaging Model

Anthropic
Per-team markdown skill files in code repos with CI maintenance, sync infrastructure, and PR-based governance.
data-twingler
Single reusable SKILL.md loaded by any agent that supports the SKILL.md protocol. Zero build system. Drop-in deployment.

Entity Resolution Strategy

Anthropic
Human-curated canonical datasets + governed semantic layer + tooling enforcement (CI hooks, code review mandates).
data-twingler
Vector search against existing KGs + template-based routing (T1–T8) + Linked Data hyperlinks. Leverages existing KG infrastructure rather than requiring curated semantic layers.

Provenance & Trust

Anthropic
Provenance footer appended to responses: source tier, freshness, owner. "Raw table, freshness unknown" signals caution.
data-twingler
Inherent provenance via hyperlinked entity IRIs. Every result carries resolvable URIs (linkeddata.uriburner.com/describe/) to source entities. Provenance is structural, not cosmetic.

Accuracy Approach

Anthropic
~95% accuracy via offline evals (dashboard + long-tail), adversarial review (+6% accuracy), ablation at PR granularity, active correction harvesting.
data-twingler
Accuracy via template pre-gate (no ad-hoc queries until templates exhausted), local-first grounding in known RDF, semantic variant retries (3 rephrasings), multiple fallback endpoints.

Cross-Platform Reusability

Anthropic
Skills synced to plugin marketplace, cloud-storage blobs, MCP resources. Requires sync infrastructure per surface.
data-twingler
Single SKILL.md loaded by any agent supporting the protocol. No sync infrastructure. Works in Claude Code, OpenCode, Grok CLI, Codex, and any SKILL.md-compatible environment.

Staleness Defense

Anthropic
CI hooks flag model changes missing skill updates, colocated artifacts in single repo, freshness checks, active correction harvesting.
data-twingler
Default endpoint returns live KG data (linkeddata.uriburner.com/sparql). No stale local model problem. Local RDF can be regenerated on demand from source.

Explore KG Using SQL — SPASQL 🔗

data-twingler's SPASQL (SPARQL inside SQL) enables exploring Knowledge Graphs using familiar SQL SELECT ... FROM (SPARQL ...) AS syntax. This is a capability Anthropic's SQL-only approach cannot match.

SELECT movie FROM (SPARQL PREFIX dbr: <http://dbpedia.org/resource/> PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?movie WHERE { SERVICE <http://dbpedia.org/sparql> { ?movie rdf:type dbo:Film ; dbo:director dbr:Spike_Lee . } } ) AS movies

▶ Run live query

S2 SPASQL: Local KG exploration with SQL
SELECT EntityID, EntityTypeID, kg FROM (SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) (?g AS ?kg) WHERE { GRAPH ?g { ?s a ?o } } GROUP BY ?o ?g HAVING (COUNT(*) > 20000) ORDER BY ASC(?g) DESC(?count) ) AS kgEntities

▶ Run live query

⚡ Why this matters

Anthropic's approach requires a governed semantic layer curated by humans to map business concepts to SQL tables. data-twingler's SPASQL lets users explore Virtuoso Knowledge Graphs using standard SQL SELECT syntax — the SPARQL is embedded as a subquery. This means anyone who knows SQL can immediately query KGs without learning SPARQL. The KG itself provides the entity disambiguation (via RDF types and relationships) that Anthropic needs a separate semantic layer to achieve.

SPARQL Query Patterns (data-twingler T1–T8) 🔗

data-twingler uses 8 predefined templates routed from trigger phrases. No query executes until template matching is attempted.

T1 Data Space Exploration
# Trigger: "Explore this Data Space" SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) (?g AS ?kg) WHERE { GRAPH ?g { ?s a ?o } } GROUP BY ?o ?g HAVING (COUNT(*) > 20000) ORDER BY ASC(?g) DESC(?count) LIMIT 50

▶ Run live query

T2 Specific KG Exploration
# Trigger: "Explore knowledge graph {G}" SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) WHERE { GRAPH {G} { ?s a ?o } } GROUP BY ?o ORDER BY DESC(?count) LIMIT 50
T6 Q&A with KG context (2-step)
# Trigger: "{Question}" with article/KG context # Step 1: Index query — discover questions in the graph SPARQL SELECT DISTINCT ?name WHERE { { GRAPH ?g { ?article (schema:name|schema:headline|schema:title) ?title ; (schema:hasPart|schema:mainEntity|schema:question) ?question. ?question a schema:Question; schema:name ?name. } } UNION { GRAPH ?g1 { ?article schema:name|schema:headline ?title ; (schema:hasPart/schema:mainEntity) ?question. ?question a schema:Question; schema:name ?name. } } }

Step 2 retrieves the schema:acceptedAnswer for the matched question.

Key Insight 🔗

💡 Complement, Not Replace

Both approaches recognize the same core challenge: mapping a user's natural language question to the correct entity in the data model. The difference lies in how they solve it:

Anthropic: Human-curated governance data-twingler: KG-native vector + template routing Anthropic: SQL-only data-twingler: SQL + SPARQL + SPASQL + GraphQL Anthropic: CI/PR infrastructure data-twingler: Drop-in SKILL.md

data-twingler's approach is particularly suited for environments with existing Knowledge Graph infrastructure (Virtuoso, URIBurner) where entity relationships are already structured and resolvable via Linked Data IRIs. Anthropic's approach is designed for organizations building analytics capabilities from scratch with SQL warehouses and need a governed semantic layer as the authoritative surface.

The data-twingler's multi-language capability (SPARQL, SPASQL, SPARQL-FED, GraphQL beyond SQL) and inherent Linked Data provenance (every result hyperlinked to resolver URIs) are capabilities that go beyond what Anthropic's SQL-only, provenance-footer approach provides.

Frequently Asked Questions 🔗

1) Concept-Entity Ambiguity — the agent cannot map a user's question to the correct fields. 2) Data Staleness — business definitions change, agent knowledge goes stale. 3) Retrieval Failure — the right information exists but the agent cannot find it in the vast search space. Anthropic addresses these with a 4-layer stack: data foundations → sources of truth → skills → validation.
data-twingler supports SPARQL, SPASQL (SPARQL embedded in SQL), SPARQL-FED (federated queries across endpoints), and GraphQL — in addition to plain SQL. Anthropic's approach is SQL-only, routed through a semantic layer. This means data-twingler can query RDF knowledge graphs directly, federate across multiple SPARQL endpoints, and use GraphQL for API-driven data access.
Anthropic relies on human-curated canonical datasets and a governed semantic layer with tooling enforcement (CI hooks, code review mandates). The semantic layer is the mandatory first path. data-twingler uses local-first vector search against existing Knowledge Graphs, template-based routing (T1–T8), and Linked Data hyperlinks (URIBurner resolver) for entity denotation. The KG provides inherent disambiguation via structured RDF relationships — entities are already connected, typed, and described in the graph.

Glossary 🔗

Semantic Layer

A compiled set of metric and dimension definitions that maps business concepts to governed data entities. In Anthropic's stack, agents are structurally required to query the semantic layer first before falling back to raw SQL.

Skill (Claude Code)

A folder of markdown that Claude Code reads on demand, encoding procedural knowledge: which sources to consult in what order, how to navigate ambiguous data, and what a finished analysis looks like.

SKILL.md

A standardized markdown file format for defining reusable AI agent skills. Any agent environment that supports the SKILL.md protocol can load and execute the skill. Used by data-twingler and other skills in the ai-agent-skills repository.

KG-Hybrid Query Modality

A dual-path search approach combining keyword full-text search (bif:contains) and server-side vector similarity (vvec:cosine_similarity_openai) against the same knowledge graph endpoint. Used by data-twingler for Graph IRI Discovery.

Provenance Footer

A metadata footer appended to every analytics response containing source tier (semantic layer vs curated reference vs raw table), data freshness date, and owning team. Used by Anthropic to help consumers judge response trustworthiness.

Linked Data Entity Denotation

The practice of hyperlinking every entity identifier in query results to a resolvable URI via a Linked Data resolver (e.g., linkeddata.uriburner.com/describe/?uri=…). Used by data-twingler for inherent provenance — provenance is structural, not cosmetic.

Knowledge Graph 🔗

Entities and relationships in the Anthropic vs. data-twingler analytics landscape.

Organizations Concepts Technologies

Explore Knowledge Graph using SPARQL 🔗

Query the RDF knowledge graph via URIBurner's SPARQL endpoint. Select a recipe or write your own query.

▶ Explore Knowledge Graph using SPARQL