Anthropic Self-Service Analytics with Claude

Overview 🔗

Anthropic's Data Science team describes how they enable self-service business analytics using Claude Code, achieving ~95% automation of business queries at ~95% accuracy. Their approach centers on a 4-layer agentic analytics stack designed to address three failure modes that plague LLM-driven data querying.

This infographic presents their approach and compares it with the data-twingler skill from the ai-agent-skills repository — a reusable SKILL.md that any agent environment can load, offering SQL, SPARQL, SPASQL, SPARQL-FED, and GraphQL via natural language, with inherent Linked Data provenance.

Three Failure Modes (Anthropic) 🔗

Anthropic identifies these as the root causes of inaccurate analytics responses:

🔀

Concept-Entity Ambiguity

With hundreds of viable fields, the agent cannot map a user's question to the correct fields. E.g., "active users" — what actions count? What lookback window?

⏳

Data Staleness

Data sources, business definitions, and schemas change constantly. Agent knowledge goes stale, returning subtly wrong answers.

🔍

Retrieval Failure

The right information exists in the data model but the agent cannot find it given the vastness of the search space.

Anthropic's 4-Layer Analytics Stack 🔗

1

Data Foundations SQL-only

Canonical datasets, enforced standards via tooling/CI/mandate, colocated artifacts (data code + semantic layer + dashboards in one repo), metadata as first-class product. Dimensional modeling and shift-left testing remain essential.

2

Sources of Truth Semantic layer mandatory

Semantic layer (compiled metrics, mandatory first path), lineage/transformation graph, distilled query corpus, business context (company knowledge graph). Agents are structurally routed to governed metrics first.

3

Skills Markdown + CI

Pairwise skills (knowledge router + unbook procedure) with LLM-oriented reference docs. Skill maintenance colocated with data model changes. Synced across surfaces (Slack, IDE, dashboards, standalone sessions).

4

Validation Evals + adversarial

Offline evals (dashboard + long-tail), ablation at PR granularity, online adversarial review, provenance footer, passive monitoring, active correction harvesting. Gate launches per domain at ~90% eval threshold.

data-twingler Approach 🔗

1

Local-First Vector Search Step 0

Before any endpoint call, scan local RDF directories, extract candidates (schema:Question, schema:DefinedTerm, schema:HowTo, skos:Concept), embed the user's prompt, return cosine-similarity matches above 0.75 threshold. Inverts the traditional remote-first workflow.

2

KG-Hybrid Graph Discovery bif:contains + vvec

Two parallel search strategies: keyword (bif:contains) as primary, server-side vector similarity (vvec:cosine_similarity_openai) as fallback. Semantic variant retry — up to 3 rephrasings before escalating to fallback endpoints.

3

Predefined Template Matching T1–T8

8 templates matched to trigger phrases: T1 (data space), T2 (KG), T3 (KG+inference), T4 (federated), T5 (HowTo), T6 (Q&A), T7 (DefinedTerm), T8 (entity description). No query executes until template matching is attempted.

4

Multi-Language Execution SQL + SPARQL + more

6 execution modes: direct curl → URIBurner REST functions → OAuth2 → MCP → chatPromptComplete → OPAL Agent. Covers SQL, SPARQL, SPASQL, SPARQL-FED, and GraphQL. Default endpoint: linkeddata.uriburner.com/sparql.

5

Linked Data Entity Denotation Inherent provenance

SELECT movie FROM (SPARQL PREFIX dbr: <http://dbpedia.org/resource/> PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?movie WHERE { SERVICE <http://dbpedia.org/sparql> { ?movie rdf:type dbo:Film ; dbo:director dbr:Spike_Lee . } } ) AS movies

▶ Run live query

S2 SPASQL: Local KG exploration with SQL ▼

SELECT EntityID, EntityTypeID, kg FROM (SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) (?g AS ?kg) WHERE { GRAPH ?g { ?s a ?o } } GROUP BY ?o ?g HAVING (COUNT(*) > 20000) ORDER BY ASC(?g) DESC(?count) ) AS kgEntities

▶ Run live query

⚡ Why this matters

Anthropic's approach requires a governed semantic layer curated by humans to map business concepts to SQL tables. data-twingler's SPASQL lets users explore Virtuoso Knowledge Graphs using standard SQL SELECT syntax — the SPARQL is embedded as a subquery. This means anyone who knows SQL can immediately query KGs without learning SPARQL. The KG itself provides the entity disambiguation (via RDF types and relationships) that Anthropic needs a separate semantic layer to achieve.

SPARQL Query Patterns (data-twingler T1–T8) 🔗

data-twingler uses 8 predefined templates routed from trigger phrases. No query executes until template matching is attempted.

T1 Data Space Exploration ▼

# Trigger: "Explore this Data Space" SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) (?g AS ?kg) WHERE { GRAPH ?g { ?s a ?o } } GROUP BY ?o ?g HAVING (COUNT(*) > 20000) ORDER BY ASC(?g) DESC(?count) LIMIT 50

▶ Run live query

T2 Specific KG Exploration ▼

# Trigger: "Explore knowledge graph {G}" SPARQL SELECT (SAMPLE(?s) AS ?EntityID) (COUNT(*) AS ?count) (?o AS ?EntityTypeID) WHERE { GRAPH {G} { ?s a ?o } } GROUP BY ?o ORDER BY DESC(?count) LIMIT 50

T6 Q&A with KG context (2-step) ▼

# Trigger: "{Question}" with article/KG context # Step 1: Index query — discover questions in the graph SPARQL SELECT DISTINCT ?name WHERE { { GRAPH ?g { ?article (schema:name|schema:headline|schema:title) ?title ; (schema:hasPart|schema:mainEntity|schema:question) ?question. ?question a schema:Question; schema:name ?name. } } UNION { GRAPH ?g1 { ?article schema:name|schema:headline ?title ; (schema:hasPart/schema:mainEntity) ?question. ?question a schema:Question; schema:name ?name. } } }

Step 2 retrieves the schema:acceptedAnswer for the matched question.

Key Insight 🔗

💡 Complement, Not Replace

Both approaches recognize the same core challenge: mapping a user's natural language question to the correct entity in the data model. The difference lies in how they solve it:

Anthropic: Human-curated governance data-twingler: KG-native vector + template routing Anthropic: SQL-only data-twingler: SQL + SPARQL + SPASQL + GraphQL Anthropic: CI/PR infrastructure data-twingler: Drop-in SKILL.md

data-twingler's approach is particularly suited for environments with existing Knowledge Graph infrastructure (Virtuoso, URIBurner) where entity relationships are already structured and resolvable via Linked Data IRIs. Anthropic's approach is designed for organizations building analytics capabilities from scratch with SQL warehouses and need a governed semantic layer as the authoritative surface.

The data-twingler's multi-language capability (SPARQL, SPASQL, SPARQL-FED, GraphQL beyond SQL) and inherent Linked Data provenance (every result hyperlinked to resolver URIs) are capabilities that go beyond what Anthropic's SQL-only, provenance-footer approach provides.

Frequently Asked Questions 🔗

What are the three failure modes of analytics agents according to Anthropic? ▼

1) Concept-Entity Ambiguity — the agent cannot map a user's question to the correct fields. 2) Data Staleness — business definitions change, agent knowledge goes stale. 3) Retrieval Failure — the right information exists but the agent cannot find it in the vast search space. Anthropic addresses these with a 4-layer stack: data foundations → sources of truth → skills → validation.

What query languages does data-twingler support that Anthropic's approach does not? ▼

data-twingler supports SPARQL, SPASQL (SPARQL embedded in SQL), SPARQL-FED (federated queries across endpoints), and GraphQL — in addition to plain SQL. Anthropic's approach is SQL-only, routed through a semantic layer. This means data-twingler can query RDF knowledge graphs directly, federate across multiple SPARQL endpoints, and use GraphQL for API-driven data access.

How does entity resolution differ between the two approaches? ▼

Anthropic relies on human-curated canonical datasets and a governed semantic layer with tooling enforcement (CI hooks, code review mandates). The semantic layer is the mandatory first path. data-twingler uses local-first vector search against existing Knowledge Graphs, template-based routing (T1–T8), and Linked Data hyperlinks (URIBurner resolver) for entity denotation. The KG provides inherent disambiguation via structured RDF relationships — entities are already connected, typed, and described in the graph.

Glossary 🔗

Semantic Layer

A compiled set of metric and dimension definitions that maps business concepts to governed data entities. In Anthropic's stack, agents are structurally required to query the semantic layer first before falling back to raw SQL.

Skill (Claude Code)

A folder of markdown that Claude Code reads on demand, encoding procedural knowledge: which sources to consult in what order, how to navigate ambiguous data, and what a finished analysis looks like.

SKILL.md

A standardized markdown file format for defining reusable AI agent skills. Any agent environment that supports the SKILL.md protocol can load and execute the skill. Used by data-twingler and other skills in the ai-agent-skills repository.

KG-Hybrid Query Modality

A dual-path search approach combining keyword full-text search (bif:contains) and server-side vector similarity (vvec:cosine_similarity_openai) against the same knowledge graph endpoint. Used by data-twingler for Graph IRI Discovery.

Provenance Footer

A metadata footer appended to every analytics response containing source tier (semantic layer vs curated reference vs raw table), data freshness date, and owning team. Used by Anthropic to help consumers judge response trustworthiness.

Linked Data Entity Denotation

The practice of hyperlinking every entity identifier in query results to a resolvable URI via a Linked Data resolver (e.g., linkeddata.uriburner.com/describe/?uri=…). Used by data-twingler for inherent provenance — provenance is structural, not cosmetic.

Knowledge Graph 🔗

Entities and relationships in the Anthropic vs. data-twingler analytics landscape.

Organizations Concepts Technologies

Explore Knowledge Graph using SPARQL 🔗

Query the RDF knowledge graph via URIBurner's SPARQL endpoint. Select a recipe or write your own query.

▶ Explore Knowledge Graph using SPARQL