Architecture
The Virtuoso Sponger is the Linked Data middleware component of Virtuoso that generates Linked Data from a variety of data sources, supporting a wide variety of data representation and serialization formats. The Sponger is transparently integrated into Virtuoso's SPARQL Query Processor where it delivers URI de-referencing within SPARQL query patterns, across disparate data spaces. It also delivers configurable smart HTTP caching services. Optionally, it can be used by the Virtuoso Content Crawler to periodically populate and replenish data within the native Quad Store.
The Sponger is a fully fledged service that is also directly accessible via SOAP or REST interaction patterns. It is also integrated into Virtuoso's SPARQL query processor, Web Content Crawler and Smart WebDAV folders functionality. (Smart WebDAV folders are 'sink' folders which act as conduits for populating the Virtuoso Quad Store with structured data extracted from documents placed in them.)
As depicted below, OpenLink's broad portfolio of Linked-Data-aware products supports a number of routes for creating or consuming Linked Data. The Sponger provides a key platform for developers to generate quality data meshes from unstructured or semi-structured data sources.
Architecturally, the Sponger is comprised of a number of cartridges for specific data sources. Basically, each cartridge is the equivalent of a data access driver.
The Sponger supports two types of cartridge: Extractor and Meta Cartridges. Extractor cartridges handle raw data extraction and transformation, while Meta Cartridges handle lookups and joins across other Linked Data spaces and Web 2.0 style APIs. Both cartridge types are themselves comprised of data extractor and schema/ontology mapper components.
Cartridges are highly customizable. Custom cartridges can be developed using any language supported by the Virtuoso Server Extensions API, enabling structured Linked Data generation from resource types not available in the default Sponger cartridge collection bundled as part of the Virtuoso Sponger VAD package (rdf_mappers_dav.vad).
Why Is It Important?
A majority of the world's data naturally resides in non-Linked Data form at the current time. The Sponger delivers middleware that accelerates the boot-strapping of the Data Web by generating Linked Data from unstructured sources, unobtrusively. This "Swiss army knife" for on-the-fly Linked Data generation provides a bridge between the traditional Document Web and the Linked Data Web ("Data Web").
Sponging data from non-Linked Data sources and converting it to Linked Data exposes the data in a canonical form for querying and inference, and enables fast and easy construction of Linked Data driven mesh-ups as opposed to code driven Web 2.0 mash-ups.
How It Works
Designed with a customization-friendly plug-in architecture, the Sponger's core functionality is provided by data transformation drivers, known as Sponger Cartridges, that handle entity extraction, representation construction, metadata generation, and the creation of de-referenceable proxy (or wrapper) URIs. Cartridges may be written in Virtuoso Procedure Language, XSLT, PHP, Python, Java, etc.
There are currently two kinds of cartridges: Extractor and Metadata. Extractor cartridges run within the Virtuoso instance, performing initial data retrieval, data object construction, vocabulary/ontology mapping, and proxy URI creation. Metadata (or "Meta") cartridges, on the other hand, use bindings to external processes and/or third-party services to extract entities from content, and to handle additional data retrieval, data object construction, and data transformation.