Skip to main content

JSON

data.world supports the upload of data in the JSON data format. Files with the JSON data format can be either .json files with the standard JSON serialization, or .yaml files using the alternative YAML serialization. In addition to allowing for uploading information stored in JSON files, storing the files in JSON format allows for data to be synchronized from web services that return their data as JSON. In this way data from web services can be queried using either SQL or SPARQL.

Where possible, data.world will attempt to infer a tabular structure from JSON data, and will create triples based on that structure. For JSON data which is fairly regular and shallow (like many log files), mapping to a tabular structure is very handy. But it can become unwieldy for more deeply nested and irregular JSON structures. For those, the tabular JSON mapping is lossy, flattening deeper structures of JSON into simple strings.

For all JSON files (tabular or not), there is also another more powerful direct mapping performed that is better for the more deeply nested and irregular structures. Using the https://json2rdf.data.world/ namespace (abbreviated as j2r:), data.world creates rdf triples from each JSON document as follows:

  • Every JSON document is an instance of j2r:Document (i.e, there is a triple of the form documentIri a j2r:Document where documentIri represents a placeholder for the IRI of the document.

  • The filename of the document is referenced as the dwo:name of the instance (i.e, there is a triple for the form documentIri dwo:name “document name”).

  • A triple with the predicate j2r:topLevel points to the top-level element  in the document.  This triple looks like documentIri j2r:topLevel elementIri.

Elements in the JSON document are represented in RDF as either:

  • j2r:Object instances, which map to JSON objects (maps of key-value pairs within {}

    • Each j2r:Object instance will have a triple elementIri a j2r:Object.

  • RDF Lists, which map to JSON arrays (lists of objects within []

  • RDF Literals, which map to literals as values in arrays or object values.

Every key in a JSON object gets defined as an rdf:Property. Those properties are defined in a child namespace of the namespace for the dataset or project by appending json-terms/ to the end of dataset or project namespace. The child namespace is commonly abbreviated as j: . What this means is that for each key in an object, there will be a triple objectiri j:termname keyValue. The key value may be either a literal, a list, or the iri of a contained object.

There is also a property j2r:term defined on the rdf:Property. which points to the string value of the JSON key.  That is to say, there is a triple defined as j:termname j2r:term termname. This triple enables dynamic term discovery for maps with open models.

Note

Note: data.world coins IRIs for each JSON element to guarantee uniqueness, but does not otherwise guarantee any particular structure of that IRI.  The IRIs should be used for identity only and not parsed for their constituent structure.

The structure of data.world's direct mapping allows for a quite idiomatic model for querying against JSON documents in SPARQL. All data in the JSON model is available for querying directly, and not flattened or lossy in any way.

Examples - JSON

Here is an example of a query showing how to access data using triples generated for JSON documents:

PREFIX : <https://ddw-doccorp.linked.data.world/d/sparql-json-dataset/>
PREFIX json2rdf: <https://json2rdf.data.world/>
PREFIX jsonterm: <https://ddw-doccorp.linked.data.world/d/sparql-json-dataset/json-terms/>

SELECT ?abbreviation ?elevation
{
    ?abbrp a             json2rdf:Property.
    ?abbrp json2rdf:term "abbreviation".
    ?q     ?abbrp        ?abbreviation.
    ?elevp a             json2rdf:Property.
    ?elevp json2rdf:term "elevation".
    ?q     ?elevp        ?elevation.
}

Run query

This is what the query looks like when run on data.world:

native_JSON.png

Here is an example of a query which shows how to access data using triples generated for tabular JSON documents:

PREFIX : <https://ddw-doccorp.linked.data.world/d/sparql-json-dataset/>

SELECT ?abbreviation ?elevation
{
    ?q a                            :tbl-elevations.
    ?q :col-elevations-abbreviation ?abbreviation.
    ?q :col-elevations-elevation    ?elevation.
}

Run query

This is what the query looks like when run on data.world:

Tabular_JSON.png