Skip to main content

XML

data.world supports the upload of data in the XML data format which not only allows for uploading information stored in XML files, but also enables data to be synchronized from web services which return their data as XML. In this way, data from web services can be queried using SPARQL

For all XML files, we apply a mapping from XML to RDF.  data.world creates rdf triples from each XML document in the namespace https://xml2rf.data.world/ (abbreviated as x2r:). Every XML document is an instance of x2r:Document. The filename of the document is referenced as the dwo:name of the instance. The dwo:name is the name predicate for the data.world ontology. Its full unprefixed form is https://ontology.data.world/v0#name. There is also a triple with the predicate x2r:topLevel that points to the top-level element in the document.

Components in the XML document are represented as either:

  • x2r:Element instances - These map to XML elements.

  • x2r:TextNode instances - These map to XML text.

  • Blank nodes representing XML attributes

For each element, data.world generates triples for:

  • The local name of the tag of the element using predicate x2r:tag

  • The namespace of the tag of the element using predicate x2r:xmlns

  • A pointer to the element’s parent using predicate x2r:parent

  • A pointer to the top-level document containing the element using predicate x2r:containedIn

  • RDF lists by tag of child elements contained in the element

  • An RDF list of the child text nodes contained in the element

For each text node data.world generates triples for:

  • A pointer to the text node’s parent using predicate x2r:parent

  • A pointer to the top-level document containing the text node using predicate x2r:containedIn

  • The textual content of the text node using predicate x2r:content

For each attribute data.world generates triples for:

  • The local name of the tag of the attribute using predicate x2r:tag

  • The namespace of the tag of the attribute using predicate x2r:xmlns

  • The value of the attribute using predicate x2r:value

This mapping structure allows for a quite idiomatic model for querying against XML documents in SPARQL. All data in the XML model is available for querying directly, although the queries may be somewhat verbose.

Example - XML

The following example illustrates the use of the data.world generated triples to get data out of XML:

PREFIX : <https://ddw-doccorp.linked.data.world/d/sparql-xml-dataset/>
PREFIX x2r: <https://xml2rdf.data.world/>

SELECT ?abbreviation ?passengers
{
    ?airportNode         a           x2r:Element.
    ?airportNode         x2r:tag     "airport".
    ?abbreviationElement x2r:parent  ?airportNode.
    ?abbreviationElement x2r:tag     "abbreviation".
    ?abbreviationNode    x2r:parent  ?abbreviationElement.
    ?abbreviationNode    x2r:content ?abbreviation.
    ?passengersElement   x2r:parent  ?airportNode.
    ?passengersElement   x2r:tag     "passengers".
    ?passengersNode      x2r:parent  ?passengersElement.
    ?passengersNode      x2r:content ?passengers.
}

Run query

Here is what the query looks liike when run on data.world:

XML_example.png