Skip to main content

Definitions of common data.world terms

Name

Description

Summary

Administrator

The person in an organization who can manage organization members and access levels, and access all data sets and projects owned by the organization (even private ones).

API

Application Program Interface

A set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components.

Bookmarks

You can add a bookmark to any dataset or project that interests you, whether or not it is owned by you or your organization. Search is enabled in your bookmarks section to help you quickly find datasets or projects. If your data project is bookmarked, you can think of it as similar to a "like" on Facebook.

CC BY-NC

Creative Commons Attribution-NonCommercial 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share and adapt your dataset if they give credit to you and do not use your dataset for any commercial purposes.

CC BY-NC-ND

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share only your unmodified dataset if they give credit to you and do not share it for commercial purposes. Users cannot make any additions, transformations or changes to your dataset under this license.

CC BY-NC-SA

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share your dataset only if they (1) give credit to you, (2) do not use your dataset for any commercial purposes, and (3) distribute any additions, transformations or changes to your dataset under this license. We consider this license a viral license since users will need to share their work on your dataset under this same license and any users of the adapted dataset would likewise need to share their work on the adapted dataset under this license and so on for any other changes to those modified datasets.

CC BY-ND

Creative Commons Attribution-NoDerivatives 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share your dataset if they give credit to you, but they cannot make any additions, transformations or changes to your dataset under this license.

CC-0

Creative Commons Public Domain Dedication

This license is one of the open Creative Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

CC-BY

Creative Commons Attribution 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

CC-BY-SA

Creative Commons Attribution-ShareAlike 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformations or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your CC-BY-SA licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

CDLA-Permissive-2.0

Community Data License Agreement – Permissive, Version 2.0

Community Data License Agreement – Permissive, Version 2.0This Community Data License Agreement is similar to permissive open source licenses such as the MIT license. It allows users to use, modify and adapt your dataset and the data within it, and to share it. The CDLA-Permissive-2.0 terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data. The 2.0 version is significantly shorter, uses plain language to express the grant of permissions and requirements. The only obligation is to "make available the text of this agreement with the shared Data," including the disclaimer of warranties and liability.

CDLA-Sharing-1.0

Community Data License Agreement – Sharing, Version 1.0

This license is one of the Community Data License Agreement licenses and was designed to embody the principles of "copyleft" in a data license. It allows users to use, modify and adapt your dataset and the data within it, and to share the dataset and data with their changes so long as they do so under the CDLA-Sharing and give credit to you. The CDLA-Sharing terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

Classroom

A classroom is a type of organization you can set-up in data.world so you and your students can upload datasets, create projects, discuss, and share insights. A classroom includes unlimited private projects & datasets, 1GB per project/dataset, & up to 100 members, so it's a perfect way to collaborate with any group that needs to learn together.

Community

The data.world community includes every person who uses the platform whether enterprise, educational, or individual.

Content contributor

A Content Contributor is a person in an organization who can create and interact with the organization's projects and datasets.

Contributor

A Contributor is a person who is invited to access a dataset or project. Contributor permissions can be set to Discover only, View only, Edit (view and edit), or Manage (view, edit, and manage).

Crowdsourced data

An organization can be configured so that an individual outside the organization can propose that the organization own a dataset created by the individual. Datasets created in this way are called crowdsourced data.

CSV

Comma-Separated-Value is a file format used to transform text into tables. Commas are used to separate the data into columns of the same data type, and paragraph breaks are used to separate it into records or rows.

Data

Data is just information, and it can take many forms from images to spreadsheets. Data in data.world can be in any file format.

Database

A structured set of data held in a computer, especially one that is accessible in various ways.

Data dictionary

The data dictionary contains all the metadata (data about the data) for the files, tables and columns in a dataset. For all files it contains:

The names of all the files in the dataset, a place to add descriptions for each file, and the labels for each file. For tabular files it has: The column names, the format of the data in each column, and a place to add a description for each column.

Data inspector

When data is ingested into data.world the Data Inspector evaluates it to rapidly diagnose issues with it. The inspector does not examine data brought in through a live connection, only data uploaded to data.world

Graph database

A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

Integration

An application or program that connects to data.world in order to transport, manipulate, sync, or share data and analyses of the data.

JSON

JavaScript Object Notation

JSON (pronounced jay-saun) is a language-independent, open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).

license

data.world allows you to specify how you allow data you own to be used by others.

license type

By providing a license, you are setting expectations about how you want your data to be used. You can think of a license as the Terms of Use for your data.

Markup language

A markup language is a computer language that uses tags to define elements within a document. It is human-readable, meaning markup files contain standard words, rather than typical programming syntax. The two most common mark-up languages are HTML and XML.

Metadata catalog

An organized list containing all the information about your data resources. For example, the source, the type, the location, the owner, the update and creation dates, descriptions of the resource, etc.

Metamap

A graph-based data repository containing the metadata about all public datasets stored in data.world.

ODC-BY

Open Data Commons Attribution License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

ODC-ODbL

Open Data Commons Open Database License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformation or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your ODC-ODbL licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

OKTA

Cloud software that helps companies manage and secure user authentication into modern applications, and for developers to build identity controls into applications, website web services and devices. Provides secure identity management with Single Sign-On, Multi-factor Authentication and Lifecycle Management (Provisioning).

PDDL

Open Data Commons Public Domain Dedication and License

This license is one of the Open Data Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

Public API

The public API is used to create an integration or application with data.world. The API can also be used to get data out of data.world.

Public Domain

Public Domain License

The work has been dedicated to the public domain by waiving all rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

Query

A statement written to retrieve information from a dataset on data.world. Queries can be written in SQL or SPARQL.

RDF

Resource Description Framework

RDF represents information using semantic triples, which comprise a subject, predicate, and object. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.

RDF triple store

An RDF triple store is similar to a graph database and stores information in semantic triples. It is accessed and manipulated using the SPARQL query language.

SAML

Security Assertion Markup Language

An open standard for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider. SAML enables Single-Sign On (SSO)

Share-alike license

If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

SSO

Single Sign-on

a property of access control of multiple related, yet independent, software systems. With this property, a user logs in with a single ID and password to gain access to any of several related systems.

Streams

Streams are a type of input (jsonl) that allows you to update and append records to a data file on data.world instead of having to re-upload the entire file when changes need to be made.

Summary

The summary is one of two documents created with a dataset or project. The summary is where all of the information about the origin of the data, why you created the dataset, further documentation of your work, etc. is found. Use the Summary section to tell your data's story.

Triple

AKA Semantic triples

A triple is a set of three entities that arranges a statement about semantic data in the form of subject–predicate–object expressions. Each item in the triple is expressed as a Web URI.

TTL or Turtle

Terse RDF Triple Language

Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the RDF data model. Turtle syntax is similar to that of SPARQL. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.