Skip to main content

Definitions of common data.world terms

Name

Description

Summary

Administrator

The person in an organization who can manage organization members and access levels, and access all data sets and projects owned by the organization (even private ones).

API

Application Program Interface

A set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components.

Article

Documentation on data.world is broken up into four different types. One of those types is articles which are instructional for a specific task or feature, and are not hands-on.

Best practices

Best practices is a type of documentation which is instructional, not hands on, and recommends a specific way of doing something.

Bookmarks

You can add a bookmark to any dataset or project that interests you, whether or not it is owned by you or your organization. Search is enabled in your bookmarks section to help you quickly find datasets or projects. If your data project is bookmarked, you can think of it as similar to a "like" on Facebook.

Business glossary

A list of terms defined as they are used in your specific business environment.

Catalog

A catalog is an organized list of information.

CC BY-NC

Creative Commons Attribution-NonCommercial 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share and adapt your dataset if they give credit to you and do not use your dataset for any commercial purposes.

CC BY-NC-ND

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share only your unmodified dataset if they give credit to you and do not share it for commercial purposes. Users cannot make any additions, transformations or changes to your dataset under this license.

CC BY-NC-SA

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

This license is one of the most restrictive Creative Commons licenses. Users can share your dataset only if they (1) give credit to you, (2) do not use your dataset for any commercial purposes, and (3) distribute any additions, transformations or changes to your dataset under this license. We consider this license a viral license since users will need to share their work on your dataset under this same license and any users of the adapted dataset would likewise need to share their work on the adapted dataset under this license and so on for any other changes to those modified datasets.

CC BY-ND

Creative Commons Attribution-NoDerivatives 4.0 International

This license is one of the more restrictive Creative Commons licenses. Users can share your dataset if they give credit to you, but they cannot make any additions, transformations or changes to your dataset under this license.

CC-0

Creative Commons Public Domain Dedication

This license is one of the open Creative Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

CC-BY

Creative Commons Attribution 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

CC-BY-SA

Creative Commons Attribution-ShareAlike 4.0 International

This license is one of the open Creative Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformations or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your CC-BY-SA licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

CDLA-Permissive-2.0

Community Data License Agreement – Permissive, Version 2.0

Community Data License Agreement – Permissive, Version 2.0This Community Data License Agreement is similar to permissive open source licenses such as the MIT license. It allows users to use, modify and adapt your dataset and the data within it, and to share it. The CDLA-Permissive-2.0 terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data. The 2.0 version is significantly shorter, uses plain language to express the grant of permissions and requirements. The only obligation is to "make available the text of this agreement with the shared Data," including the disclaimer of warranties and liability.

CDLA-Sharing-1.0

Community Data License Agreement – Sharing, Version 1.0

This license is one of the Community Data License Agreement licenses and was designed to embody the principles of "copyleft" in a data license. It allows users to use, modify and adapt your dataset and the data within it, and to share the dataset and data with their changes so long as they do so under the CDLA-Sharing and give credit to you. The CDLA-Sharing terms explicitly do not impose any obligations or restrictions on results obtained from users’ computational use of the data.

Classroom

A classroom is a type of organization you can set-up in data.world so you and your students can upload datasets, create projects, discuss, and share insights. A classroom includes unlimited private projects & datasets, 1GB per project/dataset, & up to 100 members, so it's a perfect way to collaborate with any group that needs to learn together.

Columns

Data in tabular format is arranged into rows and columns. Columns represent data of the same type across all the records.

Community

The data.world community includes every person who uses the platform whether enterprise, educational, or individual.

Content contributor

A Content Contributor is a person in an organization who can create and interact with the organization's projects and datasets.

Contributor

A Contributor is a person who is invited to access a dataset or project. Contributor permissions can be set to Discover only, View only, Edit (view and edit), or Manage (view, edit, and manage).

Created and Updated Date

Created and updated are two operators which can be used to find datasets, projects, insights, users and organizations based on the date they were added or last updated. Timestamps are set in UTC, not your local time, so you might get results that are a day off of your local time depending on where you are:

Creator

The creator of a dataset or project is the individual who creates it. The creator can be different from the owner (see owner for more details). The distinction between owner and creator is important for organizations as the owner manages a resource with the same privileges as the creator, but owners can be changed (as personnel changes) while creator is a static entry.

Crowdsourced data

An organization can be configured so that an individual outside the organization can propose that the organization own a dataset created by the individual. Datasets created in this way are called crowdsourced data.

CSV

Comma-Separated-Value is a file format used to transform text into tables. Commas are used to separate the data into columns of the same data type, and paragraph breaks are used to separate it into records or rows.

Data

Data is just information, and it can take many forms from images to spreadsheets. Data in data.world can be in any file format.

Database

A structured set of data held in a computer, especially one that is accessible in various ways.

Data dictionary

The data dictionary contains all the metadata (data about the data) for the files, tables and columns in a dataset. For all files it contains:

The names of all the files in the dataset, a place to add descriptions for each file, and the labels for each file. For tabular files it has: The column names, the format of the data in each column, and a place to add a description for each column.

Data inspector

When data is ingested into data.world the Data Inspector evaluates it to rapidly diagnose issues with it. The inspector does not examine data brought in through a live connection, only data uploaded to data.world

Data sources

A data source is any place you can get data from including databases, local files, cloud-based files, real-time sources like log files, SaaS data, URL's, a corporate network.

Dataset

Datasets are where all data is stored and documented for later sharing and use in projects. A dataset is the basic repository for data files and associated metadata, documentation, scripts, and any other supporting resources that should be stored alongside the data.

Description fields

Datasets, projects, all the files in each, and all the columns in any structured data files have description fields associated with them. Descriptions are very short and serve as a quick reference for the item they describe.

FAQ

Frequently Asked Question

A document format consisting of questions and answers.

Glossary

A glossary is an alphabetical list of terms or words found in or relating to a specific subject with explanations; a brief dictionary.

Graph database

A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

Insights

Findings, conclusions, and interesting points for discussion about a project are stored as insights in the project.

Integration

An application or program that connects to data.world in order to transport, manipulate, sync, or share data and analyses of the data.

JSON

JavaScript Object Notation

JSON (pronounced jay-saun) is a language-independent, open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).

Resources

Your resources are the datasets and projects owned by you or your organization(s).

license

data.world allows you to specify how you allow data you own to be used by others.

license type

By providing a license, you are setting expectations about how you want your data to be used. You can think of a license as the Terms of Use for your data.

Markup language

A markup language is a computer language that uses tags to define elements within a document. It is human-readable, meaning markup files contain standard words, rather than typical programming syntax. The two most common mark-up languages are HTML and XML.

Metadata

National Information Standards Organization (NISO), Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.

Metadata catalog

An organized list containing all the information about your data resources. For example, the source, the type, the location, the owner, the update and creation dates, descriptions of the resource, etc.

Metamap

A graph-based data repository containing the metadata about all public datasets stored in data.world.

ODC-BY

Open Data Commons Attribution License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you.

ODC-ODbL

Open Data Commons Open Database License

This license is one of the Open Data Commons licenses and allows users to share and adapt your dataset so long as they give credit to you and distribute any additions, transformation or changes to your dataset under this license. We consider this license (a.k.a a viral license) problematic since others may decide not to work with your ODC-ODbL licensed dataset if there is risk that by doing so their work on your dataset will need to be shared under this license when they would rather use another license.

OKTA

Cloud software that helps companies manage and secure user authentication into modern applications, and for developers to build identity controls into applications, website web services and devices. Provides secure identity management with Single Sign-On, Multi-factor Authentication and Lifecycle Management (Provisioning).

Organization

A group on data.world that you belong to which determines what data resources you can see and edit.

Owner

When a dataset or project is created the person creating it is the creator, but the owner can be designated as either the person who created it, one of the organizations in which the creator is a member, or an organization that accepts ownership proposals. The owner has all the same permissions for management and editing of the dataset or project that the creator has.

PDDL

Open Data Commons Public Domain Dedication and License

This license is one of the Open Data Commons licenses and is like a public domain dedication. It allows you, as a dataset owner, to use a license mechanism to surrender your rights in a dataset when you might not otherwise be able to dedicate your dataset to the public domain under applicable law.

Platform

The data.world application is also referred to as the platform.

Project

Projects are where all querying, analysis and discussion of data takes place in data.world. Data in different datasets can be used for many different projects, but each project contains all and only the data that is relevant for that project. The information in a project can come from datasets, files attached directly to the project, insights written by the project's team members about the data and the project, and discussions about the project.

Public API

The public API is used to create an integration or application with data.world. The API can also be used to get data out of data.world.

Public Domain

Public Domain License

The work has been dedicated to the public domain by waiving all rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

Query

A statement written to retrieve information from a dataset on data.world. Queries can be written in SQL or SPARQL.

Quick start guide

A quick-start guides is a short hands-on type of documentation derived from tutorials and designed to quickly get users comfortable with basic use of the data.world platform.

RDF

Resource Description Framework

RDF represents information using semantic triples, which comprise a subject, predicate, and object. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.

RDF triple store

An RDF triple store is similar to a graph database and stores information in semantic triples. It is accessed and manipulated using the SPARQL query language.

Reference

A type of documentation that includes tables, lists, glossaries, appendices, etc. It is informational, not instructional, in format and is not hands-on.

SAML

Security Assertion Markup Language

An open standard for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider. SAML enables Single-Sign On (SSO)

Share-alike license

If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

SPARQL

SPARQL Protocol and RDF Query Language

Pronounced "sparkle", SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in RDF format.

SQL

Structured Query Language

SQL is a language used to access and manipulate relational database management systems.

SSO

Single Sign-on

a property of access control of multiple related, yet independent, software systems. With this property, a user logs in with a single ID and password to gain access to any of several related systems.

Streams

Streams are a type of input (jsonl) that allows you to update and append records to a data file on data.world instead of having to re-upload the entire file when changes need to be made.

Summary

The summary is one of two documents created with a dataset or project. The summary is where all of the information about the origin of the data, why you created the dataset, further documentation of your work, etc. is found. Use the Summary section to tell your data's story.

Tag

Tags can be used to organize and group your dataset or project by topic, category, source, department, or team. They can be searched for explicitly with the tag search operator, and can also help to filter down more generic search results.

Team

A team is a group of people working on a project. A team could be an organization or a subset of an organization.

Title

The name of the dataset or project. Titles are accessible via search.

Triple

AKA Semantic triples

A triple is a set of three entities that arranges a statement about semantic data in the form of subject–predicate–object expressions. Each item in the triple is expressed as a Web URI.

TTL or Turtle

Terse RDF Triple Language

Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the RDF data model. Turtle syntax is similar to that of SPARQL. Turtle provides a way to group three URIs to make a triple, and provides ways to abbreviate such information, for example by factoring out common portions of URIs.

Tutorial

One of our four types of documentation is a tutorial. Tutorials are instructional, in depth, and hands-on. A variation on the tutorial is a quick start which is a shorter, derivative version of a tutorial.

URI

Uniform Resource Identifier

A string of characters that unambiguously identifies a particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. http://).The most common form of URI is the Uniform Resource Locator (URL), frequently referred to informally as a web address.

White paper

A high-level, but very technical document. It is informational, not instructional, in format and is not hands on.