Skip to main content

Documentation

Data dictionary

The data dictionary contains all the metadata (data about the data) for the files, tables and columns in a dataset. For all files it contains:

  • The names of all the files in the dataset

  • A place to add descriptions for each file

  • The labels for each file

and for tabular files it has:

  • The column names

  • The format of the data in each column

  • A place to add a description for each column

You can get to the data dictionary either from the Overview tab (right below the Summary) or from the Documents section in the left pane of the workspace:

Screen_Shot_2018-12-07_at_3.48.48_PM.png
Screen_Shot_2018-12-08_at_2.47.46_PM.png

Data dictionary entries for each file are edited separately by selecting the Edit link next to the filename in the data dictionary document. Every file--no matter what type--has a data dictionary entry which contains the file metadata for the file:

Screen_Shot_2018-09-27_at_4.58.18_PM.png

Tabular files also have optional advanced settings and csv settings additional options in their file metadata:

Screen_Shot_2019-04-15_at_12.22.41_PM.png

The Authentication setting allows you to specify password, token, or OAuth parameters if the source URL requires authentication. The Headers setting is to specify options to modify the response from the URL, e.g., to specify a file content type. The Post body setting enables you to switch the request method from GET to POST if the source URL requires it.

The CSV settings section manages how your comma separated value format files are handled. To access it, select Show to the right of the section:

Screen_Shot_2019-04-15_at_12.37.52_PM.png

Tabular files also have a tab for columnar metadata in their data dictionary where you can rename the columns, change their format, and add descriptions for them:

Screen_Shot_2018-09-27_at_5.04.59_PM.png

Changing column names and adding a description is a great way to avoid the ambiguity that comes from having multiple columns with the same name. It also renders obscure column names understandable.

Changes to column names, descriptions, and data types propagate throughout data.world to every project that references the dataset, and the changes remain even if the data is updated from an external source.