Skip to main content

Match and extend your data

After uploading data to data.world, if your data contains a common, general type of information that data.world recognizes, we will suggest related information that you can use to enhance your data and research. As data.world processes your data, we will analyze each field's data type and format in an attempt to match it to any known data types within the data.world system. When a potential match is discovered, it will be indicated by an interactive green triangle displayed against that field:

360007578694-mceclip5.png

Clicking the corner will present a menu which indicates that potential matches exist for this column of data.

state_data.png

Match suggestions modal

Clicking on the highlighted menu item will presents a list of potential matches for this column. In many cases, this will only be a single item, although it is possible to have multiple matches as shown below.

360007578734-mceclip7.png

Related columns

Adding a matched column will sometimes cause additional column choices to become available. In the above case, by selecting to include the "us_state" matched column, the user will be presented with a number of other "Related" columns. Related columns can be added to your table as well to provide additional levels of aggregation. Related columns generally represent containing entities. So in the case of States, Census Divisions and Regions contain the state:

360007578754-mceclip8.png

Clicking "Add related column" will add this information to your table. Also note that columns can be renamed after adding them to your table using the pencil icon.

360007635693-mceclip9.png

Updated table

Upon clicking "Done" the file will be reprocessed and new columns will be inserted. This could take a few moments depending on the size of the table, although it should generally be pretty quick.

360007578814-mceclip10.png

The reprocessed table now contains the selected columns along with all of the original data columns. Note that if a new version of the original file is uploaded (with additional rows of data, or additional columns), the file will automatically be matched using the same algorithm.

360007635713-mceclip11.png

Informational match popup

Now that we have our match columns, we can gain some additional context about these rows. Clicking on the blue "bubbles" will being up a small popup with some additional information.

360007578834-mceclip12.png

While these pop-ups only provide some basic information today, we will be working to flesh these out over time. Our ultimate goal is for these to contain complete descriptions, maps, flags, population counts, etc.

Removing matched columns

Match columns may be removed at any time from the column header menu:

360007659674-mceclip0.png

Query aggregation (advanced)

By including the containing information in this table, we now have additional data that we can aggregate across. Let's jump in and see how many states our data has in each of the various census districts. First, choose to "Query this file"

360007635733-mceclip14.png

A simple aggregation sql query across this column tells us that we have two states in the Pacific Census Division.

360007635433-mceclip3.png

Query join (advanced)

If you have two tables, both of which have been matched to the same class, then you can easily join those tables using SQL. In the following example, two tables which have been matched to states will join on the entity by default.

360007635453-mceclip4.png

Match logic

Matches are discovered based on the content of the column, not the name of the column.

Example: If a column contains values such as 78703, 78731, 00501, and 24151, then it will be recognized as a Zip Code, even if the column is not named zipcode or postalcode.

Available matches:

You can find the full up-to-date list of matches in the ddw/ontology-v0 dataset. Specifically, this query gives a full accounting of available matches.