Community docs

Verifying your data with data inspectors

When you ingest a tabular data file on data.world it is run through a series of inspections to validate both the structure and content of the data in the file. If issues are found, the file is flagged with a warning. Warnings are indicated by either a yellow triangle or a red circle, depending on the severity. The warning flag can be found on the dataset overview page under the name of the file:

Screen_Shot_2018-12-30_at_4.25.37_PM.png

or on the About this file section as Inspections on the dataset or project workspace for the file:

Screen_Shot_2018-12-30_at_5.39.17_PM.png

The number of warnings is listed to the right of the flag. By far the most common, yellow triangles are there to alert you to potential problems with the data that might affect your ability to query it, or warn you that sensitive data (social security numbers, phone numbers, email addresses, etc.,) was detected.

Very occasionally you will get a red flag which indicates that there was an error on ingest and data from the original file was lost. Possible reasons for the loss of data include:

  • The original file was corrupt.

  • There was a data type mismatch between the data type identified for the column and the data stored in it.

  • Data that you choose to connect to a specified linked data class had values that didn't match the linked data.

For a list of all the inspection warnings and errors, see the article Data Inspectors.

Whether you get a yellow warning or a red error, you have the option to correct it or ignore it. If you get yellow warnings, click on the flag for the warning dialog box view the warning types and locations. The dialog groups the errors by type so you can review them one kind at a time. Each type of warning is labeled with what kind of issue it is, how many were found, and the location of each. Some flags are for issues you already know about and don't wish to fix. Those warnings you can simply dismiss:

Screen_Shot_2018-12-30_at_4.30.22_PM.png

Note: Once you have dismissed a set of warnings it will not show up in the file again even--if you delete and reimport the file or update it. The ONLY way to get a list of all the warnings back is to delete the file and ingest it again with a different name.

If you wish to correct the issues with files that were originally added to data.world by a direct add, you can:

  1. download the file from data.world

  2. make the corrections (the locations in the warnings will help you find them)

  3. re-upload the file using the same name - by using the same name, you'll overwrite the file on data.world (as opposed to creating a new file, which would occur if you changed the name)

For files that are synchronized from external services (such as cloud storage services), you will need to:

  1. update the file in the source system

  2. either select the Sync now button from the details window:

    verifying-your-data-1.png

    or

    from the workspace, choose the Sync now button on the right sidebar:

    verifying-your-data-2.png

Sometimes changes that you make to the data dictionary will cause error warnings in the data. In the example below, one of the columns in the file being ingested holds the ages of shark-attack victims. Some of the values in the column are "20's", "30 or 40", etc. If I wanted to to restrict the data being imported to only integers so I could use arithmetic functions on it, I could go in to the data dictionary for the file after import and set the column type from string to integer. Doing this would immediately cause a red flag on the inspections as some of the data would be left out on re-ingest due to a datatype mismatch:

Screen_Shot_2018-12-30_at_6.11.05_PM.png

For a list of all the inspection warnings and errors, see the article Data Inspectors.