Skip to main content

Documentation

Verifying your data with data inspectors

When you ingest a tabular data file on data.world it is run through a series of inspections to validate both the structure and content of the data in the file. If issues are found, the file is flagged and users can take action to fix them.

The warning flag can be found on the Dataset details page under the name of the file:

Screen_Shot_2018-12-30_at_4.25.37_PM.png

You can also see the indicator in the About this file section as Inspections on the dataset or project workspace for the file:

Screen_Shot_2018-12-30_at_5.39.17_PM.png

Types of indicators

Issues are indicated using two indicators:

  • Warnings indicated by yellow triangle : By far the most common, yellow triangles are there to alert you to potential problems with the data that might affect your ability to query it, or warn you that sensitive data (social security numbers, phone numbers, email addresses, etc.,) was detected.

  • Sever errors indicated by red circle: Very occasionally you will get a red flag which indicates that there was an error on ingest and data from the original file was lost. Possible reasons for the loss of data include:

    • The original file is corrupt.

    • There was a data type mismatch between the data type identified for the column and the data stored in it.

    • Data that you choose to connect to a specified linked data class had values that didn't match the linked data.

For a list of warnings and errors, see Data Inspections

Reviewing warnings and errors

  1. When you click the warning or error indicator, a window opens that lists all the issues captured by the data inspector. Review the issues and decide if you need to fix them or click Dismiss if you do not wish to be notified about them.

    Important

    Once a set of warnings are dismissed, they will not show up in the file again even if you delete and reimport the file or update it. The ONLY way to get a list of all the warnings back is to ingest it again with a different name.

  2. If you wish to correct the issues with files that were originally added to data.world by a direct add, you can download the file and make the corrections and re-upload the file.

  3. For files that are synchronized from external services (such as cloud storage services), you will need to update the file in the source system and click the Sync now button to sync the changes.

Sometimes changes made to the data dictionary can cause error warnings in the data. For example, after ingesting a file, if you accidentally convert a String column to Integer, it will immediately cause a red flag on the inspections as some of the data would be left out on re-ingest due to a datatype mismatch: