Community docs

Evaluating data from search results

If you're looking for search syntax and terminology see our article on advanced search.

This article is about qualifying the results you get from a search to see if they are appropriate for use in your project. To find the answers to general search questions, or to learn how to use our search bar, see the article on finding data resources. To learn how to restrict your search results using filters or facets see the article on filtering search results.

Finding data is only the first step to using it. With all the information available it can be difficult to know what's good data and what's not. Some of the criteria you might use to qualify the data you find as appropriate for your purpose are its:

  • Certification (status)

  • Format

  • Provenance (source)

  • Recency

  • For datasets: Popularity (frequency of citation), and content

As was discussed in the article How to find data resources, every resource card shows the format of the resource in the upper left corner of the card. However the dataset and project resource cards also both show the provenance, recency and popularity of the data, and the certification of the data if it exists. For a detailed explanation of the data on a resource card see the article on finding data resources.

Certification

If you are in an organization that certifies its data by assigning a status you will see information on the resource card indicating the status of the resource. Default statuses are Pending, Approved, Deprecated, Rejected, and Warning and are flagged with an icon on the search result:

Image_2019-10-08_at_10.30.27_AM.png Further information on the certification is provided in the summary section of the data resource overview:

Image_2019-10-08_at_10.32.43_AM.pngCertifications are also customizable.

Format

Using just a list of data resource names to determine appropriate resources is not very useful when there are multiple screen s of results to sift through. When you are looking for tabular data or datasets, e.g., it's handy to be able to both see at a glance the format of the data resource:

Image_2019-10-08_at_10.50.33_AM.png

and also to be able to restrict your search so you only see certain types of resources. See the article on filtering search results for more information about restricting your search results. If you do not recognize the format of a data resource by its icon, there is a guide to all of the icon types used by data.world.

Provenance

If a document doesn't have a certification flag, another way of determining its quality is to look at the provenance. If the source is an official agency or person/organization known to be credible, it's a good bet the data is sound. There are two components to provenance in data.world: The creator of the resource and the owner. Both are indicated on the results card:

Image_2019-10-08_at_11.55.23_AM.png

Recency

For data from a continually changing source (e.g., a company's sales records in a dataset), knowing when the data was last updated can also help you decide between different representations of the same data. In this example both datasets are from California Health and Human Services, both are concerned with tobacco sold to minors, but the first one was updated 24 days ago and the other was last updated three years ago:

Image_2019-10-08_at_12.10.49_PM.png

Popularity and content

For resource types which are datasets there are two additional indicators of the potential value of the resource: Frequency of use and content. Frequency tells you how many projects use that dataset, and content shows you the number of files and tables in the dataset:

Image_2019-10-08_at_12.21.16_PM.png