Skip to main content

About data profiling

What is data profiling?

Data profiling is a process of creating high-level summaries about data content and quality that can aid in decision making, data trust, and understanding.

Once you have profiled your data, you can perform data quality checks.

  • The summary statistics generated via profiling help you understand and trust your data by providing a quick look at the data.

  • By viewing the summary statistics, you can quickly see if your data has quality issues like incomplete data, data with an incorrect format (e.g. a phone number with too few digits), or data outside of a normal range (For example. data with too high of a maximum value or too low of a minimum value).

  • Profiling is especially important for larger datasets where visual spot-checks are difficult – profiling helps automate your data quality routine, leading to more trust and better decision-making.

How do I setup data profiling in data.world?

There are two options for setting up data profiling:

For both features, the system creates summary statistics about the data, like the mean, minimum, and maximum values, and null-counts.