Skip to main content

Dataset best practices

  • Create a dataset to house data, metadata, and supporting documentation when the data could be used in many different analysis projects. Create a project to house all of the work that goes into an analysis project, and link in the datasets to support your work rather than duplicating (see the article on datasets vs. projects for more information).

  • When uploading tabular data, we recommend a CSV file format over an Excel format (xlsx) as we can support larger file sizes for querying.

  • Remove any headers, footers, or notes outside of a single row of column headers from the data file. Include any removed content in the dataset summary or upload as a separate notes file within the dataset. Keeping the data file basic (machine readable over human readable) ensures data.world will import and analyze the data with the best accuracy.

  • Tag and document your data so that others will better understand and use the data.

  • Use the data inspector to verify your data has imported correctly and for a view into the data’s quality.

  • Ensure your dataset is within the data.world size limitations. You can also upload a zip file, and then attempt to extract it after upload if presented that option in the application. If you need data.world to support a larger dataset size for queries, please contact us and we'll do our best to accommodate.

  • Files within a dataset are displayed alphabetically, so if the files in your dataset should be displayed in a particular order, name them accordingly (01_*.xls, 02_*.pdf, etc.) or use the summary to take others through your data and analysis.

  • Search first to see if the same dataset has already been uploaded, and if so, consider collaborating or linking directly to that dataset rather than uploading a duplicate. There’s nothing wrong with uploading your own copy, but sharing through collaboration or direct linking will keep that data’s ‘story’ in one place.