Community docs

How to get your data into data.world

There are several ways to get your data or metadata into data.world and there is no one right choice. There are benefits to each method, and which you choose will depend on several factors:

  • What format is your data in?

  • Where is your data currently located?

  • What is the size of your data?

  • How often does it update?

  • What are you going to do with it?

Databases or data warehouses

If you work with a JDBC-compatible database or warehouse, you can use our API to directly sync your data into a dataset. Below is a view of the databases supported by our API, and a current list can be found on our database integrations page.

Screen_Shot_2019-04-03_at_4.33.40_PM.png

If you wish to leave your data at rest in your existing database or data warehouse, whether it’s on-premises or in the cloud, our Enterprise Tier supports virtualized access to that data source.

Local files

The best way to get files from your computer onto data.world is to upload them directly into a dataset. Files uploaded from a computer cannot be automatically synced or updated, but you can manually push new versions up to your dataset, replacing the previous version, as needed. When a new version is uploaded, the older version is still available for auditability and versioning. More about uploading data from local files and versioning can be found in our article Adding data files.Adding files to a dataset

Cloud-based storage

Documents that are stored in cloud-based storage services (e.g., in Google Drive, Box, Dropbox or Amazon S3) can be easily added to data.world with one of our integrations and set to sync so that they update automatically:

Screen_Shot_2019-04-03_at_9.47.02_AM.png

As with manual updating, versions of files that are automatically updated are also kept for reference. More information about adding cloud-based files can also be found in the article Adding data files.Adding files to a dataset

Excel spreadsheets

For Excel spreadsheets, data.world has created a specific add-in that's available on the AppSource or from within Excel. The add-on allows you to work with your data in Excel while at the same time sharing it in a dataset with others who may not have or use Excel:

Screen_Shot_2019-04-03_at_10.06.17_PM.png

See our Excel integration page for more information. Of course if you so choose you can always either upload your Excel spreadsheet into a dataset like you would any other file type, or you could put in a cloud service like Google Drive, Box or DropBox and add it to the dataset there so it can automatically sync between the two. Versions of Excel files that are uploaded or synced are also kept for future reference.

Data from real-time sources via streaming

You might have data that updates in real-time that you would like to put on data.world. This data could be something like log files, test metrics or tracking data. The best way to integrate this data into a dataset is to use data.world's streaming API. Unlike the methods previously mentioned which pull data from the source on a regularly scheduled basis, data brought in through the streaming API can be pushed into a dataset based on a change to the original data. Because it's triggered by data events and not random time intervals, using the streaming API is the best way to manage real-time data. You can read more about streaming in our API Quickstart guide.

For those less comfortable with working directly with an API, data.world also integrates with several superconnectors like IFFTTT, KNOTS, Singer or Stitch. While easier to use, they are less flexible and versatile than our own streaming API. You can see a full list of our superconnector integrations on our superconnector integrations page.

Screen_Shot_2019-04-03_at_3.47.52_PM.png

Data via a URL or RESTful API

Another common source of data is from a URL or RESTful API available on the internet. If you have a Google Sheets doc, e.g., you can add it to a data.world dataset. As long as the data is on a site that's publicly accessible, you can sync it to data.world--even if it's on a password-protected site with data.world's option to add from a URL. Detailed instructions for adding and syncing data from a url can be found in the article Adding files from a URL. If you do not own the data from the web that you'd like to bring into data.world, you can find out more about licensing and data in the article Licensing and data you found.

If you have data that is behind an API that you'd like to put on data.world--e.g., data from Salesforce, Facebook Ads, Google Ads, etc.--the best way to get it into a dataset is to use one of the superconnectors shown above. More information about our sales and marketing app integrations can be found here.

On-premise data

In addition to data that is available to data.world via cloud sources or APIs, some data that you might want to make accessible on data.world might only be available on your corporate network or behind a firewall. For customers with a need to catalog data behind a firewall, we make our Virtual Data Connector available as an appliance that can be hosted at your site and communicates with data.world via a secure bridge protocol. This option is available to our enterprise tier customers. If you have this need, please contact our sales team at sales@data.world and they will help you with your options.