Skip to main content

Adding files to a dataset

Some of the sources for data you would like to put in a dataset include:

  • Your local drive

  • The cloud

  • Live data connections and data extracts

  • URLs

Adding data files from a local drive

The easiest way to add files from a local drive is to drag and drop them onto the add data box. Drag and drop allows you to add multiple files to a dataset at once:

screenshot_multiple_files_add_data__1_.png

You can also add files by selecting the Add data button or clicking anywhere in the Add data box. The Add data button opens the add data window which lists all the ways you can add data to a dataset. Note that in addition to the Upload from computer option you still have the option to drag and drop files into your dataset. Upload from computer is similar to drag and drop, but you can only select one file at a time for upload. Dragging and dropping is much more efficient for adding multiple files from a local drive.

Adding data files from the cloud

If you want to add data files from the cloud you will need to configure your cloud service account to allow access by data.world. To configure a service, go to https://data.world/integrations/categories/import or click on the Add data button from the dataset overview and following the link to the integration page:

add-files-2.png

Once you have configured a cloud drive, it remains connected for further use. There is no limit to the number of cloud drives you can have configured.

You can select and add multiple cloud-based files at a time, and files sourced from the cloud can also be set to sync regularly--ensuring you'll always have the most up-to-date version of the file in your dataset.

Screen_Shot_2018-07-22_at_5.36.58_PM.png

Sync options can be changed at any time from the overview page of the dataset:

Screen_Shot_2018-07-22_at_5.41.04_PM.png

If you manually update a file (delete and re-upload it) or if a file updates automatically from a sync, all the previous versions of the file are preserved in data.world and can be downloaded at any time. In this way your data is preserved for auditing, accountability and versioning. To access past versions of your data go to the Activity tab on the dataset and select Versions, then click on the three-dot menu to the right of the version you wish to recover:

Screen_Shot_2019-04-16_at_3.24.17_PM.png

If you'd like to know how to link data through a URL see the article Adding files from a URL.

Live data connections and data extracts

When you select Add data on a dataset you'll be taken to a screen where you can choose a variety of options including your virtual connection listed under MY DATA SOURCES:

add_data_for_virt_conn.png

When you select your connection name you'll have the choice of creating the dataset with a live connection or a data extract:

live_connection_vs_extract.png

Data extracts are not available for all data sources. If this option is not available for your data source it will be greyed out--as shown above.

The main differences between a live table and and a data extract are as follows.

On a live table :

  • Data continues to live at its source and will not be ingested into data.world.

  • Any queries executed against this dataset will be translated and executed in the data source.

  • Users may select tables to pull into the dataset, but cannot specify a SQL query.

With a data extract:

  • Data will be pulled into data.world and processed into our internal representation.

  • You can set it to update at specific intervals from the source.

  • Users can select tables to pull into the dataset or specify a SQL query whose results should be pulled into the dataset.

After choosing live or extract you might be prompted to select a database, and then a schema, followed by tables--or you might just be presented with a list of tables. Your options are determined by the data source.

When you get to the table selection you have the option of adding one or many at the same time. If you want to use all of the tables in your dataset, select Name at the top of the list:

Table_selectin.png

Select Import ... tables and when the tables have been linked you will receive a confirmation and a reminder of whether this data is from a live connection or brought in with a data extract:

Confirmation_of_table_added.png

Finally you will get a confirmation that your dataset has been created and when you close that window you'll be taken to your dataset overview page where you can document the dataset and edit the metadata:

Dataset_overview.png

Adding files from a URL

Have data on another site that you'd like to import to data.world for easy sharing, collaboration, and querying? No problem! As long as you have a direct URL to one of our supported file types and permission to access it, data.world will be able to import it easily. This is a great solution for importing data from the web, data portals, cloud storage apps, GitHub, and API endpoints! Even better, if the files change at the source, data.world can automatically update it.

To add data from a URL click the Add data button on the dataset overview page and select the Sync from URL option:

adding-files-from-url-01.png

If the URL does not require authentication to access it, all you need to do to add it to your dataset is enter the shareable link (provided by the data source) into the source URL field and select Continue:

360022911293-mceclip1.png

For sources where you need permission to access the data, first paste the URL, then choose authentication and select the appropriate option in the dropdown menu (OAuth, Token, or Username and password aka Basic):

360021960034-mceclip2.png

Headers and POST body are used to make API calls to sites that support REST API. See the support docs on those sites for required values. Though it is possible to include authentication information in headers, we strongly recommend using the Authentication setting as collaborators on the dataset or project can see information in headers (including logins and passwords), but not Authentication values.

When you have finished entering the required information from your URL, click Continue andyou'll be prompted to name the data file as it will be seen on data.world. Choose carefully, as this name cannot be edited later. Changing the name would require deleting the connection and creating a new one with the new name.

If data.world encounters an issue with the source URL, we will display an error requesting you to verify the link and the settings. Hovering over the ? next to the error message will show you the exact error returned:

360021960054-mceclip3.png

Clicking Edit will bring up the same dialog you used to enter the initial parameters so you can make changes:

360021960114-mceclip4.png

Note that sometimes the error returned will say 404 Not Found instead of 403 Forbidden if it's an authentication failure even though the URL is correct. This result is a security feature of the API.

Once your files are added, configure them to update regularly through the automated sync settings.