Publish & update files from S3
Using S3 for your repository, but still want to publish data on data.world for easy sharing, collaboration, visualization, and querying? No problem!
To import a file from an Amazon S3 bucket, either use the link value for any public file or for private files, generate a presigned URL using the AWS command link interface (CLI). Below are the steps for this:
Create your dataset container on data.world:
- Login to https://data.world/.
Click '+ Add datasets' at the top of any page.
Name your dataset, select Open or Private and click
Create dataset
. Note that you don't need to add files through the interface. We'll add them in the next step.
Add files to your dataset:
Add files to your dataset using the S3 public or presigned URL with the following command:
curl https://api.data.world/v0/datasets/<username>/<datasetName>/files \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <MY-API-TOKEN>' \
--data-binary '{"files": [
{ "name": "<fileName1>", "source": {"url": "<sourceURL1>" }},
{ "name": "<fileName2>", "source": {"url": "<sourceURL2>" }}
] }'
Where:
username
is the dataset owner username. Note you must have permissions to modify the dataset if not the owner.datasetName
is the id of the dataset. This can be found in the URL path of the dataset.MY-API-TOKEN
can be found under your profile settings within data.world, or by going to https://data.world/settings/advanced.fileName
is what you'd like to name the file, and should include the file extension.sourceURL
is the link for a public file, or a presigned URL for private files (see tips below).Tip 1: URL for Public File
To make your file public on S3, navigate to the file, right-click on it and select Make Public. After doing so, go to the Properties for the file and the Link value can be used to upload to data.world.
Tip 2: URL for Private File
To generate a presigned URL for a private file on S3, you will first need to install and configure the AWS CLI. Once that is in place, use the following command to generate the URL for each file:
aws s3 presign <S3URI> --expires-in <expireTimeInSeconds>
WhereS3URI
is in the format s3://mybucket/myfile