Docs portal

3rd party tools

Excel and data.world

One of the most powerful ways to work with data on data.world is to do it without ever leaving your favorite application. There are already many applications which have data.world integrations built for them, and more are constantly being added. For a comprehensive list of the current integrations available see our Integrations page. This article covers using our integration for one of the most common data analysis tools: Microsoft Excel. This article was written on a Mac using Microsoft Excel 2016 Desktop for Mac so the screens shown may vary from yours if you are using Excel for Windows.

Installing the data.world add-in to Excel

To start using Excel with data.world you need to install the add-in in Excel, and then enable the integration in data.world. You can get the add-in on the Microsoft app store page. On the App store page, enter data.world into the search bar and click Add:

Screen_Shot_2018-07-23_at_10.02.08_AM.png

Once the add-in is installed you'll have three new items on the right side of the home tab on your Excel menu bar:

Screen_Shot_2018-07-23_at_10.13.11_AM.png

When you have the add-in installed, go to the Excel integration page on data.world and select Enable integration:

Screen_Shot_2019-12-16_at_8.01.23_PM.png
Authentication

To use Excel with data.world you'll need to either sign-in to your data.world account or get an access code/API token from within data.world. Here are instructions for both.*

To use your data.world account, select one of the data.world icons from the top right of your Excel screen. When the login page appears, select Sign In, and enter your data.world credentials. To use the access code or an API token, select access code on the login page:

Screen_Shot_2019-12-16_at_5.36.47_PM.png

You can get an access code from the Excel integration page under the Manage tab by selecting Get token:

Screen_Shot_2019-12-16_at_9.44.08_PM.png

Copy the token and paste it into the window linked from access code on the login page.

Uploading Data From Excel to data.world

To upload data from Excel to data.world, select the Sync Data button on the right side of the Home tab of the Excel menu bar. You can either upload a selection from the current sheet or the entire current sheet. Because the worksheets in an Excel file are stored as individual files on data.world, if your Excel file has multiple worksheets you will need to upload them one at a time:

Screen_Shot_2019-12-16_at_10.38.27_PM.png

When you upload directly from Excel you have the option to add the file to a new dataset, to an existing dataset, or to an existing project:

Screen_Shot_2019-12-16_at_10.06.55_PM.png

Note: If you choose to add the spreadsheet to a project it will not be available for use in any other projects or datasets.

If you upload to a new dataset you'll also be prompted to set the permissions for it. The options are private and public--you cannot set permissions at the organization level from within Excel. However you can adjust permissions to the dataset from within data.world after the dataset has been created:

Screen_Shot_2019-12-16_at_10.52.07_PM.png
Updating data.world data from within Excel

After you have uploaded a spreadsheet to data.world you can continue to work on it in Excel and then sync your changes when you're done. To update an existing dataset select Sync data, and click the upload button to the right of the dataset name:

Screen_Shot_2019-12-17_at_9.56.54_AM.png
Importing data from data.world into Excel

If you're working in Excel and you want to import a file from data.world into your current workbook, select Import data from the Excel menu bar and choose either a new dataset or one you have previously worked with in Excel:

Screen_Shot_2019-12-17_at_11.08.02_AM.png

If you choose to import a new item you'll be prompted to choose the source dataset or project from all of the datasets and projects you explicitly have permissions to (you do not get a list of all the datasets and projects on data.world--just the ones you own, that are owned by an organization you are in, or which have been shared with you). Next you choose whether to import a query or a table (a table is any tabular file in the dataset or project you chose), and if you want to import it to a new tab in your workbook or to the current sheet:

Screen_Shot_2019-12-17_at_11.15.33_AM.png

Note: if you import into the current sheet, all the data in the sheet will be overwritten.

After working with the data in Excel you can sync it back up to data.world.

Publishing insights from Excel to data.world

While data.world has a a built-in Chart builder application and an integration with Tableau, if you prefer to work on your visual data analysis in Excel you can still upload your charts as insights to projects on data.world. To upload a chart from Excel choose the Publish insights menu item and select your chart:

Screen_Shot_2019-12-17_at_12.21.29_PM.png

You'll be prompted to select an existing project or to create a new one, and if you choose New project you'll set the permissions for it to public or private. After uploading, your chart is available on the insights tab of the project along with any comments you added:

Screen_Shot_2019-12-17_at_12.28.00_PM.png

*The only time it matters which you use is if your organization uses SAML for authentication. SAML users must use the access code/API token.

Tableau Web Data Connector

Easily pull data directly into Tableau to create your visualizations using data.world's web data connector. By using the connector as your data source, Tableau can even keep the data in sync with data.world to ensure your visualizations are using the most accurate data.

Enable the Tableau integration in data.world

If you're connecting to Tableau and data.world for the first time:

  • Click the Integrations link at the bottom of the left sidebar in data.world

    integrations01.png
  • Filter or scroll down to find the Tableau tile and click on it

  • Click the Enable Integration button at the top of the Tableau integration page

Alternatively, you can enable the integration by following this link to go straight to the Tableau integration page

Add a new Web Data Connector data source in Tableau

To use the connector from Tableau or Tableau Public v.10+, select the Web Data Connector option under Connect > To a Server within your Tableau application.

In the window that opens, we're going to enter the web data connector URL.

The URL you enter here will depend on what type of import you would like to make. You have two options:

  1. Import all tables from a dataset or project

  2. Import a single table, file, or query result

Import an entire dataset
  1. Enter tableau.data.world as the Web Data Connector URL.

  2. Press Enter

  3. Login with your data.world credentials if prompted

  4. Enter the data.world dataset URL or click Browse to scroll through your datasets or search the data.world community (note that searches results are limited to 30 in the Web Data Connector)

  5. Click Select next to the dataset you'd like to import

  6. Click Connect to confirm you selection

  7. Depending on the size of the dataset, this import can take from a few seconds to a few minutes

Import a single file, table, or query result

Go to data.world and navigate to the file or query you would like to open in Tableau. To the right of the Download link, click Open with Tableau (note: if you have multiple integrations enabled, you may have to click the dropdown menu here to choose Tableau)

tableau01.png

Copy the URL from the new window that opens:

tableau04.png

Returning to Tableau, paste the connector URL into the input box. Note that when you enter the URL, the input box disappears and moves to the URL field so you'll just need to hit Enter on your keyboard to load the next page.

tableau05.png

By using the connector URL from the Open with Tableau window, the data.world web data connector will be pre-populated with your dataset URL or query details:

115018236988-mceclip5.png

If you'd like to bring in another dataset, you can go to Data > New Data Source and repeat the steps above.

Posting a Tableau visualization on data.world

Once you've created and published your visualization to Tableau Public, use the published URL to share it back on data.world as an Insight on a Data Project, as a discussion post, or within your dataset or Data Project summary using the markdown syntax @(https://public.tableau.com/shared/url) :

115003214833-mceclip0.png
Adding data.world to the Tableau Server safe list

In order to publish a Web Data Connector to Tableau Server to allow for data refreshes, you will need to add data.world to the safe list/allowlist. For general information on this, please see Tableau's documentation.

For Tableau Server 2018.1 and above, run the following commands:

tsm data-access web-data-connectors add --name 'DW WDC' --url https://tableau.data.world:443 --secondary https://api.data.world/*,https://query.data.world/*,https://data.world/oauth/*
tsm data-access web-data-connectors allow
tsm restart

As always, contact us if you have any feedback or run into any unexpected issues with the connector.

Python SDK

Data.world has developed an open source Python library for working with data.world datasets.

This library makes it easy for data.world users to pull and work with data stored on data.world. Additionally, the library provides convenient wrappers for data.world APIs, allowing users to create and update datasets, add and modify files, and possibly implement entire apps on top of data.world.

Installation

You can install data.world Python library using pip directly from PyPI:

pip install datadotworld

Optionally, you can install the library including pandas support:

pip install datadotworld[pandas]

If you use conda to manage your python distribution, you can install from the community-maintained [conda-forge](https://conda-forge.github.io/) channel:

conda install -c conda-forge datadotworld-py
Configuration

This library requires a data.world API authentication token to work.

Your authentication token can be obtained on data.world once you enable Python under Integrations > Python

To configure the library, run the following command:

dw configure

Alternatively, tokens can be provided via the DW_AUTH_TOKEN environment variable. On MacOS or Unix machines, run (replacing <YOUR_TOKEN>> below with the token obtained earlier):

export DW_AUTH_TOKEN=<YOUR_TOKEN>
Examples

The load_dataset() function facilitates maintaining copies of datasets on the local filesystem. It will download a given dataset's datapackage and store it under ~/.dw/cache. When used subsequently, load_dataset() will use the copy stored on disk and will work offline, unless it's called with force_update=True or auto_update=True. force_update=True will overwrite your local copy unconditionally. auto_update=True will only overwrite your local copy if a newer version of the dataset is available on data.world.

Once loaded, a dataset (data and metadata) can be conveniently accessed via the object returned by load_dataset().

Start by importing the datadotworld module:

import datadotworld as dw

Then, invoke the load_dataset() function, to download a dataset and work with it locally. For example:

intro_dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset')

Dataset objects allow access to data via three different properties raw_data, tables and dataframes. Each of these properties is a mapping (dict) whose values are of type bytes, list and pandas.DataFrame, respectively. Values are lazy loaded and cached once loaded. Their keys are the names of the files contained in the dataset.

For example:

>>> intro_dataset.dataframes
LazyLoadedDict({
    'changelog': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballstats': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballteam': LazyLoadedValue(<pandas.DataFrame>)})

IMPORTANT: Not all files in a dataset are tabular, therefore some will be exposed via raw_data only.

Tables are lists of rows, each represented by a mapping (dict) of column names to their respective values.

For example:

>>> stats_table = intro_dataset.tables['datadotworldbballstats']
>>> stats_table[0]
OrderedDict([('Name', 'Jon'),
             ('PointsPerGame', Decimal('20.4')),
             ('AssistsPerGame', Decimal('1.3'))])

You can also review the metadata associated with a file or the entire dataset, using the describe function. For example:

>>> intro_dataset.describe()
{'homepage': 'https://data.world/jonloyens/an-intro-to-dataworld-dataset',
 'name': 'jonloyens_an-intro-to-dataworld-dataset',
 'resources': [{'format': 'csv',
   'name': 'changelog',
   'path': 'data/ChangeLog.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballstats',
   'path': 'data/DataDotWorldBBallStats.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballteam',
   'path': 'data/DataDotWorldBBallTeam.csv'}]}
>>> intro_dataset.describe('datadotworldbballstats')
{'format': 'csv',
 'name': 'datadotworldbballstats',
 'path': 'data/DataDotWorldBBallStats.csv',
 'schema': {'fields': [{'name': 'Name', 'title': 'Name', 'type': 'string'},
                       {'name': 'PointsPerGame',
                        'title': 'PointsPerGame',
                        'type': 'number'},
                       {'name': 'AssistsPerGame',
                        'title': 'AssistsPerGame',
                        'type': 'number'}]}}
Standalone functions
load_dataset(dataset_key, force_update=False, auto_update=False)

Load a dataset from the local filesystem, downloading it from data.world first, if necessary.

This function returns an object of type LocalDataset. The object allows access to metedata via it’s describe() method and to all the data via three properties raw_data, tables and dataframes, all of which are mappings (dict-like structures).

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • force_update (bool) – Flag, indicating if a new copy of the dataset should be downloaded replacing any previously downloaded copy (Default value = False)

  • auto_update (bool) – Flag, indicating that dataset be updated to the latest version

Returns

The object representing the dataset

Return type

LocalDataset

Raises

RestApiError – If a server error occurs

open_remote_file(dataset_key, file_name, mode='w', **kwargs)

Open a remote file object that can be used to write to or read from a file in a data.world dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file_name (str) – The name of the file to open

  • mode (str, optional) – the mode for the file - must be ‘w’, ‘wb’, ‘r’, or ‘rb’ - indicating read/write (‘r’/’w’) and optionally “binary” handling of the file data. (Default value = ‘w’)

  • chunk_size (int, optional) – size of chunked bytes to return when reading streamed bytes in ‘rb’ mode

  • decode_unicode (bool, optional) – whether to decode textual responses as unicode when returning streamed lines in ‘r’ mode

  • **kwargs

Examples

>>> importdatadotworldasdw>>>>>> # write a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt')asw:... w.write("this is a test.")>>>>>> # write a jsonlines file>>> importjson>>> withdw.open_remote_file('username/test-dataset',... 'test.jsonl')asw:... json.dump({'foo':42,'bar':"A"},w)... w.write("\n")... json.dump({'foo':13,'bar':"B"},w)... w.write("\n")>>>>>> # write a csv file>>> importcsv>>> withdw.open_remote_file('username/test-dataset',... 'test.csv')asw:... csvw=csv.DictWriter(w,fieldnames=['foo','bar'])... csvw.writeheader()... csvw.writerow({'foo':42,'bar':"A"})... csvw.writerow({'foo':13,'bar':"B"})>>>>>> # write a pandas dataframe as a csv file>>> importpandasaspd>>> df=pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})>>> withdw.open_remote_file('username/test-dataset',... 'dataframe.csv')asw:... df.to_csv(w,index=False)>>>>>> # write a binary file>>> withdw.open_remote_file('username/test-dataset',>>> 'test.txt',mode='wb')asw:... w.write(bytes([100,97,116,97,46,119,111,114,108,100]))>>>>>> # read a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt',mode='r')asr:... print(r.read())>>>>>> # read a csv file>>> withdw.open_remote_file('username/test-dataset',... 'test.csv',mode='r')asr:... csvr=csv.DictReader(r)... forrowincsvr:... print(row['column a'],row['column b'])>>>>>> # read a binary file>>> withdw.open_remote_file('username/test-dataset',... 'test',mode='rb')asr:... bytes=r.read()
query(dataset_key, query, query_type='sql', parameters=None)

Query an existing dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • query (str) – SQL or SPARQL query

  • query_type ({'sql', 'sparql'}, optional) – The type of the query. Must be either ‘sql’ or ‘sparql’. (Default value = “sql”)

  • parameters (query parameters, optional) – parameters to the query - if SPARQL query, this should be a dict containing named parameters, if SQL query,then this should be a list containing positional parameters. Boolean values will be converted to xsd:boolean, Integer values to xsd:integer, and other Numeric values to xsd:decimal. Anything else is treated as a String literal (Default value = None)

Returns

Object containing the results of the query

Return type

QueryResults

Raises

RuntimeError – If a server error occurs

API Client Methods

The following functions are all methods of the datadotworld.api_client() class

add_files_via_url(dataset_key, files={})

Add or update dataset files linked to source URLs

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • files (dict) – Dict containing the name of files and metadata Uses file name as a dict containing File description, labels and source URLs to add or update (Default value = {}) description and labels are optional.

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> url='http://www.acme.inc/example.csv'>>> api_client=dw.api_client()>>> api_client.add_files_via_url(... 'username/test-dataset',... {'example.csv':{... 'url':url,... 'labels':['raw data'],... 'description':'file description'}})
add_linked_dataset(project_key, dataset_key)

Link project to an existing dataset

This method links a dataset to project

Parameters
  • project_key (str) – Project identifier, in the form of owner/id

  • dataset_key – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> linked_dataset=api_client.add_linked_dataset(... 'username/test-project',... 'username/test-dataset')
append_records(dataset_key, stream_id, body)

Append records to a stream.

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • stream_id (str) – Stream unique identifier.

  • body (obj) – Object body

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.append_records('username/test-dataset','streamId',... {'content':'content'})
create_dataset(owner_id, **kwargs)

Create a new dataset

Parameters
  • owner_id (str) – Username of the owner of the new dataset

  • title (str) – Dataset title (will be used to generate dataset id on creation)

  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}) – Dataset visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

Returns

Newly created dataset key

Return type

str

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> url='http://www.acme.inc/example.csv'>>> api_client.create_dataset(... 'username',title='Test dataset',visibility='PRIVATE',... license='Public Domain',... files={'dataset.csv':{'url':url}})
create_insight(project_key, **kwargs)

Create a new insight

Parameters
  • project_key (str) – Project identifier, in the form of

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

Insight with message and uri object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.create_insight(... 'projectOwner/projectid',title='Test insight',... image_url='url')
create_project(owner_id, **kwargs)

Create a new project

Parameters
  • owner_id (str) – Username of the creator of a project.

  • title (str) – Project title (will be used to generate project id on creation)

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

Newly created project key

Return type

str

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.create_project(... 'username',title='project testing',... visibility='PRIVATE',... linked_datasets=[{'owner':'someuser',... 'id':'somedataset'}])
delete_dataset(dataset_key)

Deletes a dataset and all associated data

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_dataset(... 'username/dataset')
delete_files(dataset_key, names)

Delete dataset file(s)

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • names (list of str) – The list of names for files to be deleted

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_files(... 'username/test-dataset',['example.csv'])
delete_insight(project_key, insight_id)

Delete an existing insight.

Parameters
  • project_key (str) – Project identifier, in the form of projectOwner/projectId

  • insight_id (str) – Insight unique id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> del_insight=api_client.delete_insight(... 'username/project','insightid')
delete_project(project_key)

Deletes a project and all associated data

Parameters

project_key (str) – Project identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_project(... 'username/test-project')
download_datapackage(dataset_key, dest_dir)

Download and unzip a dataset’s datapackage

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • dest_dir (str or path) – Directory under which datapackage should be saved

Returns

Location of the datapackage descriptor (datapackage.json) in the local filesystem

Return type

path

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> datapackage_descriptor=api_client.download_datapackage(... 'jonloyens/an-intro-to-dataworld-dataset',... '/tmp/test')>>> datapackage_descriptor'/tmp/test/datapackage.json'
download_dataset(dataset_key)

Return a .zip containing all files within the dataset as uploaded.

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Returns

.zip file contain files within dataset

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.download_dataset(... 'username/test-dataset')
download_file(dataset_key, file)

Return a file within the dataset as uploaded.

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file (str) – File path to be returned

Returns

file in which the data was uploaded

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.download_file('username/test-dataset',... '/my/local/example.csv')
fetch_contributing_datasets(**kwargs)

Fetch datasets that the authenticated user has access to

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Authenticated user dataset

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_dataset=... api_client.fetch_contributing_datasets(){'count': 0, 'records': [], 'next_page_token': None}
fetch_contributing_projects(**kwargs)

Fetch projects that the currently authenticated user has access to

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_projects=... api_client.fetch_contributing_projects(){'count': 0, 'records': [], 'next_page_token': None}
fetch_datasets(**kwargs)

Fetch authenticated user owned datasets

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_owned_dataset=api_client.fetch_datasets()
fetch_liked_datasets(**kwargs)

Fetch datasets that authenticated user likes

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_liked_dataset=... api_client.fetch_liked_datasets()
fetch_liked_projects(**kwargs)

Fetch projects that the currently authenticated user likes

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_liked_projects=... api_client.fetch_liked_projects()
fetch_projects(**kwargs)

Fetch projects that the currently authenticated user owns

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_projects=... api_client.fetch_projects()
get_dataset(dataset_key)

Retrieve an existing dataset definition

This method retrieves metadata about an existing

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> intro_dataset=api_client.get_dataset(... 'jonloyens/an-intro-to-dataworld-dataset')>>> intro_dataset['title']'An Intro to data.world Dataset'
get_insight(project_key, insight_id, **kwargs)

Retrieve an insight

Parameters
  • project_key (str) – Project identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

Returns

Insight definition, with all attributes

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> insight=api_client.get_insight(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world',... 'c2538b0c-c200-474c-9631-5ff4f13026eb')>>> insight['title']'Coast Guard Lives Saved by Fiscal Year'
get_insights_for_project(project_key, **kwargs)

Get insights for a project.

Parameters

project_key (str) – Project identifier, in the form of projectOwner/projectid

Returns

Insight results

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> insights=api_client.get_insights_for_project(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world'... )
get_project(project_key)

Retrieve an existing project

This method retrieves metadata about an existing project

Parameters

project_key (str) – Project identifier, in the form of owner/id

Returns

Project definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> intro_project=api_client.get_project(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world'... )>>> intro_project['title']'An Example Project that Shows What To Put in data.world'
get_user_data()

Retrieve data for authenticated user

Returns

User data, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_data=api_client.get_user_data()>>> user_data[display_name]'Name User'
remove_linked_dataset(project_key, dataset_key)

Unlink dataset

This method unlinks a dataset from a project

Parameters
  • project_key (str) – Project identifier, in the form of owner/id

  • dataset_key – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.remove_linked_dataset(... 'username/test-project',... 'username/test-dataset')
replace_dataset(dataset_key, **kwargs)

Replace an existing dataset

This method will completely overwrite an existing dataset.

Parameters
  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}) – Dataset visibility

  • files (dict, optional) – File names and source URLs to add or update

  • dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_dataset(... 'username/test-dataset',... visibility='PRIVATE',license='Public Domain',... description='A better description')
replace_insight(project_key, insight_id, **kwargs)

Replace an insight.

Parameters
  • project_key (str) – Projrct identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_insight(... 'projectOwner/projectid',... '1230-9324-3424242442',... embed_url='url',... title='Test insight')
replace_project(project_key, **kwargs)

Replace an existing Project

Create a project with a given id or completely rewrite the project, including any previously added files or linked datasets, if one already exists with the given id.

Parameters
  • project_key (str) – Username and unique identifier of the creator of a project in the form of owner/id.

  • title (str) – Project title

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

project object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_project(... 'username/test-project',... visibility='PRIVATE',... objective='A better objective',... title='Replace project')
sparql(dataset_key, query, desired_mimetype='application/sparql-results+json', **kwargs)

Executes SPARQL queries against a dataset via POST

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • query (str) – SPARQL query

Returns

file object that can be used in file parsers and data handling modules.

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sparql_post('username/test-dataset',... query)
sql(dataset_key, query, desired_mimetype='application/json', **kwargs)

Executes SQL queries against a dataset via POST

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • query (str) – SQL query

  • include_table_schema (bool) – Flags indicating to include table schema in the response

Returns

file object that can be used in file parsers and data handling modules.

Return type

file-like object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sql('username/test-dataset','query')
sync_files(dataset_key)

Trigger synchronization process to update all dataset files linked to source URLs.

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sync_files('username/test-dataset')
update_dataset(dataset_key, **kwargs)

Update an existing dataset

Parameters
  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}, optional) – Dataset visibility

  • files (dict, optional) – File names and source URLs to add or update

  • dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_dataset(... 'username/test-dataset',... tags=['demo','datadotworld'])
update_insight(project_key, insight_id, **kwargs)

Update an insight.

Note that only elements included in the request will be updated. All omitted elements will remain untouched.

Parameters
  • project_key (str) – Projrct identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_insight(... 'username/test-project','insightid'... title='demo atadotworld'})
update_project(project_key, **kwargs)

Update an existing project

Parameters
  • project_key (str) – Username and unique identifier of the creator of a project in the form of owner/id.

  • title (str) – Project title

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_project(... 'username/test-project',... tags=['demo','datadotworld'])
upload_files(dataset_key, files, files_metadata={}, **kwargs)

Upload one or more dataset files

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • files (list of str) – The list of names/paths for files stored in the local filesystem

  • expand_archives – Boolean value to indicate files should be expanded upon upload

  • files_metadata (dict optional) – Dict containing the name of files and metadata Uses file name as a dict containing File description, labels and source URLs to add or update

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.upload_files(... 'username/test-dataset',... ['/my/local/example.csv'])

R SDK

Checkout our GitHub repository below to get an R client for querying data.world datasets:

https://github.com/datadotworld/data.world-r

We'll add more functionality as it comes, and please send any feedback to help@data.world!

Using data.world with Slack

Slack is a powerful communication tool for teams and when paired with data.world it extends your control over your information flow in dramatic ways. Integrating data.world with Slack enables you to get real-time updates in Slack whenever changes are made to data.world accounts, datasets, and projects--keeping you and your team constantly in the loop.

With Slack and data.world you can:

  • Easily subscribe to projects, datasets, and accounts to receive notifications when updates are made.

  • Share rich messages when you link to data.world so your teammates won’t be left in the dark.

  • Preview query details and easily link to a specific query with one click.

  • Quickly view your subscriptions, unsubscribe, and explore further through inline commands.

In the next sections we'll show you how to do all of these things and more!

data.world's Slack integration is not created by, affiliated with, or supported by Slack Technologies, Inc.

Installation

To install the data.world Slack integration, select Add to Slack button at the top of the Slack Integrations page:

slack-integrations-page-1.png

You'll be prompted to either enter your Slack workspace URL (you must be an administrator for the workspace to install the app) or to create a new workspace:

Screen_Shot_2018-09-13_at_2.51.31_PM.png

After you enter your URL and select Continue you'll be taken to the final set-up screen that tells you how data.world will interact with your workspace:

Screen_Shot_2018-09-13_at_2.57.19_PM.png

Once the data.world app has been installed on Slack it will show up on the bottom left of your Slack workspace in the Apps section. The first message from the app will contain a welcome message with instructions on how to use the app and where to get help:

Screen_Shot_2018-09-13_at_3.23.32_PM.png
Subscription

In the example above we have a channel called crm for our customer relationship management team. If they want to get updates whenever changes are made to the crm datasets, we would need to set-up their channel as follows:

  • Add the data.world app to the channel - this needs to be done for every channel ion the workspace that wants to use the app.

  • Subscribe the channel to the projects, datasets, and accounts for which the members should get updates.

To add the data.world app to the channel, select the channel from the list in the left sidebar and enter the command /invite @data.world:

Screen_Shot_2018-09-13_at_3.24.00_PM.png

Once the app has been added to the channel you can set up subscriptions to projects, datasets, and accounts. To subscribe the crm channel to CRM Project, enter the command /data.world subscribe https://data.world/siyeh/crm-project into the command line of the channel:

Screen_Shot_2018-09-13_at_3.37.45_PM.png
Notification

After a channel has been subscribed, whenever an update is made to the subscribed item (project, dataset, or account) everyone in the channel will automatically get a notification of the update with a link to it:

Screen_Shot_2018-09-13_at_4.00.23_PM.png

In the example above the file can be commented on in the Slack channel or on data.world by selecting the Discuss button at the bottom of the message:

Screen_Shot_2018-09-13_at_4.07.08_PM.png
Rich messages

With the data.world app for Slack installed whenever you reference something on data.world in your channel you get a rich text preview with embedded links and the option of subscribing to the item (if a project, dataset, or account) from the message:

Screen_Shot_2018-09-13_at_4.47.48_PM.png
Preview queries

Another powerful feature of the data.world app is that you can paste the URL of a data.world query into Slack and it gets rendered into preview and has links for others to access directly without having to hunt around the project or dataset to find the query:

Screen_Shot_2018-09-13_at_4.56.19_PM.png
View and manage subscriptions

Managing your subscriptions and getting help for commands is easy too! To see what a particular channel is subscribed to, enter the command /data.world list into the channel dialog and you will get back a list of all the projects, datasets, organizations, and users the channel is subscribed to with links to each of them and the name of the person who subscribed to them:

Screen_Shot_2018-09-13_at_5.08.53_PM.png

The correct command format for subscribing is stored in Slack in the first message from the data.world app so you'll always have it readily at hand. The syntax for all commands also can be accessed by typing /data.world help:

Screen_Shot_2018-09-13_at_5.20.46_PM.png