Docs portal

Integrations

Integrations

By connecting your data.world datasets and projects to other applications and programs, you unlock the ability to transport, manipulate, sync, and share your data and analyses with a few simple steps. Download our integrations here. Additional documentation below.

Integrations.png

Algorithmia

With data.world's Algorithmia integration, you can build intelligent apps that you can execute in your favorite coding language.

Configure Algorithmia with your data.world token.

Using the configure algorithm, you can save your API token for later use on Algorithmia:

algorithmia_dw-gallery.png
Algorithmia integration page
algorithmia_install-use.png
  1. Create an account and login to Algorithmia.

  2. Navigate to the data.world organization on Algorithmia.

  3. Choose the Configure algorithm.

    algorithmia_dw-config.png
  4. Enter your API token and click Run Example.

  5. Proceed to the query of your choice, you can choose to:

  6. Once you're satisfied with the results of your algorithm, scroll down and choose your desired language, copy the code, and run in your program of choice.

What next?

Here are a few things you can do with Algorithmia and data.world:

  • Create data collections to easily store data on Algorithmia and export it to other sources (like data.world or your local machine).

  • Connect other data sources like AWS S3 or Dropbox to import data and apply algorithms.

  • Combine other algorithms with those available from data.world to build complex apps like in this example from Algorithmia.

Athena

AWS Glue

Azure Synapse

BigQuery

Introduction

data.world connects to BigQuery via a Service Account associated with your project. Once you create a service account, create a key and download the associated JSON key file. The key file, and your service account email are required to configure a connection from data.world. Your account will need the roles BigQuery Data Viewer and BigQuery User in order to function properly.For additional information, see Google Cloud Platform documentation on predefined roles and permissions.

BigQuery Data Viewer

roles/bigquery.dataViewer

When applied to a dataset, dataViewer provides permissions to:

  • Read the dataset's metadata and to list tables in the dataset.

  • Read data and metadata from the dataset's tables.

When applied at the project or organization level, this role can also enumerateall datasets in the project. Additional roles, however, are necessary to allowthe running of jobs.

Permissions

bigquery.datasets.get
bigquery.datasets.getIamPolicy
bigquery.models.getData
bigquery.models.getMetadata
bigquery.models.list
bigquery.routines.get
bigquery.routines.list
bigquery.tables.export
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
resourcemanager.projects.get
resourcemanager.projects.list
BigQuery User

roles/bigquery.user

Provides permissions to run jobs, including queries, within the project. The user role can enumerate their own jobs, cancel their own jobs, and enumeratedatasets within a project. Additionally, allows the creation of new datasetswithin the project; the creator is granted the bigquery.data Owner role for these new datasets.

Permissions

bigquery.bireservations.get
bigquery.capacityCommitments.get
bigquery.capacityCommitments.list
bigquery.config.get
bigquery.datasets.create
bigquery.datasets.get
bigquery.datasets.getIamPolicy
bigquery.jobs.create
bigquery.jobs.list
bigquery.models.list
bigquery.readsessions.*
bigquery.reservationAssignments.list
bigquery.reservationAssignments.search
bigquery.reservations.get
bigquery.reservations.list
bigquery.routines.list
bigquery.savedqueries.get
bigquery.savedqueries.list
bigquery.tables.list
bigquery.transfers.get
resourcemanager.projects.get
resourcemanager.projects.list

Box.com

With data.world's Box.com integration, you can transfer and sync files to data.world in seconds.

Upload files from Box.com.

Transferring data from Box.com to data.world is easy through the Add Data menu:

box_1.png
box_new-box.png
box_choose-file.png
  1. From the Add Data menu, choose New File.

  2. Choose Add new Box.com from the resulting modal. You may need to choose the All tab to view the Box.com option.

  3. Authenticate your desired Box.com account, then select your account from the New File modal. You may need to choose New File from the Add Data menu again after authenticating your Box.com account.

    box_choose-box.png
  4. Choose file(s) from the resulting list to upload.

What next?

Here are a few things you can do with Box.com and data.world:

  • Keep your synced files up to date by going to the Settings section of your dataset or project and editing the Automatic Sync Options.

  • Add your team's Box.com files to data.world to further explore your data and collaborate with your team.

Canvas

Setting up a classroom to work collaboratively on data.world can be done in a few simple steps:

1.) Go to https://data.world/create-organization?plan=team-classroom to create your classroom as an organization. This will ensure that you can invite all your students and everyone in the org can upload and share data.

2.) As part of the setup, you will be able to invite your class via a link (rather than manually entering email addresses).

classroom-one.png

To set this up manually (for already-created orgs), follow these steps:

  • Go to the Settings > Preferences of the org and check “Allow users to request to join this organization”

classroom-two.png
  • Then go to the People tab and click to copy the link and share with your class. (This link will prompt them to create an account, if they don’t already have one, and will auto request an invite to the org - you will need to accept the requests)

classroom-three.png

4.) From there, you or the students can upload datasets, create projects, discuss and share insights.

Check out our Getting Started page fore more details on diving into the platform.

We also have a new Canvas integration link!

Install the data.world LTI in your LMS

After navigating to the LMS course in which you want to install the data.world LTI, access the course settings menu.

course-settings.png

Within the course settings menu, access the course "Apps" menu.

course-apps.png

Select the Add App button, i.e. "+ App", from within the course apps submenu. This will cause an application configuration modal to popup.

course-app-settings.png
Configuration Link Generator
<form>Configuration Type<input></input>Name<input></input>Consumer Key<input></input><small>This field should be left blank.</small>Shared Secret<input></input><small>This field should be left blank.</small>Config URL<input></input>Course ConfigurationOrganization URL<input></input>Invitation Link<input></input><button>Generate</button></form>
add-app-menu.png

Chart Builder

One of the most powerful ways to share results of your data analysis is through visualizations. On data.world you have access to integrations for many third-party tools and also to Chart Builder, a visual editor for Vega-Lite built specifically for data.world. It is the perfect tool for those looking to create a simple visualization that is lightweight and easy to embed.

Visualizations in Chart Builder can be made from either a file or a query. They can be saved in various formats, shared, or embedded in the various Simple Editor windows.

Using Chart Builder with files

You can create visualizations of tabular data files in either projects or datasets without ever writing a query. To follow along with this example, open the dataset Shark attack data updated daily. To create a visualization of all the data in the file, select the View icon to the right of the file name on the overview page:

chart-builder-viz-01.png

You'll be taken to the data workspace and a view of all the data in the file. Next, click the dropdown arrow next to Open in app and choose Open with Chart Builder:

chart-builder-viz-02.png

If you're using Chart Builder for the first time, you'll be directed to a page requesting authorization. After authorizing, you'll be redirected to the Chart Builder workspace.

Chart Builder comes with two options for creating and modifying charts: a Visual Builder and a Vega-Lite Editor. The easiest way to use it is to create your initial chart on the Visual Builder tab and then switch over to the Vega-Lite Editor to make any changes outside the scope of the Visual Builder. See our article on using Vega-Lite or the Vega-Lite website for more information.

To create a quick bar chart of the number of people in the dataset who have been fatally attacked by sharks, select the field fatal_y_n from the dropdown list for the X axis, and COUNT(*) from the bottom of the list on the Y axis. Like magic, our chart appears on the right side of the screen:

Screen_Shot_2018-12-19_at_1.57.56_PM.png
Using Chart Builder with queries

Chart Builder can also be used on queries. Using a query as the basis for your chart enables you to:

  • Clean up data (e.g., remove NULL values)

  • Filter out data (e.g., specify a time period, a specific value, an aggregation, etc.)

  • Change your data structure so that it can be charted

In the shark attack dataset referenced above there is a saved query called Query for analysis by year or country This query has been written to exclude NULL values and remove non-binary entries on specific fields. It can be used to create a number of different charts. Click the dropdown arrow next to Open in app and choose Open with Chart Builder:

chart-builder-viz-03.png
Formatting options for visualizations

Continuing with the example above, to make a chart with circles for marks that compares the number of attacks on men vs on women across the years select Circle from the Marks dropdown, and year from the X axis dropdown:

Screen_Shot_2018-12-26_at_6.46.22_PM.png

If the axis doesn't display the way you want it to, you can override the default format for the Type under the Options dropdown. In this case the year was read as a number because of the underlying data type and the type was set to Quantitative when Ordinal was the right choice:

Screen_Shot_2018-12-26_at_6.50.01_PM.png

Set the Y axis to COUNT(*) and Color to sex:

Screen_Shot_2018-12-26_at_6.57.47_PM.png

If your chart is not appropriately sized for viewing you can manually set the chart size. A size that shows the data of this chart to best effect is a width of 950 and a height of 730:

Screen_Shot_2018-12-26_at_7.04.11_PM.png

Another handy thing you can do in the options section is order the results in your graph. You may have ordered them already in your query results, but that order does not carry over to the graph. For this example we'll use the saved query Countries with >10 unprovoked attacks since 1960 with mortality data. After running the query, click Chart, set the X axis to country and the Y axis to # attacks:

Screen_Shot_2018-12-26_at_7.14.54_PM.png

To sort by the country with he most attacks select Options on the X axis, choose Descending for the sort and y - # attacks on the field:

Screen_Shot_2018-12-23_at_5.19.49_PM.png

To add information on if the attacks were fatal or not, select fatal_y_n next to color.

Screen_Shot_2018-12-26_at_7.39.25_PM.png

If your results do not seem to display as they should, check to make sure the field you are sorting on is not being improperly aggregated. Being able to switch over to the Vega-Lite Editor is very handy for identifying this kind of configuration issues. Looking in the Vega-Lite Editor in the area dealing with "sort" the operation is set to "average":

Screen_Shot_2018-12-26_at_7.42.17_PM.png

In this case "sum" is the correct option, and upon replacing "average" with "sum" the visualization displays properly:

Screen_Shot_2018-12-26_at_7.45.33_PM.png

NOTE: Once you have made edits in the Vega-Lite Editor you can no longer make any changes in the Visual Builder so save your Vega-Lite Editor changes for when you are finished building the chart with the Visual Builder.

Saving and sharing visualizations

Chart-builder visualizations on data.world can be saved in a number of json, image, and html formats shown under the Download button:

Screen_Shot_2018-12-26_at_1.14.46_PM.png

There are also a variety of options for sharing your visualization on data.world:

Screen_Shot_2018-12-26_at_10.57.40_PM.png

Selecting Share > Insight lets you add your visualization to any project on data.world for which you have permission. To share the insight you are prompted to chose a project where you will share it, to give it a title, and optionally to add comments. The final option (selected by default) is to save the visualization as a Vega-Lite source file on the project.

Share > File lets you add the visualization to any dataset or project for which you have permission. Share > Markdown Embed (Comment) you can embed your chart in any place which uses Markdown (e.g., insights, comments, summaries). By default the embedded chart will be a static rendering of the data from when the visualization was created. However using the Vega-Lite Editor you can create a 'live' chart that updates as the data on which it's based updates. The shark attack dataset is an example of continually-updating data.

To make a chart 'live', go to the Vega-Lite Editor and scroll down to the section referencing the "data" parameters:

Screen_Shot_2018-12-26_at_10.40.49_PM.png

Under the "data" element replace "source" with "url" and add a hardcoded url for the query that drives the visualization (you get this in the workspace while viewing the query), and add a "format" element with the type "csv":

Screen_Shot_2018-12-26_at_10.42.13_PM.png

Then when you select Share > Markdown Embed (Comment), you'll get Markdown text for a live version of the visualization that you can copy and paste into Insights, Comments, and Summaries on data.world:

Screen_Shot_2018-12-26_at_10.39.43_PM.png

Here is an example of the live visualization above used as an insight on a project that uses the dataset:

Screen_Shot_2019-02-20_at_11.14.01_AM.png

Finally, if you want to share a link to the Chart Builder screen for the visualization so someone else can edit and run it, you can do so with the Share > URL option:

Screen_Shot_2018-12-27_at_12.11.48_AM.png
Troubleshooting
Error loading data

An expired token can cause one to receive a "Error loading data." message when opening Chart Builder. To remedy this:

  • Click on your account avatar on the top right corner of data.world and go to 'Your integrations'

  • Select the Chart Builder tile

  • On the Chart Builder page, select the Manage tab

  • Click the Revoke button and disconnect the Chart Builder integration

  • Click the Enable integration button and authorize access

Re-launching Chart Builder will now allow it to fetch the data successfully

Blank chart

When using the Vega-Lite editor to modify the Chart Builder output, many errors cause a blank chart to display. Troubleshooting must be manually carried out in this case - the Vega-Lite editor does not include any error identification functionality.

Using multiple columns of data in Chart Builder visualizations

Chart Builder is a quick and easy tool for creating visualizations of data on the fly, but there is one thing that isn't easy to do with it: include data from more than one column in your graph. This limitation can be a real problem if, say, you want to look at both the high and low temperatures on the days when Bigfoot was sighted. Or if you want to have a graph with gender, attack type, and fatality in shark attacks so you can see if there is any correlation between them. Though you can easily run the queries to display the data, you can't obviously render it all at the same time in Chart Builder. However, though it is a bit tricky and requires the use of UNPIVOT, you can build visualizations in Chart Builder that include data from more than one column in a query.

In this article we'll use the Project Monsters Among Us to show how to include two related fields in a visualization, and Analysis of shark attacks by region and species to do a little fancier combination of multiple columns of unrelated data into one visualization.

How to show data from two related fields in a Chart Builder visualization

There is a query called High and low temperatures on the dates of Bigfoot sightings in the Monsters project that returns a simple table with three columns:

Screen_Shot_2018-12-27_at_5.55.05_PM.png

Click on the Chart icon above the results to build a quick visualization from the query results. Set the X axis to date, the Y axis to temperature_low, and you have a visualization, but where do you put temperature_high?:

Screen_Shot_2018-12-27_at_9.51.04_PM.png

Looking at the data, the solution is to put both the high temp and the low temp values in the same column and call it temperature, and to have another column called temp_value that would indicate whether the temp shown is a high temp or a low temp for the day. Fortunately, this kind of data reorganization where columns get collapsed into rows is what the SQL UNPIVOT command does. Here is the original query rewritten to use UNPIVOT to collapse the high and low temp columns into one column, and the resulting table:

Screen_Shot_2018-12-27_at_10.17.25_PM.png

Select Chart to use Chart Builder on the results of the query, set the marks to Circle, the X axis to date, the Y axis to temperature, the color to temp_type, resize the chart toto 640 X 700, and you'll have this visualization:

Screen_Shot_2018-12-27_at_10.23.44_PM.png
Combining multiple columns of unrelated data into one visualization

In this example we have a query in the project Analysis of shark attacks by region and species that returns dates, type of attack, gender of the victim, and whether the attack was fatal or not:

Screen_Shot_2018-12-28_at_9.28.51_AM.png

To get a quick visualization of it select Chart, setCircle for marks,year for the X axis (you might have to open the options and set the type to Ordinal), COUNT (*) for the Y axis, and Gender for the color. Once it's been resized you get this chart:

Screen_Shot_2018-12-28_at_9.39.01_AM.png

As in the last example, there's no way to include attack-type or fatality data. However, a redo of the original query with UNPIVOT combines all the data into one column ready for Chart Builder:

Screen_Shot_2018-12-28_at_6.42.13_PM.png

Note: Even though there is a warning that only the first 10,000 rows of the results are displayed, when we chart the query with chart Builder, all the data is used in the visualization.

The chart from the query is built the same as before. SetCircle for marks,year for the X axis (you might have to open the options and set the type to Ordinal), COUNT (*) for the Y axis. Set Type for the color. Once it's been resized you get this chart::

Screen_Shot_2018-12-28_at_6.47.48_PM.png

If you want to try unpivoting some queries on your own and charting them, there are a couple more--Provocation and gender in shark attacks, and Provocation and fatality in shark attacks--saved on the project that you can use.

Additional information about UNPIVOT can be found in our SQL documentation.

Using the Vega Lite editor in Chart Builder

Chart Builder uses Vega-Lite, which provides a JSON syntax for creating and styling visualizations. While the Visual Builder interface within Chart Builder on data.world allows one to quickly generate a simple chart, using the Vega Lite editor allows extensive customization of the appearance of the chart.

One important note - once you modify a chart using the Vega Lite editor, the Visual Builder will no longer be accessible. Customize the chart as much as possible in the Visual Builder first before switching to the Vega Lite editor for fine-tuning to keep yourself from needing to do extra work.

This article assumes that you have already enabled the Chart Builder integration on your data.world account. If you have not already, you can enable it from the integrations page while logged into your account.

For a primer on using Chart Builder, please see the Data visualization with Chart Builder.

Getting started

As an example, I've created a project based on a dataset from the US Department of Energy regarding types of energy production throughout the US.

  1. Open up the following query saved to that project: Top 10 states by residential solar energy production

  2. You can then open these query results in Chart Builder using the dropdown menu above the results pane:

    mceclip0.png
  3. Title the chart "Rooftop photovoltaic energy production by US state (top 10)" by clicking on the text that says Untitled chart above.

  4. To the left of the chart, configure the the X axis to use the state field and the Y axis to use the gwh field.

  5. Click on the Options dropdown for the X axis, and under the Sort section, choose Descending by y - gwh.

Screen_Shot_2019-07-18_at_5.42.41_PM.png

You will then see the same chart as below:

Screen_Shot_2019-07-16_at_1.35.42_PM.png

That's a good start, but it's a bit bland and would be difficult to read if projected onto a screen across a conference room. Let's get to work!

Styling the chart title

By default, a Chart Builder adds the title automatically, but does not provide any graphical way to style it. Click on the Vega Lite editor on the top left and you'll see the following entry near the top:

"title": "Rooftop photovoltaic energy production by US state (top 10)"

To transform the title into a field that we can customize, make it into a JSON object by adding curly braces and adding the text property:

"title": {"text": "Rooftop photovoltaic energy production by US state (top 10)" }

The title will look the same as before, but we've laid the foundation for further styling by turning it into a JSON object.

Alignment

The anchor attribute determines the horizontal alignment of the title. Options include:

  • start

  • middle

  • end

In our example, let's align the title on the left side of the chart:

"title": {
   "text": "Rooftop photovoltaic energy production by US state (top 10)",
   "anchor": "start"
 }
Font

We'll use the following attributes to style the font used for the title:

  • font - the name of the font

  • fontSize - the size of the font in pixels

  • color - color of the font, given in a CSS-compatible hex code or color name

This attributes are great for matching the chart to an organization's own branding guidelines. Make the following update to give the title that authentic data.world feel:

"title": {
   "text": "Rooftop photovoltaic energy production by US state (top 10)",
   "anchor": "start",
   "font": "Lato",   "fontSize": 24,   "color": "#355D8A"
 }
Offset

Our title is looking much better now, but there isn't much space between it and the chart beneath it. Give it some breathing room by adding the offsetattribute. The offset value is the number of pixels between the title and the edge of the chart.

"title": {
   "text": "Rooftop photovoltaic energy production by US state (top 10)",
   "anchor": "start",
   "font": "Lato",
   "fontSize": 24,
   "color": "#355D8A",
   "offset": 40
 }

And here's the result so far:

Styling the axis labels

To modify the axis labels (in this case, the names on the x-axis andnumbers on on the y-axis), we'll add some additional attributes within the configobject already present in our Vega Lite editor. By default, the config object will look like:

"config": {
   "background": "#ffffff",
   "padding": 20,
}

Within that configobject, add a new object called axis to modify the labels on both the X and Y axes at the same time. That object will accept a number of attributes; we'll use the following:

  • labelFontSize - label font size in pixels

  • labelFont - label font name

  • labelColor - color of the label font, given in a CSS-compatible hex code or color name

Our config object now looks like this:

"config": {
   "background": "#ffffff",
   "padding": 20,
   "axis": {     "labelFontSize": 20,     "labelFont": "Lato",     "labelColor": "#6290C3"   }
 }

We modified labels for both axes above, but we can also style axes singly as well. The state names under the X axis are difficult to read with their current orientation, so change that by adding an axisX object within the config object.

Use the labelAngle attribute to control the angle of those labels, providing the number of degrees to rotate them.

"config": {
   "background": "#ffffff",
   "padding": 20,
   "axis": {
     "labelFontSize": 20,
     "labelFont": "Lato",
     "labelColor": "#6290C3",
     "titleFontSize": 24,
     "titleColor": "#333D49",
     "titleFont": "Lato"
   },
   "axisX": {     "labelAngle": 40   }
 }
Styling the axis titles

Our labels are much more readable now, but we need to update those state and gwh axis titles as well. For those, specify the following attributes within the config>axis object:

  • titleFont - axis title font name

  • titleFontSize - axis title font size in pixels

  • titleFontColor - color of the axis title font, given in a CSS-compatible hex code or color name

"config": {
 "background": "#ffffff",
 "padding": 20,
 "axis": {
   "labelFontSize": 20,
   "labelFont": "Lato",
   "labelColor": "#6290C3",
   "titleFontSize": 24,   "titleColor": "#333D49",   "titleFont": "Lato"
   }
}
Screen_Shot_2019-07-16_at_3.15.09_PM.png

Finally, let's edit the title text for each axis (e.g. state and gwh). This will use the column name from our query results by default. In our case, those titles aren't so bad, but in many cases they'll be difficult to read and full of underscores and abbreviations.

Navigate to the encoding object within the Vega Lite editor. It will have a number of nested objects below it already. Find the nested object encoding>x. Add and attribute called title and provide the desired name to that attribute - in this case, give it the value "State Name".

Also add a title attribute under encoding>y and provide the value "Gigawatt hours (GWh)".

Here's our new encoding object:

"encoding": {
 "x": {
   "field": "state",
   "title": "State Name",
   "type": "nominal",
   "sort": {
     "field": "gwh",
     "op": "sum",
     "order": "descending"
      },
   "scale": {
   "type": "linear",
   "zero": true
   }
 },
 "y": {
   "field": "gwh",
   "title": "Gigawatt hours (GWh)",
   "type": "quantitative",
   "scale": {
     "type": "linear",
     "zero": true
     }
   }
 }

And our final chart:

Screen_Shot_2019-07-16_at_3.39.05_PM.png

This tutorial provided some basic styling functionality by accessing the Vega Lite editor in Chart Builder directly, but it can do so much more. Check out the official Vega Lite examples, tutorials, and full documentation for an in depth look.

ckanext-datadotworld

With this extension enabled, the manage view for organizations is provided with the additional tab data.world. Within the data.world tab organization admins can specify synchronization options that will apply for that organization.

Supported versions

CKAN version 2.4 or greater (including 2.7).

All versions support celery backend, but version 2.7 will use RQ. There are no changes required to use new backend - just start it using:

paster --plugin=ckan jobs worker -c /config.ini

instead of:

paster --plugin=ckan celeryd run -c /config.ini

Details at http://docs.ckan.org/en/latest/maintaining/background-tasks.html

Installation

To install ckanext-datadotworld:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate
  2. If you already have an older version of this extension, remove it first:

    pip uninstall -y ckanext-datadotworld

    Install the ckanext-datadotworld Python package into your virtual environment:

    pip install git+https://github.com/datadotworld/ckanext-datadotworld
  3. Add datadotworld to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).

  4. Create DB tables:

    paster --plugin=ckanext-datadotworld datadotworld init -c /config.ini
    paster --plugin=ckanext-datadotworld datadotworld upgrade -c /config.ini
  5. Start celery daemon either with suprevisor or using paster:

    paster --plugin=ckan celeryd run -c /config.ini
Config Settings

Attempts to push failed datasets can be scheduled by adding the following line to cron:

* 8 * * * paster --plugin=ckanext-datadotworld datadotworld push_failed -c /config.ini

A similar solution enables syncronization with remote (i.e. not uploaded) resources with data.world:

* 8 * * * paster --plugin=ckanext-datadotworld datadotworld sync_resources -c /config.ini

Delay option

There is a 1 second delay configured by default. This delay period can be controlled by modifying the "ckan.datadotworld.request_delay" configuration variable within the CKAN ini file.

For example:

ckan.datadotworld.request_delay = 1

To ensure that the delay will work correctly, you also need to configure Celery to work in single thread mode. To do this, add the following flag to the Celery start command:

--concurrency=1

Details at http://celery.readthedocs.io/en/latest/userguide/workers.html#concurrency.

Template snippets

In order to add data.world banner on dataset page(currently it seats at the top of package_resources block) you may add next snippet to template with datadotworld_extras variable that contains object(model) with currently viewed package's datadotworld extras and org_id - owner organization of viewed packaged:

{% snippet 'snippets/datadotworld/banner.html', org_id=pkg.owner_org, datadotworld_extras=c.pkg.datadotworld_extras %}

Sidebar label may be added by placing next snippet to your template(org_id is ID of viewed organization):

{% snippet 'snippets/datadotworld/label.html', org_id=organization.id %}
Development Installation

To install ckanext-datadotworld for development, activate your CKAN virtualenv and do the following:

git clone https://github.com/datadotworld/ckanext-datadotworld.git
cd ckanext-datadotworld
python setup.py develop
paster datadotworld init -c /config.ini
Running the Tests

Make sure you follow the CKAN testing guide (http://docs.ckan.org/en/latest/contributing/test.html). To run the tests, do the following:

nosetests --ckan --nologcapture --with-pylons=test.ini

To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage) then run:

nosetests --ckan --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.datadotworld --cover-inclusive --cover-erase --cover-tests

DB2

dbt

Denodo

Domo

Dremio

Dropbox

With data.world's Dropbox integration, you can transfer and sync files to data.world in seconds.

Upload files from Dropbox.

Transferring data from Dropbox to data.world is easy through the Add Data menu:

  1. From the Add Data menu, choose New File.

  2. dropbox_1.png
  3. Choose Add new Dropbox from the resulting modal. You may need to choose the All tab to view the Dropbox option.

  4. Dropbox_2.png
  5. Authenticate your desired Dropbox account, then select your account from the New File modal.

    You may need to choose New File from the Add Data menu again after authenticating your Dropbox account.

    Dropbox_3.png
  6. Choose file(s) from the resulting list to upload.

  7. Use checkboxes to upload multiple files.

    Dropbox_4.png
What next?

Here are a few things you can do with Dropbox and data.world:

  • Keep your synced files up to date by going to the Settings section of your dataset or project and editing the Automatic Sync Options.

  • Add your team's Dropbox files to data.world to further explore your data and collaborate with your team.

dw-JDBC

dw-jdbc is a JDBC driver for connecting to datasets hosted on data.world. It can be used to provide read-only access to any dataset provided by data.world from any JVM language. dw-jdbc supports query access both in dwSQL (data.world's SQL dialect) and in SPARQL 1.1, the native query language for semantic web data sources.

JDBC URLs

JDBC connects to data source based on a provided JDBC url. data.world JDBC urls have the form

jdbc:data:world:[language]:[user id]:[dataset id]

where:

  • [language] is either sql or sparql

  • [user id] is the data.world id of the dataset owner

  • [dataset id] is the data.world id of the dataset

You can extract these ids from the dataset home page url: https://data.world/[user id]/[dataset id].

Sample code (Java 8)
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;


final String QUERY = "select * from HallOfFame where playerID = ? order by yearid, playerID limit 10";
final String URL = "jdbc:data:world:sql:dave:lahman-sabremetrics-dataset";


try (final Connection connection =    // get a connection to the database, which will automatically be closed when done
         DriverManager.getConnection(URL, "<your user name>", "<your API token>");
     final PreparedStatement statement = // get a connection to the database, which will automatically be closed when done
         connection.prepareStatement(QUERY)) {
    statement.setString(1, "alexape01"); //bind a query parameter
    try (final ResultSet resultSet = statement.executeQuery()) { //execute the query
        ResultSetMetaData rsmd = resultSet.getMetaData();  //print out the column headers
        int columnsNumber = rsmd.getColumnCount();
        for (int i = 1; i <= columnsNumber; i++) {
            if (i > 1) System.out.print(",  ");
            System.out.print(rsmd.getColumnName(i));
        }
        System.out.println("");
        while (resultSet.next()) { //loop through the query results
            for (int i = 1; i <= columnsNumber; i++) { //print out the column headers
                if (i > 1) System.out.print(",  ");
                String columnValue = resultSet.getString(i);
                System.out.print(columnValue);
            }
            System.out.println("");

            // Note: when calling ResultSet.getObject() prefer the version that takes an explicit Class argument:
            // Integer n = resultSet.getObject(param, Integer.class);
        }
    }
}
Using dw-jdbc in your project

If using Maven, you can use dw-jdbc by just including the following in your pom.xml file:

<dependency>
    <groupId>world.data</groupId>
    <artifactId>dw-jdbc</artifactId>
    <version>0.4.1</version>
</dependency>

See this link at Maven Central to find the latest version number for the JDBC driver.

For some database tools it's easier to install the jdbc driver if it's a single jar. For this reason we also provide dw-jdbc bundled with all its dependencies under the following:

<dependency>
    <groupId>world.data</groupId>
    <artifactId>dw-jdbc</artifactId>
    <classifier>shaded</classifier>
    <version>0.4.1</version>
</dependency>
Finding your Token
  1. Visit https://data.world

  2. Visit your user settings, and click the advanced tab.

  3. Copy your token.

Features
  • JDBC 4.2

  • The driver only supports read-only queries. It does not support INSERT/UPDATE/DELETE, DDL, or transactions.

  • Queries can be written in SPARQL 1.1 or in the SQL dialect described at https://docs.data.world/tutorials/dwsql/.

  • [SQL-only] Table and column metadata via java.sql.DatabaseMetaData.

  • [SQL-only] Support for positional parameters via java.sql.PreparedStatement.

  • [SPARQL-only] Support for named parameters via java.sql.CallableStatement.

    • For example, CallableStatement.setString("name", "value") will bind the string value to ?name within the query.

  • The DataWorldStatement.setJdbcCompatibilityLevel(JdbcCompatibility) method can be used to adjust how the JDBC driver maps query results to Java objects in java.sql.ResultSetMetaData. This is particularly relevant to SPARQL queries where result types in a column can vary from row to row.

    • JdbcCompatibility.LOW - No assumptions are made about types. ResultSetMetaData.getColumnType() returns java.sql.Types.OTHER and ResultSet.getObject() returns world.data.jdbc.model.Node.

    • JdbcCompatibility.MEDIUM - [SPARQL default] All columns are typed as string. ResultSetMetaData.getColumnType() returns java.sql.Types.NVARCHAR and ResultSet.getObject() returns java.lang.String.

    • JdbcCompatibility.HIGH - [SQL default] Columns are typed based on the underlying data, either using table metadata (SQL) or by inspecting the first row of the response (SPARQL).

Excel

One of the most powerful ways to work with data on data.world is to do it without ever leaving your favorite application. There are already many applications which have data.world integrations built for them, and more are constantly being added. For a comprehensive list of the current integrations available see our Integrations page. This article covers using our integration for one of the most common data analysis tools: Microsoft Excel. This article was written on a Mac using Microsoft Excel 2016 Desktop for Mac so the screens shown may vary from yours if you are using Excel for Windows.

Installing the data.world add-in to Excel

To start using Excel with data.world you need to install the add-in in Excel, and then enable the integration in data.world. You can get the add-in on the Microsoft app store page. On the App store page, enter data.world into the search bar and click Add:

Screen_Shot_2018-07-23_at_10.02.08_AM.png

Once the add-in is installed you'll have three new items on the right side of the home tab on your Excel menu bar:

Screen_Shot_2018-07-23_at_10.13.11_AM.png

When you have the add-in installed, go to the Excel integration page on data.world and select Enable integration:

Screen_Shot_2019-12-16_at_8.01.23_PM.png
Authentication

To use Excel with data.world you'll need to either sign-in to your data.world account or get an access code/API token from within data.world. Here are instructions for both.*

To use your data.world account, select one of the data.world icons from the top right of your Excel screen. When the login page appears, select Sign In, and enter your data.world credentials. To use the access code or an API token, select access code on the login page:

Screen_Shot_2019-12-16_at_5.36.47_PM.png

You can get an access code from the Excel integration page under the Manage tab by selecting Get token:

Screen_Shot_2019-12-16_at_9.44.08_PM.png

Copy the token and paste it into the window linked from access code on the login page.

Uploading Data From Excel to data.world

To upload data from Excel to data.world, select the Sync Data button on the right side of the Home tab of the Excel menu bar. You can either upload a selection from the current sheet or the entire current sheet. Because the worksheets in an Excel file are stored as individual files on data.world, if your Excel file has multiple worksheets you will need to upload them one at a time:

Screen_Shot_2019-12-16_at_10.38.27_PM.png

When you upload directly from Excel you have the option to add the file to a new dataset, to an existing dataset, or to an existing project:

Screen_Shot_2019-12-16_at_10.06.55_PM.png

Note: If you choose to add the spreadsheet to a project it will not be available for use in any other projects or datasets.

If you upload to a new dataset you'll also be prompted to set the permissions for it. The options are private and public--you cannot set permissions at the organization level from within Excel. However you can adjust permissions to the dataset from within data.world after the dataset has been created:

Screen_Shot_2019-12-16_at_10.52.07_PM.png
Updating data.world data from within Excel

After you have uploaded a spreadsheet to data.world you can continue to work on it in Excel and then sync your changes when you're done. To update an existing dataset select Sync data, and click the upload button to the right of the dataset name:

Screen_Shot_2019-12-17_at_9.56.54_AM.png
Importing data from data.world into Excel

If you're working in Excel and you want to import a file from data.world into your current workbook, select Import data from the Excel menu bar and choose either a new dataset or one you have previously worked with in Excel:

Screen_Shot_2019-12-17_at_11.08.02_AM.png

If you choose to import a new item you'll be prompted to choose the source dataset or project from all of the datasets and projects you explicitly have permissions to (you do not get a list of all the datasets and projects on data.world--just the ones you own, that are owned by an organization you are in, or which have been shared with you). Next you choose whether to import a query or a table (a table is any tabular file in the dataset or project you chose), and if you want to import it to a new tab in your workbook or to the current sheet:

Screen_Shot_2019-12-17_at_11.15.33_AM.png

Note: if you import into the current sheet, all the data in the sheet will be overwritten.

After working with the data in Excel you can sync it back up to data.world.

Publishing insights from Excel to data.world

While data.world has a a built-in Chart builder application and an integration with Tableau, if you prefer to work on your visual data analysis in Excel you can still upload your charts as insights to projects on data.world. To upload a chart from Excel choose the Publish insights menu item and select your chart:

Screen_Shot_2019-12-17_at_12.21.29_PM.png

You'll be prompted to select an existing project or to create a new one, and if you choose New project you'll set the permissions for it to public or private. After uploading, your chart is available on the insights tab of the project along with any comments you added:

Screen_Shot_2019-12-17_at_12.28.00_PM.png

*The only time it matters which you use is if your organization uses SAML for authentication. SAML users must use the access code/API token.

Generic JDBC

Go-API

This package makes it easy to use data.world's REST API with Go.

Users can:

  • Create and update datasets, projects, metadata, and files

  • Query datasets using SQL and SPARQL

  • Download files and entire datasets

Installation
go get github.com/datadotworld/dwapi-go/dwapi
Usage

The full package documentation is available at https://godoc.org/github.com/datadotworld/dwapi-go/dwapi.

You can also check out the API documentation at https://apidocs.data.world/api for specifics on the endpoints.

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/datadotworld/dwapi-go/dwapi"
)

func main() {
    // new client
    token := os.Getenv("DW_AUTH_TOKEN")
    dw := dwapi.NewClient(token)

    // get info on the current user
    user, err := dw.User.Self()
    if err != nil {
        fmt.Fprintln(os.Stderr, "User.Self() returned an error:", err)
        os.Exit(1)
    }
    fmt.Println("Name:", user.DisplayName)
    fmt.Println("Creation Date:", user.Created)
    fmt.Println("-----")
    /* output:
    Name: My Display Name
    Creation Date: 2016-07-13T23:38:44.026Z
    -----
    */

    // create a new dataset
    owner := "my-username"
    request := dwapi.DatasetCreateRequest{
        Title:       "My Awesome Dataset",
        Description: "A short description",
        Summary:     "A long description",
        Tags:        []string{"first", "puppies and kittens"},
        License:     "PDDL",
        Visibility:  "PRIVATE",
    }
    createResp, err := dw.Dataset.Create(owner, &request)
    if err != nil {
        fmt.Fprintln(os.Stderr, "Dataset.Create() returned an error:", err)
        os.Exit(1)
    }
    fmt.Println("Create Response Message:", createResp.Message)
    fmt.Println("Dataset URI:", createResp.URI)
    fmt.Println("-----")
    /* output:
    Response Message: Dataset created successfully.
    Dataset URI: https://data.world/my-username/my-awesome-dataset
    -----
    */

    // retrieve the metadata for a dataset
    pattern := regexp.MustCompile(`https://data.world/(?:.*)/(.*)`)
    datasetid := pattern.FindStringSubmatch(createResp.URI)[1]

    retrieveResp, err := dw.Dataset.Retrieve(owner, datasetid)
    if err != nil {
        fmt.Fprintln(os.Stderr, "Dataset.Retrieve() returned an error:", err)
        os.Exit(1)
    }
    fmt.Println("Title:", retrieveResp.Title)
    fmt.Println("Description:", retrieveResp.Description)
    fmt.Println("Access Level:", retrieveResp.AccessLevel)
    fmt.Println("Creation Date:", retrieveResp.Created)
    fmt.Println("Last Updated Date:", retrieveResp.Updated)
    fmt.Println("Dataset Status:", retrieveResp.Status)
    fmt.Println("-----")
    /* output:
    Title: My Awesome Dataset
    Description: A short description
    Access Level: ADMIN
    Creation Date: 2018-11-21T17:32:40.057Z
    Last Updated Date: 2018-11-21T17:32:40.057Z
    Dataset Status: NEW
    -----
    */

    // upload a file
    s := []string{"first_name,last_name", "Abe,Marcos", "Abby,Johnson"}
    sj := strings.Join(s, "\n")
    testFilePath := filepath.Join(os.TempDir(), "test-file.csv")
    if err = ioutil.WriteFile(testFilePath, []byte(sj), 0644); err != nil {
        fmt.Fprintln(os.Stderr, "Dataset.UploadFile() returned an error while creating a file:", err)
        os.Exit(1)
    }
    defer os.Remove(testFilePath)

    uploadResp, err := dw.Dataset.UploadFile(owner, datasetid, "test-file.csv", testFilePath, false)
    if err != nil {
        fmt.Fprintln(os.Stderr, "Dataset.UploadFile() returned an error:", err)
        os.Exit(1)
    }
    fmt.Println("Upload Response Message:", uploadResp.Message)
    fmt.Println("-----")
    /* output:
    Response Message: File uploaded.
    -----
    */

    // delete a dataset
    deleteResp, err := dw.Dataset.Delete(owner, datasetid)
    if err != nil {
        fmt.Fprintln(os.Stderr, "Dataset.Delete() returned an error:", err)
        os.Exit(1)
    }
    fmt.Println("Delete Response Message:", deleteResp.Message)
    fmt.Println("-----")
    /* output:
    Delete Response Message: Dataset has been successfully deleted.
    -----
    */
}
Changing the hostname

The API calls are made to https://api.data.world by default, but the URL can be changed by setting the DW_API_HOST environment variable.

For customers in a single-tenant environment, you can also use the DW_ENVIRONMENT variable to alter the default URL. For example, for the customer customer, setting it will alter the URL to https://api.customer.data.world.

Additionally, the hostname can also be changed by explicitly setting the BaseURL property of the client, i.e.:

dw = dwapi.NewClient("token")
dw.BaseURL = "http://localhost:1010/v0"

Notice that the stage also needs to be set if going down this path.

Google Data Studio

With Google Data Studio, you can create informative and shareable dashboards and reports from your data.

Open a dataset in Google Data Studio.

You can easily open a dataset or query in Google Data Studio:

  1. From a dataset or query, use the

    GDS_1.png

    option and select Google Data Studio in the resulting modal. You can also create a new data source by clicking here and entering the desired dataset or project URL. Click Authorize to authorize the connector and follow any instructions to authorize your account.

  2. GDS_2.png

    You may also need to authorize the dataset/project separately. If so, follow any instructions to authorize data.

  3. Add a query to choose the data you'd like to visualize.

    GDS_3.png

    Example: Using https://data.world/typhon/new-york-times-bestsellers-from-2011-to-2018

    SELECT *FROM books_uniq_weeks

  4. Click Connect in the top right of the page to connect your data.

Refine your data and begin creating reports

Google Data Studio gives you the option to refine your data by omitting or adding fields, changing field type and aggregation, and adding descriptions.

  1. View all fields included in your dataset and make any desired changes.

  2. GDS_4.png
  3. Click Create a Report, and use the Insert menu to begin creating your report!

  4. GDS_5.png
What next?

Here are a few things you can do with Google Data Studio and data.world:

  • Create a report using multiple chart types, text, and images to show comprehensive insights.

  • Share reports with your team to help them quickly & easily understand your data.

  • Add multiple data sources to your report using the Resource menu and choosing Manage added data sources.

Google Drive

With data.world's Google Drive integration, you can transfer and sync files from Drive to data.world in seconds.

Upload files from Google Drive.

Transferring data from Google Drive to data.world is easy through the Add Data menu:

drive_add-new.png
ShiftCtrlCmd
drive_choose-file.png
  1. From the Add Data menu, choose New File.

  2. Choose Add new Google Drive from the resulting modal. You may need to choose the All tab to view the Google Drive option.

  3. Authenticate your desired Google Drive account, then select your account from the New File modal. You may need to choose New File from the Add Data menu again after authenticating your Google Drive account.

    drive_choose-drive.png
  4. Choose file(s) from the resulting list to upload.

What next?

Here are a few things you can do with Google Drive and data.world:

  • Keep your synced files up to date by going to the Settings section of your dataset or project and editing the Automatic Sync Options.

  • Add your team's Drive files to data.world to further explore your data and collaborate with your team.

Hive

IFTTT

Use easy automations to seamlessly collect and update data from multiple sources.

Connect data.world.

Logging in and connecting data.world allows you to quickly enable applets from the data.world gallery.

ifttt_connect.png
  1. Login to IFTTT or create an account if you don’t already have one.

  2. Navigate to the data.world gallery on IFTTT and select Connect.

Install applets.

IFTTT's data.world gallery offers automations ranging from auto-syncing data to storing favorite tweets in a dataset, all designed to make your life easier.

ifttt_syncdata.png
On each applet, the first option in the modal will allow you to receive notifications each time an applet runs.
ifttt_enable-applet.gif
  1. Scroll through the data.world gallery to find an applet to install.

  2. Click on the applet, then select Turn On.

  3. Configure all necessary fields in your applet, then choose Save.

What next?

Here are a few things you can do with IFTTT and data.world:

Or check out one of the many other applets available from data.world.

Infor ION

Jupyter

Write, explore, and share your code with Jupyter's visual interface for Python and other coding languages.

Choose your preferred host.

You can choose to run the Jupyter app on either a personal server or Heroku.

Heroku offers a free, cloud-based alternative to a personal server.

Jupyter_1.png
Choose an app (or create a new app) with Heroku.

You can choose to work within a previously created version of Jupyter Notebook or create a new instance.

The following setup instructions will be using Heroku, see below for information on configuring a personal server.

  1. Choose Heroku from the host options in the Choose a host panel.

  2. Choose a previously created app from your list, or click Create New to create a new iteration of Jupyter.

Jupyter_2.png
  • If creating a new app, you will need to create a password and enter any required Python packages.

Jupyter_3.png
Configure & use a personal server.

Using a personal server is an easy way to run applications, allowing you full control over setup and configuration.

The following setup instructions will be using a personal server, skip this section if you are using Heroku.

  1. Choose Personal Server from the host options in the Choose a host panel.

  2. [Locally]: Create your server

    • InstallUse pip (Python 2.7 or 3.4+)

      pip install dwcontents

    • ConfigureCreate or update ~/.jupyter/jupyter_notebook_config.py with settings:

      import dwcontentsimport osc = get_config()c.NotebookApp.contents_manager_class = dwcontents.DwContentsc.DwContents.dw_auth_token = 'YOUR_API_TOKEN'

      You can get your API token in your data.world advanced settings.

    • RunAssuming jupyter and jupyterlab (optional) have already been installed, start as normal, for example:

      $ jupyter lab

  3. [On data.world]: Click Continue, enter your server's URL, and open your notebook.

    Jupyter_4.png

    If you'd like to use the same server each time you open Jupyter, remember to click the optional checkbox.

Use the Jupyter workspace.

Use Jupyter to create new notebooks or open & edit previous notebooks.

  1. Create a new notebook or open any previous notebook (.ipynb files).

  2. Jupyter_5.png
  3. Start writing code!

What next?

Here are a few things you can do with Jupyter and data.world:

  • Load data directly from data.world into your notebook with project ID using the import datadotworld function and var = dw.load_dataset('project-id') Example:

    import datadotworld as dwlds = dw.load_dataset('garyhoov/us-food-imports-and-exports')

  • Save tables locally as CSVs, using table_name.to_csv('csvname.csv', index=True), which will allow you to upload directly to data.world through Jupyter.Example:

    foodspivot.to_csv('foods.csv', index=True)

  • Upload CSVs directly to your data.world project, all within your notebook file, using the data.world API. make sure you've imported data.world using import datadotworld Example:

    import datadotworld as dwclient = dw.api_client()client.upload_files('sarakbarr/test-projects', 'foods.csv')client.upload_files('sarakbarr/test-projects', 'foods_chart.png')

  • Share documents containing live code, text, visualizations, and more.

  • Manipulate data and use statistical modeling.

KNIME

With data.world nodes for KNIME, you can automate data processing and analysis pipelines and build smart workflows with data.world data.

Enable data.world nodes in KNIME extensions:

Using the available data.world nodes from KNIME's community extensions, you can connect with your data.world account:

  1. Open KNIME's Preferences from the File menu.

  2. KNIME_1.png
  3. From the Install/Update menu, choose Available Software Sites, then select Stable Community Contributions and click OK.

  4. KNIME_2.png
  5. Choose Install KNIME Extensions from the File menu.

  6. KNIME_3.png
  7. Search for data.world in the resulting modal and check all that apply, then click Next and complete configuration.

  8. KNIME_4.png
  9. Expand the KNIME menu from the Preferences pane and choose data.world, then enter your data.world username and API token.

  10. KNIME_5.png

    You can access your API token from data.world's KNIME integration page.

Find and use data.world Community nodes.

You can find all relevant data.world nodes under Community Nodes in the KNIME Node Repository:

  1. Expand Community Nodes from the Node Repository sidebar, then expand the options for data.world.

  2. KNIME_6.png
  3. Now you can use the data.world nodes to:

    • Read Tables from a specified data.world dataset.

    • Upload files to a data.world dataset (you can batch upload any files supported by data.world)

    • Upload a table to a data.world dataset.

<hr></hr>
What next?

Here are a few things you can do with KNIME and data.world:

  • Use drag & drop workflows to create complex data pipelines that you can easily sync and update.

  • Connect a variety of sources with either dedicated nodes or a JDBC connection, allowing you to source data from a large number of tools and applications.

  • Take advantage of KNIME's built-in nodes to read & write data, utilize statistical tools, chart your data, and more

Looker

Manta

MS SQL Server

My SQL

Open API

Oracle

Plotly

Plotly allows you to easily query your data.world data and create a variety of modern visualizations to highlight your findings.

Open a dataset file in Plotly.

You can easily open a file in Plotly from your data.world dataset:

plotly_open.png
plotly_open-plotly.png
  1. From a file, use the

    option and select Plotly in the resulting modal.

  2. Choose the Open in Plot.ly option from the list.

  3. Preview your data and build a chart.

    plotly_preview.png
  4. Refine options and save or share your visualization!

Use Plotly's Falcon SQL Client.

If you're looking for more flexibility, you can use Plotly's Falcon SQL Client to query data next to inline visualizations:

plotly_falcon-dw.png
  1. Download the Falcon SQL Client.

  2. Select data.world (look for Sparkle the Owl!) and enter your dataset ID and your API token .

  3. In the Query tab, review your data and add a query.

    plotly_falcon-preview.png
  4. Refine options and start building visualizations!

What next?

Here are a few things you can do with Plotly and data.world:

  • To embed as an Insight: Create a new insight and use your chart's shareable link .

  • To embed in Markdown: Use the @(shareable_link) notation.

  • Use Plotly's Falcon SQL Client to:

  • Embed your Plotly visualizations in data.world anywhere Markdown is supported (like in comments and summaries).

Power BI

Microsoft Power BI gives you a powerful platform to analyze, manipulate, and visualize your data..

Connect data.world to Microsoft Power BI.

Using the data.world connector from Power BI desktop, you can easily retrieve data from data.world:

Please note that Microsoft Power BI is currently available only on Windows. You'll need to be running a Windows OS or a virtual machine on your Mac OS.
powerbi_connect.png
  1. Download Power BI Desktop.

  2. Open Power BI and choose Get Data, then select data.world and connect.

  3. Configure your data, you'll need to enter the Owner, Dataset ID, and any desired queries, and click OK. Example: Using https://data.world/associatedpress/foreign-exchange-rates.

    • Owner: associatedpress

    • Dataset ID: foreign-exchange-rates

    • Query: SELECT * FROM quarterly

    powerbi_get-data.png
  4. Log in with either your API token (from dataworld advanced settings) or using OAuth sign-in.

Preview and refine your data.

Preview your data and use the query editor to apply any desired changes before loading data into Microsoft Power BI.

powerbi_edit-data.png
powerbi_chart.png
  1. When your data preview appears, click Edit to make any changes to your query. You can now make changes including removing rows & columns, merging queries, grouping data, and more.

  2. When you're satisfied with your query, select Close & Apply from the top left corner of your window.

  3. Start creating reports!

What next?

Here are a few things you can do with Microsoft Power BI and data.world:

  • Publish your report to the Power BI community, embed it into your website, or share it back to data.world using the Power BI report URL.

  • Configure a scheduled refresh for data.world queries so they're always up to date.

  • Customize your queries to make them ideal for analytics, including pivoting or un-pivoting the data, adding date calculations, and more..

Postgres Proxy

Postgres Proxy enables BI and data science tools that use PostgreSQL to connect directly to datasets and projects on data.world by making the resources look like PostgreSQL databases. Power BI "Direct Query", e.g., can be used with PostgreSQL databases and now also with data.world datasets and projects--with no additional integration required. Postgres Proxy combined with data.world's federated query capability allows you to create even more powerful analyses from a wide variety of data sources. This functionality is currently in beta release.

Configuring Postgres Proxy on data.world

To use Postgres Proxy on data.world, you first need to create a PostgreSQL connection in your BI Tool for the dataset or project you want to analyse. Open your BI tool and create the connection with the following parameter values:

host: postgres.data.world

port: 5432

db: agentid/datasetid

user: {your data.world user id}

pass: {read/write token} 

Note

For single tenant customers, set host to postgres.{site}.data.world

You can find the agentid/datasetid in the URL for the resource:

agentid_datasetid.png

The read/write token is located in the Advanced settings in your profile. More information about your API tokens can be found here.

PostgreSQL

Presto

Python

Data.world has developed an open source Python library for working with data.world datasets.

This library makes it easy for data.world users to pull and work with data stored on data.world. Additionally, the library provides convenient wrappers for data.world APIs, allowing users to create and update datasets, add and modify files, and possibly implement entire apps on top of data.world.

Installation

You can install data.world Python library using pip directly from PyPI:

pip install datadotworld

Optionally, you can install the library including pandas support:

pip install datadotworld[pandas]

If you use conda to manage your python distribution, you can install from the community-maintained [conda-forge](https://conda-forge.github.io/) channel:

conda install -c conda-forge datadotworld-py
Configuration

This library requires a data.world API authentication token to work.

Your authentication token can be obtained on data.world once you enable Python under Integrations > Python

To configure the library, run the following command:

dw configure

Alternatively, tokens can be provided via the DW_AUTH_TOKEN environment variable. On MacOS or Unix machines, run (replacing <YOUR_TOKEN>> below with the token obtained earlier):

export DW_AUTH_TOKEN=<YOUR_TOKEN>
Define content for SDK host

If you want to connect to the public data.world api server you will use the default settings. Howwever if you want to connect to a private data.world environment (e.g., your_org.data.world), you can use environment variables to define the host:

def create_url(subdomain, environment):
    if environment:
        subdomain = subdomain + '.' + environment

    return 'https://{}.data.world'.format(subdomain)

DW_ENVIRONMENT = environ.get('DW_ENVIRONMENT', '')
API_HOST = environ.get('DW_API_HOST', create_url('api', DW_ENVIRONMENT))
DOWNLOAD_HOST = environ.get(
    'DW_DOWNLOAD_HOST', create_url('download', DW_ENVIRONMENT))
QUERY_HOST = environ.get('DW_QUERY_HOST', create_url('query', DW_ENVIRONMENT))
Examples

The load_dataset() function facilitates maintaining copies of datasets on the local filesystem. It will download a given dataset's datapackage and store it under ~/.dw/cache. When used subsequently, load_dataset() will use the copy stored on disk and will work offline, unless it's called with force_update=True or auto_update=True. force_update=True will overwrite your local copy unconditionally. auto_update=True will only overwrite your local copy if a newer version of the dataset is available on data.world.

Once loaded, a dataset (data and metadata) can be conveniently accessed via the object returned by load_dataset().

Start by importing the datadotworld module:

import datadotworld as dw

Then, invoke the load_dataset() function, to download a dataset and work with it locally. For example:

intro_dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset')

Dataset objects allow access to data via three different properties raw_data, tables and dataframes. Each of these properties is a mapping (dict) whose values are of type bytes, list and pandas.DataFrame, respectively. Values are lazy loaded and cached once loaded. Their keys are the names of the files contained in the dataset.

For example:

>>> intro_dataset.dataframes
LazyLoadedDict({
    'changelog': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballstats': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballteam': LazyLoadedValue(<pandas.DataFrame>)})

IMPORTANT: Not all files in a dataset are tabular, therefore some will be exposed via raw_data only.

Tables are lists of rows, each represented by a mapping (dict) of column names to their respective values.

For example:

>>> stats_table = intro_dataset.tables['datadotworldbballstats']
>>> stats_table[0]
OrderedDict([('Name', 'Jon'),
             ('PointsPerGame', Decimal('20.4')),
             ('AssistsPerGame', Decimal('1.3'))])

You can also review the metadata associated with a file or the entire dataset, using the describe function. For example:

>>> intro_dataset.describe()
{'homepage': 'https://data.world/jonloyens/an-intro-to-dataworld-dataset',
 'name': 'jonloyens_an-intro-to-dataworld-dataset',
 'resources': [{'format': 'csv',
   'name': 'changelog',
   'path': 'data/ChangeLog.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballstats',
   'path': 'data/DataDotWorldBBallStats.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballteam',
   'path': 'data/DataDotWorldBBallTeam.csv'}]}
>>> intro_dataset.describe('datadotworldbballstats')
{'format': 'csv',
 'name': 'datadotworldbballstats',
 'path': 'data/DataDotWorldBBallStats.csv',
 'schema': {'fields': [{'name': 'Name', 'title': 'Name', 'type': 'string'},
                       {'name': 'PointsPerGame',
                        'title': 'PointsPerGame',
                        'type': 'number'},
                       {'name': 'AssistsPerGame',
                        'title': 'AssistsPerGame',
                        'type': 'number'}]}}
Standalone functions
load_dataset(dataset_key, force_update=False, auto_update=False)

Load a dataset from the local filesystem, downloading it from data.world first, if necessary.

This function returns an object of type LocalDataset. The object allows access to metedata via it’s describe() method and to all the data via three properties raw_data, tables and dataframes, all of which are mappings (dict-like structures).

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • force_update (bool) – Flag, indicating if a new copy of the dataset should be downloaded replacing any previously downloaded copy (Default value = False)

  • auto_update (bool) – Flag, indicating that dataset be updated to the latest version

Returns

The object representing the dataset

Return type

LocalDataset

Raises

RestApiError – If a server error occurs

open_remote_file(dataset_key, file_name, mode='w', **kwargs)

Open a remote file object that can be used to write to or read from a file in a data.world dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file_name (str) – The name of the file to open

  • mode (str, optional) – the mode for the file - must be ‘w’, ‘wb’, ‘r’, or ‘rb’ - indicating read/write (‘r’/’w’) and optionally “binary” handling of the file data. (Default value = ‘w’)

  • chunk_size (int, optional) – size of chunked bytes to return when reading streamed bytes in ‘rb’ mode

  • decode_unicode (bool, optional) – whether to decode textual responses as unicode when returning streamed lines in ‘r’ mode

  • **kwargs

Examples

>>> importdatadotworldasdw>>>>>> # write a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt')asw:... w.write("this is a test.")>>>>>> # write a jsonlines file>>> importjson>>> withdw.open_remote_file('username/test-dataset',... 'test.jsonl')asw:... json.dump({'foo':42,'bar':"A"},w)... w.write("\n")... json.dump({'foo':13,'bar':"B"},w)... w.write("\n")>>>>>> # write a csv file>>> importcsv>>> withdw.open_remote_file('username/test-dataset',... 'test.csv')asw:... csvw=csv.DictWriter(w,fieldnames=['foo','bar'])... csvw.writeheader()... csvw.writerow({'foo':42,'bar':"A"})... csvw.writerow({'foo':13,'bar':"B"})>>>>>> # write a pandas dataframe as a csv file>>> importpandasaspd>>> df=pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})>>> withdw.open_remote_file('username/test-dataset',... 'dataframe.csv')asw:... df.to_csv(w,index=False)>>>>>> # write a binary file>>> withdw.open_remote_file('username/test-dataset',>>> 'test.txt',mode='wb')asw:... w.write(bytes([100,97,116,97,46,119,111,114,108,100]))>>>>>> # read a text file>>> withdw.open_remote_file('username/test-dataset',... 'test.txt',mode='r')asr:... print(r.read())>>>>>> # read a csv file>>> withdw.open_remote_file('username/test-dataset',... 'test.csv',mode='r')asr:... csvr=csv.DictReader(r)... forrowincsvr:... print(row['column a'],row['column b'])>>>>>> # read a binary file>>> withdw.open_remote_file('username/test-dataset',... 'test',mode='rb')asr:... bytes=r.read()
query(dataset_key, query, query_type='sql', parameters=None)

Query an existing dataset

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id or of a url

  • query (str) – SQL or SPARQL query

  • query_type ({'sql', 'sparql'}, optional) – The type of the query. Must be either ‘sql’ or ‘sparql’. (Default value = “sql”)

  • parameters (query parameters, optional) – parameters to the query - if SPARQL query, this should be a dict containing named parameters, if SQL query,then this should be a list containing positional parameters. Boolean values will be converted to xsd:boolean, Integer values to xsd:integer, and other Numeric values to xsd:decimal. Anything else is treated as a String literal (Default value = None)

Returns

Object containing the results of the query

Return type

QueryResults

Raises

RuntimeError – If a server error occurs

API Client Methods

The following functions are all methods of the datadotworld.api_client() class

add_files_via_url(dataset_key, files={})

Add or update dataset files linked to source URLs

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • files (dict) – Dict containing the name of files and metadata Uses file name as a dict containing File description, labels and source URLs to add or update (Default value = {}) description and labels are optional.

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> url='http://www.acme.inc/example.csv'>>> api_client=dw.api_client()>>> api_client.add_files_via_url(... 'username/test-dataset',... {'example.csv':{... 'url':url,... 'labels':['raw data'],... 'description':'file description'}})
add_linked_dataset(project_key, dataset_key)

Link project to an existing dataset

This method links a dataset to project

Parameters
  • project_key (str) – Project identifier, in the form of owner/id

  • dataset_key – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> linked_dataset=api_client.add_linked_dataset(... 'username/test-project',... 'username/test-dataset')
append_records(dataset_key, stream_id, body)

Append records to a stream.

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • stream_id (str) – Stream unique identifier.

  • body (obj) – Object body

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.append_records('username/test-dataset','streamId',... {'content':'content'})
create_dataset(owner_id, **kwargs)

Create a new dataset

Parameters
  • owner_id (str) – Username of the owner of the new dataset

  • title (str) – Dataset title (will be used to generate dataset id on creation)

  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}) – Dataset visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

Returns

Newly created dataset key

Return type

str

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> url='http://www.acme.inc/example.csv'>>> api_client.create_dataset(... 'username',title='Test dataset',visibility='PRIVATE',... license='Public Domain',... files={'dataset.csv':{'url':url}})
create_insight(project_key, **kwargs)

Create a new insight

Parameters
  • project_key (str) – Project identifier, in the form of

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

Insight with message and uri object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.create_insight(... 'projectOwner/projectid',title='Test insight',... image_url='url')
create_project(owner_id, **kwargs)

Create a new project

Parameters
  • owner_id (str) – Username of the creator of a project.

  • title (str) – Project title (will be used to generate project id on creation)

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

Newly created project key

Return type

str

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.create_project(... 'username',title='project testing',... visibility='PRIVATE',... linked_datasets=[{'owner':'someuser',... 'id':'somedataset'}])
delete_dataset(dataset_key)

Deletes a dataset and all associated data

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_dataset(... 'username/dataset')
delete_files(dataset_key, names)

Delete dataset file(s)

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • names (list of str) – The list of names for files to be deleted

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_files(... 'username/test-dataset',['example.csv'])
delete_insight(project_key, insight_id)

Delete an existing insight.

Parameters
  • project_key (str) – Project identifier, in the form of projectOwner/projectId

  • insight_id (str) – Insight unique id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> del_insight=api_client.delete_insight(... 'username/project','insightid')
delete_project(project_key)

Deletes a project and all associated data

Parameters

project_key (str) – Project identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.delete_project(... 'username/test-project')
download_datapackage(dataset_key, dest_dir)

Download and unzip a dataset’s datapackage

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • dest_dir (str or path) – Directory under which datapackage should be saved

Returns

Location of the datapackage descriptor (datapackage.json) in the local filesystem

Return type

path

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> datapackage_descriptor=api_client.download_datapackage(... 'jonloyens/an-intro-to-dataworld-dataset',... '/tmp/test')>>> datapackage_descriptor'/tmp/test/datapackage.json'
download_dataset(dataset_key)

Return a .zip containing all files within the dataset as uploaded.

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Returns

.zip file contain files within dataset

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.download_dataset(... 'username/test-dataset')
download_file(dataset_key, file)

Return a file within the dataset as uploaded.

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • file (str) – File path to be returned

Returns

file in which the data was uploaded

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.download_file('username/test-dataset',... '/my/local/example.csv')
fetch_contributing_datasets(**kwargs)

Fetch datasets that the authenticated user has access to

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Authenticated user dataset

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_dataset=... api_client.fetch_contributing_datasets(){'count': 0, 'records': [], 'next_page_token': None}
fetch_contributing_projects(**kwargs)

Fetch projects that the currently authenticated user has access to

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_projects=... api_client.fetch_contributing_projects(){'count': 0, 'records': [], 'next_page_token': None}
fetch_datasets(**kwargs)

Fetch authenticated user owned datasets

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_owned_dataset=api_client.fetch_datasets()
fetch_liked_datasets(**kwargs)

Fetch datasets that authenticated user likes

Parameters
  • limit (str, optional) – Maximum number of items to include in a page of results

  • next (str, optional) – Token from previous result page (to be used when requesting a subsequent page)

  • sort (str, optional) – Property name to sort

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_liked_dataset=... api_client.fetch_liked_datasets()
fetch_liked_projects(**kwargs)

Fetch projects that the currently authenticated user likes

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_liked_projects=... api_client.fetch_liked_projects()
fetch_projects(**kwargs)

Fetch projects that the currently authenticated user owns

Returns

Authenticated user projects

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_projects=... api_client.fetch_projects()
get_dataset(dataset_key)

Retrieve an existing dataset definition

This method retrieves metadata about an existing

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Returns

Dataset definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> intro_dataset=api_client.get_dataset(... 'jonloyens/an-intro-to-dataworld-dataset')>>> intro_dataset['title']'An Intro to data.world Dataset'
get_insight(project_key, insight_id, **kwargs)

Retrieve an insight

Parameters
  • project_key (str) – Project identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

Returns

Insight definition, with all attributes

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> insight=api_client.get_insight(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world',... 'c2538b0c-c200-474c-9631-5ff4f13026eb')>>> insight['title']'Coast Guard Lives Saved by Fiscal Year'
get_insights_for_project(project_key, **kwargs)

Get insights for a project.

Parameters

project_key (str) – Project identifier, in the form of projectOwner/projectid

Returns

Insight results

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> insights=api_client.get_insights_for_project(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world'... )
get_project(project_key)

Retrieve an existing project

This method retrieves metadata about an existing project

Parameters

project_key (str) – Project identifier, in the form of owner/id

Returns

Project definition, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> intro_project=api_client.get_project(... 'jonloyens/'... 'an-example-project-that-shows-what-to-put-in-data-world'... )>>> intro_project['title']'An Example Project that Shows What To Put in data.world'
get_user_data()

Retrieve data for authenticated user

Returns

User data, with all attributes

Return type

dict

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> user_data=api_client.get_user_data()>>> user_data[display_name]'Name User'
remove_linked_dataset(project_key, dataset_key)

Unlink dataset

This method unlinks a dataset from a project

Parameters
  • project_key (str) – Project identifier, in the form of owner/id

  • dataset_key – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.remove_linked_dataset(... 'username/test-project',... 'username/test-dataset')
replace_dataset(dataset_key, **kwargs)

Replace an existing dataset

This method will completely overwrite an existing dataset.

Parameters
  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}) – Dataset visibility

  • files (dict, optional) – File names and source URLs to add or update

  • dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_dataset(... 'username/test-dataset',... visibility='PRIVATE',license='Public Domain',... description='A better description')
replace_insight(project_key, insight_id, **kwargs)

Replace an insight.

Parameters
  • project_key (str) – Projrct identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_insight(... 'projectOwner/projectid',... '1230-9324-3424242442',... embed_url='url',... title='Test insight')
replace_project(project_key, **kwargs)

Replace an existing Project

Create a project with a given id or completely rewrite the project, including any previously added files or linked datasets, if one already exists with the given id.

Parameters
  • project_key (str) – Username and unique identifier of the creator of a project in the form of owner/id.

  • title (str) – Project title

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

project object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.replace_project(... 'username/test-project',... visibility='PRIVATE',... objective='A better objective',... title='Replace project')
sparql(dataset_key, query, desired_mimetype='application/sparql-results+json', **kwargs)

Executes SPARQL queries against a dataset via POST

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • query (str) – SPARQL query

Returns

file object that can be used in file parsers and data handling modules.

Return type

file object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sparql_post('username/test-dataset',... query)
sql(dataset_key, query, desired_mimetype='application/json', **kwargs)

Executes SQL queries against a dataset via POST

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • query (str) – SQL query

  • include_table_schema (bool) – Flags indicating to include table schema in the response

Returns

file object that can be used in file parsers and data handling modules.

Return type

file-like object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sql('username/test-dataset','query')
sync_files(dataset_key)

Trigger synchronization process to update all dataset files linked to source URLs.

Parameters

dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.sync_files('username/test-dataset')
update_dataset(dataset_key, **kwargs)

Update an existing dataset

Parameters
  • description (str, optional) – Dataset description

  • summary (str, optional) – Dataset summary markdown

  • tags (list, optional) – Dataset tags

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Dataset license

  • visibility ({'OPEN', 'PRIVATE'}, optional) – Dataset visibility

  • files (dict, optional) – File names and source URLs to add or update

  • dataset_key (str) – Dataset identifier, in the form of owner/id

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_dataset(... 'username/test-dataset',... tags=['demo','datadotworld'])
update_insight(project_key, insight_id, **kwargs)

Update an insight.

Note that only elements included in the request will be updated. All omitted elements will remain untouched.

Parameters
  • project_key (str) – Projrct identifier, in the form of projectOwner/projectid

  • insight_id (str) – Insight unique identifier.

  • title (str) – Insight title

  • description (str, optional) – Insight description.

  • image_url (str) – If image-based, the URL of the image

  • embed_url (str) – If embed-based, the embeddable URL

  • source_link (str, optional) – Permalink to source code or platform this insight was generated with. Allows others to replicate the steps originally used to produce the insight.

  • data_source_links (array) – One or more permalinks to the data sources used to generate this insight. Allows others to access the data originally used to produce the insight.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_insight(... 'username/test-project','insightid'... title='demo atadotworld'})
update_project(project_key, **kwargs)

Update an existing project

Parameters
  • project_key (str) – Username and unique identifier of the creator of a project in the form of owner/id.

  • title (str) – Project title

  • objective (str, optional) – Short project objective.

  • summary (str, optional) – Long-form project summary.

  • tags (list, optional) – Project tags. Letters numbers and spaces

  • license ({'Public Domain', 'PDDL', 'CC-0', 'CC-BY', 'ODC-BY', 'CC-BY-SA', 'ODC-ODbL', 'CC BY-NC', 'CC BY-NC-SA', 'Other'}) – Project license

  • visibility ({'OPEN', 'PRIVATE'}) – Project visibility

  • files (dict, optional Description and labels are optional) – File name as dict, source URLs, description and labels() as properties

  • linked_datasets (list of object, optional) – Initial set of linked datasets.

Returns

message object

Return type

object

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.update_project(... 'username/test-project',... tags=['demo','datadotworld'])
upload_files(dataset_key, files, files_metadata={}, **kwargs)

Upload one or more dataset files

Parameters
  • dataset_key (str) – Dataset identifier, in the form of owner/id

  • files (list of str) – The list of names/paths for files stored in the local filesystem

  • expand_archives – Boolean value to indicate files should be expanded upon upload

  • files_metadata (dict optional) – Dict containing the name of files and metadata Uses file name as a dict containing File description, labels and source URLs to add or update

Raises

RestApiException – If a server error occurs

Examples

>>> importdatadotworldasdw>>> api_client=dw.api_client()>>> api_client.upload_files(... 'username/test-dataset',... ['/my/local/example.csv'])

R DBI

The dwDBI package provides:

  1. A light DBI wrapper around the data.world API package. The benefit of this is that you can write SQL queries in RMarkdown chunks and evaluate them.

  2. Contracts that let you browse data.world datasets with the RStudio Connections panel.

Querying data.world dataset in RMarkdown Notebooks

First import the package.

library('dwDBI')

Make sure that you've configured your data.world API key.

dwapi::configure('YOUR API KEY HERE')

To run a SQL query, connect to a data.world dataset with the dw_connect() function.

sql101_conn <- dw_connect('ryantuck/sql-101-training')

In RStudio, you can write and run SQL code chunks by specifying a connection option.

# ```{sql, connection=sql101_conn}
#     ... your query here ...
#```

Running the SQL code chunk returns a data frame.

select *
from customers
order by `last`

id

first

last

14

Margaret

Atwood

2

Jane

Austen

12

Charlotte

Brontë

20

Emily

Brontë

0

Ernest

Hemingway

17

Victor

Hugo

10

Franz

Kafka

8

Jack

Kerouac

16

Harper

Lee

6

Vladimir

Nabokov

RStudio Connections

You can also explore the tables in data.world datasets in the RStudio Connections pane. Just use the "New Connection" button in the pane, select the "Data.World" connection type, and enter the name of the dataset you want to view.

R and R Studio

The data.world R package makes it easy to query datasets and use data.world's REST API.

Install and configure the data.world R package.

Using an R console (like R Studio), you can use the data.world R package to run queries, import datasets, upload data, and more.

Install the data.world package for R:

devtools::install_github("datadotworld/data.world-r", build_vignettes = TRUE)

Note

Note: You will need to have the devtools package for R installed to run the previous command. If that is not already installed, you can install it from CRAN using the command install.packages("devtools")

R_1.png

Configure the package with your API token.

saved_cfg <- data.world::save_config("YOUR API TOKEN")data.world::set_config(saved_cfg)

R_2.png

View the Quickstart vignette to view more information on the data.world package.

library(data.world)vignette("quickstart", package = "data.world")

R_3.png
Import your first query.

Using the data.world package, import your first query:

  1. Choose your dataset and retrieve your SQL or SPARQL query.

    Note

    The below query includes specific query parameters. You can also simplify your query by using SELECT * FROM DataDotWorldBBallTeam

  2. intro_ds <- "https://data.world/jonloyens/an-intro-to-dataworld-dataset"
    
    sample_query <- data.world::qry_sql(paste0(
    "SELECT t.Name, t.Height, s.AssistsPerGame ",
    "FROM DataDotWorldBBallTeam as t ",
    "JOIN DataDotWorldBBallStats as s ON t.Name = s.Name ",
    "ORDER BY s.AssistsPerGame DESC"))
    
    data.world::query(sample_query, dataset = intro_ds)
    R_4.png
  3. Use the available functions to begin modeling your data.

What next?

Here are some things you can do with R and data.world:

  • Use the Addins menu to quickly add a new Insight to data.world from R Studio.

  • Use the included dwapi package to update datasets, upload files, list all tables in a project, and more.

Redshift

Reltio

Simple Editor

Intro to the Simple Data Documentation Editor

The Simple Data Documentation Editor--Simple Editor for short--is data.world's built-in WYSIWYG editor for project and dataset summaries, posts, and insights. The Simple Editor generates powerful data-enabled markdown, and allows you to create data-rich documentation for your datasets and projects faster than ever before. Now you can:

  • Use drag and drop or autofill to embed or link resources in various formats from many different sources

  • Benefit from easy-to-use styling or standard Markdown

  • Insert a variety of content blocks

  • Use Markdown shortcuts while in the Simple Editor

  • Switch back and forth between the Simple Editor and Markdown

Accessing the command menu

When you place your cursor on a new line in a Simple Editor field, a list of shortcuts for basic embed and format options appears:

Screen_Shot_2019-04-16_at_11.40.17_AM.png

If your cursor is not at the beginning of the line your content will be linked instead of embedded. More information about embedding vs. linking can be found in the article Embedding resources with the Simple Editor.

The forward slash (/) brings up the command menu which is covered in detail in the article Using the Simple Editor command menu.

Embedding links to other users and organizations

The ampersand ( @ ) brings up a list of users and organizations with whom you've recently interacted in order to @mention them. If the person or org you want to tag is not in the list, you can begin typing the name for a list of options:

Screen_Shot_2019-04-15_at_4.56.30_PM.png

The @mention takes readers to the profile page of the user or organization who was tagged.

Formatting options and shortcuts

The last item on the initial prompt of commands for the Simple Editor is highlight text. It serves as a reminder that when you are in the Simple Editor you can highlight a section of text to get a menu of the formatting options built in to the editor:

Simple_editor_highlight_menu.png

The options on the Highlight menu are Bold, Italics, code, header types 1-3, and links. In addition to formatting with the highlight text menu you can also use the basic keyboard shortcuts like ⌘+b for bold, ⌘+i for italics (Ctrl+ for Windows). The standard Markdown shortcuts (# Header1, ## Header2, **bold**, _italics_, etc.) can also be used in the Simple Editor window. For more information on Markdown styling see our Markdown syntax reference.

Switching between the Simple Editor and Markdown

Though the simplicity of the WYSIWYG interface of the Simple Editor will appeal to most users, those who prefer to use Markdown can switch to Markdown by selecting the Switch to Markdown link at the top of the edit window:

Screen_Shot_2019-05-06_at_5.40.12_PM.png

It is also possible to do some editing in the Simple Editor and continue editing in Markdown or vice versa. Both versions are maintained simultaneously:

Screen_Shot_2019-05-06_at_5.42.58_PM.png

More information about how the Simple Editor works can be found in the article Using the Simple Editor command menu.

Using the Simple Editor command menu

In the article Intro to the Simple Data Documentation Editor we introduced data.world's Simple Editor that allows you to quickly and easily add different kinds of files from various sources into project and dataset summaries, posts, and insights.

The command menu (accessed by typing forward slash ( / ) at the beginning of a line in the editor window) contains a list of items you can embed including recent files, queries, insights, and various content block options. If your cursor is not at the beginning of a line when you type the forward slash, your content will be linked, not embedded. More information about embedding vs. linking can be found in the article Embedding resources with the Simple Editor.

Recently updated resources

The first several items on the command menu are files, queries, insights, and other resources specific to your current dataset or project that you have recently modified:

Screen_Shot_2019-05-06_at_5.58.53_PM.png

If the item you wish to embed is not in the recent items list you can begin typing its name and autocomplete will take over:

Screen_Shot_2019-05-06_at_6.26.00_PM.png
Checklists

Creating a checklist is as easy as selecting the item on the menu and then entering all your list items separated by a return. Checklists format as items in a list and the boxes can be checked during the edit process. Once the summary or insight has been saved, the state of the checkboxes is set and can be changed by opening and editing the summary or insight again:

Screen_Shot_2019-03-29_at_4.32.37_PM.png
Code blocks

Selecting < > Code from the dropdown inserts a formatted code block where you can type or paste a code snippet:

Screen_Shot_2019-03-29_at_4.38.36_PM.png
Embed Content

The </> Embed Content menu item inserts the formatting required for a URL so all you need to do is paste the URL into it and it will be embedded after it's saved:

Screen_Shot_2019-04-16_at_11.37.15_AM.png

When adding embedded content, the embed link must be the only text on its line. You cannot, for instance, embedded content in the middle of a sentence.

More information on embedding and linking content can be found in the article Embedding resources with the Simple Editor

Embedding resources with the Simple Editor

With data.world's Simple Editor you can quickly and easily add different kinds of files from various sources into project and dataset summaries, posts, and insights. The articles Intro to the Simple Data Documentation Editor and Using the Simple Editor command menu describe the editor and the features of the command menu.

Using drag and drop to embed resources

In addition to using the command menu, you can also embed resources into a Simple Editor window with drag and drop. Drag and drop works with resources located in the current project or dataset, on your local drive, a networked or cloud drive, and a web page. To embed a resource by dragging and dropping it the resource needs to be dragged to the beginning of a new line in the Simple Editor window. Resources placed in-line render as links rather

Various sources for embedded resources

The command menu is useful for embedding resources from the current dataset or project into the editor window, but resources in connected datasets or located outside of data.world must be embedded with drag and drop. Sources for resources that can be embedded include:

  • The current project or dataset

  • Desktop or cloud storage

  • From a URL

The current project or dataset

While recently modified files in the current project will show up in the command menu, files in linked datasets do not. Drag and drop works for both the current project or dataset files and also for the connected dataset files. To include any resource from the left sidebar of the workspace select it and drag it to the beginning of a new line in the editor window:

Screen_Shot_2019-05-07_at_11.38.41_AM.png

The files in your project or dataset that can be viewed in the workspace typically render in full or as a preview in the Simple Editor window. PDF's are the current exception to this rule and can only be embedded as a link. Tabular files are displayed as a preview with a link to the full file. Text files (.txt, not rich-text files--.rtf) and most image files are also previewed in the editor. See the article Supported file types for detailed information on the rendering of specific file types.

Queries and insights are both rendered as previews in the Simple Editor. Notice that for queries there is also an option to run the query in the bottom right corner of the window:

Screen_Shot_2019-03-29_at_3.20.56_PM.png

If you select the Run query option you'll be taken to a new tab in the workspace which will display the full results of the query (or the maximum number allowed for larger tables):

Screen_Shot_2019-03-29_at_3.24.56_PM.png

To get back to the summary window click on its tab at the top of the workspace:

Screen_Shot_2019-03-29_at_3.37.07_PM.png

Parameterized queries can also be embedded into the Simple Editor where they will be executable by the reader. The query is previewed in the editor window with all the parameter fields showing the default values. The reader can replace the defaults with other values and interactively run the query without ever leaving the summary, post, or insight. Running the query returns a preview of the first five rows of the query results. Clicking See all runs the query in a workspace and returns all the results from the parameters set by the reader (up to 10,000):

Screen_Shot_2019-05-06_at_12.49.37_PM.png

Embedding files from your desktop or a cloud storage device like iCloud, Dropbox, or Box is easy with the editor's drag and drop interface. Simply choose the file and drag it to the beginning of a line in the Simple Editor window. Non-image files will be added to the dataset or project:

drag_add__file.jpg

You will then need to embed the uploaded file into the editor window either with the command menu or by dragging and dropping it. If the file you choose is a supported image file, the image will be added to the editor window without being added to the dataset or project as a new file:

From a URL

Another source for embeddable content is a webpage. URL's can be embedded into the beginning of a line in a Simple Editor window and they will show up as either a preview of the webpage or as a link to it, depending on how the source page was configured by its author:

Screen_Shot_2019-05-06_at_1.42.33_PM.png

Singer

A Singer target that writes data to data.world

How to use it

target-datadotworld works together with any other Singer Tap to store on data.world data extracted from sources like SalesForce, HubSpot, Marketo, MySQL and more.

Install and Run

First, make sure Python 3.6 is installed on your system.

target-datadotworld can be run with any Singer Tap, but we'll use tap-fixerio which pulls currency exchange rate data - as an example.

These commands will install tap-fixerio and target-datadotworld with pip, and then run them together, piping the output of tap-fixerio to target-datadotworld:

? pip install target-datadotworld tap-fixerio
? tap-fixerio | target-datadotworld -c config.json
INFO Replicating the latest exchange rate data from fixer.io
INFO Tap exiting normally

The data will be written to the dataset specified in config.json. In this specific case, under a stream named exchange-rates.

If you're using a different Tap, substitute tap-fixerio in the final command above with the command used to run your Tap.

Configuration

target-datadotworld requires configuration file that is used to store your data.world API token and dataset information.

The following attributes are required:

  • api_token: Your data.world API token

  • dataset_id: The title of the dataset where the data is to be stored. Must only contain lowercase letters, numbers, and dashes.

Additionally, the following optional attributes can be provided.

  • dataset_owner: If not the same as the owner of the API token (e.g. if the dataset is to be accessed/created under an organization account, as opposed to the user's own)

Example:

{
    "api_token": "your_token",
    "dataset_id": "fixerio-data",
    "dataset_owner": "my-company",
}

Slack

Slack is a powerful communication tool for teams and when paired with data.world it extends your control over your information flow in dramatic ways. Integrating data.world with Slack enables you to get real-time updates in Slack whenever changes are made to data.world accounts, datasets, and projects--keeping you and your team constantly in the loop.

With Slack and data.world you can:

  • Easily subscribe to projects, datasets, and accounts to receive notifications when updates are made.

  • Share rich messages when you link to data.world so your teammates won’t be left in the dark.

  • Preview query details and easily link to a specific query with one click.

  • Quickly view your subscriptions, unsubscribe, and explore further through inline commands.

In the next sections we'll show you how to do all of these things and more!

data.world's Slack integration is not created by, affiliated with, or supported by Slack Technologies, Inc.

Installation

To install the data.world Slack integration, select Add to Slack button at the top of the Slack Integrations page:

slack-integrations-page-1.png

You'll be prompted to either enter your Slack workspace URL (you must be an administrator for the workspace to install the app) or to create a new workspace:

Screen_Shot_2018-09-13_at_2.51.31_PM.png

After you enter your URL and select Continue you'll be taken to the final set-up screen that tells you how data.world will interact with your workspace:

Screen_Shot_2018-09-13_at_2.57.19_PM.png

Once the data.world app has been installed on Slack it will show up on the bottom left of your Slack workspace in the Apps section. The first message from the app will contain a welcome message with instructions on how to use the app and where to get help:

Screen_Shot_2018-09-13_at_3.23.32_PM.png
Subscription

In the example above we have a channel called crm for our customer relationship management team. If they want to get updates whenever changes are made to the crm datasets, we would need to set-up their channel as follows:

  • Add the data.world app to the channel - this needs to be done for every channel ion the workspace that wants to use the app.

  • Subscribe the channel to the projects, datasets, and accounts for which the members should get updates.

To add the data.world app to the channel, select the channel from the list in the left sidebar and enter the command /invite @data.world:

Screen_Shot_2018-09-13_at_3.24.00_PM.png

Once the app has been added to the channel you can set up subscriptions to projects, datasets, and accounts. To subscribe the crm channel to CRM Project, enter the command /data.world subscribe https://data.world/siyeh/crm-project into the command line of the channel:

Screen_Shot_2018-09-13_at_3.37.45_PM.png
Notification

After a channel has been subscribed, whenever an update is made to the subscribed item (project, dataset, or account) everyone in the channel will automatically get a notification of the update with a link to it:

Screen_Shot_2018-09-13_at_4.00.23_PM.png

In the example above the file can be commented on in the Slack channel or on data.world by selecting the Discuss button at the bottom of the message:

Screen_Shot_2018-09-13_at_4.07.08_PM.png
Rich messages

With the data.world app for Slack installed whenever you reference something on data.world in your channel you get a rich text preview with embedded links and the option of subscribing to the item (if a project, dataset, or account) from the message:

Screen_Shot_2018-09-13_at_4.47.48_PM.png
Preview queries

Another powerful feature of the data.world app is that you can paste the URL of a data.world query into Slack and it gets rendered into preview and has links for others to access directly without having to hunt around the project or dataset to find the query:

Screen_Shot_2018-09-13_at_4.56.19_PM.png
View and manage subscriptions

Managing your subscriptions and getting help for commands is easy too! To see what a particular channel is subscribed to, enter the command /data.world list into the channel dialog and you will get back a list of all the projects, datasets, organizations, and users the channel is subscribed to with links to each of them and the name of the person who subscribed to them:

Screen_Shot_2018-09-13_at_5.08.53_PM.png

The correct command format for subscribing is stored in Slack in the first message from the data.world app so you'll always have it readily at hand. The syntax for all commands also can be accessed by typing /data.world help:

Screen_Shot_2018-09-13_at_5.20.46_PM.png

Snowflake

SQL Anywhere

Stitch

Connect to all your data sources and replicate your data on data.world with Stitch's best-in-class ETL service.

Choose your data source.

With Stitch's interface, you can easily choose data that you'll replicate to data.world:

stitch_add-integration.png
  1. Create a free Stitch account and login.

  2. Click Add Integration and choose your desired data source.

Connect data.world.

After choosing your data source, you can quickly choose and configure data.world as a destination for your data.

stitch_data-world.png
stitch_auth-dw.png
documentation
  1. Select data.world as the destination for your data.

  2. Sign in with your data.world credentials.

  3. Start replicating data!

What next?

Here are a few things you can do with Stitch and data.world:

  • Choose multiple data sources to replicate all your data to data.world.

  • Customize the replication frequency for each data source, from every 30 minutes to every day.

  • Add team members to your account so they can view and collaborate.

Tableau Desktop Connector

Easily pull data directly into Tableau to create your visualizations using data.world's Tableau Desktop connector. By using the connector as your data source, Tableau can even keep the data in sync with data.world to ensure your visualizations are using the most accurate data.

Enable Tableau Desktop or Tableau Public on data.world

If you're connecting Tableau Desktop or Tableau Public to data.world for the first time:

  • Click the Integrations link at the bottom of the left sidebar in data.world

    integrations01.png
  • Filter or scroll down to find the Tableau Desktop/Public tile and click on it

  • Click the Enable Integration button at the top of the Tableau Desktop/Public integration page

Alternatively, you can enable the integration by following this link to go straight to the Tableau Desktop/Public integration page

Add a new data source in Tableau

To use the connector from Tableau Desktop or Tableau Public v.10+, select the Web Data Connector option under Connect > To a Server within your Tableau application.

In the window that opens, we're going to enter the web data connector URL.

The URL you enter here will depend on what type of import you would like to make. You have two options:

  1. Import all tables from a dataset or project

  2. Import a single table, file, or query result

Import an entire dataset
  1. Enter tableau.data.world as the Web Data Connector URL.

  2. Press Enter

  3. Login with your data.world credentials if prompted

  4. Enter the data.world dataset URL or click Browse to scroll through your datasets or search the data.world community (note that searches results are limited to 30 in the Web Data Connector)

  5. Click Select next to the dataset you'd like to import

  6. Click Connect to confirm you selection

  7. Depending on the size of the dataset, this import can take from a few seconds to a few minutes

Import a single file, table, or query result

Go to data.world and navigate to the file or query you would like to open in Tableau. To the right of the Download link, click Open with Tableau (note: if you have multiple integrations enabled, you may have to click the dropdown menu here to choose Tableau)

tableau01.png

Copy the URL from the new window that opens:

tableau04.png

Returning to Tableau, paste the connector URL into the input box. Note that when you enter the URL, the input box disappears and moves to the URL field so you'll just need to hit Enter on your keyboard to load the next page.

tableau05.png

By using the connector URL from the Open with Tableau window, the data.world web data connector will be pre-populated with your dataset URL or query details:

115018236988-mceclip5.png

If you'd like to bring in another dataset, you can go to Data > New Data Source and repeat the steps above.

Posting a Tableau visualization on data.world

Once you've created and published your visualization to Tableau Public, use the published URL to share it back on data.world as an Insight on a Data Project, as a discussion post, or within your dataset or Data Project summary using the markdown syntax @(https://public.tableau.com/shared/url) :

115003214833-mceclip0.png
Adding data.world to the Tableau Server safe list

In order to publish a Web Data Connector to Tableau Server to allow for data refreshes, you will need to add data.world to the safe list/allowlist. For general information on this, please see Tableau's documentation.

For Tableau Server 2018.1 and above, run the following commands:

tsm data-access web-data-connectors add --name 'DW WDC' --url https://tableau.data.world:443 --secondary https://api.data.world/*,https://query.data.world/*,https://data.world/oauth/*
tsm data-access web-data-connectors allow
tsm restart

As always, contact us if you have any feedback or run into any unexpected issues with the connector.

Tableau Server

Tableau

When the DWCC is run against Tableau server the following metadata is collected:

  • Workbook name

  • Dashboard name

  • Dashboard title

  • Project a dashboard is in

  • Non-dashboard views

  • Number of dashboard views

  • Tags for objects that have them

  • Relationships between views/dashboards and workbooks

  • Number of dashboard favorites

Vertica