Enterprise docs

Live data connection

When you connect a database to data.world using Connection manager or an integration from the Integrations gallery, your data continues to live at its source location and is not stored in data.world. This configuration is frequently referred to as data virtualization.

The Connection manager is the best way to create a virtual connection that will be owned by an organization. If you need a connection that you will own personally, you will need to create it from the Integrations gallery. See Connection permissions for more information about choosing an owner for a connection.

One of the benefits of data virtualization is that it allows you to view and query data on data.world that would exceed the dataset size limits on data.world. It also ensures that you have access to your most current data without needing to worry about scheduling synchronizations, or the processing it time it would take to import/refresh the data.

When you query a live table using data.world, our system will translate your query from our native SQL dialect into the SQL dialect of the target system. That system will then execute the query on its own hardware and return the results to data.world for display. Another benefit of virtualization is that it makes managing permissions and access to the data easier.

Please be aware that cloud database providers frequently charge either by the amount of time that queries run on their systems or by the total amount of data scanned during the query. If this describes your database service then executing queries against live tables in data.world will also incur charges on those systems.

Create a dedicated IAM role for Athena connections

To configure a virtual connection to Athena you will need to create a dedicated IAM role in your Amazon Web Services (AWS) console and enter the AWS Amazon Resource Name (ARN) for it in the Add a new connection dialog. To create the role, however, you will need to first get the AWS External ID from the bottom of the connection dialog. Follow the steps below to create the AWS role and the connection to Athena.

  1. Open the configuration screen as described above,

  2. Copy the External ID and do not close the dialog.

    Warning

    You have to leave the Add a new connection dialog open while you connect to the AWS console and create the role needed for the connection because every time you open the dialog to create a new connection, a new external ID is generated.

  3. Go to the AWS console and select Create role.

    AWS_screen_1.png
  4. Use the following parameters for the role:

    • Select type of trusted entity - Another AWS account

    • Account ID - 465428570792

    • Require external ID - checked

    • External ID - The value copied from the Add new connection dialog in data.world

  5. Select Next: Permissions:

    AWS_screen_2.png
  6. Use the search bar to find the following two policies and add them:

    • AmazonAthenaFullAccess

    • AmazonS3FullAccess

    Note

    You may choose to be more fine grained in precisely which buckets you allow data.world to access. We will only need write access on the S3 output bucket location configured earlier. Otherwise, the minimum permissions required to query data from table backing buckets is required.

  7. Select Next: Tags and add any tags you would like.

  8. Select Next: Review

    AWS_Screen_3.png
  9. Name the role, write a description, verify that the two policies shown above are present, and select Create role.

  10. Find the role you have just created:

    AEWS_find_role.png
  11. Copy its ARN, and paste the ARN into the dialog window you left open for adding a new Athena connection.

    AWS_role_permissions.png

Test configuration

Enter all the parameters into the configuration window and select Test configuration to make sure it works. If it does, select Configure to save it. You can now use this connection any time you add data.

Add_data.png