Community docs

Configurations for virtual connections

Virtual connections to your data can be created through Connection manager, or from our Integrations for database connectors. The Connection manager is the best way to create a connection that will be owned by an organization. If you need a connection that you will own personally, you will need to create it from the integrations gallery. See the article on Connection permissions for more information about choosing an owner for a connection.Connection permissions

Athena virtual data connection
  • S3 Output Bucket Location - The Amazon S3 bucket where query results should be stored. The location should start with s3://. For example, to store results in a folder named "test-folder-1" inside an S3 bucket named "query-results-bucket", you would set the location to s3://query-results-bucket/test-folder-1

  • Workgroup - If your Athena instance is configured with different workspaces you can assign your connection to a workspace here

  • AWS ARN - A dedicated Identity Access Management (IAM) role created specifically for data.world. This role must be created before you can configure a connection to Athena. See Create a dedicated IAM role for Athena connections for more information.

  • AWS external id - provided in the "Add a new Athena connection" dialog

Note

Before configuring a virtual connection to Athena you need to have set up an IAM role in the AWS console.

Athena_configuration_IG.png
Create a dedicated IAM role for Athena connections

To configure a virtual connection to Athena you will need to create a dedicated IAM role in your Amazon Web Services (AWS) console and enter the AWS Amazon Resource Name (ARN) for it in the Add a new connection dialog. To create the role, however, you will need to first get the AWS External ID from the bottom of the connection dialog. Follow the steps below to create the AWS role and the connection to Athena.

  1. Open the configuration screen as described above,

  2. Copy the External ID and do not close the dialog.

    Warning

    You have to leave the Add a new connection dialog open while you connect to the AWS console and create the role needed for the connection because every time you open the dialog to create a new connection, a new external ID is generated.

  3. Go to the AWS console and select Create role.

    AWS_screen_1.png
  4. Use the following parameters for the role:

    • Select type of trusted entity - Another AWS account

    • Account ID - 465428570792

    • Require external ID - checked

    • External ID - The value copied from the Add new connection dialog in data.world

  5. Select Next: Permissions:

    AWS_screen_2.png
  6. Use the search bar to find the following two policies and add them:

    • AmazonAthenaFullAccess

    • AmazonS3FullAccess

    Note

    You may choose to be more fine grained in precisely which buckets you allow data.world to access. We will only need write access on the S3 output bucket location configured earlier. Otherwise, the minimum permissions required to query data from table backing buckets is required.

  7. Select Next: Tags and add any tags you would like.

  8. Select Next: Review

    AWS_Screen_3.png
  9. Name the role, write a description, verify that the two policies shown above are present, and select Create role.

  10. Find the role you have just created:

    AEWS_find_role.png
  11. Copy its ARN, and paste the ARN into the dialog window you left open for adding a new Athena connection.

    AWS_role_permissions.png
Azure Synapse virtual data connection
Azure_Synapse_config_IG.png
BigQuery virtual data connection
  • Project ID - The unique identifier for your BigQuery project

  • Service account username - A Google account that is associated with your Google project, as opposed to a specific user

  • Service account key file - Provides the authentication information used in the connection configuration. This file will be uploaded when you enter the other configuration information into the dialog

BigQuery_config_IG.png
MS SQL Server virtual data connection
SQL_Server_config_IG.png
MYSQL virtual data connection
My_SQL_config_IG.png
Oracle Database virtual data connection
Oracle_db_config_IG.png
PostgreSQL virtual data connection
PostgreSQL_config_IG.png
Redshift virtual data connection
Redshift_config_IG.png
Snowflake virtual data connection
Snowflake roles, warehouses, and privileges

The Snowflake user specified in the connection must have a default Warehouse set in Snowflake

All queries run against Snowflake with this connection will use this Warehouse for their compute power.

If the Snowflake user specified in the connection does not have a default Role set in Snowflake, the connection will use the Public role, which may limit privileges to access and query data.

If a default Role is set for the Snowflake user, that Role will be used by the data.world connection.

In order to create a virtualized connection to a table or view, the user must have USAGE privileges on the database and schema and SELECT privileges on the table or view.

If a default database is specified in the data.world connection modal, it must be specified using all UPPER CASE letters.

Snowflake_config_IG.png