Preparing to run the Snowflake collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item	Requirement
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.
RAM	8 GB
CPU	2 Ghz processor
Software (for on-premise runs only) Docker or Java Runtime Environment
Docker	Click here to get Docker.
Java Runtime Environment	OpenJDK 17 is supported and available here.
data.world specific objects (for both cloud and on-premise runs)
Dataset	You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.
Network connection
Allowlist IPs and domains	Follow these instructions to configure your network. Use these tools to check network connections before running the collector.

Setting up authentication for cataloging Snowflake

The collector supports Username and Key-pair authentication. For details, see the Snowflake documentation.

Important

Support for Snowflake username and password authentication will be discontinued soon. If you used this method of authentication, we advise you to change the authentication to username and key-pair authentication. For details, see this field notice.

Snowflake determines authorization based on roles. Therefore, you must specify a role when running the collector to connect to Snowflake. To harvest the full range of metadata available, the ACCOUNTADMIN role is required. Using other roles will allow collection of basic metadata but will not provide access to extended metadata and lineage information.

If you choose to use a role other than ACCOUNTADMIN, you must grant the alternative role specific permissions as outlined in this section.

Additionally, some collector functionalities require executing queries in Snowflake, which necessitates the use of a warehouse. If the user credentials passed to the collector have a default warehouse set, there is no need to specify a warehouse when running the collector. However, if the user does not have a default warehouse, or you wish to use a different warehouse than the user’s default, you must specify it using the Snowflake warehouse (--warehouse) configuration option.

To set permissions:

Create a new role.

CREATE OR REPLACE ROLE DDW_ACCOUNT_ROLE;
GRANT OPERATE, USAGE ON WAREHOUSE <warehouse_name> TO ROLE DDW_ACCOUNT_ROLE;

Grant permissions to allow metadata harvesting for database, schema, tables, and views.

GRANT USAGE ON DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT SELECT ON ALL TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT SELECT ON ALL EXTERNAL TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT SELECT ON ALL VIEWS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT SELECT ON FUTURE TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;

If you do not plan to use the following two options - Enable Sample String Values collection and Enable column statistics collection, while harvesting the metadata, , you can grant the REFERENCES permission, instead of the SELECT permission.

GRANT USAGE ON DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT REFERENCES ON ALL TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT REFERENCES ON ALL VIEWS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;
GRANT REFERENCES ON FUTURE TABLES IN DATABASE <database_name> TO ROLE DDW_ACCOUNT_ROLE;

Grant permissions to allow harvesting of object dependency based lineage, user-defined functions, data metric functions, stored procedures, tags/tag values, row access policies, masking policies, and harvest table usage.
There are two options for assigning these permissions.
Important
You need the ACCOUNTADMIN role to grant these permissions.
1. Option 1: This approach involves using the ACCOUNTADMIN role to grant broad imported privileges across databases.
```
USE ROLE ACCOUNTADMIN;
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE DDW_ACCOUNT_ROLE;
```
2. Option 2: This approach involves assigning specific database roles for targeted permission allocation.
```
USE ROLE ACCOUNTADMIN;
USE DATABASE SNOWFLAKE;
GRANT DATABASE ROLE OBJECT_VIEWER TO ROLE DDW_ACCOUNT_ROLE;
GRANT DATABASE ROLE GOVERNANCE_VIEWER TO ROLE DDW_ACCOUNT_ROLE;
```
  Important
  For more information about these database roles, see the Snowflake Documentation.
Grant permissions for Streamlit apps. These permissions are only required if you want to harvest metadata from Streamlit apps.
```
GRANT USAGE ON STREAMLIT <streamlit_app_name> TO ROLE DDW_ACCOUNT_ROLE;
```
Create a dedicated Snowflake user for the collector to use to authenticate to Snowflake. To create key-pair authentication for this user, you need a minimum of SECURITYADMIN to alter the user settings. See the Snowflake documentation.
```
CREATE USER DDW_ACCOUNT
RSA_PUBLIC_KEY = '<rsa_public_key>'
TYPE = SERVICE
DEFAULT_ROLE = DDW_ACCOUNT_ROLE
DEFAULT_WAREHOUSE = '<warehouse_name>'
DISPLAY_NAME = 'data.world';
GRANT ROLE DDW_ACCOUNT_ROLE TO USER DDW_ACCOUNT;
```

Grant user access to the provided warehouse.

GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE DDW_ACCOUNT_ROLE;

Run the following query to verify the appropriate permissions are granted.
```
SHOW GRANTS TO ROLE DDW_ACCOUNT_ROLE;
```

In this section:

Preparing to run the Snowflake collector

Setting up pre-requisites for running the collector

Setting up authentication for cataloging Snowflake

Important

Important

Important

Search results