Preparing to run the Snowflake collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Setting up authentication for cataloging Snowflake
The collector supports the following authentication method:
Username and Key-pair authentication. For details, see the Snowflake documentation.
Important
Support for Snowflake username and password authentication will be discontinued soon. If you used this method of authentication, we advise you to change the authentication to username and key-pair authentication. For details, see this field notice.
We recommend you create a dedicated Snowflake user for running the collector. You will need specific permissions to create this new user.
You need at least USERADMIN or higher permissions to create a new user. See the Snowflake user creation documentation. If you plan to modify the settings of an existing user, you will need OWN permissions to make such updates. See the Snowflake Admin User Management documentation
Additionally, to create key-pair authentication for this user, you need a minimum of SECURITYADMIN to alter the user settings. See the Snowflake documentation.
To set permissions:
In the following query, replace <warehouse_name>, <database_name>, <streamlit_app_name>, and <rsa_public_key>.
Note
The following query grants permissions for all Snowflake schemas, tables, external tables, and views to the DDW_ACCOUNT_ROLE. The scope of the grant statements can be modified to grant permissions to specified objects which would limit the metadata cataloged from Snowflake.
The collector requires permissions on the snowflake.account_usage schema to harvest table usage, object dependency based lineage, functions and stored procedures, tags, row access policies, masking policies, creation date and time, and modification date and time.
create or replace role DDW_ACCOUNT_ROLE; grant operate, usage on warehouse <warehouse_name> to role DDW_ACCOUNT_ROLE; grant usage on database <database_name> to role DDW_ACCOUNT_ROLE; grant usage on all schemas IN DATABASE <database_name> to role DDW_ACCOUNT_ROLE; grant select on all tables IN DATABASE <database_name> to role DDW_ACCOUNT_ROLE; grant select on all external tables IN DATABASE <database_name> to role DDW_ACCOUNT_ROLE; grant select on all views IN DATABASE <database_name> to role DDW_ACCOUNT_ROLE; GRANT USAGE ON FUTURE SCHEMAS IN DATABASE '<database_name>' TO ROLE DDW_ACCOUNT_ROLE; GRANT SELECT ON FUTURE TABLES IN DATABASE '<database_name>' TO ROLE DDW_ACCOUNT_ROLE; //These permissions are only required if you want to harvest metadata from Streamlit apps. GRANT USAGE on STREAMLIT <streamlit_app_name> to ROLE DDW_ACCOUNT_ROLE; // Set up a key-pair authenticate. Follow steps from https://docs.snowflake.com/en/user-guide/key-pair-auth create user DDW_ACCOUNT RSA_PUBLIC_KEY = '<rsa_public_key>' type = SERVICE default_role = DDW_ACCOUNT_ROLE default_warehouse = '<warehouse_name>' display_name = 'data.world'; // These permissions are required to harvest table usage, Object Dependency based lineage, // functions and stored procedures, tags, row access policies, and masking policies use role ACCOUNTADMIN; grant imported privileges on database snowflake to role DDW_ACCOUNT_ROLE; // You can check the grants on the user with this SHOW GRANTS TO ROLE DDW_ACCOUNT_ROLE;
Run all the queries.
Test the collector using the DDW_ACCOUNT user and and the key-pair file you created.