About the Snowflake collector
Use this collector to directly harvest metadata from Snowflake data cloud including tabular objects, objects from Horizon (policies and tags), Snowflake data quality as well as functions and stored procedures from Snowpark. The collector allows you to harvest column statistics and table query counts. Additionally, the Snowflake collector can harvest metadata for Streamlit applications.
Important
The Snowflake collector can be run in the Cloud or on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.246. To view the release notes for this version and all previous versions, please go here.
What is cataloged
The collector catalogs the following information.
Note
The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.
Object | Information cataloged |
---|---|
Columns | Name, Description (Snowflake comments), Data Type, Is Nullable, Default Value, Key type (Primary, foreign) |
Table | Name, Description (Snowflake comments), Primary key, Schema, External URL (Snowsight) Extended metadata: Is Iceberg, Is dynamic, Is transient, Created Date, Last Modified Date, Last Updated By (Snowflake LAST_DDL_BY), Owner (Snowflake role that owns table), Definition Altered Date, Table Size |
Views | Name, Description (Snowflake comments), Schema, External URL (Snowsight), Definition Altered Date, Last Modified By (Snowflake LAST_DDL_BY), Is Iceberg, Is dynamic, Is temporary, Created Date, Last Modified Date, Owner (Snowflake role that owns table) |
Materialized View | Name, Description (Snowflake comments), Schema, External URL (Snowsight), Definition Altered Date, Last Modified By (Snowflake LAST_DDL_BY), Is Iceberg, Is dynamic, Is temporary, Created Date, Last Modified Date, Owner (Snowflake role that owns table) |
Schema | Identifier, Name Extended metadata: Description (Snowflake comments), Created date, Updated |
Database | Type, Name, Identifier, Server, Port, Environment, JDBC URL Extended metadata: Description (Snowflake comments) |
User-defined functions (this includes User-defined functions created in Snowpark) | Name Extended metadata: Function Definition, Description (Snowflake comments), Created Date, Owner (Snowflake Role), Snowflake Function Type (Return Type), Function Signature, Function Language, Packages, Runtime version, Updated |
Stored Procedures (this includes Stored Procedures created in Snowpark) | Name Extended metadata: Function Definition, Owner (Snowflake Role), Description (Snowflake comments), Return Type, Function Language, Packages, Created Date, Function signature, Snowflake Function Type, Runtime Version, Updated, Stored Procedure Type |
Cataloging Snowflake Tags and Policies
Following additional information is cataloged when you run the collector with the --policy-collection and --tag-collection parameters.
Object | Information cataloged |
---|---|
Columns | Snowflake masking policies |
Snowflake Tags | Name, Allowed Values, Description (Snowflake comments) |
Snowflake Tag Values | Name, Value |
Snowflake Masking Policies | Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database |
Snowflake Row access policies | Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database |
Cataloging Snowflake table query counts
Following additional information is cataloged when you run the collector with the --table-usage-collection and --table-usage-lookback-days parameters.
Object | Information cataloged |
---|---|
Table | Query Count |
Cataloging Snowflake Data Metric Functions
You can harvest system data metric functions that capture data quality metrics. Following additional information is cataloged when you run the collector with the --data-metric-function-collection parameter.
Object | Information cataloged |
---|---|
Data Metric Function |
|
Table |
|
Cataloging Snowflake Streamlit Applications
Following additional information is cataloged when you run the collector with the --streamlit-app-collection parameter.
Object | Information cataloged |
---|---|
Snowflake Streamlit App |
|
Relationships between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Snowflake Row access policies | Tables |
Snowflake Masking Policies | Columns, Tag value |
Table | Columns, Tag value |
Snowflake Tags | Tag value |
Columns | Table, Tag Value |
View | Schema that contains Views, Columns that are part of Views |
Materialized View | Schema that contains Materialized Views, Columns that are part of Materialized Views |
Schema | Database that contains Schema, Table that is part of Schema, View that is part of Schema, Materialized View that is part of Schema, Streamlit app that is part of schema |
Streamlit App | Schema that contains Streamlit App |
Database | Schema contained in Database |
Data Metric Function | Table or View analyzed by this Snowflake Data Metric Function |
Table | Data Metric Function that analyzes Table |
View | Data Metric Function that analyzes View |
Lineage for Snowflake
The following lineage information is collected by the Snowflake collector. Note that any lineage for SQL Statements defined via variable statements are not supported
Important things to note:
Any lineage for SQL Statements defined via variable statements are not supported.
The collector traces these relationships from a View’s columns to ultimate source Table columns across SQL expressions and subqueries.
Object | Lineage available |
---|---|
Column in View | The collector identifies the associated column in an upstream view or table:
|
User-defined function | A Function and:
A View and:
|
Stored Procedure | The collector identifies:
Note: The following stored procedures are not supported:
|
Important details about snowflake tags and policies
The Snowflake collector harvests Snowflake object tags, Snowflake tag-based masking policies, Snowflake masking policies, and Snowflake row access policies. This information enhances the data governance experience to discover all tags and policies applied across their Snowflake tables and columns in a data catalog.
The collector harvests tags and tag-values (tag references) that meet one of the following criteria:
1. The tag is not linked to any object and resides in a database and schema specified in the collector run command.
The tag is associated with a database object, and the corresponding database is specified in the collector run command.
The tag is linked to a non-database object, and that object resides in a database and schema specified in the run command.
Important
The collector does not harvest system tags defined in SNOWFLAKE.CORE.
In the following example you can see how a Snowflake Tag-Based Masking Policy applied to sensitive data columns: routing numbers, bank name, and bank account number. In this view, you can also see the associated tag (Classification:confidential), and technical details about the policy, like the Policy Body which explains how the Policy works.
Feature | Details |
---|---|
Snowflake object tags | Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach. For details, see Snowflake documentation. |
Snowflake masking policies | Masking policies define what data is masked and not shown to users allowing authorized users to access sensitive data at query runtime. For details, see the Snowflake documentation. |
Snowflake tag based masking policies | When a masking policy is applied to a Snowflake tag, the columns with the same tag are automatically protected based on the conditions set in the masking policy. For details, see the Snowflake documentation. |
Snowflake row access policies | Access policies are applied to rows of data, that is, who can see the rows of data, and masking policies define what data is masked and not shown to users. For details about this see the Snowflake documentation. |
Table query count | For each table, the Snowflake collector computes percentile popular and a query count based on the user access history table in Snowflake. This information can then be used to assign popularity ranking to Snowflake tables in data.world. For details about this see the Snowflake documentation. |
When this data is harvested and is displayed in data.world, you see the following information in your catalog: Snowflake masking policies, Snowflake row access policies, Snowflake tag values, and Snowflake tags.
When you browse to a table harvested from Snowflake, you will notice that the appropriate policies are applied to the tables. In the following example, the policy and tag values are applied to the Order table.
Users can click through to the tag values and policies to see their details.
Note
Note that users can also use the Search feature in the application to locate all these resources easily.
Authentication supported
The collector supports the following authentication method:
Username and Key-pair authentication. For details, see the Snowflake documentation.
Warning
Snowflake is transitioning away from username & password authentication soon. If you used this method of authentication, we advise that you update to key-pair authentication. For details, see this field notice.