Skip to main content

About the Snowflake collector

Use this collector to directly harvest metadata from Snowflake data cloud including tabular objects, objects from Horizon (policies and tags), Snowflake data quality as well as functions and stored procedures. The collector allows you to harvest column statistics and table query counts. Additionally, the Snowflake collector can harvest metadata for Streamlit applications.

Important

The Snowflake collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.247. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.

Table 1.

Object

Information cataloged

Column

Name, Description (Snowflake comments), Data Type, Is Nullable, Default Value, Key type (Primary, foreign)

Table

Name, Description (Snowflake comments), Primary key, Schema, External URL (Snowsight)

Extended metadata: Is Iceberg, Is dynamic, Is transient, Created Date, Last Modified Date, Last Updated By (Snowflake LAST_DDL_BY), Owner (Snowflake role that owns table), Definition Altered Date, Table Size

View

Name, Description (Snowflake comments), Schema, External URL (Snowsight), Definition Altered Date, Last Modified By (Snowflake LAST_DDL_BY), Is Iceberg, Is dynamic, Is temporary, Created Date, Last Modified Date, Owner (Snowflake role that owns table)

Materialized View

Name, Description (Snowflake comments), Schema, External URL (Snowsight), Definition Altered Date, Last Modified By (Snowflake LAST_DDL_BY), Is Iceberg, Is dynamic, Is temporary, Created Date, Last Modified Date, Owner (Snowflake role that owns table)

Schema

Identifier, Name

Extended metadata: Description (Snowflake comments), Created date, Updated

Database

Type, Name, Identifier, Server, Port, Environment, JDBC URL

Extended metadata: Description (Snowflake comments)

User-defined function

(this includes User-defined functions created in Snowpark)

Name

Extended metadata: Function Definition, Description (Snowflake comments), Created Date, Owner (Snowflake Role), Snowflake Function Type (Return Type), Function Signature, Function Language, Packages, Runtime version, Updated

Stored Procedure

(this includes Stored Procedures created in Snowpark)

Name

Extended metadata: Function Definition, Owner (Snowflake Role), Description (Snowflake comments), Return Type, Function Language, Packages, Created Date, Function signature, Snowflake Function Type, Runtime Version, Updated, Stored Procedure Type



Cataloging Snowflake Tags and Policies

Following additional information is cataloged when you run the collector with the --policy-collection and --tag-collection parameters.

Table 3.

Object

Information cataloged

Columns

Snowflake masking policies

Snowflake Tags

Name, Allowed Values, Description (Snowflake comments)

Snowflake Tag Values

Name, Value

Snowflake Masking Policies

Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database

Snowflake Row access policies

Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database



Cataloging Snowflake table query counts

Following additional information is cataloged when you run the collector with the --table-usage-collection and --table-usage-lookback-days parameters.

Table 4.

Object

Information cataloged

Table

Query Count



Cataloging Snowflake Data Metric Functions

You can harvest system data metric functions that capture data quality metrics. Following additional information is cataloged when you run the collector with the --data-metric-function-collection parameter.

Table 5.

Object

Information cataloged

Data Metric Function

  • Identifier: Name of the function

  • Title: Name of the function

  • Description

  • Body Definition of the function

Table

  • Schedule: cron expressions

  • Data metric function observations: function name, column arguments, most recent metric time, value



Cataloging Snowflake Streamlit Applications

Following additional information is cataloged when you run the collector with the --streamlit-app-collection parameter.

Table 6.

Object

Information cataloged

Snowflake Streamlit App

  • Identifier: App Name

  • Title: App Title

  • Description: App comments

  • Main file

  • Root location

  • Date Created

  • Owner



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 7.

Resource page

Relationship

Snowflake Row access policies

Tables

Snowflake Masking Policies

Columns, Tag value

Table

Columns, Tag value

Snowflake Tags

Tag value

Columns

Table, Tag Value

View

Schema that contains Views, Columns that are part of Views

Materialized View

Schema that contains Materialized Views, Columns that are part of Materialized Views

Schema

Database that contains Schema, Table that is part of Schema, View that is part of Schema, Materialized View that is part of Schema, Streamlit app that is part of schema

Streamlit App

Schema that contains Streamlit App

Database

Schema contained in Database

Data Metric Function

Table or View analyzed by this Snowflake Data Metric Function

Table

Data Metric Function that analyzes Table

View

Data Metric Function that analyzes View



Lineage for Snowflake

The following lineage information is collected by the Snowflake collector. Note that any lineage for SQL Statements defined via variable statements are not supported

Important things to note:

  • Any lineage for SQL Statements defined via variable statements are not supported.

  • The collector traces these relationships from a View’s columns to ultimate source Table columns across SQL expressions and subqueries.

Table 8.

Object

Lineage available

Column in View

The collector identifies the associated column in an upstream view or table:

  • Where the data is sourced from

  • That sort the rows via ORDER BY

  • That filter the rows via WHERE/HAVING

  • That aggregate the rows via GROUP BY

User-defined function

A Function and:

  • Views referenced in Function

  • Tables referenced in Function

  • Functions referenced in Function

A View and:

  • Function referencing View

Stored Procedure

The collector identifies:

  • The associated column in an upstream view or table

    • Where the data is sourced from

    • That sort the rows via ORDER BY

    • That filter the rows via WHERE/HAVING

    • That aggregate the rows via GROUP BY

  • The downstream table that has its data updated by this stored procedure.

Note: The following stored procedures are not supported:

  • Stored procedures with multitable inserts. However, multiple separate insert statements that insert into one table are supported.

  • Stored procedure with multiple SELECT and INSERT statements not separated by a semicolon delimiter.

  • Stored procedures used to create tables or any transient tables created during the execution scope of a stored procedure.



Important details about Snowflake tags

The Snowflake collector harvests Snowflake tags allowing you to discover tags in their Snowflake instance.

  • When the collector is run with Collect Snowflake tag information (--tag-collection) option enabled, the collector harvests tags and tag-values (tag references) that meet one of the following criteria

    • The tag is not linked to any object and resides in a database and schema specified in the collector run command.

    • The tag is linked with a database object, and the corresponding database is specified in the collector run command.

    • The tag is linked to a non-database object, and that object resides in a database and schema specified in the run command.

      Important

      The collector does not harvest system tags defined in SNOWFLAKE.CORE.

  • If the collector is run with Collect all Snowflake tag information (--tag-collection-all) option, the collector does not filter based on whether the object of the tag is in one of the databases specified to be harvested.

Important details about Snowflake policies

The Snowflake Collector harvests Snowflake policies allowing you to discover policies in your Snowflake instance.

  • When the collector is run with Collect Snowflake policy information --policy-collection option, the collector harvests masking policies and row access policies that meet one of the following criteria:

    • The policy is not linked to any object and resides in a database and schema specified in the collector run command.

    • The policy is associated with a database object, and the corresponding database is specified in the collector run command.

    • The policy is linked to a non-database object, and that object resides in a database and schema specified in the run command.

  • If the collector is run with the Collect all Snowflake policy information (--policy-collection-all) option, the collector does not filter based on whether the object of the policy is in one of the databases specified to be harvested.

Sample view of Snowflake tag and policies in data.world

In the following example you can see how a Snowflake Tag-Based Masking Policy applied to sensitive data columns: routing numbers, bank name, and bank account number. In this view, you can also see the associated tag (Classification:confidential), and technical details about the policy, like the Policy Body which explains how the Policy works.

policy_info.png
Table 9.

Feature

Details

Snowflake object tags

Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach. For details, see Snowflake documentation.

Snowflake masking policies

Masking policies define what data is masked and not shown to users allowing authorized users to access sensitive data at query runtime. For details, see the Snowflake documentation.

Snowflake tag based masking policies

When a masking policy is applied to a Snowflake tag, the columns with the same tag are automatically protected based on the conditions set in the masking policy. For details, see the Snowflake documentation.

Snowflake row access policies

Access policies are applied to rows of data, that is, who can see the rows of data, and masking policies define what data is masked and not shown to users. For details about this see the Snowflake documentation.

Table query count

For each table, the Snowflake collector computes percentile popular and a query count based on the user access history table in Snowflake. This information can then be used to assign popularity ranking to Snowflake tables in data.world. For details about this see the Snowflake documentation.



When this data is harvested and is displayed in data.world, you see the following information in your catalog: Snowflake masking policies, Snowflake row access policies, Snowflake tag values, and Snowflake tags.

snowflake_harvested_info.png

When you browse to a table harvested from Snowflake, you will notice that the appropriate policies are applied to the tables. In the following example, the policy and tag values are applied to the Order table.

snowflake_harvested_info02.png

Users can click through to the tag values and policies to see their details.

Note

Note that users can also use the Search feature in the application to locate all these resources easily.

snowflake_harvested_info03.png

Authentication supported

The collector supports the following authentication method:

  • Username and Key-pair authentication. For details, see the Snowflake documentation.

    Warning

    Snowflake is transitioning away from username & password authentication soon. If you used this method of authentication, we advise that you update to key-pair authentication. For details, see this field notice.