Skip to main content

About the Snowflake collector

Use this collector to directly harvest metadata from Snowflake data cloud including tabular objects, objects from Horizon (policies and tags), Snowflake data quality as well as functions and stored procedures from Snowpark. The collector allows you to harvest column statistics and table query counts. Additionally, the Snowflake collector can harvest metadata for Streamlit applications.

Important

The Snowflake collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.200. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.

Table 1.

Object

Information cataloged

Columns

Name, Description (Snowflake comments), Data Type, Is Nullable, Default Value, Key type (Primary, foreign)

Table

Name, Description (Snowflake comments), Primary key, Schema, External URL (Snowsight)

Views

Name, Description (Snowflake comments), SQL definition, External URL (Snowsight)

Materialized View

Name, Description (Snowflake comments)

Schema

Identifier, Name, Description (Snowflake comments)

Database

Type, Name, Identifier, Server, Port, Environment, JDBC URL, Description (Snowflake comments)

User-defined functions

(this includes User-defined functions created in Snowpark)

Name, Description (Snowflake comments), Owner, Function Type, Signature, Definition, Return Type, Language, Packages

Stored Procedures

(this includes Stored Procedures created in Snowpark)

Name, Description, Owner, Signature, Stored Procedure Type, Definition, Return Type, Language, Packages



Cataloging Snowflake Tags and Policies

Following additional information is cataloged when you run the collector with the --policy-collection and --tag-collection parameters.

Table 3.

Object

Information cataloged

Columns

Snowflake masking policies

Snowflake Tags

Name, Allowed Values, Description (Snowflake comments)

Snowflake Tag Values

Name, Value

Snowflake Masking Policies

Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database

Snowflake Row access policies

Name, Description (Snowflake comments), Date created, Snowflake tags, Owner, Type, Policy body, Schema, Database



Cataloging Snowflake table query counts

Following additional information is cataloged when you run the collector with the --table-usage-collection and --table-usage-lookback-days parameters.

Table 4.

Object

Information cataloged

Table

Query Count



Cataloging Snowflake Data Metric Functions

Following additional information is cataloged when you run the collector with the --data-metric-function-collection parameter.

Table 5.

Object

Information cataloged

Data Metric Function

  • Identifier: Name of the function

  • Title: Name of the function

  • Description

  • Body Definition of the function

Table

  • Schedule: cron expressions

  • Data metric function observations: function name, column arguments, most recent metric time, value



Cataloging Snowflake Stremalit Applications

Following additional information is cataloged when you run the collector with the --streamlit-app-collection parameter.

Table 6.

Object

Information cataloged

Snowflake Stremalit App

  • Identifier: App Name

  • Title: App Title

  • Description: App comments

  • Main file

  • Root location

  • Date Created

  • Owner



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 7.

Resource page

Relationship

Snowflake Row access policies

Tables

Snowflake Masking Policies

Columns, Tag value

Table

Columns, Tag value

Snowflake Tags

Tag value

Columns

Table, Tag Value

View

Schema that contains Views, Columns that are part of Views

Materialized View

Schema that contains Materialized Views, Columns that are part of Materialized Views

Schema

Database that contains Schema, Table that is part of Schema, View that is part of Schema, Materialized View that is part of Schema, Streamlit app that is part of schema

Streamlit App

Schema that contains Streamlit App

Database

Schema contained in Database

Data Metric Function

Table or View analyzed by this Snowflake Data Metric Function

Table

Data Metric Function that analyzes Table

View

Data Metric Function that analyzes View



Lineage for Snowflake

The following lineage information is collected by the Snowflake collector.

Note

Note that the collector traces these relationships from a View’s columns to ultimate source Table columns across SQL expressions and subqueries.

Table 8.

Object

Lineage available

Column in View

The collector identifies the associated column in an upstream view or table:

  • Where the data is sourced from

  • That sort the rows via ORDER BY

  • That filter the rows via WHERE/HAVING

  • That aggregate the rows via GROUP BY

User-defined function

A Function and:

  • Views referenced in Function

  • Tables referenced in Function

  • Functions referenced in Function

A View and:

  • Function referencing View

Stored Procedure

The collector identifies:

  • The associated column in an upstream view or table

    • Where the data is sourced from

    • That sort the rows via ORDER BY

    • That filter the rows via WHERE/HAVING

    • That aggregate the rows via GROUP BY

  • The downstream table that has its data updated by this stored procedure.



Important details about snowflake tag and policies

The Snowflake collector harvests Snowflake object tags, Snowflake tag-based masking policies, Snowflake masking policies, and Snowflake row access policies. This information enhances the data governance experience to discover all tags and policies applied across their Snowflake tables and columns in a data catalog

In the following example you can see how a Snowflake Tag-Based Masking Policy applied to sensitive data columns: routing numbers, bank name, and bank account number. In this view, you can also see the associated tag (Classification:confidential), and technical details about the policy, like the Policy Body which explains how the Policy works.

policy_info.png
Table 9.

Feature

Details

Snowflake object tags

Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach. For details, see Snowflake documentation.

Snowflake masking policies

Masking policies define what data is masked and not shown to users allowing authorized users to access sensitive data at query runtime. For details, see the Snowflake documentation.

Snowflake tag based masking policies

When a masking policy is applied to a Snowflake tag, the columns with the same tag are automatically protected based on the conditions set in the masking policy. For details, see the Snowflake documentation.

Snowflake row access policies

Access policies are applied to rows of data, that is, who can see the rows of data, and masking policies define what data is masked and not shown to users. For details about this see the Snowflake documentation.

Table query count

For each table, the Snowflake collector computes percentile popular and a query count based on the user access history table in Snowflake. This information can then be used to assign popularity ranking to Snowflake tables in data.world. For details about this see the Snowflake documentation.



When this data is harvested and is displayed in data.world, you see the following information in your catalog: Snowflake masking policies, Snowflake row access policies, Snowflake tag values, and Snowflake tags.

snowflake_harvested_info.png

When you browse to a table harvested from Snowflake, you will notice that the appropriate policies are applied to the tables. In the following example, the policy and tag values are applied to the Order table.

snowflake_harvested_info02.png

Users can click through to the tag values and policies to see their details.

Note

Note that users can also use the Search feature in the application to locate all these resources easily.

snowflake_harvested_info03.png

Authentication supported

The collector supports the following authentication methods: