About the Redshift collector

Use this collector to harvest metadata for Redhsift tables and columns across the enterprise systems and make it searchable and discoverable in data.world.

Important

The Redshift collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.294. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.

Table 1.

Object	Information cataloged
Columns	Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Key type (Primary, foreign), column size, column index
Table	Name, description, primary key, schema
Views	Name, description, SQL definition
Schema	Identifier, Name
Database	Type, name, identifier, server, port, environment, JDBC URL
Function	Name, Description, Function Type
Stored Procedure	Name, Description, Stored Procedure Type

Profiling and sampling specific information

If you include the profiling and sampling specific parameters while running the collector, the following additional information is harvested for Columns.

Important

The user/role must have read access to data to be able to harvest profiling information (column statistics).

Table 2.

Object	Information cataloged
Column	Average Length (sample) Average Value (sample) Data Distribution Distinct Values Estimated Distinct Values Estimated Non-null Values Maximum Length (sample) Maximum Value (sample) sorted numerically or alphabetically (z-a) Minimum Length (sample) Minimum Value (sample) sorted numerically or alphabetically (a-z) Non-null Values (sample) Sample String Values (first 5 items in a column)
Table	Row Count Sample Count (Target sample size)

Sensitive data classification specific information

If you include the Sensitive data classification parameters while running the collector, the following information is scanned.

Table 3.

Object	Data scanned
Column	Classification Types: Address, Bank Account Number, Bank Routing Number, Blood Type, Credit Card, Expiration Date, Credit Card Number, Credit Card Verification Code, Date, Drug, Email Address, Healthcare General, Healthcare Identification Number, Identification Number, Injury, IP Address, Medical Condition, Medical Process, Numerical PII, Occupation, Organization, Origin, Passport Number, Password, Person Name, Phone Number, Physical Attribute, Political Affiliation, Religion, Social Security Number, Time, URL, User Name, Zodiac Sign, General, None This is a subset of the entity types supported by private.ai. For a description of these entity types, see the Private AI documentation. Observation Count Average Score
Sensitive Data Classification Provider	Name Version Note: currently only private-ai is supported.

Object

Data scanned

Column

Classification Types: Address, Bank Account Number, Bank Routing Number, Blood Type, Credit Card, Expiration Date, Credit Card Number, Credit Card Verification Code, Date, Drug, Email Address, Healthcare General, Healthcare Identification Number, Identification Number, Injury, IP Address, Medical Condition, Medical Process, Numerical PII, Occupation, Organization, Origin, Passport Number, Password, Person Name, Phone Number, Physical Attribute, Political Affiliation, Religion, Social Security Number, Time, URL, User Name, Zodiac Sign, General, None
This is a subset of the entity types supported by private.ai. For a description of these entity types, see the Private AI documentation.
Observation Count
Average Score

Sensitive Data Classification Provider

Name
Version
Note: currently only private-ai is supported.

Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 1.

Resource page	Relationship
Table	Columns
Columns	Table
Schema	Database that contains Schema, Table that is part of Schema
Database	Schema contained in Database

Lineage for Redshift

The collector identifies, for every column in a View, the column(s) in other tables or views from which that view’s column selects (sources) its data.

The collector traces these relationships from a View’s columns to ultimate source Table columns across SQL expressions and subqueries.

Additionally, the collector establishes relationships between a View and any columns in source Tables that sort the rows in the View (via SQL ORDER BY), filter the rows in the View (via SQL WHERE and HAVING clauses), and aggregate the rows in the View (via SQL GROUP BY).

Note

Note that the collector currently does not establish view-to-table relationships in the JDBC collectors. This can be done transitively (e.g., in SPARQL) by noting the column-level relationships, since each column is associated with one and only one table or view. Alos, any lineage for SQL Statements defined via variable statements are not supported.

Authentication supported

The collector supports username/password authentication to Redshift.

In this section:

About the Redshift collector

Important

Note

What is cataloged

Note

Important

Relationships between objects

Lineage for Redshift

Important

Note

Authentication supported

Search results