Skip to main content

About the Redshift collector

Use this collector to harvest metadata for Redhsift tables and columns across the enterprise systems and make it searchable and discoverable in data.world.

Important

The Redshift collector can be run in the Cloud or on-premise using Docker or JAR files.

Note

The latest version of the Collector is 2.315. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Note

The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.

Table 1.

Object

Information cataloged

Columns

Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Key type (Primary, foreign), column size, column index

Table

Name, description, primary key, schema

Views

Name, description, SQL definition

Schema

Identifier, Name

Database

Type, name, identifier, server, port, environment, JDBC URL

Function

Name, Description, Function Type

Stored Procedure

Name, Description, Stored Procedure Type



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 1.

Resource page

Relationship

Table

Columns

Columns

Table

Schema

Database that contains Schema, Table that is part of Schema

Database

Schema contained in Database



Lineage for Redshift

The following lineage information is collected by the Redshift collector:

Table 2.

Object

Lineage Available

View (column-level)

The collector traces data flow from view columns to source table columns across SQL expressions and sub queries.

View → Table column relationships

The collector captures lineage when views:

  • Sort rows (ORDER BY)

  • Filter rows (WHERE, HAVING)

  • Aggregate rows (GROUP BY)

Stored Procedure

The collector identifies:

  • The associated column in an upstream view or table:

    • Where the data is sourced from

    • That sort the rows via ORDER BY

    • That filter the rows via WHERE/HAVING

    • That aggregate the rows via GROUP BY

  • The downstream table that has its tables updated.

The following stored procedures are not supported:

  • Stored procedures with multitable inserts. However, multiple separate insert statements that insert into one table are supported.

  • Stored procedure with multiple SELECT and INSERT statements not separated by a semicolon delimiter



Authentication supported

  • The collector supports username/password authentication to Redshift.