About the PostgreSQL collector
Use this collector to harvest metadata for PostgreSQL tables and columns across the enterprise systems and make it searchable and discoverable in data.world. It is suitable for both self-hosted PostgreSQL instances and managed PostgreSQL instances, such as those hosted on AWS RDS.
Important
The PostgreSQL collector can be run in the Cloud or on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.256. To view the release notes for this version and all previous versions, please go here.
What is cataloged
The collector catalogs the following information.
Note
The collector harvests all versions of overloaded functions and stored procedures. Each version has its own title/name in the catalog, but a distinct identifier.
Object | Information cataloged |
---|---|
Columns | Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Key type (Primary, foreign), column size, column index |
Table | Name, description, primary key, schema |
Views | Name, description, SQL definition |
Materialized View | Name, description, SQL definition |
Schema | Identifier, Name |
Database | Type, name, identifier, server, port, environment, JDBC URL |
Function | Name, Description, Function Type |
Stored Procedure | Name, Description, Stored Procedure Type |
Relationship between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Table | Columns |
Columns | Table |
Schema | Database that contains Schema, Table that is part of Schema, View that is part of Schema, Materialized View that is part of Schema |
View | Schema that contains Views, Columns that are part of Views |
Materialized View | Schema that contains Materialized Views, Columns that are part of Materialized Views |
Lineage for PostgreSQL
The collector identifies, for every column in a View/Materialized View, the columns in other tables or views from which that view’s column selects (sources) its data.
The collector traces these relationships from a View’s columns to ultimate source Table columns across SQL expressions and subqueries.
Additionally, the collector establishes relationships between a View and any columns in source Tables that sort the rows in the View (via SQL ORDER BY)
, filter the rows in the View (via SQL WHERE
and HAVING
clauses), and aggregate the rows in the View (via SQL GROUP BY
).
Note
Note that the collector currently does not establish view-to-table relationships in the JDBC collectors. This can be done transitively (e.g., in SPARQL) by noting the column-level relationships, since each column is associated with one and only one table or view. Also, any lineage for SQL Statements defined via variable statements are not supported.
Authentication supported
The collector supports username/password authentication to PostgreSQL.