Skip to main content

Catalog collector release note


Published versions of collectors are available as a docker image here.

Release version 2.128

Details about the release


This release was for internal improvements and has no customer impacting changes.

Table 1.



Release version


Release date

22 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 70328cef174353aaee098316df6324d6c7777805d0e8dca25210976c4083b979

  • amd64: 4568d255e7c9a36ab82ac980ec915decc7beddebb1e3f124a8e5f18bd3515c27

Release version 2.127

Details about the release

Table 2.



Release version


Release date

21 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 1a2686c8317986cea74500dd1018152614f08937cbbbd4232ff675cc4e3a2100

  • amd64: 3f1a3f175fcfafa3d7dae390850eb7fa2aa9393c9f1196316bce4c80e64b910b

New features and changes

  • dbt cloud collector is now available. Detailed documentation about the collector is available here.

Release version 2.126

Details about the release

Table 3.



Release version


Release date

17 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64: 44d78be268b948226cad5fc41310202e34e30c5313f026380a00554e135ddb27

amd64: f9033e2060fea8f22beb599296800a4b4494592878e0048dd2de59bf7a308321

New features and changes

  • Snowflake collector: supports profiling for columns with values stored in scientific notations.

Bug fixes

  • Power BI collector: Fixed an issue with tabular files to properly handle invalid paths or http paths.

Release version 2.125

Details about the release

Table 4.



Release version


Release date

9 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64:  37c33410f1e3162b11e0885600f860d3fe9a41790faeed62cf791b4289797703

amd64 : 64e38233b4c47fac90e2d68eafa948a4c46fbf7f3504968e28244857298b2a46

Bug fixes

  • Fivetran collector: Updated destination identifiers to match the case for currently supported database types. Specifically, this resolves the duplicate Snowflake resource pages issue.

  • Snowflake collector: Fixed an issue that was causing duplicate snowflake tag-value pairs.

  • Tableau collector: Updated project filtering to ensure collector harvests calculated fields which are referenced in a sheet but were not created in the sheet.

Release version 2.124

Details about the release

Table 5.



Release version


Release date

21 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64: b756d2f91373067746af00d951e117fabcd930d65df2dcb27706ee05689f495c

amd64: 165071142006ba509759d1e5d7fa49a57e9b09ff9d1ce665bf41a6683685d27b

New features and changes

  • Amazon S3 Collector: The new Amazon S3 collector harvests metadata about buckets and objects, including the Region, Version State, Size, Last Modified Data, ACL Owner, Grantee and Grant Permission, amongst others. See all the details about this collector in  this documentation.

  • BigQuery collector enhancements:

    • You can now harvest column-level lineage between views and tables, as well as more metadata about datasets, projects, tables, and views.

    • The collector now provide an option to do a test run to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

Bug fixes

  • Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: When parsing view definition SQL to harvest column-level lineage, the collectors now correctly parse SQL in which tables are fully qualified in the FROM clause but not in the SELECT clause.

  • Power BI collector: The Power BI collector has changed the URL used as the dwec:externalUrl property from Power BI's embedUrl to Power BI's webUrl, which now allows the user to open the Report, Dashboard, or Dataset in a browser. Additionally, the collector now harvests the embedUrl from Power BI as a separate property, kos:embedUrl.

  • Snowflake Collector: The collector now handles scenario when Snowflake JDBC driver does not provide valid default values for certain database columns.

Release version 2.123

Details about this release

Table 6.



Release version


Release date

13 February, 2023

Docker image ID

  • arm64: 7e2738ad5f2dae819332ef2f17a1cc34adaa0e3af167bdfa1fd6fedd36520871

  • amd64: ac00e4820508b612a7dbceb865112ad5d9115c258e02842991b0850dc8b4ea89

New features and changes

  • Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: Enhancements have been made to parsing of view definition SQL to harvest column-level lineage. We now support joins on named subqueries and correctly handle quoted identifiers.

Bug fixes

  • Snowflake collector: The sampling queries used to calculate the column statistics were failing.

Release version 2.122

Details about this release

Table 7.



Release version


Release date

10 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: b50f49f324e6df07864a39b904609b007105eb0378daee89fda1e255fd07a075

  • amd64: 87e2c4d932f4c584c42b87d2842b81cc3e54a7a42a18af5dd400340a4eae62e5

New features and changes

  • Manta collector: 

    • Collector now supports Manta version r38.1

    • The collector now also supports token-based authentication.

  • JDBC collectors: The description of the --jdbc-property for JDBC collectors is updated for clarity.

  • The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Tableau collector

  • YAML configuration files used to configure collectors can now interpolate system environment variables and Java system properties. For details about using this feature, see this documentation.

Release version 2.121

Details about this release

Table 8.



Release version


Release date

2 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • amd64: a730cf46ed312e871e5d54c6fc89ab1cd050a8c3ed7f0a0155bcae7c1aab7ac3

  • arm64: 894f34263dd50b1290d5cbbd63849830150815c10a4969ab57c58ed625466ab1

New features and changes

  • Tableau collector: Improved detection of underlying database type when a Tableau data source uses ODBC.

Bug fixes

  • dbt collector: Fixed an issue in the dbt collector that caused coining of IRIs that were inconsistent with IRIs coined by the Snowflake collector, which prevented the linking of database objects between dbt and Snowflake in the catalog. The application now ensures consistency of database object IRIs created by the dbt and Snowflake collectors.

Release version 2.120

Details about this release

Table 9.



Release version


Release date

27 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • amd64: 78e999b0bcd7493ab9c0c48680483689dd7272efe84d006df74e5f33a17c700d

  • arm64: d26d71eb535ef56bb68cb411292865967b5399a8c49bedb1f5d0e8cda9b8173c

New features and changes

  • BigQuery collector: The collector now harvests catalog resources representing BigQuery datasets and their associated metadata.

  • dbt collector:

    • The collector now supports key pair authentication to Snowflake allowing users to use private-public key pair for authenticating to Snowflake.

    • The collector now has improved detection of target database type information when that information is missing in the profiles.yml file.

    • Users can now use the new --snowflake-account CLI parameter to override snowflake account information from the command line.

    • The help text for --snowflake-role, --snowflake-warehouse, and snowflake-application parameters now include examples and case-sensitivity information.

  • Snowflake collector:

    • The collector now supports key pair authentication allowing users to use a private-public key pair for authenticating to Snowflake.

    • Enhancements made to parsing of Snowflake SQL dialect when harvesting column-level lineage allows for parsing of statements with copy grants.

  • Tableau collector The help text now includes examples for the --tableau-project and --tableau-exclude parameters.

  • Power BI collector The help text now includes examples for the --include-workspace and --exclude-workspace parameters.

Release version 2.119

Details about this release

Table 10.



Release version


Release date

20 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

AMD64: 5a8f9e24ebe05dc027caf74075cac4ce51667271da30935640fc3c9471578445

ARM64: af0b4528e0ee097d29d286c29c803db185f616babdb2a867b6228e77efaf1cd5

New features and changes

  • A new Further Help section is added to the help available for collectors that is accessed using the -H or --help parameters in the command. It now guides users to the collectors help available on the documentation site.

  • The collectors now emit a globally unique IRI to track collector runs.

Bug fixes

  • Snowflake collector: Column statistics now supports Number data type.

Release version 2.118

Details about this release

Table 11.



Release version


Release date

18 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: aca6f9202192bac23a5579f88eb155576e46425ad6b901c3febfbab32ff4158a

  • amd64: 56cbb748ada10f006d41a841034a0a6ba7211085c9682aa32331682ea92d20b0


Bug fixes

  • Snowflake collector: Column statistics now supports columns with spaces in names. 

  • Tableau collector:  The Tableau collector released from version 2.113-2.117 had an issue because of which it was not able to parse GraphQL queries. If you are using collectors between version 2.113-2.117, you must upgrade to 2.118 to be able to use the Tableau collector successfully.

Release version 2.117

Details about this release


This release was for internal improvements and has no customer impacting changes.

Table 12.



Release version


Release date

12 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 429a55a11d4bcd15647d1316d9debd9ead4b4ab5c0b9146894d07c39aa814290

  • amd64: 481dd2da6de71525248eba186feeeafcc73cc956ade0a196a4e8b0c2424e74b9

Release version 2.116

Details about this release

Table 13.



Release version


Release date

10 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 011ebeaf6000b1fdc47f1d3f8cb8a7655cbbe3528b844abe2a2cd9bd9fddc0fe

  • amd64: 192a2b94b6e58016c8e5f7ae871480e6e38fb74214597640f8b862a245d5c629

New features and changes

  • Power BI Collector: The following alternate options are added for some of the command line parameters:

    • For --include-user-workspace alternate parameter --user-workspace-include

    • For --include-workspace alternate parameter --workspace-include

    • For --exclude-workspace alternate parameter --workspace-exclude

  • BigQuery collector:

    • The collector now harvests additional metadata from projects, datasets, views, and tables available in BigQuery.

    • Column-level lineage added between Views and Tables.

Bug fixes

  • Snowflake collector: Fixed issue for parsing a SQL statement that contained copy grants in Views. This helps improve the column-level lineage harvested by the collector.

Release version 2.115

Details about this release

Table 14.



Release version


Release date

10 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 693e1661d3ae178c0d5a2bca8e40f406928e91d34b4c1c749f1cce31bf720592

  • amd64: 060e0268ecdfb3d9c7382cce30192c334c1240831edf21681887bc8ee29a33c4

New features and changes

  • The documentation of the jdbc-property property for database collectors is improved to explain how users can specify multiple properties. This change applies to 19 collectors that include this parameter.

  • A new resource dwec:Source is added to the catalog emitted from database collectors. It is a mechanism that allows users to render specified resource properties as read-only in the catalog UI.

  • Power BI collector: The collector now has enhanced parsing of power BI transformation expressions. As a result of this change more column-level lineage information is harvested from Power BI.

  • Snowflake collector: The collector now harvests table usage counts information.

  • dbt collector: User-defined database attributes are now enabled for the dbt collector to fully mitigate missing or incomplete profiles YAML file when cataloging database objects referenced by dbt.

Release version 2.114

Details about this release

Table 15.



Release version


Release date

22 December, 2022

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64:7202f9ae528a73e8ff7e6a29c36a22c8202680e733000886e760b0a3864b692a

  • amd64: 321d046a526f04b47fee389c3e48222b0b6b6c0d940ff66b938be92d85f59b0

New collectors

  • The Grafana collector is now available as a private beta release for select customers. Please contact if you are interested in using this collector.

New features and changes

  • The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Power BI

  • Catalog graphs (.ttl files) that are automatically uploaded to the platform with -u / --upload are now compressed, enabling larger graphs to be uploaded.

  • Power BI collector: The collector now provides a new option --disable-expression-lineage to skip parsing lineage from the source expressions.

  • Snowflake collector: The collector has a new ability to harvest table usage and query count. This functionality is enabled by passing --table-usage-collection. It calculates, for each table in the database being harvested, the percentage of tables in the database that have been queried no fewer times than the subject table. The time period over which this analysis is performed is controlled with option --table-usage-lookback-days (that is, the number of days prior to the time when the collector is being run during which queries of each table are tallied), which defaults to a value of 7.

Bug fixes

  • Snowflake collector: Fixed an issue with SQL parsing in Snowflake for windowed aggregate functions.

  • Power BI collector: Fixed an issue with the Power BI expression parsing related to joins in source expressions.

  • Looker collector: Fixed an issue in the Looker collector that caused an abnormal termination of the collector run with certain Looker views.

Release version 2.113

Details about this release

Table 16.



Release version


Release date

12 December, 2022

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 0517b905198728ba73bc59304ce06dd60ac99c3f7a25ad84569b94bef41eb1c2

  • amd64: b52b36ccf20c00fc0bb16b6abcb01496d55c1f64a8425a23a24f9473de54c9e3

New Features and changes

  • The following collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Databricks

    • Db2

    • Denodo

    • Dremio

    • Generic JDBC

    • Hive

    • Infor Ion

    • Mysql

    • Oracle

    • Presto

    • Salesforce

    • SQl Anywhere

    • Vertica

  • Power BI collector: Updated the Power BI collector to harvest metadata for Dataflows.

  • Databricks collector: Updated the Databricks collector driver version to 2.6.32. Drivers available here

  • dbt collector: Updated the dbt collector to harvest metas (as key value pairs) for dbt resources.

Release version 2.112

Details about this release

Table 17.



Release version

6 December, 2022

Release date


Docker image ID (use this to verify the integrity of the Docker image.)

  • arm64: 0921bdcf30a1e28f7a1d5094ff806537bfa023af93d2904bce6c9624e8cde3cf

  • amd64: 80e973a297d89d73d1a3c62d319d11baa23f45bc804e01050868827c60c2ad64

New features and changes

  • Added the following options for Snowflake, Redshift, PostgreSql and MS SQL:

    • --dry-run: If specified, the collector does not actually harvest any metadata, but just checks the database connection parameters provided by the user and reports success or failure at connecting.

    • --enable-column-statistics: to enable harvesting of column statistics (i.e., data profiling)

    • --sample-string-values: to enable harvesting of sample values and histograms for columns containing string data

    • --target-sample-size: to control the number of rows sampled for computation of column statistics and string-value histograms

Release version 2.111

Details about this release

Table 18.



Release date

29 November, 2022

Release number


Docker image ID

(use this to verify the integrity of the Docker image.)


  • arm64: 4d4cd1fde0816ae5209b72f92f87c798da83dba5b2f155e3614bc89c68f39b71

  • amd64: bdd7daa56b2a62864c59b8c8958100e12080e61dc8f18559555840bef58c8079

New features and changes

  • now produce images for the arm64 architecture (in addition to amd64). The addition of arm64 means that dwcc images run seamlessly on M1 mac. As a result of this change, from this release onward two hashes are available per release.

Release version 2.110

Details about this release


This release was for internal improvements and has no customer impacting changes.

Table 19.



Release date

12 November, 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 98583ecda023782df1e08a0f2347a536e239186dcca3936d16c67ae1f6aad0f6

Release version 2.109

Details about this release

Table 20.



Release date

10 November, 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 6602f313506e5eb3ea74c296994f7e4d7bd56845c6f2b35e6d1d4cde5f402832

New features and changes

  • Snowflake collector: Snowflake policy fully-qualified names are being written to the title instead of to the description property.

Release version 2.108

Details about this release

Table 21.



Release date

8 November, 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: b81a221abff982a356a21c4430f80da1e4459f0c3ead2d4f4f51a8f1d45c5604

New features and changes

  • The --post-process-sparql cli option is now available for all other collectors (This feature was previously made available for some collectors in release version 2.107). This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem and/or uploaded to the API.

  • BigQuery collector: The option to use the credential file for BigQuery no longer allows use of -c . It must be specified with --credentialFile.

Release version 2.107

Details about this release

Table 22.



Release date

3 November, 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 4a33db022e92488914d1f088d4041a31d76883706ae13a88bf1a0e8aa67eaa51

New features and changes

  • Added --post-process-sparql cli option to some collectors. This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem or uploaded to the API.

Bug fixes

  • Fixed an issue where MS SQL Server database objects referenced from Power BI did not always link to those objects harvested by the SQL Server collector due to mismatched IRIs.

Release version 2.106

Details about this release

Table 23.



Release date

30 October 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64 0ac22da04737fbcaaac0da9d076eaf92e3cdd870c85544dd16a556f54a8900a8

New features and changes

  • Google BigQuery collector: The collector is updated to coin IRIs for database objects that align with IRIs coined by other collectors.

  • dbt collector The collector now writes catalog records for each Snowflake tag and policy.

Bug fixes

  • Fixed a defect in which boolean properties that appeared in the global_options section of a dwcc configuration file were not properly recognized.

Release version 2.105

Details about this release

Table 24.



Release date

28 October, 2022

Release version


Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 427443cadbc21a3e26f095a4c054f6193bf3c2b96d257cdb22b57abd061bad68

New features and changes

  • Power BI collector: The collector now supports the ability to include specific workspaces for cataloging via the parameter --include-workspace. The collector continues to allow exclusion of specific workspaces with --exclude-workspace. Use of --include-workspace takes precedence.

  • dbt collector: 

    • The collector now supports harvesting of dbt projects/artifacts that specify Snowflake as the target database.

    • The collector now correctly coins database object (e.g., database, schema, table, column) IRIs that align with IRIs coined by the JDBC collectors. Previously, if the case used for identifiers in dbt artifacts did not match the target database’s default collation, the IRIs would not align (they do now).

Previous versions


Collector v2.104

hash: 0b01f8c379e52f3167577a6fd1e5ad2f8d2f3d73871797ae3859b79f83bf5c29

  • Updated the Monte Carlo collector to add a --bigquery-credentials-file option, in order to standardize the option since the dbt collector has a --bigquery-credentials-file option (note the --big-query-credentialFile still exists in Monte Carlo, this is a new alias for the same option).

  • The Snowflake collector now harvests Tags, Masking Policies, and Row Access Policies, and associate these resources with the database objects to which they apply. There are new CLI options in order to include these: --tag-collection and --policy-collection.

10-14-22 Collector v2.103

hash: 0f4f021c4c8fc17c7f47618ef9942e255327eef0d4c749e453a06e1d0e96760b

  • Updated SQL Server collector to harvest intra-database lineage from views.

  • Updated the log messages for missing files not required for DBT collector to run.

  • Added table name to warning messages in Tableau collector in addition to the table ID.

  • Added parent-child relationship between projects in Tableau collector when the parent project is not included in the filtered projects.

  • Added pagination for certain queries in Tableau to prevent the result hitting the max node limit.

  • Updated automatic catalog upload functionality to accommodate large catalog graphs.

9-30-22 Collector v2.102

hash: 4dd8a1bdc776f0e8eb352954298842867c7873224d658f2abd8faefe31c40a76

  • Updated the Tableau collector to accommodate changes in the Tableau metadata api that were preventing detection of lineage relationships between Tableau fields and underlying database objects.

  • Updated the AWS Glue collector to handle an error with jobs that have a space or other invalid characters in their paths.

  • Updated the Databricks collector to include the UserAgentEntry property in the jdbc connection.

  • The collectors will now emit the Collector version to the logs.

  • Added a fix to SQL parsing for window aggregate functions (e.g. SUM(X) OVER (PARTITION by Y ORDER BY Z..) )

9-19-22 Collector v2.101

hash: 2be46a6268e34acceedd5b80412787d10f732ad1a2f1ceb83c6d5ce2fe819457

  • Added a filtering feature to filter Tableau fields by project in Tableau collector.

  • Fixed an intermittent authentication issue associated with harvesting metadata from a single site with Tableau collector.

  • Added a log message for missing job script in AWS Glue collector.

  • Enhanced harvesting of column-level lineage from database views, including handling SQL SELECT statements missing a FROM clause, and updated list of Snowflake keywords passed to functions.

9-9-22 Collector v2.100

hash: 946a0c51c091e74d6043dea1450a1ac818546b040e702e91526da185297a2858

  • Fixed the Fivetran collector so that it doesn't produce "blank" nodes (no id or name)

  • Added a change to use log_level rather than log-level.

9-2-22 Collector v2.99

hash: 68e4c4d6a6b40cb91a8e574a1f106c9c20ba2f1156a93f5f871b0284e975a766

  • Fixed an issue in Power BI with the new metadata API calls.

9-1-22 Collector v2.98

hash: 228090b0af31681952b7ccd5abef9beaf070450692501fef527bff8ca32280cb

  • Added harvesting of column-level lineage in the dbt collector, for dbt projects that target one of the collectors for which intra-database lineage is supported (i.e., Snowflake, Redshift, and PostgreSQL).

8-29-22 Collector v2.97

hash: 0a2134ef29a057b0c003ab353c14f034e65702b298a450f01fafcea5b6e8c1ea

  • PostgreSQL can now be cataloged using either catalog-postgres or catalog-postgresql for the command.

  • Microsoft SQL Server collector now harvests SQL Server extended properties for databases, schemas, tables, and columns.

  • Column-level lineage harvesting in the Snowflake, Redshift, and PostgreSQL collectors now properly harvests lineage from views whose sql statements include comments starting with “--“, and also statements with inline subselects.

  • The dbt collector harvests process (activity) and model (agent) metadata using PROV-O qualified derivations.

8-17-22 Collector v2.96 (no 2.95 release)

hash: a09e365296d57965385563569c6c58a6f706da1a4c1c6d711141aabd316d8629

  • Tableau collector now supports multiple --tableau-project options, allowing the user to include multiple projects in the same collector run.

  • Tableau collector no longer associates Custom SQL Table resources as part of a database.

  • The Collector no longer includes a bundled jdbc driver for Salesforce. Please contact for assistance in obtaining an appropriate JDBC driver.

8-15-22 Collector v2.94

hash: ac21b2f728b79e3dff38c2a395a81c0b0b1558979b8385747c6d00b76e1d6724

  • Enhancements to the Tableau Collector for project filtering and additional logging

  • Postgress additional triple for Table to Database linking

8-2-22 Collector v2.93

hash: afdecd160fd38e3db565cb14db3805fed05fa86b5c3a70662d0c8f0b0d10799f

  • Includes some internal dependency updates.

  • Enhancements to the DBT collector to validate the input profile.yaml file

7-29-22 Collector v2.92

hash: 9b87934376246cd3926bfe413d36f2a7a0f2e7d848d7f2e68380d6035fe276f6

  • Enhancement to the tableau collector to add a retry if a graphql query fails

  • Enhancement to the collectors to add a check at the beginning of a collector run to ensure the output directory exists (if the o/-output option is used), which will log an error and stop if the output directory doesn't exist.

7-21-22 Collector v2.91

hash: 708b34b19b2695d14d5a74a8281d5365cb659c4eeeebb791b3ebf2aa2e4d6686

  • Enhancement to the Tableau collector to reauthenticate if the Tableau API reports failed authorization during the collector run.

  • Released the new Fivetran collector (catalog-fivetran)

7-12-22 Collector v2.90

hash: bf94b0431b5a99dd95f485a8a48f202ea138f103b54873b4440b5080d86d529a

  • Added the parameter --include-information-schema for Snowflake and SQL Server collectors; we no longer catalog the information schema in these collectors when --all-schemas is specified, unless the user also specifies --include-information-schema.

  • Improved handling of manifest json structures with some nulls in the dbt collector.

  • Added reporting on user access issues during parsing/resolution for Snowflake collector.

7-9-22 Collector v2.89

hash: 848e38708b832c703652dc45d148e471a2341bce7f6ec159c2471f287a8d3620

  • Updated tableau collector to print a clear log message when authentication expires during a collector run

  • Update tableau collector to allow optimized serialization of API requests under JDK 17

7-1-22 Collectorv2.88

hash: 8534190cb3f0f93bd2a326abd54086e89eb38ece8180bf0487486dc66242d6c8

  • Significant updates to the Power BI collector. As of this release Power BI Collector outputs different classes than the version before it. The collector now emits information about where it is sourcing it's data.

  • Internal developer and testing improvements

  • The MANTA collector is now more specific about the concepts that it emits about Informatica PC.

6-24-22 Collectorv2.87

hash: 7860e33213ba90783851cd7f7e6529ee99a5f261ae086d3a7038938c6f290ae6

  • The information schema collector now explicitly supports Oracle.

  • We have added enhancements to the dbt collector to harvest DBT snapshots and sources.

6-22-22 Collectorv2.86

hash: 6fdae2dd70896e402ca648701bcd48210a8fd5979230c958b0dc06030bd7b1ec 

  • For collectors that take API endpoint URLs, the Collector will add a trailing slash to the URL if needed and not specified by the user

  • New command-line option --warehouse available for the Snowflake collector that allows the user to specify which warehouse to use to connect to snowflake.

6-18-22 Collectorv2.85

hash: aaa6e55bf19af7ef37f1ab80ad28522af77a6ff286ef616085d92ab51f7d7899

  • Added a the Collector collector for dbt - legacy collector still available.

  • Fixed an issue with auto-uploaded log files, in which not all log messages were being written out.

6-15-22 Collectorv2.84

hash: 2e128cd3c89ffc8c35fbad12f6ee4ba7e6e5cdf9bfcf991fac78e8033d5d17d0

  • Looker collector now emits resources for Looker Views and relates the Looker Dimensions and Looker Measures that are configured within those Views.

  • Improved handling of unexpected database types encountered when cataloging Tableau.

6-3-22 Collectorv2.83

hash: 9487027423a076231cec76f5679f044493a3d75032882c4ca0e5cf1c0304e6cf

  • Further improvements in handling of SQL ORDER BY, GROUP BY, WHERE, and HAVING clauses when harvesting intra-database lineage from database views.

6-2-22 Collector2.82

hash: 22459e3d3a2a38f448d4e56137ed4ecd05170767b5a682dd4870135cceff23c2

  • Corrected coining of IRIs in catalog graphs emitted by the Tableau collector.

  • Improved logging in the Tableau collector to detect unknown linked database types.

  • Improved harvesting of lineage between database views and referenced columns, including support for columns in SQL ORDER BY, HAVING, GROUP BY, and WHERE clauses, and parsing of a wider range of column expressions.

the Collectorv2.81 - INTERNAL RELEASE

5-24-22 the Collectorv2.80

Hash: be5a85c754d54328accabe332dec55ce507baddbe68d2fe9e29a211e9ea1420f

  • With this release, the Collector now requires Java 17. If you run the collector from within Docker this change will not affect you. If you run the Collector from a .jar file, you will need to upgrade your JRE to 17 to run DWCv2.80 and greater.

  • Add the parameter --disable lineage-collection to enable users to turn off cataloging lineage for PostgreSQL, Redshift, MS SQL Server, and Snowflake

5-13-22 the Collectorv2.79

Hash: 5b548c82b96ad5e5dbd4770adff205c9d07cac3c5f949882d7d9381240366ddb

  • The Manta collector can now accept OAuth tokens for MANTA authentication (for harvesting metadata from manta version R35 and above)

  • We have released a new collector powerbigov that only allows tenantid for auth and not user/password and connects to the government powerbi api urls.

5-11-22 the Collectorv2.78

Hash: 71edd8ff7a4c3ed8a91eaf36d59c8e2745b7a76f8666b5750cbee8205021c9c6

  • Added some small Tableau collector enhancements.

  • New PowerBiGov collector with specific endpoints for .gov customers. This collector does not accept a username or password.

  • For PowerBI, a new way to authenticate is available. A user can now enter a tenant ID with a client id and a client secret to authenticate, in addition to using a username and password.

  • For both PowerBi and PowerBiGov, when using the tenantid, secret and client id authentication method, this collector no longer emits information about PowerBI Apps.

4-27-22 the Collectorv2.77

hash: 4bed848791cfa9e46c9db4a78c7a593bb1c986900dc6fcfcd4255ddce1528579

  • Fixes an issue with the Snowflake collector that prevented the bundled jdbc driver from being found. Any users working with the Collector 2.76 should update.

4-22-22 the Collectorv2.76

hash: 30e60a4434ee64d2981b40eb2dc92506da3d367eab22bc0bca0c61bdd44a3f02

  • The Snowflake collector harvests some intra-database lineage information from database views.

  • Improved the host mapping in the Manta collector.

4-7-22 the Collectorv2.75

hash: 1a59dbb3ff8679fb6ee22eadaeb04ccdb28c5660be029e78fbc96403ae33096f

  • the Manta collector now emits resources for file sources and targets and their directory structure. It also emits sources and targets as files.

4-1-22 the Collectorv2.74

hash: 219428f6a72be91205408d5cb3f8cc8b27e1a9a4df0208e4cacb8fbaa1352f90

  • The Tableau collector now emits “column-level lineage”:

  • Improved styling of the Collector command-line errors

  • Updated command-line options for Datakin and Marquez.

3-16-22 dbt collector v.05

This version adds a third command-line argument to specify an output file name.

3-8-22 the Collectorv2.73

hash: 119daf987dcfad25db599e1c1affedf17a35ff2aa002d0618d642eb309cebaaf

  • Permalinks to Looker explores included via externalUrl

  • Improvements to datakin/marquez collectors

  • Tableau collector now emits resources for Tableau Projects, allowing us to establish full relationships between projects and the workbooks and views that they contain

  • Monte Carlo data collector now emits data quality information using enhanced dwec ontology concepts

  • Looker collector now emits descriptions for measures and dimensions

  • MANTA collector now emits Snowflake resources found in MANTA scans

3-1-22 the Collectorv2.72

Hash: 62d156aca58ec92513e8d6490f00fd10ee52dfb7a65f71c20c6a988c938dfddd

  • [BUGFIX] Invalid prefix when using --base option

  • Update the Collector transform to add catalog events to specific collectors

  • Added a Snowflake Sensitive Data Discovery collector

  • Sync CLI options between collector types

  • Validation of CLI options for the Collector

  • Improvements to the the Collector CLI

  • Update the MonteCarlo Collector to use the new Data Quality Ontology

2-17-22 the Collectorv2.71

Digest: 03fc3df90ae63896d62ea22e00688f42cacf5b76d0f47691c06c104736680b2a

  • Bug fix for Marquez collector

  • Bug fix for Manta collector

2-9-22 the Collectorv2.70

Digest: 06bb747c4d7705c1e44664de7854158d87468316bab549ec5604b0a075380c69

  • Preview images for Tableau assets are now harvested much more efficiently, and the resulting image data in the catalog graph are much smaller, reducing catalog harvest run time and enabling image objects to remain within platform constraints during ingest.

  • Fix for unexpected column type errors in BigQuery collector

2-8-22 the Collectorv2.69

Digest: 5ab9b97d5f8f4568613438a9e52b0bdc12974f8d6edd0dab374a281c4982c737

  • Created new collectors for Marquez and Datakin

  • Added schema information to the Tableau collector outputs

2-4-22 the Collectorv2.68

Digest: 23674ee02a6b725d5f9a453615dc507286da2ee606dca83c386472f3aa36d118

  • The Tableau collector now accepts Tableau “Personal Access Tokens” for authentication, via new cli options --tableau-pat-name and --tableau-pat-secret.

  • Fixed an issue with mis-identification of views as tables in BigQuery.

2-2-22 the Collectorv2.67

Digest: 032867c9c52c8d46dc0b90a61a128be65ecec1440bb0adccb8b0d1b249b4e351

  • Fixed an issue with server name identification in Manta.

1-26-22 the Collectorv2.66

Digest: fa9ae2eb3d68375a3ff01ac7bde98fd36f372b84dce0d411444146ea9566b47b

  • With this release the Athena collector is no longer a JDBC collector--we harvest metadata by accessing the Athena API directly, rather than going through a JDBC driver. This means that it is no longer necessary to provide a JDBC driver when running the collector.

1-10-22 the Collectorv2.65

Digest: ed08cdd21a374c30456de0989076f5180bc4187ca998358b051807e521fd44e6

  • This release adds a new option for the MANTA collector, --manta-max-parallel-scenarios. Specifying this option and passing an integer value will configure the MANTA API to export the specified number of scenarios in the MANTA graph in parallel. The default value is 4; adjusting this up or down can improve performance.

1-5-22 DWCv2.64

Digest: 45b72798b0602885790388331a75db1f4286b15bf57b21f30f416eda79041571 

This release upgrades the Collector's dependency on the Apache Jena RDF library to version 4.2.0, which addresses security vulnerability

12-23-21 the Collectorv2.63

Digest: sha256:eb4208c914269c793a5e2143d59a9982e7b087c5da1c17dd075e02a326e64a3e

  • The Athena JDBC driver is no longer bundled with the Collector as we have discovered that the Athena driver itself has a dependency on a vulnerable version of log4j. Customers that use the the Collector Athena collector will now need to supply their own driver and put it in the jdbc driver directory (as is done with other collectors for which we don’t distribute a driver).

12-15-21 the Collectorv2.62

Digest: sha256:2cd579e09f4eee94e141e8cf7e4e40e9a9b8803029df1be7112d67d62ef33b9e

  • The Oracle collector now supports connecting to the database via SID (instance ID) or Service Name. Service Name is the default. If a connection via SID is desired, pass the SID as the value of the -d/ --database option and add the --oracle-sid-mode option (flag).

12-13-21 the Collectorv2.61

Digest: sha256:bd0ba96208d714ecef4131867cf5d16372be0a33f416c1d6bd01f132c8517323

  • The information schema collector has been modified so that the files table_constraints.csv and constraint_column_usage are now optional, not required.

12-10-21 the Collectorv2.60

Digest: sha256:7fd825bfe7d2f99c9a1298ad26bc1934c9657cc7c5868dd093844344d18fc7b7

  • Updated the BigQuery collector to support current Google Cloud API enhancements.

  • Added a new Information Schema Collector. This collector runs via the {{catalog-information-schema}} command and is notably cataloging four CSV files that are provided to the collector via a {{--csv-file-directory}} parameter rather than connecting to a database. This collector is an option for customers with tricky DB setups that do not allow them to authenticate or establish connections to their DB via our normal the Collector collectors.

12-2-21 the Collectorv2.59

Digest: sha256:051f76748be1c6cf2c7557600dde71a39e1b822c9e49120881ce938f1c8c2b80

  • Verified the Manta collector works with MANTA R34.

  • Released the config file command.

  • Modified the Tableau collector to remove schema and database names from table names.

  • Updated the BigQuery collector to support cataloging all datasets in a project at once by default, and to be able to use cli options to select specific datasets in a project as well. With this last change, the  --dataset param is no longer required. The help text has been updated with new messaging to reflect these changes.

11-10-21 the Collectorv2.58

Digest: sha256:82ebc1cec46f70de000aa94695359bd28d65c2782afc362c9ce14fadc04eae07

  • Added a new collector for Hive (as an alternative to catalog-hive) that uses only the Hive metastore--it does not connect to the Hive server directly. 

  • The PowerBI collector now harvests workspaces and identifies other assets as being in workspaces

  • the Collector now emits “catalog events” into the catalog graph. These capture details about the cataloging process itself, including selected configuration options with which the Collector was run, and summary statistics about the catalog. The ingest process will soon extract this information from catalogs at ingest time and send them to segment for downstream analysis.

11-1-21 the Collectorv2.57

Digest: sha256:606f7cfbe60bf56b4c2ecd5fb3902d4de621e31ae76ad78e68c56c788f81e5e6

  • Fixed an issue in the Tableau collector in which Custom SQL Table objects without an associated database were not handled correctly.

10-27-21 the Collectorv2.56

Digest: sha256:335f7e110a9506d95dff05971492e6509fb8537e74f9275d04dcf9e2427df0f0

  • Added new cli options to salesforce collector so that it can handle sandbox environments and custom login domains customers might have.

10-25-21 the Collectorv2.55

Digest: sha256:c60ae69edc88b8801be833d578ef5dca73b6302646be9b30d31ccdfd7444288a

  • This release updates the BigQuery collector to handle fields in BigQuery tables for which the BigQuery API returns null type.

10-5-21 the Collectorv2.53

Digest: sha256:59c960d525e66e77d08dd34fd58c9b5027334a4bd2271f1f059370ae006a4b0b

  • Enhancements to the MANTA collector to harvest additional lineage information from MANTA scans (lineage from Informatica PowerCenter in particular)

  • Tableau collector enhancement to provide a better warning to the user when an obsolete version of the Tableau API is specified

9-29-21 the Collectorv2.52

Digest: sha256:915e4e91841001f80a84a65fcd76350b9a1d53f4e31678bb0e628d32beab94a1

  • Fixed an issue with the handling of certain fields and database information when the Tableau collector was run with a non-admin credential.

9-28-21 the Collectorv2.51 (internal)

Digest: sha256:261c5bf33b2ae38cbda35a346fcb37c56bbf8ebfb773f328deb9140efba1c8bf

  • Fixedan issue with the Tableau collector issue to handle views/workbooks that exist outside of a project.

9-28-21 the Collectorv2.50 (internal)

Digest: sha256:b407c629247f36afac3869eb8320464fce8caeb2865dd79811882b54ef94d1b5

  • Fixed an issue with the Tableau collector to handle workbooks that exist outside of projects.

9-24-21 the Collectorv2.49

Digest: sha256:397e78867f41aaa393ff69f42b0fa524fdcad662ddd027925cf27f80497b24ce 

  • Added a collector for Salesforce (catalog-slesforce)

  • Fixed a IRI mismatch issue for Tableau Collector when running on Tableau instances with a Snowflake datasource.

9-18-21 the Collectorv2.48

Digest: sha256:c36755489b6235408aa4e639e6e184cab027a32a34e3b8ca369c3c6b3c4bff96

  • Made internal improvements to the tableau collector to enable more efficient querying of the Tableau metadata api.

  • Fixed an issue in the manta collector in which certain missing data in the MANTA lineage graph caused an exception

9-10-21 the Collectorv2.47

Digest: sha256:219edfa247929e15d7c4e2be99ef890b2487c398abc1a23b2f85b3de11812be3

  • Fixed an issue in the Reltio collector that occured when a Reltio configuration was missing certain objects.

  • Added a collector for Databricks (catalog-databricks)

9-8-21 the Collectorv2.46

Digest: sha256:e48cba45b457e076714d94d3a83d1164cb892864213732b3b2b334c041ff178a

  • Fixed an issue with creation of resource IRIs by certain collectors when the user chooses version 1 minting

  • Updated BigQuery collector to enable integration with platform / connection manager

  • Fixed an issue with the MANTA collector in which certain large MANTA scans caused a numeric overflow during json de-serialization

  • Updated Reltio collector to include information about survivorship groups in the emitted catalog

8-24-21 the Collectorv2.45

Digest: sha256:77f4c784b1d0166cf3bb87903696528f712fbe6aee1d4cb7e60097a0f494c7de

  • This release fixed an issue with JDBC drivers not being loaded by the Athena collector.

  • Added a collector for Reltio configurations (catalog-reltio).

the Collectorv2.44

Digest: sha256:47c1bb38b88c25801adf1f765e23c63637d15a60ae11fca8d63b53a8cd4755b2

  • Fixes an issue with URLs for sheets and dashboards that exist in Tableau Online or in Tableau Server within a site other than the default site.

the Collectorv2.43



  • Additional datetime fields added for Looker objects and typed as xsd:dateTime.

  • Fixed an issue caused by an undocumented change in Tableau Online’s REST API when using the Tableau collector to harvest metadata from Tableau Online.

the Collectorv2.42

Digest: sha256:e6bc353ea4b2ec3486b54d4e9280856d328d93f5d406e367c0c50303cde93704

  • The generic jdbc collector harvests database name when cataloging Intersystems Cache databases

  • Running the Snowflake collector with the -A / --all-schemas option harvests metadata from all available schemas, as with other collectors

the Collectorv2.41

Digest: sha256:bb79aa8afd19bf35b4b7e75840c21598702ec1d74b5f8640cc72a6758a3a0bc9

  • Fixed an issue with permalinks to objects in the MANTA collector .

the Collectorv2.40

DIGEST: sha256:44dd710a49a1500863f49e2f2e4ef261a45cdc6c7354702fe8e764210c27293b

  • Added support for Looker folders and additional attributes to the Looker metadata collector.

  • Added the ability to preview images to the Tableau metadata collector.

the Collectorv2.39

Digest: sha256:992671530f7483bfeb8a2aab52880a524b7df79caf427b373bd825115d71f4dc

  • Fixed an issue with the handling of certain special characters in catalog resource IRIs.

  • The --schema option for JDBC collectors can now be specified multiple times to enable the cataloging of multiple schemas in a single catalog.

the Collectorv2.38

Internal release

the Collectorv2.37

Digest: sha256:6a84217fa33df75d67ce51c486a90a802a8313a3432835abb55fffb5f1d3afc7

  • Updated Tableau collector to paginate additional graphql queries to avoid hitting Tableau Metadata API limits.

  • Updated the Hive2 collector to capture table-level metadata from the hive metastore

  • Updated the Tableau collector to allow the user to exclude specified Tableau objects from the catalog

the Collectorv2.36 

Digest: sha256:8dd9793f3b0e74adcd7e7bc153f06b8c3098470217fb07af4336dde611269671

  • Improvements to error messages produced when using a config-file to run the Collector

  • We disallow running catalog-postgres and catalog-redshift in the same config file as the two collectors use incompatible JDBC drivers

  • Improved error handling throughout the Collector

  • Improvements in representation of Tableau data source names in tableau catalogs

  • Improvements to the MANTA collector

the Collector v2.35 Changes in this release:

  • Upgrade of Denodo collector to Denodo 8

  • Handle edge case of very large field values embedded in manta’s exported artifacts

  • Support for sites

  • Handle edge case of stored procedure columns in manta

the Collector v2.34 This release includes:

  • Enhancements to domo collector output

  • Testing improvements

  • A minor tableau collector enhancement

  • Fix for an issue in the tableau collector in which column fields were sometimes not properly identifying the Tableau Table from which they sourced their data

  • Improvment to the presentation of domo catalogs in the platform UI.

  • Changes to the dockerhub repository where we house images containing non-released versions of the Collector. Previously we were calling these “beta” releases; we now call them “release candidates”. The new repository is datadotworld/dwcc-rc and the image tags are x.y-rc-z where x.y is the next expected Collector release, and z is an increment.

the Collector v2.33 Adds support for harvesting intra-database lineage from manta scans, and accommodates changes in MANTA R32 (aka 1.32). We no longer support MANTA versions earlier than MANTA R32.

the Collector v2.32 This release adds in collector support for Vertica db.

the Collector v2.31 Issued fix to ensure alignment of identifiers for databases referenced by Tableau and Looker collectors.

the Collector v2.30 Installed a config file-driven configuration (as a hidden feature for now). Issued a fix for handling empty powerbi objects returned by the API

the Collector v2.29 The catalog collector now supports Tableau Online! Additionally there was a bugfix for PowerBi.

the Collector v2.28 Bugfix release

the Collector v 2.27 Added the optional CLI option tableau-graphql-page-size to the Tableau collector which allows the user to set a number of objects to be included in each page of paginated queries.

the Collector v2.26 Updated the PowerBi collector so that if a report is unavailable via the API it will be logged, and cataloging will continue on the rest of the repository.

the Collector v2.25 This release includes better and more user-friendly error handling and reporting. We have also added an enhanced collection of Tableau metadata via the Tableau Metadata API (graphql endpoint). New metadata includes data sources, databases, fields, metrics, and many more inter-object relationships.

the Collector v2.24 the Collector is now distributed via Dockerhub Additionally there are changes to the Tableau and PowerBI collectors, and the ability to change the level of error messages written to the console and log file, and a new subcommand to display the the Collector license text.

For Tableau:

  • The Tableau collector now emits RDF in which the object of `dct:creator` is a `dwec:Agent` instead of a string literal. This means we write additional details about the Tableau account that created the dashboard, via properties of the `dwec:Agent` resource. These details include: account name, account “full name”, and account email address (if they are populated in Tableau).

For PowerBI:

  • The PowerBI collector writes resources representing powerbi “data sources” that are now of a PowerBI-specific class, rather than `dwec:DataArtifact`.

Logging changes:

  • It is now possible for users to set the level (severity) of log messages written to the console and log file. By default, we write “info” level messages; users can choose to write only errors (level=“ERROR”), errors+warnings (level=“WARN”), or all messages including debug trace (level=“DEBUG”). This is useful if we want to have customers run the Collector with debug logging turned on, for troubleshooting problems etc.

Display the Collector license information:

  • License information for the Collector is now available as a subcommand of the Collector. To get all licensing information, run the command docker run -it --rm datadotworld/dwcc:X.XX display-license where X.XX is a version of the Collector greater than or equal to 2.24.

the Collector v2.23 Internal release

the Collector v2.22 Internal release

the Collector v2.21 fixed some timeout issues with Looker collector when fetching images from the Looker API. Fixed an issue with cataloging reports and dashboards based on user workspace permissions in PowerBi.

the Collector v2.20 With this release our Tableau collector now supports cataloging of workbooks and non-dashboard views as well as harvesting tags on workbooks and views. FIxed an issue in the Looker collector where preview images returned from looker api were missing.

the Collector v2.19 Includes a clean-up of the embedded help commands for several collectors and:

  • Fixes an issue with the Tableau Server collector when cataloging multi-site server instances.

  • Adds --tableau-site parameter to enable user to restrict cataloging to a single site (not required, by default all sites in the instance are scanned). Value provided to --tableau-site can be a site ID or name.

the Collector v2.18 The tableau collector now has a flag option --tableau-skip-images which skips the harvesting of preview images for views. Usage is like this:

... catalog-tableau --tableau-api-base-url= --tableau-username=admin --tableau-password=password -a sc-test3 -n tableau-test --tableau-skip-images

the Collector v2.17 Adds a collector for Presto

the Collector v2.16 This release:

  • Adds the parameter --all-databases to the Athena collector so that it can catalog all the databases accessible from the logged-in account.

  • Fixes some issues with datatypes for dwec:externalUrl predicates.

the Collector v2.15 This release contains the following:

  • The Tableau collector formerly had a CLI parameter --tableau-project-id which could be used to catalog only assets in the project with the specified ID. The parameter is now --tableau-project and takes either a project ID or project name

  • Update to the MANTA collector to accommodate a minor change in the MANTA API with v 1.31. Customers who have updated their MANTA instance to v 1.31+ will want to use the Collector 2.15+.

  • The Looker collector now works for non-admin Looker users; however, when the Collector is run by a non-admin, the emitted catalog will not contain any information about databases used by Looker analysis assets (access to database information in Looker requires admin permissions).

  • All JDBC collectors now populate two new properties for dwec:DatabaseColumndwec:columnDefaultValue  and dwec:columnIsNullable, which contain the default value for that column in newly inserted rows, and whether the column can be null, respectively. (Note that only some databases/drivers provide this metadata…we put it in the catalog if it’s there).

the Collector v2.14 Adds a collector for Looker. Minor update to the script that includes available versions in the error message if you don’t supply a version.

the Collector v2.13 Adds cli params with this version so it now possible to pass arbitrary driver properties through to the connection

the Collector v2.12 Adds collector for SAP (formerly Sybase) SQL Anywhere metadata collector

the Collector v2.11 Improves the Dremio collector’s handling of data sources nested within multiple layers of folders, and fixed a minor issue with the Dremio collector’s harvesting of lineage metadata from the Dremio graph API.

the Collector v2.10 Adds a collector for Domo and JDBC database collectors can now catalog all schemas in the database at once (default remains to catalog only user's default schema).

the Collector v2.9 Adds Tableau Server collector and extended the OpenAPI collector to include a few additional schema property metadata properties.

the Collector v2.8 Adds Infor ION data lake collector. Optimized collection of JDBC metadata (performance improvement).

the Collector v2.7 Adds a collector for PowerBI.

the Collector v2.6 Adds the Manta collector.

the Collector v2.5 Upgrads Java runtime.

the Collector v2.4 Extends handling of OpenAPI collector parameters and responses.

the Collector v2.3 Adds support for OpenAPI (fka Swagger) collector.

the Collector v2.2 A refactoring release.

the Collector v2.1 Fixes an issue with the Denodo cataloger jdbc url port.

the Collector v2.0 We now use v2 URIs as the official locator IDs for metadata resources. This is a breaking change (for structural, intentional reasons) which is not backwards compatible with v1 URIs. For more information see the article on the Collector v2.X.

the Collector v 1.20 Addresses some memory issues and open-cursor leaks.

the Collector v.1.19 Adds writing statements to the catalog graph indicating that the catalog was the Collector by the Collector (with a version). We also added the ability to write database schema objects to the catalog graph.

the Collector v1.18, Allows you to specify alternate organization permissions and upload locations when performing an automatic upload of the metadata.

the Collector v.1.16 and the Collector v.1.17 Address issues with the SQL Server cataloger.

the Collector v.1.15 Adds Dremio support with optional Catalog API lineage fetching.

the Collector v1.14, Enables you to change the amount of memory that gets allocated to a the Collector docker process. See our article on allocating additional memory to Docker for more information.

the Collector v.1.13 Adds support for Microsoft SQL Server, and we enable JVM to use available memory in the container (useful for creating large catalogs). Additionnally we Improve data type recognition in AWS Glue cataloger.

As of the Collector v1.12 we can support not only Glue ETL jobs, but also Glue Data Catalog tables and columns.

With the Collector v.1.11 you can:

  • Upload generated catalogs via the --upload / -U command-line parameters

  • Upload the the Collector log when uploading generated catalogs with --upload

  • Fetch an organization's current catalog with the fetch-catalog command

In the Collector v1.10 we added support for AWS Glue and AWS Athena including cataloging ETL jobs associated with an AWS account. There is no need to mount in a jdbc drivers directory as the Glue cataloger uses the Glue API, not JDBC.

dwc v.1.9 is a bug cleanup release.

It is now possible with the Collector.1.8 to use jdbc drivers on classpath as well as those found in user-specified JDBC Driver Directory (drivers in directory have higher precendence than classpath drivers).

the Collector v.1.7 is a bug-fix release

the Collector v.1.6 adds the support for arbitrary jdbc data sources and the ability to build one-off docker images for testing, demos, etc.,

With the Collector v.1.5 we add support for Oracle.

In the Collector.1.4 we add support for Google BigQuery.

the Collector v.1.3 brings much new functionality including:

  • Support for Denodo and Snowflake

  • Compatibility of JDBC catalogs with tables imported through integrations

  • Ability to differentiate source information for databases cataloged from localhost

  • Cataloging of REMARKS fields into dct:descriptio

With the Collector v.1.2 we support Redshift databases.

the Collector v.1.1 contains documentation clarification and expansion for the documents to streamline tags on customer docker hosts.

The initial release of the Collector v.1.0 provides support for metadata catalog extraction for DB2, Hive, MySQL, Postgres.