Skip to main content

Catalog collector release notes

Important

Published versions of collectors are available as a docker image and a JAR file.

Release version 2.222

Details about the release

Table 1.

Item

Details

Release version

2.222

Release date

July 23, 2024

Docker image ID

Jar file



New features and changes

  • Power BI Service and Power BI Gov collectors: The collectors now support Denodo sources in Power BI column-level lineage parsing.

  • Denodo collector: The collector now harvests column-level lineage.

  • SQL Server Integration Services (SSIS) collector: Added a new --jdbc-property parameter. This allows you to provide authentication details for NTLM Authentication type.

  • dbt Core and dbt Cloud collectors: The collectors now harvests model columns from catalog.json and manifest.json database objects.

Bug fixes

  • Power BI collector: The collector now properly handle scenarios where columns are renamed in Power BI that resulted in duplicate columns in source tables.

  • Azure Data Factory collector: The collector now properly performs Date transformation when the time zone is not available as ZoneID.

  • Azure Data Lake Storage Gen2 collector:

    • Updated the collector to remove redundant permission-related relationships.

    • Fixed an issue with the IRIs creation for collector resources by using correct terms.

Release version 2.221

Details about the release

Table 2.

Item

Details

Release version

2.221

Release date

July 15, 2024

Docker image ID

Jar file



New features and changes

  • Power BI Gov collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Fixed an issue where placing a comment directly after a keyword without a space was sometimes causing parsing issues.

    • Fixed an issue with parsing CREATE VIEW statements where parentheses were being incorrectly removed during the SQL pre-processing.

    • Proper error messages are now logged when users run the collectors with the --dry-run option without specifying a single database or with multiple databases.

  • Snowflake collector: Resolved an issue where the collector was cataloging an incorrect database when the user had a default namespace set in Snowflake.

  • Databricks collector: Fixed an issue where the collector output files uploads were failing due to spaces in IRIs.

  • QuickSight Collector: Fixed an AwsAccountId null error while listing resources using pagination, which was causing issues in cataloging all the specified resources.

  • Azure Data Factory collector:

    • Resolved an issue with truncated paginated results.

    • Fixed an issue with the title of global parameters by correctly using the parameter name.

Release version 2.220

Details about the release

Table 3.

Item

Details

Release version

2.220

Release date

July 10, 2024

Docker image ID

Jar file



New features and changes

  • Oracle collector: Added support for lineage when the SELECT statement contains synonyms. This enhancement fixes lineage tracking between Oracle and Power BI when synonyms are used.

  • Power BI collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.

Bug fixes

  • Power BI and Power BI Gov collectors:

    • Fixed an issue with parameter value replacement in expressions when the parameter contains a $ symbol.

    • Fixed an issue where Power BI reports failed to process when the page name is null.

Release version 2.219

Details about the release

Table 4.

Item

Details

Release version

2.219

Release date

July 8, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:a9f524581a769ade5b7c01a887a7dbd90510a94ce6a37e115fc7a6a7e40f0557

    • amd64: sha256:f10deae54bbf33be7a40352ae048dd74d3981698dabceae748d513250dcf800a

Jar file



New features and changes

  • All collectors: Each catalog resource in the catalog output file now contains information about the collector that harvested the resource. This information is available only in the catalog file and can be used in SPARQL automations.

Bug fixes

  • Power BI and Power BI Gov collectors: The collectors now properly handle scenarios when they run into API request limits. A new parameter Disable max requests wait (--disable-max-requests-wait) is added for handling these scenarios.

  • Azure Data Lake Storage Gen2 collector: Resolved an issue where certain ACL information missing in the Azure Data Lake Storage API response caused errors in the collector.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Harvesting of column-level lineage from views now supports view definitions containing unaliased subselects.

Release version 2.218

Details about the release

Table 5.

Item

Details

Release version

2.218

Release date

July 1, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:93c8ebbd04553ad2123d944efca841dccc44b06dacdd0bd9683bbd7fe41c282e

    • amd64: sha256:d078c72ae238ab02d0a1e8cc003509eba463ed78f97cf2835073ee586d9fc741

Jar file



New features and changes

  • Power BI and Power BI Gov collectors: The collectors now support parsing data source expressions for Power BI tables where the source connection information is defined as a parameter. This means that if Power BI users specify data source connection information in a parameter and use that parameter in place of the source in the expression, the collectors will correctly parse and resolve the expression/lineage.

  • Oracle collector: The collector now harvests from DBA_ views if the credential used to execute the collector lacks permissions for information schema views.

  • dbt Core collector: The collector now harvest database objects and intra-database lineage from dbt projects and artifacts that use Azure Synapse as a backend.

  • All collectors: Collectors now verify that the user-requested upload location exists with proper permissions before execution and issue a warning if a problem exists.

  • Databricks collector: The collector no longer supports Databricks-managed password authentication. If you used this method of authentication, you must change the authentication to personal access token. For details, see "Preparing Databricks for collectors".

Bug fixes

  • SQL Server collector:

    • Fixed an issue where large values for column statistics produced an arithmetic overflow.

    • Resolved a problem where view definitions that include the TOP() expression were not properly handled when harvesting column-level lineage for views.

  • Power BI and Power BI Gov collectors: Fixed an issue where logging operations were causing an exception if certain Power BI objects were null.

  • Tableau collector: Fixed an issue where certain Tableau projects were not fully cataloged.

Release version 2.216

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 6.

Item

Details

Release version

2.216

Release date

June 26, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:52406bc14a26061e473987b281f3c354592b811af99c7fc8cf86833cffc9e819

    • amd64: sha256:52a347688c2e68806bf82a284e5a0ee8c9dfbda5f970f24ab69037e8025e3e8f

Jar file



Release version 2.215

Details about the release

Table 7.

Item

Details

Release version

2.215

Release date

June 26, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:e8bd4f7ba78527cea800acb3a03a1440b283f1fdd81097ad19a7a8b9a5362d9c

    • amd64: sha256:9018f2d0a0b734e5bbd858e1ba5a00b88b56982da6b9c0f690f6fff901016630

Jar file



New features and changes

  • Power BI Gov Collector:

    • The collector now supports harvesting of all workspaces and apps using the --all-workspaces-and-apps parameter.

    • Added the ability to disable lineage harvesting using the --disable-expression-lineage parameter.

Release version 2.214

Important

This release was for internal improvements and has no customer impacting changes.

Details about the release

Table 8.

Item

Details

Release version

2.214

Release date

June 25, 2024

Docker image ID

Jar file



Release version 2.213

Details about the release

Table 9.

Item

Details

Release version

2.213

Release date

June 25, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:5c57df9b1285ce0f663955d0f90ad8c7c6e90acc110538473065d418a31ad2c9

    • amd64: sha256:20bf229751e3657c960ecc68a484b2df5be094215bc6e6163f206c8dcb9f0dba

Jar file



New features and changes

  • Azure Data Factory collector: The collector now harvests Expressions for table names, schema names, file names.

  • new collector for SQL Server Integration Services (SSIS) is now available in private preview. If you would like access to this collector, please contact your Customer Success Director.

Bug fixes

  • Power BI and Power BI Gov collectors: The collectors now correctly harvest lineage for column types.

Release version 2.212

Details about the release

Table 10.

Item

Details

Release version

2.212

Release date

June 21, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:7880afc402b3db8f45a31bb90f565d417feb0c2fbfcc78715854da24c594cfc6

    • amd64: sha256:6218bc7e03a1d5b3195d64625ee4525f7739d59bbcb920afb95d719e8b90fe6a

Jar file



New features and changes:

  • Snowflake collector: The collector now harvests metadata for functions and stored procedures from the snowflake.account_usage views when the metadata is unavailable from the information_schema of the database.

  • Power BI and Power BI gov collectors now catalog:

    • Dataset table expression

    • Description for the workspace, app, and dataset

Bug fixes:

  • ADF collector: Fixed an issue with datetime parse errors while harvesting triggers.

Release version 2.211

Details about the release

Table 11.

Item

Details

Release version

2.211

Release date

June 15, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI gov collectors: The collectors now support lineage for Oracle database objects.

Bug fixes

  • Power BI and Power BI gov collectors: Resolved an issue with collecting child resources for apps when using service principal authentication.

  • Snowflake and Oracle collectors: Fixed an issue where the system now correctly does not harvest function lineage when users enable the Disable lineage collection (--disable-lineage-collection) option.

  • Oracle collector: Fixed an issue with harvesting database columns of LONG type.

Release version 2.210

Details about the release

Table 12.

Item

Details

Release version

2.210

Release date

June 7, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI Gov collectors:

    • Added a new feature that provides support to parse SQL statements within table expressions, enabling column-level lineage harvesting. To use this feature, you need to use the --datasource-mapping-file to specify the credentials. These credentials allow the collector to link lineage to the database sources.

    • The collector now harvests measures.

  • Databricks collector: The collector now harvests table and column tags by schema.

Bug fixes

  • Snowflake collector was unable to harvest lineage if the SQL statement included a dash in the column aliases.

  • Snowflake, Teradata, Netezza collectors: Fixed an issue that occurred because of insufficient information while harvesting agent resources for functions and procedures.

  • SQL Server collector: Fixed an issue that occurred while parsing view queries where columns have dashes in their names.

Release version 2.209

Details about the release

Table 13.

Item

Details

Release version

2.209

Release date

June 2, 2024

Docker image ID

Jar file



New features and changes

  • Databricks collector: The collector now harvests table and column lineage from system tables. To use this feature, you need to set new permissions for the collector.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Resolved a problem concerning column statistics when an aggregate statistic has a zero value.

  • Tableau collector: Resolved an issue to correctly associate lineage with the appropriate parent project.

  • Sigma collector: Resolved an issue which occurred when a dataset referred to in the lineage was not available among the harvested datasets.

  • Snowflake collector: Fixed an issue associated with external URLs containing special characters.

Release version 2.208

Details about the release

Table 14.

Item

Details

Release version

2.208

Release date

24 May, 2024

Docker image ID

Jar file



Bug fixes

  • Snowflake collector:

    • Resolved the issue that arose from Snowflake not returning function metadata.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Addressed the issue encountered during the harvesting of column statistics when the result set contained non-integer values.

Release version 2.207

Details about the release

Table 15.

Item

Details

Release version

2.207

Release date

21 May, 2024

Docker image ID

Jar file



Bug fixes

  • BigQuery collector: The collector is updated to generate catalog records for BigQuery Label instances. This allows them to be visible on the resource pages in the application.

  • Sigma collector: Resolved an issue that could result in an exception when the Sigma APIs failed to return a table path for a connection.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Enhanced error log statements by adding fully qualified table names when certain tables or columns in the database cannot be located during lineage resolution.

Release version 2.206

Details about the release

Table 16.

Item

Details

Release version

2.206

Release date

17 May, 2024

Docker image ID

Jar file



New features and changes

  • Sigma collector:

    • A new --pagination-limit parameter is now available for the collector. You can use this parameter to set the page size for the Sigma API response. The maximum value you can set is 1000. If you do not specify a value, the default page size is 25.

    • The collectors is optimized to enhance the efficiency of lineage harvesting.

  • Snowflake collector: The collector now harvests extended metadata for tables, views, and materialized views.

Bug fixes

  • SQL Server collector: Incorporated additional debug logging for when the collector fails to harvest extended metadata.

  • Oracle collector:

    • The collector is now able to handle column names with single quotes in them.

    • Fixed an issue with synonyms being harvested in the wrong schema.

Release version 2.205

Details about the release

Table 17.

Item

Details

Release version

2.205

Release date

17 May, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI Gov collectors: The OBDC data sources YAML file (datasources.yml) is updated to allow user-specified aliases for the database location (host). This ensures that resources are accurately linked across collectors.

  • Snowflake collector: Added support for harvesting materialized views for SQL definition, External URL (Snowsight).

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • The collectors are optimized to load JDBC drivers more efficiently, thereby reducing memory usage.

Release version 2.204

Details about the release

Table 18.

Item

Details

Release version

2.204

Release date

10 May, 2024

Docker image ID

Jar file



Bug fixes

  • SQL Server collector: The collector now correctly manages a scenario to use a consistent case when a collation is set.

  • dbt core and dbt cloud collectors: The collectors are optimized to correctly manage scenarios that previously caused an exception while harvesting lineage.

  • Sigma collector: The collector is optimized to manage scenarios that were previously causing the collector to not run properly.

Release version 2.203

Details about the release

Table 19.

Item

Details

Release version

2.203

Release date

8 May, 2024

Docker image ID

Jar file



New features and changes

  • dbt Core collector:

    • Now supports multiple run_results.json in single collector run. Add the new parameter --run-results-directory to your command/YAML file to use this new feature.

    • Now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.

  • dbt cloud collector now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.

Bugs

  • Sigma collector properly deserializes objects from Sigma API.

  • Power BI and Power BI gov collectors now properly obtains server name and port from Power BI data source parameters.

Release version 2.202

Details about the release

Table 20.

Item

Details

Release version

2.202

Release date

7 May, 2024

Docker image ID

Jar file



New features

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Optimized view query parsing to improve the processing time for large SQL statements.

    • Optimized the querying of metadata during view lineage harvesting.

  • Oracle Collector: Added a new --oracle-jdbc-timezone-as-region parameter. This allows you to decide if the Oracle JDBC connection timezone should utilize the JVM's default timezone.

Bug fixes

  • AWS Glue Collector: Improved the log message that are recorded when the harvesting of job lineage fail.

Release version 2.201

Details about the release

Table 21.

Item

Details

Release version

2.201

Release date

2 May, 2024

Docker image ID

Jar file



New features and changes

  • Oracle and SQL Server collectors: The collectors now catalog column-level lineage when functions and stored procedures contain sub-selects.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Performance optimizations are done to the collectors to improve the overall runtime of the collectors.

    • A new parameter --disable-extended-metadata is now available that allows you to skip harvesting of extended metadata for resource types such as database, schema, table, columns functions, stored procedures, user defined types, synonyms. Basic metadata for these resource types will still be harvested.

  • Power BI and Power BI gov collectors now catalog:

    • Relationships between Power BI apps and workspaces

    • Apps with associated workspace IDs (when service principal authentication is used)

Bug fixes

  • Teradata collector properly harvests lineage metadata from views with SQL statements containing REPLACE RECURSIVE VIEW, LOCK ROW ACCESS.

  • Oracle collector properly harvests lineage metadata from views with COLLECT.

  • All collectors properly handle config file options that start with option flags.

Release version 2.200

Details about the release

Table 22.

Item

Details

Release version

2.200

Release date

19 April, 2024

Docker image ID

Jar file



New features and changes

  • All collectors: Users now have the option to define a custom output file name for the collector catalog during run time. To do this, use the --output-name parameter. The system automatically adds .dwec.ttl to the end of the provided file name.

    Note

    If you are updating the file name for an already configured collector, make sure to check and modify any existing SPARQL queries that explicitly mention existing collector output files.

  • Oracle Collector: The collector now harvests Oracle package bodies and Oracle package specifications.

Bug fixes

  • SQL Server collector Fixed an error that occurred when harvesting column statistics.

  • Power BI and Power BI Gov collectors: Resolved an issue that was causing errors during the parsing of expressions that used the Table.RenameColumns Power Query table function in certain cases.

  • Snowflake Collector: The collector now properly harvest tags that are defined in a different schema than the schemas specified for the collector.

  • The following collectors are updated to harvest lineage accurately for group by, order by, where, and having SQL expressions. Prior to this update, the relationships were incorrectly directed.

    Postgres, Databricks, Derby, Netezza, Oracle, Redshift, Snowflake, SQL Server, Teradata collectors

Release version 2.199

Details about the release

Table 23.

Item

Details

Release version

2.199

Release date

11 April, 2024

Docker image ID

Jar file



New features and changes

  • new collector is now available for Amazon Managed Streaming for Kafka.

  • Oracle collector: The collector now harvests lineage from views, stored procedures, and functions.

  • Snowflake collector: The collector now harvests Streamlit apps.

  • The following collectors now support harvesting from multiple databases specified by users. This means you can provide the --database parameter multiple times while running the collector.

    • Databricks, PostgresSQL, SQL Server, Db2, Redshift, Denodo, Oracle, MySQL, Snowflake, Teradata

Bug fixes

  • Power BI and Power BI Gov collector: Resolved an issue that was caused by parsing expand column expressions.

  • dbt cloud collector: The collector now properly harvests metadata of dbt Cloud artifacts when the target database is not Snowflake. Note the collector will only harvest metadata from the dbt Cloud artifacts and not connect to any unsupported target database to obtain database lineage metadata.

  • Snowflake collector: The collector harvest policies associated with cataloged database objects, regardless of the database in which the policies reside.

Release version 2.198

Details about the release

Table 24.

Item

Details

Release version

2.198

Release date

9 April, 2024

Docker image ID

Jar file



New features and changes

  • Oracle collector: The collector now harvest Synonyms.

  • Athena collector: Starting with release 2.198, data.world no longer package the Athena JDBC driver with the Athena collector. You can continue to use the releases previous to 2.198 as-is, but when you updated to the collector version to 2.198 or higher, you will have to download and mount the driver for the collector and update the collector command to include the driver path.

Release version 2.197

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 25.

Item

Details

Release version

2.197

Release date

5 April, 2024

Docker image ID

Jar file



Release version 2.196

Details about the release

Table 26.

Item

Details

Release version

2.196

Release date

2 April, 2024

Docker image ID

Jar file



New features and changes

  • Log files for collectors: The collector log files for each collector run now have unique names. This allows logs to be written to separate files when running multiple collector instances.

  • Reltio collector: Survivorship groups and mappings are now recognized as primary entities with catalog records.

  • Snowflake collector: The collector now harvests tags associated with database objects in the user-specified database, regardless of the database in which the tag resides.

Bug fixes

  • Teradata collector: Fixed an issue that was blocking column harvesting due to invalid column references in Views.

  • Azure data Factory collector: Fixed an issue preventing successful file uploads to data.world.

Release notes for previous versions