Catalog collector release notes
Important
Published versions of collectors are available as a docker image and a JAR file.
Release version 2.222
Details about the release
Item | Details |
---|---|
Release version | 2.222 |
Release date | July 23, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI Service and Power BI Gov collectors: The collectors now support Denodo sources in Power BI column-level lineage parsing.
Denodo collector: The collector now harvests column-level lineage.
SQL Server Integration Services (SSIS) collector: Added a new --jdbc-property parameter. This allows you to provide authentication details for NTLM Authentication type.
dbt Core and dbt Cloud collectors: The collectors now harvests model columns from catalog.json and manifest.json database objects.
Bug fixes
Power BI collector: The collector now properly handle scenarios where columns are renamed in Power BI that resulted in duplicate columns in source tables.
Azure Data Factory collector: The collector now properly performs Date transformation when the time zone is not available as ZoneID.
Azure Data Lake Storage Gen2 collector:
Updated the collector to remove redundant permission-related relationships.
Fixed an issue with the IRIs creation for collector resources by using correct terms.
Release version 2.221
Details about the release
Item | Details |
---|---|
Release version | 2.221 |
Release date | July 15, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI Gov collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Fixed an issue where placing a comment directly after a keyword without a space was sometimes causing parsing issues.
Fixed an issue with parsing CREATE VIEW statements where parentheses were being incorrectly removed during the SQL pre-processing.
Proper error messages are now logged when users run the collectors with the --dry-run option without specifying a single database or with multiple databases.
Snowflake collector: Resolved an issue where the collector was cataloging an incorrect database when the user had a default namespace set in Snowflake.
Databricks collector: Fixed an issue where the collector output files uploads were failing due to spaces in IRIs.
QuickSight Collector: Fixed an AwsAccountId null error while listing resources using pagination, which was causing issues in cataloging all the specified resources.
Azure Data Factory collector:
Resolved an issue with truncated paginated results.
Fixed an issue with the title of global parameters by correctly using the parameter name.
Release version 2.220
Details about the release
Item | Details |
---|---|
Release version | 2.220 |
Release date | July 10, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle collector: Added support for lineage when the SELECT statement contains synonyms. This enhancement fixes lineage tracking between Oracle and Power BI when synonyms are used.
Power BI collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.
Bug fixes
Power BI and Power BI Gov collectors:
Fixed an issue with parameter value replacement in expressions when the parameter contains a $ symbol.
Fixed an issue where Power BI reports failed to process when the page name is null.
Release version 2.219
Details about the release
Item | Details |
---|---|
Release version | 2.219 |
Release date | July 8, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
All collectors: Each catalog resource in the catalog output file now contains information about the collector that harvested the resource. This information is available only in the catalog file and can be used in SPARQL automations.
Bug fixes
Power BI and Power BI Gov collectors: The collectors now properly handle scenarios when they run into API request limits. A new parameter Disable max requests wait (--disable-max-requests-wait) is added for handling these scenarios.
Azure Data Lake Storage Gen2 collector: Resolved an issue where certain ACL information missing in the Azure Data Lake Storage API response caused errors in the collector.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Harvesting of column-level lineage from views now supports view definitions containing unaliased subselects.
Release version 2.218
Details about the release
Item | Details |
---|---|
Release version | 2.218 |
Release date | July 1, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors: The collectors now support parsing data source expressions for Power BI tables where the source connection information is defined as a parameter. This means that if Power BI users specify data source connection information in a parameter and use that parameter in place of the source in the expression, the collectors will correctly parse and resolve the expression/lineage.
Oracle collector: The collector now harvests from DBA_ views if the credential used to execute the collector lacks permissions for information schema views.
dbt Core collector: The collector now harvest database objects and intra-database lineage from dbt projects and artifacts that use Azure Synapse as a backend.
All collectors: Collectors now verify that the user-requested upload location exists with proper permissions before execution and issue a warning if a problem exists.
Databricks collector: The collector no longer supports Databricks-managed password authentication. If you used this method of authentication, you must change the authentication to personal access token. For details, see "Preparing Databricks for collectors".
Bug fixes
SQL Server collector:
Fixed an issue where large values for column statistics produced an arithmetic overflow.
Resolved a problem where view definitions that include the TOP() expression were not properly handled when harvesting column-level lineage for views.
Power BI and Power BI Gov collectors: Fixed an issue where logging operations were causing an exception if certain Power BI objects were null.
Tableau collector: Fixed an issue where certain Tableau projects were not fully cataloged.
Release version 2.216
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.216 |
Release date | June 26, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.215
Details about the release
Item | Details |
---|---|
Release version | 2.215 |
Release date | June 26, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI Gov Collector:
The collector now supports harvesting of all workspaces and apps using the --all-workspaces-and-apps parameter.
Added the ability to disable lineage harvesting using the --disable-expression-lineage parameter.
Release version 2.214
Important
This release was for internal improvements and has no customer impacting changes.
Details about the release
Item | Details |
---|---|
Release version | 2.214 |
Release date | June 25, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.213
Details about the release
Item | Details |
---|---|
Release version | 2.213 |
Release date | June 25, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Azure Data Factory collector: The collector now harvests Expressions for table names, schema names, file names.
A new collector for SQL Server Integration Services (SSIS) is now available in private preview. If you would like access to this collector, please contact your Customer Success Director.
Bug fixes
Power BI and Power BI Gov collectors: The collectors now correctly harvest lineage for column types.
Release version 2.212
Details about the release
Item | Details |
---|---|
Release version | 2.212 |
Release date | June 21, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes:
Snowflake collector: The collector now harvests metadata for functions and stored procedures from the snowflake.account_usage views when the metadata is unavailable from the information_schema of the database.
Power BI and Power BI gov collectors now catalog:
Dataset table expression
Description for the workspace, app, and dataset
Bug fixes:
ADF collector: Fixed an issue with datetime parse errors while harvesting triggers.
Release version 2.211
Details about the release
Item | Details |
---|---|
Release version | 2.211 |
Release date | June 15, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI gov collectors: The collectors now support lineage for Oracle database objects.
Bug fixes
Power BI and Power BI gov collectors: Resolved an issue with collecting child resources for apps when using service principal authentication.
Snowflake and Oracle collectors: Fixed an issue where the system now correctly does not harvest function lineage when users enable the Disable lineage collection (--disable-lineage-collection) option.
Oracle collector: Fixed an issue with harvesting database columns of LONG type.
Release version 2.210
Details about the release
Item | Details |
---|---|
Release version | 2.210 |
Release date | June 7, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors:
Added a new feature that provides support to parse SQL statements within table expressions, enabling column-level lineage harvesting. To use this feature, you need to use the --datasource-mapping-file to specify the credentials. These credentials allow the collector to link lineage to the database sources.
The collector now harvests measures.
Databricks collector: The collector now harvests table and column tags by schema.
Bug fixes
Snowflake collector was unable to harvest lineage if the SQL statement included a dash in the column aliases.
Snowflake, Teradata, Netezza collectors: Fixed an issue that occurred because of insufficient information while harvesting agent resources for functions and procedures.
SQL Server collector: Fixed an issue that occurred while parsing view queries where columns have dashes in their names.
Release version 2.209
Details about the release
Item | Details |
---|---|
Release version | 2.209 |
Release date | June 2, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector: The collector now harvests table and column lineage from system tables. To use this feature, you need to set new permissions for the collector.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Resolved a problem concerning column statistics when an aggregate statistic has a zero value.
Tableau collector: Resolved an issue to correctly associate lineage with the appropriate parent project.
Sigma collector: Resolved an issue which occurred when a dataset referred to in the lineage was not available among the harvested datasets.
Snowflake collector: Fixed an issue associated with external URLs containing special characters.
Release version 2.208
Details about the release
Item | Details |
---|---|
Release version | 2.208 |
Release date | 24 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
Snowflake collector:
Resolved the issue that arose from Snowflake not returning function metadata.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Addressed the issue encountered during the harvesting of column statistics when the result set contained non-integer values.
Release version 2.207
Details about the release
Item | Details |
---|---|
Release version | 2.207 |
Release date | 21 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
BigQuery collector: The collector is updated to generate catalog records for BigQuery Label instances. This allows them to be visible on the resource pages in the application.
Sigma collector: Resolved an issue that could result in an exception when the Sigma APIs failed to return a table path for a connection.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Enhanced error log statements by adding fully qualified table names when certain tables or columns in the database cannot be located during lineage resolution.
Release version 2.206
Details about the release
Item | Details |
---|---|
Release version | 2.206 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Sigma collector:
A new --pagination-limit parameter is now available for the collector. You can use this parameter to set the page size for the Sigma API response. The maximum value you can set is 1000. If you do not specify a value, the default page size is 25.
The collectors is optimized to enhance the efficiency of lineage harvesting.
Snowflake collector: The collector now harvests extended metadata for tables, views, and materialized views.
Bug fixes
SQL Server collector: Incorporated additional debug logging for when the collector fails to harvest extended metadata.
Oracle collector:
The collector is now able to handle column names with single quotes in them.
Fixed an issue with synonyms being harvested in the wrong schema.
Release version 2.205
Details about the release
Item | Details |
---|---|
Release version | 2.205 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors: The OBDC data sources YAML file (datasources.yml) is updated to allow user-specified aliases for the database location (host). This ensures that resources are accurately linked across collectors.
Snowflake collector: Added support for harvesting materialized views for SQL definition, External URL (Snowsight).
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
The collectors are optimized to load JDBC drivers more efficiently, thereby reducing memory usage.
Release version 2.204
Details about the release
Item | Details |
---|---|
Release version | 2.204 |
Release date | 10 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
SQL Server collector: The collector now correctly manages a scenario to use a consistent case when a collation is set.
dbt core and dbt cloud collectors: The collectors are optimized to correctly manage scenarios that previously caused an exception while harvesting lineage.
Sigma collector: The collector is optimized to manage scenarios that were previously causing the collector to not run properly.
Release version 2.203
Details about the release
Item | Details |
---|---|
Release version | 2.203 |
Release date | 8 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
dbt Core collector:
Now supports multiple run_results.json in single collector run. Add the new parameter --run-results-directory to your command/YAML file to use this new feature.
Now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
dbt cloud collector now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
Bugs
Sigma collector properly deserializes objects from Sigma API.
Power BI and Power BI gov collectors now properly obtains server name and port from Power BI data source parameters.
Release version 2.202
Details about the release
Item | Details |
---|---|
Release version | 2.202 |
Release date | 7 May, 2024 |
Docker image ID |
|
Jar file |
|
New features
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Optimized view query parsing to improve the processing time for large SQL statements.
Optimized the querying of metadata during view lineage harvesting.
Oracle Collector: Added a new --oracle-jdbc-timezone-as-region parameter. This allows you to decide if the Oracle JDBC connection timezone should utilize the JVM's default timezone.
Bug fixes
AWS Glue Collector: Improved the log message that are recorded when the harvesting of job lineage fail.
Release version 2.201
Details about the release
Item | Details |
---|---|
Release version | 2.201 |
Release date | 2 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle and SQL Server collectors: The collectors now catalog column-level lineage when functions and stored procedures contain sub-selects.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Performance optimizations are done to the collectors to improve the overall runtime of the collectors.
A new parameter --disable-extended-metadata is now available that allows you to skip harvesting of extended metadata for resource types such as database, schema, table, columns functions, stored procedures, user defined types, synonyms. Basic metadata for these resource types will still be harvested.
Power BI and Power BI gov collectors now catalog:
Relationships between Power BI apps and workspaces
Apps with associated workspace IDs (when service principal authentication is used)
Bug fixes
Teradata collector properly harvests lineage metadata from views with SQL statements containing REPLACE RECURSIVE VIEW, LOCK ROW ACCESS.
Oracle collector properly harvests lineage metadata from views with COLLECT.
All collectors properly handle config file options that start with option flags.
Release version 2.200
Details about the release
Item | Details |
---|---|
Release version | 2.200 |
Release date | 19 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
All collectors: Users now have the option to define a custom output file name for the collector catalog during run time. To do this, use the --output-name parameter. The system automatically adds .dwec.ttl to the end of the provided file name.
Note
If you are updating the file name for an already configured collector, make sure to check and modify any existing SPARQL queries that explicitly mention existing collector output files.
Oracle Collector: The collector now harvests Oracle package bodies and Oracle package specifications.
Bug fixes
SQL Server collector Fixed an error that occurred when harvesting column statistics.
Power BI and Power BI Gov collectors: Resolved an issue that was causing errors during the parsing of expressions that used the Table.RenameColumns Power Query table function in certain cases.
Snowflake Collector: The collector now properly harvest tags that are defined in a different schema than the schemas specified for the collector.
The following collectors are updated to harvest lineage accurately for group by, order by, where, and having SQL expressions. Prior to this update, the relationships were incorrectly directed.
Postgres, Databricks, Derby, Netezza, Oracle, Redshift, Snowflake, SQL Server, Teradata collectors
Release version 2.199
Details about the release
Item | Details |
---|---|
Release version | 2.199 |
Release date | 11 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
A new collector is now available for Amazon Managed Streaming for Kafka.
Oracle collector: The collector now harvests lineage from views, stored procedures, and functions.
Snowflake collector: The collector now harvests Streamlit apps.
The following collectors now support harvesting from multiple databases specified by users. This means you can provide the --database parameter multiple times while running the collector.
Databricks, PostgresSQL, SQL Server, Db2, Redshift, Denodo, Oracle, MySQL, Snowflake, Teradata
Bug fixes
Power BI and Power BI Gov collector: Resolved an issue that was caused by parsing expand column expressions.
dbt cloud collector: The collector now properly harvests metadata of dbt Cloud artifacts when the target database is not Snowflake. Note the collector will only harvest metadata from the dbt Cloud artifacts and not connect to any unsupported target database to obtain database lineage metadata.
Snowflake collector: The collector harvest policies associated with cataloged database objects, regardless of the database in which the policies reside.
Release version 2.198
Details about the release
Item | Details |
---|---|
Release version | 2.198 |
Release date | 9 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle collector: The collector now harvest Synonyms.
Athena collector: Starting with release 2.198, data.world no longer package the Athena JDBC driver with the Athena collector. You can continue to use the releases previous to 2.198 as-is, but when you updated to the collector version to 2.198 or higher, you will have to download and mount the driver for the collector and update the collector command to include the driver path.
Release version 2.197
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.197 |
Release date | 5 April, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.196
Details about the release
Item | Details |
---|---|
Release version | 2.196 |
Release date | 2 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Log files for collectors: The collector log files for each collector run now have unique names. This allows logs to be written to separate files when running multiple collector instances.
Reltio collector: Survivorship groups and mappings are now recognized as primary entities with catalog records.
Snowflake collector: The collector now harvests tags associated with database objects in the user-specified database, regardless of the database in which the tag resides.
Bug fixes
Teradata collector: Fixed an issue that was blocking column harvesting due to invalid column references in Views.
Azure data Factory collector: Fixed an issue preventing successful file uploads to data.world.
Release notes for previous versions
Go here to access release notes for previous version.