Catalog collector release notes
Important
Published versions of collectors are available as a docker image and a JAR file.
Release version 2.206
Details about the release
Item | Details |
---|---|
Release version | 2.206 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Sigma collector:
A new --pagination-limit parameter is now available for the collector. You can use this parameter to set the page size for the Sigma API response. The maximum value you can set is 1000. If you do not specify a value, the default page size is 25.
The collectors is optimized to enhance the efficiency of lineage harvesting.
Snowflake collector: The collector now harvests extended metadata for tables.
Bug fixes
SQL Server collector: Incorporated additional debug logging for when the collector fails to harvest extended metadata.
Oracle collector:
The collector is now able to handle column names with single quotes in them.
Fixed an issue with synonyms being harvested in the wrong schema.
Release version 2.205
Details about the release
Item | Details |
---|---|
Release version | 2.205 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors: The OBDC data sources YAML file (datasources.yml) is updated to allow user-specified aliases for the database location (host). This ensures that resources are accurately linked across collectors.
Snowflake collector: Added support for harvesting materialized views for SQL definition, External URL (Snowsight).
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
The collectors are optimized to load JDBC drivers more efficiently, thereby reducing memory usage.
Release version 2.204
Details about the release
Item | Details |
---|---|
Release version | 2.204 |
Release date | 10 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
SQL Server collector: The collector now correctly manages a scenario to use a consistent case when a collation is set.
dbt core and dbt cloud collectors: The collectors are optimized to correctly manage scenarios that previously caused an exception while harvesting lineage.
Sigma collector: The collector is optimized to manage scenarios that were previously causing the collector to not run properly.
Release version 2.203
Details about the release
Item | Details |
---|---|
Release version | 2.203 |
Release date | 8 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
dbt Core collector:
Now supports multiple run_results.json in single collector run. Add the new parameter --run-results-directory to your command/YAML file use this new feature.
Now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
dbt cloud collector now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
Bugs
Sigma collector properly deserializes objects from Sigma API.
Power BI and Power BI gov collectors now properly obtains server name and port from Power BI data source parameters.
Release version 2.202
Details about the release
Item | Details |
---|---|
Release version | 2.202 |
Release date | 7 May, 2024 |
Docker image ID |
|
Jar file |
|
New features
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Optimized view query parsing to improve the processing time for large SQL statements.
Optimized the querying of metadata during view lineage harvesting.
Oracle Collector: Added a new --oracle-jdbc-timezone-as-region parameter. This allows you to decide if the Oracle JDBC connection timezone should utilize the JVM's default timezone.
Bug fixes
AWS Glue Collector: Improved the log message that are recorded when the harvesting of job lineage fail.
Release version 2.201
Details about the release
Item | Details |
---|---|
Release version | 2.201 |
Release date | 2 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle and SQL Server collectors: The collectors now catalog column-level lineage when functions and stored procedures contain sub-selects.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Performance optimizations are done to the collectors to improve the overall runtime of the collectors.
A new parameter --disable-extended-metadata is now available that allows users to skip harvesting of extended metadata, such as database, schema, table, columns functions, stored procedures, user defined types, synonyms.
Power BI and Power BI gov collectors now catalog:
Relationships between Power BI apps and workspaces
Apps with associated workspace IDs (when service principal authentication is used)
Bug fixes
Teradata collector properly harvests lineage metadata from views with SQL statements containing REPLACE RECURSIVE VIEW, LOCK ROW ACCESS.
Oracle collector properly harvests lineage metadata from views with COLLECT.
All collectors properly handle config file options that start with option flags.
Release version 2.200
Details about the release
Item | Details |
---|---|
Release version | 2.200 |
Release date | 19 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
All collectors: Users now have the option to define a custom output file name for the collector catalog during run time. To do this, use the --output-name parameter. The system automatically adds .dwec.ttl to the end of the provided file name.
Note
If you are updating the file name for an already configured collector, make sure to check and modify any existing SPARQL queries that explicitly mention existing collector output files.
Oracle Collector: The collector now harvests Oracle package bodies and Oracle package specifications.
Bug fixes
SQL Server collector Fixed an error that occurred when harvesting column statistics.
Power BI and Power BI Gov collectors: Resolved an issue that was causing errors during the parsing of expressions that used the Table.RenameColumns Power Query table function in certain cases.
Snowflake Collector: The collector now properly harvest tags that are defined in a different schema than the schemas specified for the collector.
The following collectors are updated to harvest lineage accurately for group by, order by, where, and having SQL expressions. Prior to this update, the relationships were incorrectly directed.
Postgres, Databricks, Derby, Netezza, Oracle, Redshift, Snowflake, SQL Server, Teradata collectors
Release version 2.199
Details about the release
Item | Details |
---|---|
Release version | 2.199 |
Release date | 11 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
A new collector is now available for Amazon Managed Streaming for Kafka.
Oracle collector: The collector now harvests lineage from views, stored procedures, and functions.
Snowflake collector: The collector now harvests Streamlit apps.
The following collectors now support harvesting from multiple databases specified by users. This means you can provide the --database parameter multiple times while running the collector.
Databricks, PostgresSQL, SQL Server, Db2, Redshift, Denodo, Oracle, MySQL, Snowflake, Teradata
Bug fixes
Power BI and Power BI Gov collector: Resolved an issue that was caused by parsing expand column expressions.
dbt cloud collector: The collector now properly harvests metadata of dbt Cloud artifacts when the target database is not Snowflake. Note the collector will only harvest metadata from the dbt Cloud artifacts and not connect to any unsupported target database to obtain database lineage metadata.
Snowflake collector: The collector harvest policies associated with cataloged database objects, regardless of the database in which the policies reside.
Release version 2.198
Details about the release
Item | Details |
---|---|
Release version | 2.198 |
Release date | 9 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle collector: The collector now harvest Synonyms.
Athena collector: Starting with release 2.198, data.world no longer package the Athena JDBC driver with the Athena collector. You can continue to use the releases previous to 2.198 as-is, but when you updated to the collector version to 2.198 or higher, you will have to download and mount the driver for the collector and update the collector command to include the driver path.
Release version 2.197
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.197 |
Release date | 5 April, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.196
Details about the release
Item | Details |
---|---|
Release version | 2.196 |
Release date | 2 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Log files for collectors: The collector log files for each collector run now have unique names. This allows logs to be written to separate files when running multiple collector instances.
Reltio collector: Survivorship groups and mappings are now recognized as primary entities with catalog records.
Snowflake collector: The collector now harvests tags associated with database objects in the user-specified database, regardless of the database in which the tag resides.
Bug fixes
Teradata collector: Fixed an issue that was blocking column harvesting due to invalid column references in Views.
Azure data Factory collector: Fixed an issue preventing successful file uploads to data.world.
Release version 2.195
Details about the release
Item | Details |
---|---|
Release version | 2.195 |
Release date | 25 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector: The collector now harvests tags for Databases, Schemas, Tables, and Columns.
Bug fixes
Power BI Service and Power BI Gov collectors: The collectors now correctly harvest skipped data sources during metadata scans.
Azure Data Lake Storage Gen2 collector: The collector is updated to refresh API authorization requests per ADLS requirements to avoid session expiration.
Azure Data Factory collector: Fixed an issue to accommodate varying data returned from the Azure Data Factory API.
Release version 2.194
Details about the release
Item | Details |
---|---|
Release version | 2.194 |
Release date | 21 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
The Power BI Service and Power BI Gov collectors now support harvesting lineage from ODBC data source types. A new parameter --datasource-mapping-file can be used to provide the information required for harvesting lineage relationships when the data source uses an ODBC connection in Power BI.
Bug fixes
The Amazon S3 collector now continues to harvest objects in the bucket when a 403 error is encountered.
The Azure Data Lake Storage Gen2 collector properly handles the scenario involving special characters in the blob name.
The Azure Data Factory collector properly handles a scenario that causes the collector to stop due to the format of information returned from the Azure Data Factory APIs.
BigQuery Collector properly handles a scenario when a table is in a different database from the one being harvested.
Release version 2.193
Details about the release
Item | Details |
---|---|
Release version | 2.193 |
Release date | 15 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
The following two new collectors are now available in Private Preview. Please contact your Customer Success Director to get access to these collectors:
Bug fixes
The Azure Data factory collector is updated to correctly handle a situation that previously caused the collector to stop, due to the format of the information returned from the ADF APIs.
Release version 2.192
Details about the release
Item | Details |
---|---|
Release version | 2.192 |
Release date | 12 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Amazon S3 collector: The collector now offers the options, --include-object and --exclude-object. These options allow you to select which objects should be included or excluded from the harvesting process.
Databricks collector: The collector now harvests Databricks tags for database, schema, table, view, and column as as key-value pairs. The collector also harvests tags for clusters and jobs, replacing the existing ClusterTag and JobTag resource types.
Release version 2.191
Details about the release
Item | Details |
---|---|
Release version | 2.191 |
Release date | 7 March, 2024 |
Docked image ID |
|
Jar file |
|
New features and changes
All collectors: The --dry-run option is now available for all collectors. This option allows you do a test run for the collectors to validate that the collector can authenticate to the specified source system. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.
Bug fixes
Teradata collector: The collector is updated to correctly parse view SQL syntax for extracting lineage metadata. It also now includes improved logging of any errors encountered during lineage harvesting.
BigQuery collector: The collector now properly handles fully qualified table names that include dashes (-).
Release version 2.190
Details about the release
Item | Details |
---|---|
Release version | 2.190 |
Release date | 5 March, 2024 |
Docked image ID |
|
Jar file |
|
New features and changes
Snowflake, Teradata and Netezza collectors: In the harvested metadata, the owner of resources are now correctly referenced as owner objects. Earlier they were referenced as string text.
Bug fixes
The Teradata collector now correctly manages variations in database cases within SQL statements while gathering lineage metadata.
Release version 2.189
Details about the release
Item | Details |
---|---|
Release version | 2.189 |
Release date | 24 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The Tableau collector now captures all sub-projects when you specify certain projects to catalog. Additionally, it enables users to exclude specific projects using the --tableau-exclude-project parameter. Any sub-projects under an excluded project are also automatically excluded.
Release version 2.188
Details about the release
Item | Details |
---|---|
Release version | 2.1288 |
Release date | 23 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The Information Schema Catalog Collector now collects descriptions from both tables and columns, if they are present in the source.
The Snowflake collector now harvests comments from Snowflake databases, schemas, and views (as resource description).
The Teradata collector has been enhanced to better parse view SQL definitions that use specific Teradata syntax elements, particularly when extracting lineage from views.
Bug fixes
BigQuery collector:
Fixed issues with handling identifiers with hyphens ( -).
Fixed issues with harvesting lineage when a view refers to columns in a separate database.
Release version 2.187
Details about the release
Item | Details |
---|---|
Release version | 2.187 |
Release date | 20 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
Netezza collector: A new and improved collector is now available for Netezza.
Oracle collector: The collector now harvest definitions for view, function and stored procedure.
Release version 2.186
Details about the release
Item | Details |
---|---|
Release version | 2.186 |
Release date | 14 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The following collectors now harvest all databases in a single collector run when the --database parameter is not specified.
The collectors also support a new parameter --exclude-database to exclude specific databases from metadata collection:
Databricks
DB2
MySQL
Oracle
PostgreSQL
Redshift
SQL Server
Snowflake
Teradata
Bug fixes
Databricks collector: The collector properly handles malformed task responses.
Power BI collector: The collector properly handles harvesting lineage relationships from Power BI data sources when parameters are used in place of the Snowflake Warehouse value.
For the following collectors, the behavior of the --include-information-schema option is changed. Now, if you use this option in the command without the --all-schemas option, the system will generate a warning to alert you about the missing parameter.
Databricks
DB2
Oracle
PostgreSQL
Redshift
SQL Server
Snowflake
Release version 2.185
Details about the release
Item | Details |
---|---|
Release version | 2.185 |
Release date | 9 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Fixed an issue that was causing database collectors to run into error state.
Release version 2.184
Details about the release
Item | Details |
---|---|
Release version | 2.184 |
Release date | 7 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Azure Data Lake Storage Gen2 collector: Fixed an issue that previously prevented the collector from running successfully on machines using amd64 processor.
Microsoft SQL Server collector now properly harvests views from Azure Synapse Analytics.
Release version 2.183
Details about the release
Item | Details |
---|---|
Release version | 2.183 |
Release date | 1 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Tableau collector: The collector is updated to properly harvest usage data in newer versions of Tableau Server.
Azure Data Lake Storage Gen2 Collector: Fixed an authentication issue in the collector that resulted in failures to initialize a channel.
Snowflake collector: The collector now properly harvests lineage between function and source table if the source table is in the cataloged schema.
Release notes for previous versions
Go here to access release notes for previous version.