Catalog collector release notes
Important
Stay updated on collector releases! To keep up with the latest updates and enhancements to data.world collectors, subscribe to the RSS feed from your favorite RSS reader.
Release version 2.300
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.300 |
Release date | 9 October, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Marquez collector: The collector now harvests job run history. A new optional parameter, --job-run-count, is now available to enable this feature.
Bug fixes
Tableau collector: Fixed an issue where relationships between sheets and fields were not being harvested under certain conditions, ensuring more complete lineage capture.
Release version 2.299
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.299 |
Release date | 2 October, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
PostgresSQL collector: Now harvests all view definitions, regardless of user select privileges, ensuring more complete metadata capture.
OpenAPI collector:
Changed hasParameter to be a containment property, improving accuracy of relationships.
Updated titles for OpenAPI resources to provide clearer labeling in the catalog.
Databricks collector:
Added support for Volume resources and harvesting lineage across tables, jobs, and notebooks.
Improved performance when harvesting lineage, reducing runtime for large environments.
dbt Core and dbt Cloud collectors: Now extract the host from the database-server option when it is provided as a URL, ensuring correct connection details.
Tableau collector:
Added support for JWT tokens for authentication, expanding login options.
Added support for Databricks database servers as a data source.
Now harvests combinedFields, improving completeness of metadata.
Marquez collector: Added support for AWS S3, enabling metadata harvesting for S3 datasets through Marquez.
Oracle collector: Now harvests Oracle roles, extending coverage of user and access metadata.
Release version 2.298
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.298 |
Release date | 25 September, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector: Now harvests models registered in Unity Catalog, expanding coverage of Databricks assets.
Confluent Cloud and Confluent Platform collectors: Added support for TLS connections to Confluent Kafka, improving security and compatibility.
Bug fixes
Microsoft Fabric collector: Fixed issues with missing DirectLake column-level lineage and corrected some improperly formed source IRIs for DirectQuery lineage.
Monte Carlo collector: Updated enum types from Monte Carlo GraphQL definitions to fix an exception caused by unrecognized types.
Release version 2.297
Important
This release was for internal improvements and has no customer impacting changes.
Release version 2.296
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.296 |
Release date | 15 September, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Tableau collector: Added a new configuration option --tableau-convert-database-identifiers. When enabled, this converts the case of database identifiers (such as schema and table names) to the default collation of the associated database, improving consistency.
SQL Server Integration Services (SSIS) collector: Enhanced debug-level logging to support more effective troubleshooting and analysis.
Microsoft Fabric collector: The collector now harvests lineage from report pages to the columns and measures in the semantic model used by each page. Additionally, an "is hidden” flag is captured for report pages, providing clearer visibility into page-level metadata.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Fixed an issue to ensure harvesting continues from multiple databases even if the connection to one database fails.
Release version 2.295
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.295 |
Release date | 5 September 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Marquez collector: Added support for harvesting Marquez datasets associated with Databricks, extending lineage and metadata coverage.
SQL Server collector: Added support for SQL Server replications, improving visibility into replicated database environments.
Tableau collector: Introduced validation for user-configured site identifiers and added warnings when identifiers are invalid or inaccessible.
Microsoft Fabric collector:
Added support for variables and parameters in data pipeline activities even when those activities have not had a recent run.
Added support for warehouse sources that use SQL queries in Copy Activities, broadening coverage of pipeline sources.
Sigma collector: Now supports lineage from datasets to source tables or other datasets, improving traceability of dataset dependencies.
Databricks collector: Enhanced lineage harvesting to support SQL statements containing the struct function.
dbt core and dbt clould collectors: Added support for dbt projects targeting SQL Server databases using encryption, improving compatibility in secure environments.
Power BI collector: Added support for harvesting report images embedded in a zip file, ensuring complete metadata capture from reports.
Bug fixes
Databricks collector:
Fixed an issue where column properties were not cataloging the correct values due to an API bug.
Fixed redundant collection of workspace resources and jobs.
Updated the gitProvider property to support both uppercase and camelCase values returned by the Databricks Jobs API.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Resolved parsing errors that occurred when harvesting view lineage from views whose SQL contained column comments with parentheses.
SQL Server Reporting Services (SSRS) collector: Corrected incorrect detection of when to use SOAP vs REST API, ensuring proper connectivity for older and newer SSRS versions.
OpenAPI collector: Fixed an issue with duplicate identification of API resources, ensuring unique resource cataloging.
Release version 2.294
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.294 |
Release date | 21 August, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Tableau collector: Harvests the last published date for workbooks and data sources, providing greater visibility into update history.
Alteryx collector: Catalogs additional metadata, including the caption tag in ToolContainer, the query in LockInInput, and the SQL in DbFileInput.
QlikSense collector: Added a configuration option to include or exclude applications, giving users more control over the scope of harvested metadata.
Bug fixes
Athena collector: Fixed an issue where the collector stopped after 100 tables. It now correctly harvests more than 100 tables in a database.
Sigma collector: Enhanced the workbook filter to avoid missing workbook exceptions, improving reliability during harvesting.
MySQL collector: Fixed an error in fetching statistics for columns whose names are reserved SQL keywords.
Microsoft Fabric collector: Resolved issues in stored procedure harvesting by properly resolving names when using pipeline variables and parameters, and updated relationship types to represent dependencies more accurately.
Release version 2.293
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.293 |
Release date | 11 August, 2025 |
Docker image ID |
|
Jar file |
|
Bug fixes
Power BI collector: Fixed an issue to ensure that when a database name is provided in the datasources.yaml file, it is always used and not overridden by values retrieved from a database query.
Tableau collector: Fixed an issue so that Published Datasources are only cataloged when they are being used in a Project that is in scope, preventing unnecessary or irrelevant catalog entries.
Informatica CDI collector: Resolved an unexpected exception that was causing the collector to fail, improving stability and reliability.
Release version 2.292
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.292 |
Release date | 1 August 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
SQL Server collector: Added support for harvesting agent jobs.
Sigma collector: Added configuration options to include or exclude workspaces, providing greater control over which resources are harvested.
Redshift collector: Now supports harvesting external tables defined via AWS Glue.
Microsoft Fabric collector:
Dataflow Gen2 is now treated as a separate resource type from Dataflows.
Added support for cataloging destinations and table-level lineage for sources and destinations in Dataflow Gen2 CI/CD types.
Microsoft Fabric and Power BI collectors: Now catalog refresh schedules for resources where refresh configuration is available, helping track automated data updates.
AWS Glue collector: Now identifies partitioned columns separately from other columns.
Bug fixes
Marquez collector: Fixed a null pointer exception that could occur when a job lacked a latest run.
Release version 2.291
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.291 |
Release date | 1 August 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Microsoft Fabric collector:
Added support for harvesting Apps and Org Apps in Microsoft Fabric.
Added support for harvesting GraphQL instances.
Bug fixes
Marquez collector: Now skips unsupported dataset types, preventing errors during harvesting.
Release version 2.290
Important
Published versions of collectors are available as a docker image and a JAR file.
Details about the release
Item | Details |
---|---|
Release version | 2.290 |
Release date | 17 July 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
A new collector, the OpenAPI collector, is now available in public preview. It supports harvesting metadata from APIs described using OpenAPI v3.0, enabling documentation and cataloging of API assets.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Added a Sensitive Data Classification option to allow classification using a hosted private-ai instance.
Microsoft Fabric collector:
Added support for harvesting Spark Job Definition details.
Now also captures Mirrored Database details, expanding coverage of key metadata elements in Microsoft Fabric.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Now include column statistics support for Date, Timestamp, and Boolean data types, enhancing profiling depth across supported databases.
Postgres collector: Supports AWS IAM authentication via secret and access key parameters, offering more secure and flexible credential management.
Bug fixes
Oracle collector: Fixed an issue in the table index feature that previously caused permission errors or max open cursor issues by updating the query logic to use DBA_ views when available.
SSIS Collector: Now harvests deeply nested control flow executables, ensuring complete control flow visibility.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server SAP HANA collectors: Improved sampling behavior for environments where TABLESAMPLE is unsupported by falling back to LIMIT or TOP clauses to compute statistics more reliably.
Release version 2.289
Details about the release
Item | Details |
---|---|
Release version | 2.289 |
Release date | 4 July, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
A new collector, the AWS Lake Formation collector, is now available in public preview.
Microsoft Fabric collector: Now catalogs metadata for Eventhouses, including the addition of associated KQL databases, expanding visibility into the Microsoft Fabric ecosystem.
dbt core and cloud collectors: Added support for harvesting semantic model metadata, enriching the data model layer within your catalog.
Power BI collector: Introduced support for an additional Databricks source MQuery function type in lineage resolution, improving coverage and accuracy of Power BI lineage.
SSIS collector: Enhanced debug-level logging to support better root cause analysis for missing catalog resources, aiding in troubleshooting and diagnostics.
Bug fixes
Alteryx collector: Fixed an issue where the workflow description was not correctly captured from the user-provided meta info section.
AWS Glue collector: Resolved a null pointer exception that could occur when the Glue Data Catalog tables are empty, improving stability.
Snowflake collector: Improved data type standardization by stripping parenthesized size/length values (for example, VARCHAR(255) → VARCHAR) for cleaner and more consistent metadata.
Release version 2.288
Important
This release was for internal improvements and has no customer impacting changes.
Release version 2.287
Details about the release
Item | Details |
---|---|
Release version | 2.287 |
Release date | 17 June, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Tableau collector: Added view-based filtering for bin and group fields to align with how calculated fields are handled.
Salesforce collector: Now harvests all reports and dashboards, not just recently viewed ones.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors, SAP HANA collectors: Column decimal digits are now written only for appropriately typed columns.
Bug fixes
Tableau collector: Fixed a null pointer exception in column lineage processing for Custom SQL tables.
OpenAPI collector: Resolved errors caused by malformed spec files that previously triggered null pointer exceptions.
Release version 2.286
Details about the release
Item | Details |
---|---|
Release version | 2.286 |
Release date | 11 June, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Logging framework (all collectors): Introduced minor updates to the logging framework. As a result, users may notice slightly different log messages compared to previous versions.
Snowflake collector: Upgraded the embedded Snowflake JDBC driver to version 3.34.2, addressing potential exceptions and improving stability.
Azure data factory collector: Now catalogs parameters used in parameterized linked services, along with the relationship between each linked service and its data source, providing deeper lineage visibility.
PowerBI collector: Added support for parsing lineage from certain SQL statement types without requiring database credentials, making it easier to extract lineage in more restricted environments.
Bug fixes
Tableau collector: Prevented exceptions that could occur when harvesting table-view relationships, particularly when table information is missing from the Tableau GraphQL API.
PowerBI collector: Fixed an issue with Denodo sources that use custom SQL, improving support for a wider range of PowerBI source types.
SQL Server Integration Services (SSIS) collector: Resolved an exception that occurred during the harvesting of column information, enhancing reliability in metadata extraction.
Release version 2.285
Details about the release
Item | Details |
---|---|
Release version | 2.285 |
Release date | 3 June, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Tableau collector: Now determines the associated project for a Custom SQL Table based on its workbook rather than its datasource, improving accuracy in project assignments.
PowerBI collector: Added support for Oracle Autonomous Database as a source, expanding connectivity and metadata coverage within PowerBI environments.
Release notes for previous versions
Go here to access release notes for previous version.