Catalog collector release notes
Important
Stay updated on collector releases! To keep up with the latest updates and enhancements to data.world collectors, subscribe to the RSS feed from your favorite RSS reader.
Release version 2.323
Details about the release
Item | Details |
|---|---|
Release version | 2323 |
Release date | 10 April, 2026 |
Docker image ID |
|
Jar file |
|
Bug fixes
Microsoft Fabric collector: Fixed an issue where Direct Lake tables displayed null schema names when the source schema was not explicitly defined, improving metadata accuracy.
Release version 2.322
Details about the release
Item | Details |
|---|---|
Release version | 2.322 |
Release date | 9 April, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Hive Metastore collector: New collector added for Hive Metastore, enabling direct metadata harvesting from the Hive Metastore database (PostgreSQL, Oracle, MySQL/MariaDB, or SQL Server) without requiring HiveServer2. The collector harvests databases, schemas, tables, columns, Hive-specific table properties, and basic view lineage. This provides a lightweight alternative for environments where HiveServer2 is unavailable or when only structural metadata is needed.
This collector replaces the existing Hive and Hive Metastore collectors.
Snowflake, Redshift, Databricks, Denodo, Dremio, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, Microsoft SQL Server, and SAP HANA collectors: Added sample data preview capability, enabling the catalog to display sample of table data. When Sensitive Data Classification is enabled, sensitive columns are automatically masked in the preview, ensuring secure data visibility.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Dremio, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Reduced unnecessary warning messages during SQL lineage processing, improving log clarity.
Release version 2.321
Details about the release
Item | Details |
|---|---|
Release version | 2.321 |
Release date | 6 April, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
BigQuery collector: The collector now harvests primary and foreign keys, expanding metadata coverage for BigQuery table relationships.
Bug fixes
All collectors: Fixed an issue where boolean false values in YAML configuration files were not properly recognized.
Snowflake, Redshift, Databricks, Denodo, Dremio, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Resolved memory issues when processing columns with very large string values, improving collector stability.
Tableau collector: Improved handling of recursive field queries, enhancing collection reliability for complex Tableau workbooks.
Release version 2.320
Details about the release
Item | Details |
|---|---|
Release version | 2.320 |
Release date | 24 March, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Qlik Replicate collector: New collector added for Qlik Replicate, enabling metadata harvesting from Qlik's data replication platform.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Dremio, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Fixed column statistics metrics to ensure consistent data type handling across database sources.
Release version 2.319
Details about the release
Item | Details |
|---|---|
Release version | 2.319 |
Release date | 20 March, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Microsoft SQL Server collector: The collector now collects usage metrics, providing insights into table and query access patterns.
BigQuery collector: Support added for harvesting multiple BigQuery projects in a single run, improving efficiency for multi-project environments.
Bug fixes
Qlik Talend Data Integration collector: Resolved an issue that caused collector failures in certain configurations.
ServiceNow collector: Improved metadata structure for tables and fields, ensuring better catalog integration.
Release version 2.318
Details about the release
Item | Details |
|---|---|
Release version | 2.318 |
Release date | 16 March, 2026 |
Docker image ID | Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags
|
Jar file |
|
New features and changes
Airflow collector: Support added for for Airflow 3.x, enabling metadata collection from the latest Airflow version.
ServiceNow collector: The collector now harvests data interfaces, expanding metadata coverage.
Snowflake collector: Added support for semantic views, providing metadata capture for Snowflake's business-friendly data representations.
SQL Server Integration Services (SSIS) collector: Added lineage support for tables and SQL statements with variable references, improving lineage accuracy for dynamic SSIS workflows.
Bug fixes
Databricks collector: Fixed regex used for parsing metric views, resolving parsing errors for certain metric view configurations.
Microsoft Fabric collector: Fixed an issue that occurred when property values were not present, improving collector stability.
Power BI Service and Microsoft Fabric collectors: Performance improvement while harvesting calculation groups, reducing collection time for Power BI models with calculation groups.
Microsoft SQL Server collector: Addressed issues with sample data size, ensuring consistent sample data collection across different table sizes.
ServiceNow collector: Fixed CatalogingEventReporter when off the main thread, resolving threading issues during collection.
Release version 2.317
Details about the release
Item | Details |
|---|---|
Release version | 2.317 |
Release date | 10 March, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector:
Harvest information on Databricks Lakeflow, enabling complete metadata capture for Lakeflow workflows and pipelines.
Harvest Unity Catalog metastore metadata, expanding coverage of Databricks governance and catalog structures.
Snowflake collector: Added Tasks harvesting capability, providing complete visibility into Snowflake automated processes and dependencies.
Microsoft Fabric and Power BI Service collectors: Support for harvesting Paginated Power BI Reports, including preview images, extending metadata coverage for Power BI assets.
Microsoft Fabric collector: Added page size configuration and improved logging for better performance monitoring and troubleshooting.
Bug fixes
SQL Server Integration Services (SSIS) collector:
Removed stored procedure dependency cataloging to improve accuracy and reduce false positive dependencies.
Fixed missing lineage for SQL statements, ensuring complete data flow visibility across SSIS packages.
Tableau collector: Fixed null pointer exception (NPE) in field harvesting.
Release version 2.316
Details about the release
Item | Details |
|---|---|
Release version | 2.316 |
Release date | 26 February, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Tableau collector: Added an option to exclude hidden fields within published data sources.
Microsoft Fabric collector: Added an option to exclude internal Delta table files from Lakehouse file cataloging, reducing noise in harvested metadata.
Snowflake collector: Now logs warnings for unsupported stage references in SQL, improving transparency during lineage harvesting.
Athena collector: Added include/exclude options for databases, giving users more control over collection scope.
Snowflake, Redshift, Databricks, Denodo, Dremio, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Reorganized database index harvesting for improved structure and consistency.
Bug fixes
Power BI and Microsoft Fabric collectors: Fixed the relationship between semantic models/dataflows and data sources to properly represent uses relationships, improving lineage accuracy.
Tableau collector: Corrected embedsView relationships for unpublished views in published dashboards, ensuring complete dashboard-to-view associations.
Release version 2.315
Details about the release
Item | Details |
|---|---|
Release version | 2.315 |
Release date | 18 February, 2026 |
Docker image ID |
|
Jar file |
|
Bug fixes
OpenAPI collector: Added handling to prevent a null pointer exception when expected content is missing from an OpenAPI specification.
Tableau collector: Ensured that published dashboards correctly include embedsView relationships, even when they contain unpublished sheets.
Release version 2.314
Details about the release
Item | Details |
|---|---|
Release version | 2.314 |
Release date | 11 February, 2026 |
Docker image ID |
|
Jar file |
|
Bug fixes
ServiceNow collector: Fixed an issue in the generation of IRIs for database tables referenced by other resources, ensuring consistent and accurate identification of those tables in the catalog.
Release version 2.313
Details about the release
Item | Details |
|---|---|
Release version | 2.313 |
Release date | 6 February, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Microsoft Fabric collector: Added support for harvesting EventStream resources, enabling lineage tracking for Fabric’s real-time streaming data pipelines.
Bug fixes
dbt collector: Corrected resource identifier generation for tables to match BigQuery collector format, ensuring proper lineage connections between dbt models and their underlying BigQuery tables.
OpenAPI collector: Added support for OpenAPI 3.1’s updated nullable type syntax, ensuring schemas using the latest specification version are parsed correctly.
SSIS collector: Fixed parsing of SSIS dataflow tasks to handle a wider variety of schema and table name patterns in SQL queries.
Release version 2.312
Details about the release
Item | Details |
|---|---|
Release version | 2.312 |
Release date | 3 February, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Redshift collector: Added support for harvesting lineage from stored procedures, improving visibility into procedural data transformations.
Bug fixes
ServiceNow collector: Improved pagination handling to ensure results are fully harvested when total counts do not equal 100, preventing missing resources in larger result sets.
Tableau collector: Fixed a defect affecting sheet-to-dashboard associations, ensuring relationships are harvested accurately.
Release version 2.311
Details about the release
Item | Details |
|---|---|
Release version | 2.311 |
Release date | 30 January, 2026 |
Docker image ID |
|
Jar file |
|
Bug fixes
Snowflake and Databricks collectors: Fixed relationship mapping so that database tables are correctly associated with their parent database, ensuring accurate hierarchy and navigation in the catalog.
Release version 2.310
Details about the release
Item | Details |
|---|---|
Release version | 2.310 |
Release date | 29 January, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Microsoft Fabric collector: Added support for harvesting Environments in Microsoft Fabric, expanding visibility into Fabric workspace configurations.
Oracle collector: Added support for harvesting lineage from CREATE TABLE statements by querying historical tables, improving lineage coverage for table creation workflows.
Bug fixes
OpenAPI collector: Fixed an issue where restrictive server configurations caused specification requests to be rejected.
Microsoft SQL Server collector: Resolved an issue where SQL preprocessing of TOP statements did not work correctly in some cases.
Microsoft Fabric collector: Fixed issues with Lakehouse Delta tables to ensure they are cataloged under the correct parent folder and to prevent duplicate resources from being created.
Release version 2.309
Details about the release
Item | Details |
|---|---|
Release version | 2.309 |
Release date | 14 January, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
SSIS collector: Added support for harvesting variables and their references, improving lineage and package-level context.
Microsoft SQL Server collector: Now harvests table descriptions from extended properties, enriching table-level documentation.
Databricks collector: Renamed the CLI option --http-path to --compute-resources-url for clearer configuration of compute connectivity.
Snowflake, Redshift, Databricks, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Added support to exclude system functions across all JDBC collectors, helping reduce noise in harvested metadata.
Microsoft Fabric collector:
Added support for listing tables for schema-enabled Lakehouses using the OneLake Tables API, and harvesting columns for those tables via the same API.
Added support for schema-enabled Fabric Lakehouses in Pipelines, improving pipeline lineage coverage.
Added support for SQL Server database tables and stored procedures as sources/sinks in Fabric Data Pipelines, expanding supported pipeline endpoints.
Sensitive data classification feature: Improved sensitive data classification for collectors. On-premise collectors now support Microsoft Presidio classification services in addition to Private AI.
Bug fixes
Tableau collector: Fixed embedsView relationships for unpublished Tableau views and dashboards, ensuring these relationships are harvested correctly.
Azure Data Lake Storage Gen2 collector: Fixed an error that occurred when harvesting a non-hierarchical storage account, improving compatibility and stability.
Release version 2.308
Details about the release
Item | Details |
|---|---|
Release version | 2.308 |
Release date | 5 January, 2026 |
Docker image ID |
|
Jar file |
|
New features and changes
Snowflake collector: Added lineage resolution support for QUALIFY clauses, LATERAL FLATTEN constructs, and subqueries in previously unsupported contexts, including WHERE, JOIN ON, and ORDER BY.
Oracle collector: The collector now harvests public synonyms, expanding visibility into shared database objects.
SSIS collector: Added support for harvesting stored procedure references, improving lineage completeness.
Bug fixes
Tableau collector:
The collector now harvests component fields and their relationships to constituent fields.
Ensured GraphQL filtering by project name is followed by filtering by project ID, preventing harvesting of unintended projects.
Snowflake collector: Fixed an issue where some SQL statements took an excessive amount of time to parse during lineage harvesting.
SQL Server collector: Prevented attempts to harvest table size metadata for views that do not support this operation.
Databricks collector: Added a safeguard to prevent SQL execution for excluded databases during lineage collection.
Oracle collector: Fixed harvesting of statistic indexes, ensuring they are correctly captured.
Release version 2.307
Details about the release
Item | Details |
|---|---|
Release version | 2.307 |
Release date | 18 December, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
ServiceNow collector: Introduced a new collector that harvests metadata for scoped applications, tables, fields, views, and data fabric tables from a ServiceNow instance.
Dremio collector: The API server (--api-server) option is no longer required. When it is not provided, the collector does not harvest the API-based lineage and dataset information.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Improved memory usage during the harvesting of column statistics, enhancing stability for large datasets.
Tableau collector:
Added hostname mapping configuration support, enabling more flexible environment and network setups.
Added a new option, --tableau-exclude-unpublished-views, to exclude unpublished views (sheets and dashboards) from harvesting.
Bug fixes
Snowflake collector: Updated SQL parsing behavior to ensure parsing terminates correctly when a timeout occurs, and made the timeout value configurable to better handle complex or long-running queries.
Release version 2.306
Details about the release
Item | Details |
|---|---|
Release version | 2.306 |
Release date | 10 December, 2025 |
Docker image ID |
|
Jar file |
|
New features and changes
Salesforce collector: Added support for grouping and aggregation columns, expanding metadata coverage for analytical queries.
Databricks collector: Added support for harvesting jobs from external workspaces, enabling visibility into cross-environment job metadata.
Azure Data Factory collector: Now harvests SQL queries used in activities, providing richer lineage and operational context.
Dremio collector:
Migrated to the Arrow Flight JDBC driver in place of the legacy Dremio driver, improving performance and compatibility.
Updated the --graphApiServer parameter to --api-server for clarity and consistency, and added a new --use-tls parameter to support secure connections.
Bug fixes
Microsoft Fabric collector: Fixed an issue where some files were not being harvested, ensuring more complete metadata collection.
QlikSense collector: Updated logic to exclude writing the API key in the TTL file
Fabric collector: Fixed a modeling issue where Lakehouse tables were incorrectly displayed as folders, ensuring proper representation of table structures
Release notes for previous versions
Go here to access release notes for previous version.