Skip to main content

Catalog collector release notes

Important

Stay updated on collector releases! To keep up with the latest updates and enhancements to data.world collectors, subscribe to the RSS feed from your favorite RSS reader.

Release version 2.307

Details about the release

Table 1.

Item

Details

Release version

2.307

Release date

18 December, 2025

Docker image ID

Jar file



New features and changes

  • ServiceNow collector: Introduced a new collector that harvests metadata for scoped applications, tables, fields, views, and data fabric tables from a ServiceNow instance.

  • Dremio collector: The API server (--api-server) option is no longer required. When it is not provided, the collector does not harvest the API-based lineage and dataset information.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Improved memory usage during the harvesting of column statistics, enhancing stability for large datasets.

  • Tableau collector:

    • Added hostname mapping configuration support, enabling more flexible environment and network setups.

    • Added a new option, --tableau-exclude-unpublished-views, to exclude unpublished views (sheets and dashboards) from harvesting.

Bug fixes

  • Snowflake collector: Updated SQL parsing behavior to ensure parsing terminates correctly when a timeout occurs, and made the timeout value configurable to better handle complex or long-running queries.

Release version 2.306

Details about the release

Table 2.

Item

Details

Release version

2.306

Release date

10 December, 2025

Docker image ID

Jar file



New features and changes

  • Salesforce collector: Added support for grouping and aggregation columns, expanding metadata coverage for analytical queries.

  • Databricks collector: Added support for harvesting jobs from external workspaces, enabling visibility into cross-environment job metadata.

  • Azure Data Factory collector: Now harvests SQL queries used in activities, providing richer lineage and operational context.

  • Dremio collector:

    • Migrated to the Arrow Flight JDBC driver in place of the legacy Dremio driver, improving performance and compatibility.

    • Updated the --graphApiServer parameter to --api-server for clarity and consistency, and added a new --use-tls parameter to support secure connections.

Bug fixes

  • Microsoft Fabric collector: Fixed an issue where some files were not being harvested, ensuring more complete metadata collection.

  • QlikSense collector: Updated logic to exclude writing the API key in the TTL file

  • Fabric collector: Fixed a modeling issue where Lakehouse tables were incorrectly displayed as folders, ensuring proper representation of table structures

Release version 2.305

Details about the release

Table 3.

Item

Details

Release version

2.305

Release date

25 November, 2025

Docker image ID

Jar file



New features and changes

  • Microsoft SQL Server collector: The collector now harvests column comments from the MS_Description extended property, improving metadata detail and context.

  • SQL Server Integration Services (SSIS) collector: Added support for lineage when Oracle is used as a source or target, expanding cross-platform lineage coverage.

Bug fixes

  • Azure Data factory collector: Fixed an issue where lineage relationships were missing when a Snowflake credential had an assigned default role, ensuring complete lineage capture.

Release version 2.304

Details about the release

Table 4.

Item

Details

Release version

2.304

Release date

20 November, 2025

Docker image ID

Jar file



New features and changes

  • OpenAPI collector: Added a new configuration option --environment-qualifier. When provided, this option ensures that harvested resources are unique and distinct for that environment qualifier, improving multi-environment cataloging.

  • Power BI collector: The collector now harvests calculation groups and calculation items, expanding metadata coverage for Power BI semantic models.

Bug fixes

  • dbt core and dbt cloud collectors: Fixed an issue where tags were represented as strings instead of relationships. Tags are now properly modeled as relationships in the catalog graph.

  • Power BI collector: Fixed an issue where descriptions in parameter definitions were incorrectly appended to parameter values.

Release version 2.303

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 5.

Item

Details

Release version

2.303

Release date

13 November, 2025

Docker image ID

Jar file



New features and changes

  • Databricks collector: Added support for harvesting metrics views, expanding metadata coverage within Databricks. A new optional parameter, --enable-metric-views, is now available to enable this feature.

  • Power BI collector: Added support for Power BI–BigQuery column lineage.

Bug fixes

  • Informatica CDI collector: Fixed an issue that caused incomplete harvesting in large CDI instances by ensuring the collector continues harvesting beyond the Informatica CDI API’s default pagination limit.

  • dbt Cloud and dbt core collectors: Corrected tag representation so that tags are now modeled as resources rather than strings, improving metadata consistency and usability.

Release version 2.302

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 6.

Item

Details

Release version

2.302

Release date

5 November, 2025

Docker image ID

Jar file



New features and changes

  • OpenAPI collector: Added handling for schema references and array item references, improving completeness of OpenAPI metadata harvesting.

  • Power BI collector: Now harvests endorsement details for reports, datasets, and dataflows, providing better visibility into trusted and certified content.

  • Databricks collector: Added support for harvesting materialized views, expanding metadata coverage within Databricks.

  • SQL Server collector: Now harvests database synonyms, improving understanding of object relationships and dependencies.

  • Tableau collector: Updated logic to associate published data sources with the project that contains them, improving metadata organization and context.

  • SSRS collector: Now harvests the external URL for linked reports, providing direct traceability to linked content.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Updated lineage logic to represent INSERT stored procedures as the triggering agent in lineage relationships, improving lineage accuracy.

Bug fixes

  • Informatica collector: Fixed an issue that limited harvesting to a maximum of 4,200 mapping tasks. The collector now processes all available mappings.

  • Databricks collector: Fixed handling of non-alphanumeric characters in identifiers. Resolved issues with the creation of invalid identifiers.

  • dbt core and dbt cloud collectors: Fixed a case mismatch issue when coining database IRIs for Snowflake accounts, ensuring consistent IRI generation.

  • SSIS collector: Corrected package lookup logic to prioritize the Package GUID, using the Version GUID only when multiple versions share the same Package GUID.

  • Talend collector: Fixed a classification issue where Talend Jobs were incorrectly identified as activities instead of agents.

  • Snowflake collector: Implemented a workaround for a Snowflake JDBC driver defect that was triggered by unexpected datatypes, preventing runtime errors.

  • SQL Server collector: The collector now harvests table size for tables in non-dbo schemas, ensuring complete table metadata coverage.

Release version 2.301

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 7.

Item

Details

Release version

2.301

Release date

17 October, 2025

Docker image ID

Jar file



New features and changes

  • Qlik Sense collector: Added lineage between Qlik Sense apps and datasets, providing greater visibility into data flow and dependencies.

  • Marquez collector: Added column-level lineage, offering more granular insight into data transformations and relationships.

Bug fixes

  • Databricks collector: Fixed the creation of invalid agent IRIs, ensuring accurate and consistent identifier generation.

  • Qlik Sense collector: Resolved a memory leak related to closing WebSocket connections, improving stability and performance.

  • AWS Glue collector: Added handling for large object size properties, preventing errors during metadata harvesting for large datasets.

Release version 2.300

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 8.

Item

Details

Release version

2.300

Release date

9 October, 2025

Docker image ID

Jar file



New features and changes

  • Marquez collector: The collector now harvests job run history. A new optional parameter, --job-run-count, is now available to enable this feature.

Bug fixes

  • Tableau collector: Fixed an issue where relationships between sheets and fields were not being harvested under certain conditions, ensuring more complete lineage capture.

Release version 2.299

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 9.

Item

Details

Release version

2.299

Release date

2 October, 2025

Docker image ID

Jar file



New features and changes

  • PostgresSQL collector: Now harvests all view definitions, regardless of user select privileges, ensuring more complete metadata capture.

  • OpenAPI collector:

    • Changed hasParameter to be a containment property, improving accuracy of relationships.

    • Updated titles for OpenAPI resources to provide clearer labeling in the catalog.

  • Databricks collector:

    • Added support for Volume resources and harvesting lineage across tables, jobs, and notebooks.

    • Improved performance when harvesting lineage, reducing runtime for large environments.

  • dbt Core and dbt Cloud collectors: Now extract the host from the database-server option when it is provided as a URL, ensuring correct connection details.

  • Tableau collector:

    • Added support for JWT tokens for authentication, expanding login options.

    • Added support for Databricks database servers as a data source.

    • Now harvests combinedFields, improving completeness of metadata.

  • Marquez collector: Added support for AWS S3, enabling metadata harvesting for S3 datasets through Marquez.

  • Oracle collector: Now harvests Oracle roles, extending coverage of user and access metadata.

Release version 2.298

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 10.

Item

Details

Release version

2.298

Release date

25 September, 2025

Docker image ID

Jar file



New features and changes

  • Databricks collector: Now harvests models registered in Unity Catalog, expanding coverage of Databricks assets.

  • Confluent Cloud and Confluent Platform collectors: Added support for TLS connections to Confluent Kafka, improving security and compatibility.

Bug fixes

  • Microsoft Fabric collector: Fixed issues with missing DirectLake column-level lineage and corrected some improperly formed source IRIs for DirectQuery lineage.

  • Monte Carlo collector: Updated enum types from Monte Carlo GraphQL definitions to fix an exception caused by unrecognized types.

Release version 2.297

Important

This release was for internal improvements and has no customer impacting changes.

Release version 2.296

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 11.

Item

Details

Release version

2.296

Release date

15 September, 2025

Docker image ID

Jar file



New features and changes

  • Tableau collector: Added a new configuration option --tableau-convert-database-identifiers. When enabled, this converts the case of database identifiers (such as schema and table names) to the default collation of the associated database, improving consistency.

  • SQL Server Integration Services (SSIS) collector: Enhanced debug-level logging to support more effective troubleshooting and analysis.

  • Microsoft Fabric collector: The collector now harvests lineage from report pages to the columns and measures in the semantic model used by each page. Additionally, an "is hidden” flag is captured for report pages, providing clearer visibility into page-level metadata.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Fixed an issue to ensure harvesting continues from multiple databases even if the connection to one database fails.

Release version 2.295

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 12.

Item

Details

Release version

2.295

Release date

5 September 2025

Docker image ID

Jar file



New features and changes

  • Marquez collector: Added support for harvesting Marquez datasets associated with Databricks, extending lineage and metadata coverage.

  • SQL Server collector: Added support for SQL Server replications, improving visibility into replicated database environments.

  • Tableau collector: Introduced validation for user-configured site identifiers and added warnings when identifiers are invalid or inaccessible.

  • Microsoft Fabric collector:

    • Added support for variables and parameters in data pipeline activities even when those activities have not had a recent run.

    • Added support for warehouse sources that use SQL queries in Copy Activities, broadening coverage of pipeline sources.

  • Sigma collector: Now supports lineage from datasets to source tables or other datasets, improving traceability of dataset dependencies.

  • Databricks collector: Enhanced lineage harvesting to support SQL statements containing the struct function.

  • dbt core and dbt clould collectors: Added support for dbt projects targeting SQL Server databases using encryption, improving compatibility in secure environments.

  • Power BI collector: Added support for harvesting report images embedded in a zip file, ensuring complete metadata capture from reports.

Bug fixes

  • Databricks collector:

    • Fixed an issue where column properties were not cataloging the correct values due to an API bug.

    • Fixed redundant collection of workspace resources and jobs.

    • Updated the gitProvider property to support both uppercase and camelCase values returned by the Databricks Jobs API.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, SAP HANA collectors: Resolved parsing errors that occurred when harvesting view lineage from views whose SQL contained column comments with parentheses.

  • SQL Server Reporting Services (SSRS) collector: Corrected incorrect detection of when to use SOAP vs REST API, ensuring proper connectivity for older and newer SSRS versions.

  • OpenAPI collector: Fixed an issue with duplicate identification of API resources, ensuring unique resource cataloging.

Release version 2.294

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 13.

Item

Details

Release version

2.294

Release date

21 August, 2025

Docker image ID

Jar file



New features and changes

  • Tableau collector: Harvests the last published date for workbooks and data sources, providing greater visibility into update history.

  • Alteryx collector: Catalogs additional metadata, including the caption tag in ToolContainer, the query in LockInInput, and the SQL in DbFileInput.

  • QlikSense collector: Added a configuration option to include or exclude applications, giving users more control over the scope of harvested metadata.

Bug fixes

  • Athena collector: Fixed an issue where the collector stopped after 100 tables. It now correctly harvests more than 100 tables in a database.

  • Sigma collector: Enhanced the workbook filter to avoid missing workbook exceptions, improving reliability during harvesting.

  • MySQL collector: Fixed an error in fetching statistics for columns whose names are reserved SQL keywords.

  • Microsoft Fabric collector: Resolved issues in stored procedure harvesting by properly resolving names when using pipeline variables and parameters, and updated relationship types to represent dependencies more accurately.

Release version 2.293

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 14.

Item

Details

Release version

2.293

Release date

11 August, 2025

Docker image ID

Jar file



Bug fixes

  • Power BI collector: Fixed an issue to ensure that when a database name is provided in the datasources.yaml file, it is always used and not overridden by values retrieved from a database query.

  • Tableau collector: Fixed an issue so that Published Datasources are only cataloged when they are being used in a Project that is in scope, preventing unnecessary or irrelevant catalog entries.

  • Informatica CDI collector: Resolved an unexpected exception that was causing the collector to fail, improving stability and reliability.

Release version 2.292

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 15.

Item

Details

Release version

2.292

Release date

1 August 2025

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • amd64: sha256:c8813ea1b7589f37397f26ccd31824ab1bf12f2797066a21498fd3e889f12a87

    • arm64: sha256:80793256c70d6afb5385d9f067a88ba353ed47d298183b403d20c0f4a14fb8b4

Jar file



New features and changes

  • SQL Server collector: Added support for harvesting agent jobs.

  • Sigma collector: Added configuration options to include or exclude workspaces, providing greater control over which resources are harvested.

  • Redshift collector: Now supports harvesting external tables defined via AWS Glue.

  • Microsoft Fabric collector:

    • Dataflow Gen2 is now treated as a separate resource type from Dataflows.

    • Added support for cataloging destinations and table-level lineage for sources and destinations in Dataflow Gen2 CI/CD types.

  • Microsoft Fabric and Power BI collectors: Now catalog refresh schedules for resources where refresh configuration is available, helping track automated data updates.

  • AWS Glue collector: Now identifies partitioned columns separately from other columns.

Bug fixes

  • Marquez collector: Fixed a null pointer exception that could occur when a job lacked a latest run.

Release version 2.291

Important

Published versions of collectors are available as a docker image and a JAR file.

Details about the release

Table 16.

Item

Details

Release version

2.291

Release date

1 August 2025

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • amd64: sha256:2138f6297b46e4e3e1f103272c5d6ee5c4b9ccac62298312c51a4291be510cfe

    • arm64: sha256:7a91ef17763f34c61f8e164392a47476876b04048cd6d3df27921f35ae571b5b

Jar file



New features and changes

  • Microsoft Fabric collector:

    • Added support for harvesting Apps and Org Apps in Microsoft Fabric.

    • Added support for harvesting GraphQL instances.

Bug fixes

  • Marquez collector: Now skips unsupported dataset types, preventing errors during harvesting.

Release notes for previous versions