Skip to main content

Catalog collector release notes

Important

Published versions of collectors are available as a docker image here.

Release version 2.158

Details about the release

Table 1.

Item

Details

Release version

2.158

Release date

18 September, 2023

Docker image ID

  • arm64: c74f5866de2865782c6f1ef70b97d02bbd15a5bb824088362fe8e6037d500322

  • amd64: 5223699780b6dc2798c5e4173e55b2e74468b7f163a6e0e4c7928699c09d7849



New features and changes

  • Amazon S3 collector now harvests resources up to and including the maximum count specified by the user or the 10,000 default limit.

  • dbt Cloud collector handles scenarios where dbt Cloud runs contain more than the current dbt Cloud limit of 1000 artifacts.

Bug fixes

  • Databricks collector properly harvests column statistics from tables with no columns.

  • MS SQL Server collector properly handles case sensitivity of SQL keywords when parsing lineage.

Release version 2.157

Details about the release

Table 2.

Item

Details

Release version

2.157

Release date

14 September, 2023

Docker image ID

  • arm64: c76986c17f6d1b7ff36c052a21491eb5bbb4f68dc8418a0fe0f9f8d19abda6b3

  • amd64: e36c90c0fc7dbe9887951168b5c93236cb3eb357f7626f22eea6eedcf75cb2de



New features and changes

  • The following two new collectors are now available:

  • Power BI and power BI Gov Collectors: The following parameters for these two collectors now support regular expressions: --exclude-workspace --include-workspace

  • The following JDBC collectors now test for connection status before executing queries. If a connection is closed by the database, the collector detects this condition and re-opens the connection.

Bug fixes

  • Databricks collector now properly handles non-alphanumeric characters in object names.

Release version 2.156

Details about the release

Table 3.

Item

Details

Release version

2.156

Release date

7 September, 2023

Docker image ID

  • arm64: f1f5b73979468a5997a3e29e3a9ec0ae34b535a3863918078417894b67046d19

  • amd64: 75ea1cd791e8680fc5d24088ad8294b9561d48d9682f028105937180310593ef



New features and changes

Bug fixes

  • dbt Cloud Collector now properly selects the job that the user specified while running the collector.

Release version 2.155

Details about the release

Table 4.

Item

Details

Release version

2.155

Release date

29 August, 2023

Docker image ID

  • arm64: 4a09c0486552990aa75131e4ec88e47a4eb2eded2bf88a8cb559d88ab8ac11a4

  • amd64: 1108f76e93e4cb614cae9ddc03aec49a104566fbe62a93a307c4b2c648fb4bd3



Bug fixes

  • The AWS Glue collector properly handles AWS Glue Catalog instances with more than 100 databases.

Release version 2.154

Details about the release

Table 5.

Item

Details

Release version

2.154

Release date

27 August, 2023

Docker image ID

  • arm64: b243aee7fdf18855b387d99b03955a811ebf1888fb36136ee32802bcb0c5b7e3

  • amd64: b512e67115bbe9ff63f7adbe39416e885db9649301c665e11c6d620fe9c17a5f



New features and changes

  • AWS Glue collector has improved logging to help with troubleshooting of access and permissions issues.

Release version 2.153

Details about the release

Table 6.

Item

Details

Release version

2.153

Release date

25 August, 2023

Docker image ID

  • arm64: b5d818c510c161d3f08c10e932661e2f582a9053810d97942ac1a12b0f994ad0

  • amd64: e4b92cc1323516b104f7ea146b80bce46debe87246f9298c4eb1f17b3f030fa9



New features and changes

  • DBT Cloud collector: The following enhacements are made to the collector.

    • The collector now harvests information about dbt Cloud resources associated with the artifacts from which metadata is harvested.

    • The collector now supports two new parameters (--dbt-cloud-environment and --dbt-cloud-job) to allow users to filter runs by environment and job.

Release version 2.152

Details about the release

Table 7.

Item

Details

Release version

2.152

Release date

22 August, 2023

Docker image ID

  • arm64: f8606120f88e38e658902478b5e26181d0d260b732c389dc22a3f3fe89e41c58

  • amd64: ebf746424eda52ebf7edc7ce8013f4da6cb985abdd8036ede8af9885e8560f83



New features and changes

  • Snowflake and SQL Server Collectors now harvest column-level lineage from Stored Procedures.

  • Logging improvements:

    • Improved log messages for instances when the service account/user account used by the collector does not have access to upload to a dataset in data.world.

    • Debug logs messages now log current memory and stack size.

Release version 2.151

Details about the release

Table 8.

Item

Details

Release version

2.151

Release date

15 August, 2023

Docker image ID

  • arm64: 6cf7d92cd8fdf9fdc8b9ff2ea3bfb369203c43f1bc57e1e3f14c12f7ef651af8

  • amd64: 655cf89d3bd60e6a95da428abb1d3eb3621f96a384166ac15cdc4d73ca1a354e



New features and changes

  • A new collector is now available for SQL Server Reporting Services.

  • Databricks collector: Update the collector to retry after a pause when the Databricks API responds with too many requests.

  • All database collectors: Optimized the database collectors to reuse database connections where possible.

Release version 2.150

Details about the release

Table 9.

Item

Details

Release version

2.150

Release date

10 August, 2023

Docker image ID

  • arm64: 57a1a0425917d69c40688dfcf46b05a531a8cdca5ae3c798b0d20e518fcb60ee

  • amd64: 53920c9e9b80e45f3494b8f755c2e3dbf8275a0d125dd0997a0161f6d2edba99



Bug fixes

  • All collectors: Fixed an issue that prevented the user from passing command-line options containing spaces, when running the collectors using the docker container.

Release version 2.149

Details about the release

Table 10.

Item

Details

Release version

2.149

Release date

9 August, 2023

Docker image ID

  • arm64: d456d1966e6ede73cfabd4d11e77c4fddc9b3861ca82ab360493cf9a6b0f782b

  • amd64:  8406dae1ab24932c24a3bbc82618368653e32e0f4a5c09f0caa0eb22ed4e711a



New features and changes

  • Databricks Collector The collector now allows users to use Personal Access Token without specifying username/password for authentication.

  • Power BI Gov collector: The --include-user-workspace parameter is removed from the collector CLI options.

Release version 2.148

Details about the release

Table 11.

Item

Details

Release version

2.148

Release date

27 July, 2023

Docker image ID

  • arm64: 2c856f96af8576024b9a81fb89eb3803eaa03abbe055f601f5597f2f79a62019

  • amd64:  dcb90da4d519a165da4228f02ee75da4a892320901305df85279a99ea84e2cee



New features and changes

  • Databricks Collector has a new option --workflow-exclude to exclude harvesting of jobs/workflows.

  • Power BI and Power BI Gov Collectors now support parameter values in Power BI expressions.

Bug fixes

  • Tableau collector properly handles duplicate data sources when multiple filtered projects are specified.

Release version 2.147

Details about the release

Table 12.

Item

Details

Release version

2.147

Release date

24 July, 2023

Docker image ID

  • arm64: 872cbd1f19c7e54d6db57fdab60dc08fab2690a59df54e74f0718d8cd8794381

  • amd64:  d0a77c9cb81754d217f53dc4373a3b9b85425d687dcb6fb218abea678c42c148



New features and changes

  • Power BI and Power BI Gov collectors: The collectors have a new option --max-parseable-expression-length, which sets the maximum number of characters in a PowerBI expression that will be parsed for lineage metadata.

Bug fixes

  • Power BI and ThoughtSpot collectors now refresh expired authentication tokens.

  • The SQL Server collector now properly handles missing SQL definition when harvesting stored procedures.

Release version 2.146

Details about the release

Table 13.

Item

Details

Release version

2.146

Release date

19 July, 2023

Docker image ID

  • arm64: 3c70ebdae29b700f381f8dd079279ae16403083846549b747fe1be2efb03d5d2

  • amd64: 60fa58a68e2ea530faef3a5286d0e2fb55bd0d1b48260a31658ad25ea1f97644



New features and changes

  • Tableau collector: The collector now harvests lineage relationships between embedded data sources and published data sources to reflect any such relationship that exists in Tableau.

Bug fixes

  • Power BI collector: Improvements made to the collector to avoid hitting Power BI Admin API rate limits that prevented successful collection for certain large Power BI organizations.

  • Marquez collector: API authentication token [--marquez-api-key] is now a required parameter for the collector.

  • Fivetran collector: Fivetran API key (--fivetran-apikey) and Fivetran secret (--fivetran-apisecret) options are now required parameters for the collector.

Release version 2.145

Details about the release

Table 14.

Item

Details

Release version

2.145

Release date

14 July, 2023

Docker image ID

  • arm64: fcb792fef00634de9e02e635d522b776dad844d6918b8bd002d84eda1bc1c9a1

  • amd64: 318ce1c64752c55318941576a952015e9d8f11b2db1c1f94c86607504d3896ab



New features and changes

Release version 2.144

Details about the release

Table 15.

Item

Details

Release version

2.144

Release date

11 July, 2023

Docker image ID

  • arm64: d98bbc978ee4beb798b3068fc554f7ce4478fd9c5af971e598ad2b349422e0f6

  • amd64:  93bb18e5dd6a0f2ff4caaa8058f040d28f23420bf4a535209909a48cbb4f028f



Bug fixes

  • Databricks collector: The collector now properly handles missing information returned by the Databricks APIs.

  • dbt Core and dbt Cloud Collectors: The collectors now use the description property for resources in a dbt manifest file to populate the description of associated catalog resources.

  • Power BI collector: The collector now properly handles unexpected source formats.

Release version 2.143

Details about the release

Table 16.

Item

Details

Release version

2.143

Release date

7 July, 2023

Docker image ID

  • arm64: 98423e91c32ea3a45410da41290a0b4afb6041e1d59e4488adfb12269f48d695

  • amd64: 506104e806a64eb170b3f27a4b4f40067ff27ed2f43de101f9c7e1f004aac463



New features and changes

  • DB2 collector: The collector now support harvesting of column statistics and function and stored procedure information. For details about using the new parameters (--target-sample-size--sample-string-values--enable-column-statistics) for these features, see the DB2 collector documentation.

  • Redshift collector: The collector now properly distinguishes between user-defined functions and stored procedures when harvesting function and stored procedure metadata in the collector.

  • Tableau collector: Improved error messages and handling of missing Salesforce connection information within the Tableau collector.

Bug fixes

  • Databricks collector: Fixed defects in the collector to accommodate invalid number formats and missing information returned by Databricks APIs in some cases.

Release version 2.142

Details about the release

Table 17.

Item

Details

Release version

2.142

Release date

23 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: ccc19cfe72b618bce99c18f755f1c7c7489f012f626e5ecf7abf98f8f9590012

  • amd64: 507d600ade53fdad8973e0ef9cccdb116cb46b1e198ed38427feb2f8ebb8ac95



New features and changes

  • Power BI collector: A new parameter --all-workspaces-and-apps is available for the Power BI collector which allows users to catalog all available data from the tenant using the admin API.

Bug fixes

  • Databricks collector: Fixed an issue where the collector was terminating abnormally when it encountered a notebook that had no language specified for it.

  • Microsoft SQL Server collector: Fixed an issue with parsing the SQL for certain Views in Microsoft SQL Server that prevented harvesting of lineage.

Release version 2.141

Details about the release

Table 18.

Item

Details

Release version

2.141

Release date

20 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: d17699bcb6ac1a11f8f32be735dd04367517a4cd32c2afaecce17e1069f1e203

  • amd64: a0da0c8f2db9ad45ee028fe491a7e43159969033edc1bbdd906d6327c44d7812



New features and changes

  • Thoughtspot collector now harvests:

    • Column-level lineage between JDBC source tables columns and ThoughtSpot logical columns.

    • Column-level lineage between ThoughtSpot logical columns and Answers and Liveboards that connect to the data.

  • Databricks collector now harvests additional metadata for Databricks tables.

  • The Redshift, SQL Server, and PostgreSQL collectors now harvest:

    • Functions

    • Stored procedures

  • Power BI, Looker, and Thoughtspot collectors: The resources cataloged by these collectors will now automatically include a link to the resource in the source system. This allows users to go from data.world to the associated URL for the same resource in the source system so the users do not have to manually find that resource in the source system.

    resource_button.png

Release version 2.140

Details about the release

Table 19.

Item

Details

Release version

2.140

Release date

13 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 9c2d1cf893eddc89924f212d1e10c9f26deb05ac254db783d3bdc8fc83fb6d5e

  • amd64: dc9dc94deed7fbf7da17fb822d9a085434f0a72baee8e965c7dfe1b537d62b95



New features and changes

  • dbt cloud collector allows the user to pass in a Snowflake role and Snowflake warehouse to override values found in the dbt cloud project configuration.

Bug fixes

  • dbt core and dbt cloud collectors properly handle source meta config values that are objects rather than strings in the generated dbt manifest file.

  • SQL Server Collector properly disables lineage collection when the --disable-lineage-collection parameter is set.

  • Databricks collector includes additional checks for existence of and access to Unity Catalog.

Release version 2.139

Details about the release

Table 20.

Item

Details

Release version

2.139

Release date

7 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 95c668b9ceb092cc7d99a78caa4b64b1d70ee2a1de9d47fadd1f8206aecb6948

  • amd64: fbe3a314510718a74a0162918db20d345a48e76c879f49ed6851768547e9e855



New features and changes

  • SQL Server collector now harvests created date and modified date for tables and schemas, and harvests table size in bytes.

  • Snowflake collector now harvests table size in bytes.

  • Power BI collector allows setting the --azure-tenantid option when using username and password authentication.

  • All collectors now support the ability to set the JVM stack size using the DWCC_JVM_OPTIONS parameter.

Bug fixes

  • SQL Server collector properly handles harvesting of View SQL containing character length that is more than the SQL Server column default character length (6000).

  • dbt Cloud collector Rather than reporting an error, the collector now skips job runs that do not have generated documentation artifacts.

  • Tableau collector properly catalogs all sites if no site is specified in the CLI/YAML.

Release version 2.138

Details about the release

Table 21.

Item

Details

Release version

2.138

Release date

2 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: ae9e23f1063eea7c998e4028a286e93411483a6d7a46c81eb71f11ca48f3e0ca

  • amd64: a729ad549567254d7cd4e96bf7546457b8977f250c20ecef558bf97c1565dd0c



New features and changes

  • Monte Carlo collector: The collector now:

    • Catalogs additional metadata for incidents, monitors, and tables.

    • Uses a smaller default GraphQL page size.

Bug fixes

  • BigQuery collector:

    • Properly handles issue where table IDs are returned as null.

    • Properly handles issue with table IDs that have quotes in them.

  • Monte Carlo collector: Properly handles external URLs for tables that contain spaces.

  • dbt Core and dbt Cloud collectors: Properly harvests all meta config containing object values.

  • Profiling: Properly handles histogram values containing excessive range, overflow, or underflow values.

Release notes for previous versions