Catalog collector release notes
Important
Published versions of collectors are available as a docker image here.
Release version 2.158
Details about the release
Item | Details |
---|---|
Release version | 2.158 |
Release date | 18 September, 2023 |
Docker image ID |
|
New features and changes
Amazon S3 collector now harvests resources up to and including the maximum count specified by the user or the 10,000 default limit.
dbt Cloud collector handles scenarios where dbt Cloud runs contain more than the current dbt Cloud limit of 1000 artifacts.
Bug fixes
Databricks collector properly harvests column statistics from tables with no columns.
MS SQL Server collector properly handles case sensitivity of SQL keywords when parsing lineage.
Release version 2.157
Details about the release
Item | Details |
---|---|
Release version | 2.157 |
Release date | 14 September, 2023 |
Docker image ID |
|
New features and changes
The following two new collectors are now available:
Power BI and power BI Gov Collectors: The following parameters for these two collectors now support regular expressions: --exclude-workspace --include-workspace
The following JDBC collectors now test for connection status before executing queries. If a connection is closed by the database, the collector detects this condition and re-opens the connection.
Bug fixes
Databricks collector now properly handles non-alphanumeric characters in object names.
Release version 2.156
Details about the release
Item | Details |
---|---|
Release version | 2.156 |
Release date | 7 September, 2023 |
Docker image ID |
|
New features and changes
Amazon S3 collector now allows users to filter by bucket with --include-bucket and --exclude-bucket options.
Kafka - Confluent Platform Collector now supports SASL/SCRAM-SHA-512 authentication. If you want to use this authentication, use the --kafka-cluster-sasl-type parameter while running the collector.
New parameters for API retries when API calls fail: The following collectors now support two new parameters ( --api-max-retries and --api-retry-delay) which allow users to specify the maximum number of times the collector will try to reconnect when the API call to the data source fails.
Bug fixes
dbt Cloud Collector now properly selects the job that the user specified while running the collector.
Release version 2.155
Details about the release
Item | Details |
---|---|
Release version | 2.155 |
Release date | 29 August, 2023 |
Docker image ID |
|
Bug fixes
The AWS Glue collector properly handles AWS Glue Catalog instances with more than 100 databases.
Release version 2.154
Details about the release
Item | Details |
---|---|
Release version | 2.154 |
Release date | 27 August, 2023 |
Docker image ID |
|
New features and changes
AWS Glue collector has improved logging to help with troubleshooting of access and permissions issues.
Release version 2.153
Details about the release
Item | Details |
---|---|
Release version | 2.153 |
Release date | 25 August, 2023 |
Docker image ID |
|
New features and changes
DBT Cloud collector: The following enhacements are made to the collector.
The collector now harvests information about dbt Cloud resources associated with the artifacts from which metadata is harvested.
The collector now supports two new parameters (--dbt-cloud-environment and --dbt-cloud-job) to allow users to filter runs by environment and job.
Release version 2.152
Details about the release
Item | Details |
---|---|
Release version | 2.152 |
Release date | 22 August, 2023 |
Docker image ID |
|
New features and changes
Snowflake and SQL Server Collectors now harvest column-level lineage from Stored Procedures.
Logging improvements:
Improved log messages for instances when the service account/user account used by the collector does not have access to upload to a dataset in data.world.
Debug logs messages now log current memory and stack size.
Release version 2.151
Details about the release
Item | Details |
---|---|
Release version | 2.151 |
Release date | 15 August, 2023 |
Docker image ID |
|
New features and changes
A new collector is now available for SQL Server Reporting Services.
Databricks collector: Update the collector to retry after a pause when the Databricks API responds with too many requests.
All database collectors: Optimized the database collectors to reuse database connections where possible.
Release version 2.150
Details about the release
Item | Details |
---|---|
Release version | 2.150 |
Release date | 10 August, 2023 |
Docker image ID |
|
Bug fixes
All collectors: Fixed an issue that prevented the user from passing command-line options containing spaces, when running the collectors using the docker container.
Release version 2.149
Details about the release
Item | Details |
---|---|
Release version | 2.149 |
Release date | 9 August, 2023 |
Docker image ID |
|
New features and changes
Databricks Collector The collector now allows users to use Personal Access Token without specifying username/password for authentication.
Power BI Gov collector: The --include-user-workspace parameter is removed from the collector CLI options.
Release version 2.148
Details about the release
Item | Details |
---|---|
Release version | 2.148 |
Release date | 27 July, 2023 |
Docker image ID |
|
New features and changes
Databricks Collector has a new option --workflow-exclude to exclude harvesting of jobs/workflows.
Power BI and Power BI Gov Collectors now support parameter values in Power BI expressions.
Bug fixes
Tableau collector properly handles duplicate data sources when multiple filtered projects are specified.
Release version 2.147
Details about the release
Item | Details |
---|---|
Release version | 2.147 |
Release date | 24 July, 2023 |
Docker image ID |
|
New features and changes
Power BI and Power BI Gov collectors: The collectors have a new option --max-parseable-expression-length, which sets the maximum number of characters in a PowerBI expression that will be parsed for lineage metadata.
Bug fixes
Power BI and ThoughtSpot collectors now refresh expired authentication tokens.
The SQL Server collector now properly handles missing SQL definition when harvesting stored procedures.
Release version 2.146
Details about the release
Item | Details |
---|---|
Release version | 2.146 |
Release date | 19 July, 2023 |
Docker image ID |
|
New features and changes
Tableau collector: The collector now harvests lineage relationships between embedded data sources and published data sources to reflect any such relationship that exists in Tableau.
Bug fixes
Power BI collector: Improvements made to the collector to avoid hitting Power BI Admin API rate limits that prevented successful collection for certain large Power BI organizations.
Marquez collector: API authentication token [--marquez-api-key] is now a required parameter for the collector.
Fivetran collector: Fivetran API key (--fivetran-apikey) and Fivetran secret (--fivetran-apisecret) options are now required parameters for the collector.
Release version 2.145
Details about the release
Item | Details |
---|---|
Release version | 2.145 |
Release date | 14 July, 2023 |
Docker image ID |
|
New features and changes
The following two new collectors are now available:
Release version 2.144
Details about the release
Item | Details |
---|---|
Release version | 2.144 |
Release date | 11 July, 2023 |
Docker image ID |
|
Bug fixes
Databricks collector: The collector now properly handles missing information returned by the Databricks APIs.
dbt Core and dbt Cloud Collectors: The collectors now use the description property for resources in a dbt manifest file to populate the description of associated catalog resources.
Power BI collector: The collector now properly handles unexpected source formats.
Release version 2.143
Details about the release
Item | Details |
---|---|
Release version | 2.143 |
Release date | 7 July, 2023 |
Docker image ID |
|
New features and changes
DB2 collector: The collector now support harvesting of column statistics and function and stored procedure information. For details about using the new parameters (--target-sample-size, --sample-string-values, --enable-column-statistics) for these features, see the DB2 collector documentation.
Redshift collector: The collector now properly distinguishes between user-defined functions and stored procedures when harvesting function and stored procedure metadata in the collector.
Tableau collector: Improved error messages and handling of missing Salesforce connection information within the Tableau collector.
Bug fixes
Databricks collector: Fixed defects in the collector to accommodate invalid number formats and missing information returned by Databricks APIs in some cases.
Release version 2.142
Details about the release
Item | Details |
---|---|
Release version | 2.142 |
Release date | 23 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Power BI collector: A new parameter --all-workspaces-and-apps is available for the Power BI collector which allows users to catalog all available data from the tenant using the admin API.
Bug fixes
Databricks collector: Fixed an issue where the collector was terminating abnormally when it encountered a notebook that had no language specified for it.
Microsoft SQL Server collector: Fixed an issue with parsing the SQL for certain Views in Microsoft SQL Server that prevented harvesting of lineage.
Release version 2.141
Details about the release
Item | Details |
---|---|
Release version | 2.141 |
Release date | 20 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Thoughtspot collector now harvests:
Column-level lineage between JDBC source tables columns and ThoughtSpot logical columns.
Column-level lineage between ThoughtSpot logical columns and Answers and Liveboards that connect to the data.
Databricks collector now harvests additional metadata for Databricks tables.
The Redshift, SQL Server, and PostgreSQL collectors now harvest:
Functions
Stored procedures
Power BI, Looker, and Thoughtspot collectors: The resources cataloged by these collectors will now automatically include a link to the resource in the source system. This allows users to go from data.world to the associated URL for the same resource in the source system so the users do not have to manually find that resource in the source system.
Release version 2.140
Details about the release
Item | Details |
---|---|
Release version | 2.140 |
Release date | 13 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
dbt cloud collector allows the user to pass in a Snowflake role and Snowflake warehouse to override values found in the dbt cloud project configuration.
Bug fixes
dbt core and dbt cloud collectors properly handle source meta config values that are objects rather than strings in the generated dbt manifest file.
SQL Server Collector properly disables lineage collection when the --disable-lineage-collection parameter is set.
Databricks collector includes additional checks for existence of and access to Unity Catalog.
Release version 2.139
Details about the release
Item | Details |
---|---|
Release version | 2.139 |
Release date | 7 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
SQL Server collector now harvests created date and modified date for tables and schemas, and harvests table size in bytes.
Snowflake collector now harvests table size in bytes.
Power BI collector allows setting the --azure-tenantid option when using username and password authentication.
All collectors now support the ability to set the JVM stack size using the DWCC_JVM_OPTIONS parameter.
Bug fixes
SQL Server collector properly handles harvesting of View SQL containing character length that is more than the SQL Server column default character length (6000).
dbt Cloud collector Rather than reporting an error, the collector now skips job runs that do not have generated documentation artifacts.
Tableau collector properly catalogs all sites if no site is specified in the CLI/YAML.
Release version 2.138
Details about the release
Item | Details |
---|---|
Release version | 2.138 |
Release date | 2 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Monte Carlo collector: The collector now:
Catalogs additional metadata for incidents, monitors, and tables.
Uses a smaller default GraphQL page size.
Bug fixes
BigQuery collector:
Properly handles issue where table IDs are returned as null.
Properly handles issue with table IDs that have quotes in them.
Monte Carlo collector: Properly handles external URLs for tables that contain spaces.
dbt Core and dbt Cloud collectors: Properly harvests all meta config containing object values.
Profiling: Properly handles histogram values containing excessive range, overflow, or underflow values.
Release notes for previous versions
Go here to access release notes for previous version.