Skip to main content

Release notes for previous versions

Release version 2.227

Details about the release

Table 1.

Item

Details

Release version

2.227

Release date

July 31, 2024

Docker image ID

Jar file



New features and changes:

  • SQL Server collector: If an error occurs while fetching columns from the database by schema, the collector now attempts to fetch columns by table instead.

Bug fixes:

  • SQL Server Reporting Services (SSRS) and Power BI Report Server (PBIRS) collectors: Fixed an issue when collector resources were not returned by the API.

Release version 2.226

Details about the release

Table 2.

Item

Details

Release version

2.226

Release date

July, 31, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:f55a432cbf6aa6d260cafffc528371e76d3bdeb94c19ef4131c1bfec189448b1

    • amd64: sha256:351dd8559fbc460de302d9bd87bcae815244f92ecbbca651aa403db97be5110b

Jar file



Bug fixes:

  • SQL Server Reporting Services (SSRS) and Power BI Report Server (PBIRS) collectors: Fixed an issue where the collector would terminate abnormally if the SSRS API returned no data under certain circumstances.

  • Power BI Service and Power BI Gov Collectors: The collectors now correctly handled case mismatches in source column names when resolving SQL statements for lineage.

Release version 2.225

Details about the release

Table 3.

Item

Details

Release version

2.225

Release date

July 30, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:5506911e638a5b7493395bcce874837cf3fb19bdfe864cfab1b501996f115b4f

    • amd64: sha256:cbdeed21a29be42cf423c8114de30f180b7ce67225f422d91c2f73a56d6de686

Jar file



Bug fixes

  • SQL Server Reporting Services (SSRS) and Power BI Report Server (PBIRS) collectors: Fixed an issue that caused the collector to terminate unexpectedly when encountering Linked Reports with names containing non-alphanumeric characters.

Release version 2.224

Details about the release

Table 4.

Item

Details

Release version

2.224

Release date

July 30, 2024

Docker image ID

Jar file



New features and changes

  • Oracle collector: Enabled caching for primary keys and foreign keys, and reduced the number of queries used to gather table and column extended metadata, resulting in improved collector run time.

  • SQL Server Reporting Services (SSRS) and Power BI Report Server (PBIRS) collectors: Item path is now harvested for report, data source, and dataset titles.

Bug fixes

  • SQL Server Reporting Services (SSRS) and Power BI Report Server (PBIRS) collectors: Resolved an issue with NTLM Authentication.

Release version 2.223

Details about the release

Table 5.

Item

Details

Release version

2.223

Release date

July 29, 2024

Docker image ID

Jar file



New features and changes

  • Power BI Service and Power BI Gov collectors: The collectors now support TNS connection strings in lineage parsing for Oracles sources if HOST and SID are specified. For example, (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=SERVER_NAME)(PORT=1521))(CONNECT_DATA=(SID=KOSTEST))).

  • SQL Server Reporting Services and Power BI Report Service collectors: The collectors now support authentication using NTLM.

  • Amazon S3 collector: The collector now harvests objects that begin with a prefix.

  • Salesforce collector: The collector now harvests metadata for Objects, Fields, Dashboards, and Reports. It also supports OAuth authentication instead of Basic authentication. You must complete the new pre-requisite tasks to use OAuth authentication.

  • Tableau collector: Enhanced resiliency for Tableau GraphQL query execution.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • The collectors now properly handle SQL parsing for lineage, ensuring newline characters \r do not disrupt SQL parsing.

    • Fixed an issue with the usage of variable names in stored procedures.

  • Power BI Service and Power BI Gov collectors: Fixed an issue with handling parameters that are defined in the tables section of Semantic Models, allowing for successful parsing of source information for tables using those parameters.

  • Snowflake collector: The collector now appropriately handles date time parsing for the timestamp NTZ format.

Release version 2.222

Details about the release

Table 6.

Item

Details

Release version

2.222

Release date

July 23, 2024

Docker image ID

Jar file



New features and changes

  • Power BI Service and Power BI Gov collectors: The collectors now support Denodo sources in Power BI column-level lineage parsing.

  • Denodo collector: The collector now harvests column-level lineage.

  • SQL Server Integration Services (SSIS) collector: Added a new --jdbc-property parameter. This allows you to provide authentication details for NTLM Authentication type.

  • dbt Core and dbt Cloud collectors: The collectors now harvests model columns from catalog.json and manifest.json database objects.

Bug fixes

  • Power BI collector: The collector now properly handle scenarios where columns are renamed in Power BI that resulted in duplicate columns in source tables.

  • Azure Data Factory collector: The collector now properly performs Date transformation when the time zone is not available as ZoneID.

  • Azure Data Lake Storage Gen2 collector:

    • Updated the collector to remove redundant permission-related relationships.

    • Fixed an issue with the IRIs creation for collector resources by using correct terms.

Release version 2.221

Details about the release

Table 7.

Item

Details

Release version

2.221

Release date

July 15, 2024

Docker image ID

Jar file



New features and changes

  • Power BI Gov collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Fixed an issue where placing a comment directly after a keyword without a space was sometimes causing parsing issues.

    • Fixed an issue with parsing CREATE VIEW statements where parentheses were being incorrectly removed during the SQL pre-processing.

    • Proper error messages are now logged when users run the collectors with the --dry-run option without specifying a single database or with multiple databases.

  • Snowflake collector: Resolved an issue where the collector was cataloging an incorrect database when the user had a default namespace set in Snowflake.

  • Databricks collector: Fixed an issue where the collector output files uploads were failing due to spaces in IRIs.

  • QuickSight Collector: Fixed an AwsAccountId null error while listing resources using pagination, which was causing issues in cataloging all the specified resources.

  • Azure Data Factory collector:

    • Resolved an issue with truncated paginated results.

    • Fixed an issue with the title of global parameters by correctly using the parameter name.

Release version 2.220

Details about the release

Table 8.

Item

Details

Release version

2.220

Release date

July 10, 2024

Docker image ID

Jar file



New features and changes

  • Oracle collector: Added support for lineage when the SELECT statement contains synonyms. This enhancement fixes lineage tracking between Oracle and Power BI when synonyms are used.

  • Power BI collector: The collector now harvests preview images for Power BI reports. Add the new parameter --image-collection to your command/YAML file to use this new feature.

Bug fixes

  • Power BI and Power BI Gov collectors:

    • Fixed an issue with parameter value replacement in expressions when the parameter contains a $ symbol.

    • Fixed an issue where Power BI reports failed to process when the page name is null.

Release version 2.219

Details about the release

Table 9.

Item

Details

Release version

2.219

Release date

July 8, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:a9f524581a769ade5b7c01a887a7dbd90510a94ce6a37e115fc7a6a7e40f0557

    • amd64: sha256:f10deae54bbf33be7a40352ae048dd74d3981698dabceae748d513250dcf800a

Jar file



New features and changes

  • All collectors: Each catalog resource in the catalog output file now contains information about the collector that harvested the resource. This information is available only in the catalog file and can be used in SPARQL automations.

Bug fixes

  • Power BI and Power BI Gov collectors: The collectors now properly handle scenarios when they run into API request limits. A new parameter Disable max requests wait (--disable-max-requests-wait) is added for handling these scenarios.

  • Azure Data Lake Storage Gen2 collector: Resolved an issue where certain ACL information missing in the Azure Data Lake Storage API response caused errors in the collector.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Harvesting of column-level lineage from views now supports view definitions containing unaliased subselects.

Release version 2.218

Details about the release

Table 10.

Item

Details

Release version

2.218

Release date

July 1, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:93c8ebbd04553ad2123d944efca841dccc44b06dacdd0bd9683bbd7fe41c282e

    • amd64: sha256:d078c72ae238ab02d0a1e8cc003509eba463ed78f97cf2835073ee586d9fc741

Jar file



New features and changes

  • Power BI and Power BI Gov collectors: The collectors now support parsing data source expressions for Power BI tables where the source connection information is defined as a parameter. This means that if Power BI users specify data source connection information in a parameter and use that parameter in place of the source in the expression, the collectors will correctly parse and resolve the expression/lineage.

  • Oracle collector: The collector now harvests from DBA_ views if the credential used to execute the collector lacks permissions for information schema views.

  • dbt Core collector: The collector now harvest database objects and intra-database lineage from dbt projects and artifacts that use Azure Synapse as a backend.

  • All collectors: Collectors now verify that the user-requested upload location exists with proper permissions before execution and issue a warning if a problem exists.

  • Databricks collector: The collector no longer supports Databricks-managed password authentication. If you used this method of authentication, you must change the authentication to personal access token. For details, see "Preparing Databricks for collectors".

Bug fixes

  • SQL Server collector:

    • Fixed an issue where large values for column statistics produced an arithmetic overflow.

    • Resolved a problem where view definitions that include the TOP() expression were not properly handled when harvesting column-level lineage for views.

  • Power BI and Power BI Gov collectors: Fixed an issue where logging operations were causing an exception if certain Power BI objects were null.

  • Tableau collector: Fixed an issue where certain Tableau projects were not fully cataloged.

Release version 2.216

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 11.

Item

Details

Release version

2.216

Release date

June 26, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:52406bc14a26061e473987b281f3c354592b811af99c7fc8cf86833cffc9e819

    • amd64: sha256:52a347688c2e68806bf82a284e5a0ee8c9dfbda5f970f24ab69037e8025e3e8f

Jar file



Release version 2.215

Details about the release

Table 12.

Item

Details

Release version

2.215

Release date

June 26, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:e8bd4f7ba78527cea800acb3a03a1440b283f1fdd81097ad19a7a8b9a5362d9c

    • amd64: sha256:9018f2d0a0b734e5bbd858e1ba5a00b88b56982da6b9c0f690f6fff901016630

Jar file



New features and changes

  • Power BI Gov Collector:

    • The collector now supports harvesting of all workspaces and apps using the --all-workspaces-and-apps parameter.

    • Added the ability to disable lineage harvesting using the --disable-expression-lineage parameter.

Release version 2.214

Important

This release was for internal improvements and has no customer impacting changes.

Details about the release

Table 13.

Item

Details

Release version

2.214

Release date

June 25, 2024

Docker image ID

Jar file



Release version 2.213

Details about the release

Table 14.

Item

Details

Release version

2.213

Release date

June 25, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:5c57df9b1285ce0f663955d0f90ad8c7c6e90acc110538473065d418a31ad2c9

    • amd64: sha256:20bf229751e3657c960ecc68a484b2df5be094215bc6e6163f206c8dcb9f0dba

Jar file



New features and changes

  • Azure Data Factory collector: The collector now harvests Expressions for table names, schema names, file names.

  • new collector for SQL Server Integration Services (SSIS) is now available in private preview. If you would like access to this collector, please contact your Customer Success Director.

Bug fixes

  • Power BI and Power BI Gov collectors: The collectors now correctly harvest lineage for column types.

Release version 2.212

Details about the release

Table 15.

Item

Details

Release version

2.212

Release date

June 21, 2024

Docker image ID

  • Link to download the Docker image: https://hub.docker.com/r/datadotworld/dwcc/tags

    • arm64: sha256:7880afc402b3db8f45a31bb90f565d417feb0c2fbfcc78715854da24c594cfc6

    • amd64: sha256:6218bc7e03a1d5b3195d64625ee4525f7739d59bbcb920afb95d719e8b90fe6a

Jar file



New features and changes:

  • Snowflake collector: The collector now harvests metadata for functions and stored procedures from the snowflake.account_usage views when the metadata is unavailable from the information_schema of the database.

  • Power BI and Power BI gov collectors now catalog:

    • Dataset table expression

    • Description for the workspace, app, and dataset

Bug fixes:

  • ADF collector: Fixed an issue with datetime parse errors while harvesting triggers.

Release version 2.211

Details about the release

Table 16.

Item

Details

Release version

2.211

Release date

June 15, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI gov collectors: The collectors now support lineage for Oracle database objects.

Bug fixes

  • Power BI and Power BI gov collectors: Resolved an issue with collecting child resources for apps when using service principal authentication.

  • Snowflake and Oracle collectors: Fixed an issue where the system now correctly does not harvest function lineage when users enable the Disable lineage collection (--disable-lineage-collection) option.

  • Oracle collector: Fixed an issue with harvesting database columns of LONG type.

Release version 2.210

Details about the release

Table 17.

Item

Details

Release version

2.210

Release date

June 7, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI Gov collectors:

    • Added a new feature that provides support to parse SQL statements within table expressions, enabling column-level lineage harvesting. To use this feature, you need to use the --datasource-mapping-file to specify the credentials. These credentials allow the collector to link lineage to the database sources.

    • The collector now harvests measures.

  • Databricks collector: The collector now harvests table and column tags by schema.

Bug fixes

  • Snowflake collector was unable to harvest lineage if the SQL statement included a dash in the column aliases.

  • Snowflake, Teradata, Netezza collectors: Fixed an issue that occurred because of insufficient information while harvesting agent resources for functions and procedures.

  • SQL Server collector: Fixed an issue that occurred while parsing view queries where columns have dashes in their names.

Release version 2.209

Details about the release

Table 18.

Item

Details

Release version

2.209

Release date

June 2, 2024

Docker image ID

Jar file



New features and changes

  • Databricks collector: The collector now harvests table and column lineage from system tables. To use this feature, you need to set new permissions for the collector.

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Resolved a problem concerning column statistics when an aggregate statistic has a zero value.

  • Tableau collector: Resolved an issue to correctly associate lineage with the appropriate parent project.

  • Sigma collector: Resolved an issue which occurred when a dataset referred to in the lineage was not available among the harvested datasets.

  • Snowflake collector: Fixed an issue associated with external URLs containing special characters.

Release version 2.208

Details about the release

Table 19.

Item

Details

Release version

2.208

Release date

24 May, 2024

Docker image ID

Jar file



Bug fixes

  • Snowflake collector:

    • Resolved the issue that arose from Snowflake not returning function metadata.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Addressed the issue encountered during the harvesting of column statistics when the result set contained non-integer values.

Release version 2.207

Details about the release

Table 20.

Item

Details

Release version

2.207

Release date

21 May, 2024

Docker image ID

Jar file



Bug fixes

  • BigQuery collector: The collector is updated to generate catalog records for BigQuery Label instances. This allows them to be visible on the resource pages in the application.

  • Sigma collector: Resolved an issue that could result in an exception when the Sigma APIs failed to return a table path for a connection.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Enhanced error log statements by adding fully qualified table names when certain tables or columns in the database cannot be located during lineage resolution.

Release version 2.206

Details about the release

Table 21.

Item

Details

Release version

2.206

Release date

17 May, 2024

Docker image ID

Jar file



New features and changes

  • Sigma collector:

    • A new --pagination-limit parameter is now available for the collector. You can use this parameter to set the page size for the Sigma API response. The maximum value you can set is 1000. If you do not specify a value, the default page size is 25.

    • The collectors is optimized to enhance the efficiency of lineage harvesting.

  • Snowflake collector: The collector now harvests extended metadata for tables, views, and materialized views.

Bug fixes

  • SQL Server collector: Incorporated additional debug logging for when the collector fails to harvest extended metadata.

  • Oracle collector:

    • The collector is now able to handle column names with single quotes in them.

    • Fixed an issue with synonyms being harvested in the wrong schema.

Release version 2.205

Details about the release

Table 22.

Item

Details

Release version

2.205

Release date

17 May, 2024

Docker image ID

Jar file



New features and changes

  • Power BI and Power BI Gov collectors: The OBDC data sources YAML file (datasources.yml) is updated to allow user-specified aliases for the database location (host). This ensures that resources are accurately linked across collectors.

  • Snowflake collector: Added support for harvesting materialized views for SQL definition, External URL (Snowsight).

Bug fixes

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • The collectors are optimized to load JDBC drivers more efficiently, thereby reducing memory usage.

Release version 2.204

Details about the release

Table 23.

Item

Details

Release version

2.204

Release date

10 May, 2024

Docker image ID

Jar file



Bug fixes

  • SQL Server collector: The collector now correctly manages a scenario to use a consistent case when a collation is set.

  • dbt core and dbt cloud collectors: The collectors are optimized to correctly manage scenarios that previously caused an exception while harvesting lineage.

  • Sigma collector: The collector is optimized to manage scenarios that were previously causing the collector to not run properly.

Release version 2.203

Details about the release

Table 24.

Item

Details

Release version

2.203

Release date

8 May, 2024

Docker image ID

Jar file



New features and changes

  • dbt Core collector:

    • Now supports multiple run_results.json in single collector run. Add the new parameter --run-results-directory to your command/YAML file to use this new feature.

    • Now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.

  • dbt cloud collector now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.

Bugs

  • Sigma collector properly deserializes objects from Sigma API.

  • Power BI and Power BI gov collectors now properly obtains server name and port from Power BI data source parameters.

Release version 2.202

Details about the release

Table 25.

Item

Details

Release version

2.202

Release date

7 May, 2024

Docker image ID

Jar file



New features

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Optimized view query parsing to improve the processing time for large SQL statements.

    • Optimized the querying of metadata during view lineage harvesting.

  • Oracle Collector: Added a new --oracle-jdbc-timezone-as-region parameter. This allows you to decide if the Oracle JDBC connection timezone should utilize the JVM's default timezone.

Bug fixes

  • AWS Glue Collector: Improved the log message that are recorded when the harvesting of job lineage fail.

Release version 2.201

Details about the release

Table 26.

Item

Details

Release version

2.201

Release date

2 May, 2024

Docker image ID

Jar file



New features and changes

  • Oracle and SQL Server collectors: The collectors now catalog column-level lineage when functions and stored procedures contain sub-selects.

  • Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:

    • Performance optimizations are done to the collectors to improve the overall runtime of the collectors.

    • A new parameter --disable-extended-metadata is now available that allows you to skip harvesting of extended metadata for resource types such as database, schema, table, columns functions, stored procedures, user defined types, synonyms. Basic metadata for these resource types will still be harvested.

  • Power BI and Power BI gov collectors now catalog:

    • Relationships between Power BI apps and workspaces

    • Apps with associated workspace IDs (when service principal authentication is used)

Bug fixes

  • Teradata collector properly harvests lineage metadata from views with SQL statements containing REPLACE RECURSIVE VIEW, LOCK ROW ACCESS.

  • Oracle collector properly harvests lineage metadata from views with COLLECT.

  • All collectors properly handle config file options that start with option flags.

Release version 2.200

Details about the release

Table 27.

Item

Details

Release version

2.200

Release date

19 April, 2024

Docker image ID

Jar file



New features and changes

  • All collectors: Users now have the option to define a custom output file name for the collector catalog during run time. To do this, use the --output-name parameter. The system automatically adds .dwec.ttl to the end of the provided file name.

    Note

    If you are updating the file name for an already configured collector, make sure to check and modify any existing SPARQL queries that explicitly mention existing collector output files.

  • Oracle Collector: The collector now harvests Oracle package bodies and Oracle package specifications.

Bug fixes

  • SQL Server collector Fixed an error that occurred when harvesting column statistics.

  • Power BI and Power BI Gov collectors: Resolved an issue that was causing errors during the parsing of expressions that used the Table.RenameColumns Power Query table function in certain cases.

  • Snowflake Collector: The collector now properly harvest tags that are defined in a different schema than the schemas specified for the collector.

  • The following collectors are updated to harvest lineage accurately for group by, order by, where, and having SQL expressions. Prior to this update, the relationships were incorrectly directed.

    Postgres, Databricks, Derby, Netezza, Oracle, Redshift, Snowflake, SQL Server, Teradata collectors

Release version 2.199

Details about the release

Table 28.

Item

Details

Release version

2.199

Release date

11 April, 2024

Docker image ID

Jar file



New features and changes

  • new collector is now available for Amazon Managed Streaming for Kafka.

  • Oracle collector: The collector now harvests lineage from views, stored procedures, and functions.

  • Snowflake collector: The collector now harvests Streamlit apps.

  • The following collectors now support harvesting from multiple databases specified by users. This means you can provide the --database parameter multiple times while running the collector.

    • Databricks, PostgresSQL, SQL Server, Db2, Redshift, Denodo, Oracle, MySQL, Snowflake, Teradata

Bug fixes

  • Power BI and Power BI Gov collector: Resolved an issue that was caused by parsing expand column expressions.

  • dbt cloud collector: The collector now properly harvests metadata of dbt Cloud artifacts when the target database is not Snowflake. Note the collector will only harvest metadata from the dbt Cloud artifacts and not connect to any unsupported target database to obtain database lineage metadata.

  • Snowflake collector: The collector harvest policies associated with cataloged database objects, regardless of the database in which the policies reside.

Release version 2.198

Details about the release

Table 29.

Item

Details

Release version

2.198

Release date

9 April, 2024

Docker image ID

Jar file



New features and changes

  • Oracle collector: The collector now harvest Synonyms.

  • Athena collector: Starting with release 2.198, data.world no longer package the Athena JDBC driver with the Athena collector. You can continue to use the releases previous to 2.198 as-is, but when you updated to the collector version to 2.198 or higher, you will have to download and mount the driver for the collector and update the collector command to include the driver path.

Release version 2.197

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 30.

Item

Details

Release version

2.197

Release date

5 April, 2024

Docker image ID

Jar file



Release version 2.196

Details about the release

Table 31.

Item

Details

Release version

2.196

Release date

2 April, 2024

Docker image ID

Jar file



New features and changes

  • Log files for collectors: The collector log files for each collector run now have unique names. This allows logs to be written to separate files when running multiple collector instances.

  • Reltio collector: Survivorship groups and mappings are now recognized as primary entities with catalog records.

  • Snowflake collector: The collector now harvests tags associated with database objects in the user-specified database, regardless of the database in which the tag resides.

Bug fixes

  • Teradata collector: Fixed an issue that was blocking column harvesting due to invalid column references in Views.

  • Azure data Factory collector: Fixed an issue preventing successful file uploads to data.world.

Release version 2.195

Details about the release

Table 32.

Item

Details

Release version

2.195

Release date

25 March, 2024

Docker image ID

Jar file



New features and changes

  • Databricks collector: The collector now harvests tags for Databases, Schemas, Tables, and Columns.

Bug fixes

  • Power BI Service and Power BI Gov collectors: The collectors now correctly harvest skipped data sources during metadata scans.

  • Azure Data Lake Storage Gen2 collector: The collector is updated to refresh API authorization requests per ADLS requirements to avoid session expiration.

  • Azure Data Factory collector: Fixed an issue to accommodate varying data returned from the Azure Data Factory API.

Release version 2.194

Details about the release

Table 33.

Item

Details

Release version

2.194

Release date

21 March, 2024

Docker image ID

Jar file



New features and changes

  • The Power BI Service and Power BI Gov collectors now support harvesting lineage from ODBC data source types. A new parameter --datasource-mapping-file  can be used to provide the information required for harvesting lineage relationships when the data source uses an ODBC connection in Power BI.

Bug fixes

  • The Amazon S3 collector now continues to harvest objects in the bucket when a 403 error is encountered.

  • The Azure Data Lake Storage Gen2 collector properly handles the scenario involving special characters in the blob name.

  • The Azure Data Factory collector properly handles a scenario that causes the collector to stop due to the format of information returned from the Azure Data Factory APIs.

  • BigQuery Collector properly handles a scenario when a table is in a different database from the one being harvested.

Release version 2.193

Details about the release

Table 34.

Item

Details

Release version

2.193

Release date

15 March, 2024

Docker image ID

Jar file



New features and changes

Bug fixes

  • The Azure Data factory collector is updated to correctly handle a situation that previously caused the collector to stop, due to the format of the information returned from the ADF APIs.

Release version 2.192

Details about the release

Table 35.

Item

Details

Release version

2.192

Release date

12 March, 2024

Docker image ID

Jar file



New features and changes

  • Amazon S3 collector: The collector now offers the options, --include-object and --exclude-object. These options allow you to select which objects should be included or excluded from the harvesting process.

  • Databricks collector: The collector now harvests Databricks tags for database, schema, table, view, and column as as key-value pairs. The collector also harvests tags for clusters and jobs, replacing the existing ClusterTag and JobTag resource types.

Release version 2.191

Details about the release

Table 36.

Item

Details

Release version

2.191

Release date

7 March, 2024

Docked image ID

Jar file



New features and changes

  • All collectors: The --dry-run option is now available for all collectors. This option allows you do a test run for the collectors to validate that the collector can authenticate to the specified source system. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

Bug fixes

  • Teradata collector: The collector is updated to correctly parse view SQL syntax for extracting lineage metadata. It also now includes improved logging of any errors encountered during lineage harvesting.

  • BigQuery collector: The collector now properly handles fully qualified table names that include dashes (-).

Release version 2.190

Details about the release

Table 37.

Item

Details

Release version

2.190

Release date

5 March, 2024

Docked image ID

Jar file



New features and changes

  • Snowflake, Teradata and Netezza collectors: In the harvested metadata, the owner of resources are now correctly referenced as owner objects. Earlier they were referenced as string text.

Bug fixes

  • The Teradata collector now correctly manages variations in database cases within SQL statements while gathering lineage metadata.

Release version 2.189

Details about the release

Table 38.

Item

Details

Release version

2.189

Release date

24 February, 2024

Docker image ID

JAR file



New features and changes

  • The Tableau collector now captures all sub-projects when you specify certain projects to catalog. Additionally, it enables users to exclude specific projects using the --tableau-exclude-project parameter. Any sub-projects under an excluded project are also automatically excluded.

Release version 2.188

Details about the release

Table 39.

Item

Details

Release version

2.1288

Release date

23 February, 2024

Docker image ID

JAR file



New features and changes

  • The Information Schema Catalog Collector now collects descriptions from both tables and columns, if they are present in the source.

  • The Snowflake collector now harvests comments from Snowflake databases, schemas, and views (as resource description).

  • The Teradata collector has been enhanced to better parse view SQL definitions that use specific Teradata syntax elements, particularly when extracting lineage from views.

Bug fixes

  • BigQuery collector:

    • Fixed issues with handling identifiers with hyphens ( -).

    • Fixed issues with harvesting lineage when a view refers to columns in a separate database.

Release version 2.187

Details about the release

Table 40.

Item

Details

Release version

2.187

Release date

20 February, 2024

Docker image ID

JAR file



New features and changes

  • Netezza collector: A new and improved collector is now available for Netezza.

  • Oracle collector: The collector now harvest definitions for view, function and stored procedure.

Release version 2.186

Details about the release

Table 41.

Item

Details

Release version

2.186

Release date

14 February, 2024

Docker image ID

JAR file



New features and changes

  • The following collectors now harvest all databases in a single collector run when the --database parameter is not specified. 

    The collectors also support a new parameter --exclude-database to exclude specific databases from metadata collection:

    • Databricks

    • DB2

    • MySQL

    • Oracle

    • PostgreSQL

    • Redshift

    • SQL Server

    • Snowflake

    • Teradata

Bug fixes

  • Databricks collector: The collector properly handles malformed task responses.

  • Power BI collector: The collector properly handles harvesting lineage relationships from Power BI data sources when parameters are used in place of the Snowflake Warehouse value.

  • For the following collectors, the behavior of the --include-information-schema option is changed. Now, if you use this option in the command without the --all-schemas option, the system will generate a warning to alert you about the missing parameter.

    • Databricks

    • DB2

    • Oracle

    • PostgreSQL

    • Redshift

    • SQL Server

    • Snowflake

Release version 2.185

Details about the release

Table 42.

Item

Details

Release version

2.185

Release date

9 February, 2024

Docker image ID

JAR file



Bug fixes

  • Fixed an issue that was causing database collectors to run into error state.

Release version 2.184

Details about the release

Table 43.

Item

Details

Release version

2.184

Release date

7 February, 2024

Docker image ID

JAR file



Bug fixes

  • Azure Data Lake Storage Gen2 collector: Fixed an issue that previously prevented the collector from running successfully on machines using amd64 processor.

  • Microsoft SQL Server collector now properly harvests views from Azure Synapse Analytics.

Release version 2.183

Details about the release

Table 44.

Item

Details

Release version

2.183

Release date

1 February, 2024

Docker image ID

JAR file



Bug fixes

  • Tableau collector: The collector is updated to properly harvest usage data in newer versions of Tableau Server.

  • Azure Data Lake Storage Gen2 Collector: Fixed an authentication issue in the collector that resulted in failures to initialize a channel.

  • Snowflake collector: The collector now properly harvests lineage between function and source table if the source table is in the cataloged schema.

Release version 2.182

Details about the release

Table 45.

Item

Details

Release version

2.182

Release date

30 January, 2024

Docker image ID

JAR file



New features and changes

  • All collectors: In addition to being available as Docker Images, collectors are now also accessible as JAR files. Follow these instructions to run collectors using JAR files.

  • The following collectors now harvest all versions of overloaded function and stored procedure resources, each as its own resource:

    • Db2

    • MS SQL Server

    • Netezza

    • Oracle

    • PostgreSQL

    • Redshift

    • Snowflake

    • Teradata

Bug fixes

  • Teradata and MySQL collectors: The following schema options have been removed for these collectors: --all-schemas, --include-information-schema, and --schema.

Release version 2.181

Details about the release

Table 46.

Item

Details

Release version

2.181

Release date

22 January, 2024

Docker image ID

  • arm64: 55898dd6bee4c8760f2f242467887298b10afebef6a4e7b21022b8dbd50d6595

  • amd64: 3583b8ca098d37f47efcb815934e8e58b3b9bf774b0c03101e367908957b964a



New feature and changes:

  • The Snowflake collector now harvests Data Metric Functions, their associations to tables and observed metrics.

Release version 2.180

Details about the release

Table 47.

Item

Details

Release version

2.180

Release date

17 January, 2024

Docker image ID

  • arm64: bd1c31006bdccb9dfc55849999fb80a25b0602dc3e6233444b4c36e06ececc9a

  • amd64: 9ea00c32bf8d5b214b20e98bce0fd11e7b15673d61f2bbc3da13fbd804ff9bac



New features and changes

  • Snowflake collector harvests allowed tag values from Snowflake.

Bug fixes

  • Oracle collector properly harvests Column descriptions from Oracle Data Dictionary tables.

Release version 2.179

Details about the release

Table 48.

Item

Details

Release version

2.179

Release date

10 January, 2024

Docker image ID

  • arm64: d80d17e87ce7925c9ef46ff1fee577940e73b0479e19414f4f0266e3da2f7f99

  • amd64: cd2a2d0ae59a44ebf519399acda91772cd62e3bac04341a51c93abbb2a34c6f9



New features and changes

  • The latest tag for docker images has been removed and is not available for use going forward.

    What does this change mean for users using the latest tag?

    • If you were using the latest tag, you can continue to use the image with the latest tag. However, we recommend all users update their docker run command to use an explicit version.

    • If you make a change to your local docker environment (such as removing the latest image), then your collector run will not work. You will need to update the run command to use a specific version. You can open a support ticket for assistance on updating the command.

  • Athena, Snowflake, SQL Server, DB2 collectors now harvest basic metadata for materialized views (name, description if available).

  • The Postgres collector now collector harvests materialized view with name, description, and view SQL definition (DDL) and column-level lineage.

Bug fixes

  • All collectors: Environment variables referenced in collector config (YAML) files can now have values containing backslashes and dollar signs.

Release version 2.178

Details about the release

Table 49.

Item

Details

Release version

2.178

Release date

5 January, 2024

Docker image ID

  • arm64: 3d05719236c2838e9693bd6db37455728763daf52458f28d026b4d5c28c1d518

  • amd64: fa7f73cb70c10d8fe6cf8ec882a72afa54e26386491f6eec92c96a1090005833



New features and changes

  • The Snowflake collector now harvests the External URL for Snowsight for tables and views.

  • The dbt Cloud collector now includes --dbt-cloud-host option to enable interaction with dbt static access URLs.

Bug fixes

  • Databricks collector: Addressed an issue related to correctly forming IRIs for tables under certain circumstances. This was previously causing duplicate tables and databases to be cataloged and non-existent tables to be referenced by columns.

  • The Tableau collector now properly handles a scenario when the Tableau instance has no databases defined.

Release version 2.177

Details about the release

Table 50.

Item

Details

Release version

2.177

Release date

22 December, 2023

Docker image ID

  • amd64: 3fd446534e173b1773d11d7afdb8dbf9256afa21798e81e79e35928f694afda7

  • arm64: 3ca604807c9c2829cc527db1d6b6f43093e1cb1870b8f34c8d3f29d7f1513436



Bug fixes

  • dbt Core and dbt Cloud collectors now catalog the dbt product version.

  • Tableau collector properly handles columns with missing names.

  • Monte Carlo collector:

    • The collector now correctly associates views with incidents, rectifying previous issues caused by missing details for certain incident types and subtypes.

    • The collector has improved log messages when relating tables to incidents.

Release version 2.176

Details about the release

Table 51.

Item

Details

Release version

2.176

Release date

20 December, 2023

Docker image ID

  • arm64: 795a717210ad6fd9bfc66335cc6a178e479604f5e7232ae8343248910cb28b57

  • amd64: f72b137b0b47b533a3714e9b358c5cdd0919d931fab4c2dd68155bf334010a03



Bug fixes

  • Teradata collector: Information was missing while harvesting funtions from Teradata.

  • dbt Cloud and dbt Core collectors: Information was missing while harvesting test results from dbt Cloud and dbt Core.

Release version 2.175

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 52.

Item

Details

Release version

2.175

Release date

19 December, 2023



Release version 2.174

Details about the release

Table 53.

Item

Details

Release version

2.174

Release date

18 December, 2023

Docker image ID

  • amd64: 8253823ee1192c842b373baa89cc92f653925f9c31bc66a2e8570b254c412120

  • arm64: 7f8275ab1eedb57275818b807ed8412be3c644fdbfdd7d436739bfbd5b2ae287



New features and changes

  • The following two new collectors are now available:

  • The following collectors now harvest Schema resources from the source:

    • Databricks, PostgresSQL, SQL Server, Db2, Redshift, Generic JDBC Collector, Denodo, Dremio, Infor ION, Oracle, Salesforce, SQL Anywhere, Athena, MySQL, Snowflake, Teradata, Presto, Vertica

  • dbt Cloud and dbt Core collectors now harvest following additional metadata: test results (failed, warning, success), last test run timestamp, test name, test arguments and type of dbt test.

Bug fixes

  • Teradata collector:  Information was missing while harvesting triggers from Teradata.

Release version 2.173

Details about the release

Table 54.

Item

Details

Release version

2.173

Release date

12 December, 2023

Docker image ID

  • amd64: 209d4c7a184fde3357548700fcd8b7fd88ccf5baea2d4ac0ad71c7060f2ed30d

  • arm64: 4d75d8c96a24955f77c2c35da4c1d01aa8ececdbbbdbcd4192eec0713219e2c9



New features and changes

  • dbt cloud and dbt core collectors now harvests metadata for Columns defined within Models and Sources

  • The Power BI collector now automatically filter out workspaces named My workspace or PersonalWorkspace <User> when the --all-workspaces-and-apps parameter is used. However, if you wish to include these workspaces in the catalog, you can use the --include-user-workspace option.

Release version 2.172

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 55.

Item

Details

Release version

2.172

Release date

12 December, 2023



Release version 2.171

Details about this release

Table 56.

Item

Details

Release version

2.171

Release date

6 December, 2023

Docker image ID

  • amd64: 16194c9acaa97741b17dd14525968d3a4ee6afd5babe8b0d4cf32763de6b4c0d

  • arm64: f7eae1d3f25c88eea85c6353e902baeb5e6645440494c382546a627cda5873e3



New features and changes

  • Monte Carlo collector: The Monte Carlo collector is enhanced to automatically retry harvesting from Monte Carlo in case of API failure.

Bug fixes

  • All collectors: If errors occur while running the collectors using the YAML file, the collectors will now return a not successful exit status.

Release version 2.170

Details about this release

Table 57.

Item

Details

Release version

2.170

Release date

30 November, 2023

Docker image ID

  • amd64: 45b5f9fe009db8d6104378614fe3e23210421695825dd74c533d5106e14b9f27

  • arm64: c79b930cd056efdd0886139a7a884606c6f7aba427de3250f1b3494a32a7cfa0



New features and changes

  • The Monte Carlo collector supports harvesting resources from Monte Carlo when the monitored target database is Databricks.

Release version 2.169

Details about this release

Table 58.

Item

Details

Release version

2.169

Release date

27 November, 2023

Docker image ID

  • amd64: 61ac84e57bbcc75c8dc19890796020735980f126fea0d44b2a5c8a386fc589be

  • arm64: 83afe33b1632662240f522203130b141f953bd27343ed1dbb5958f1d97de99cd



New features and changes

  • Monte Carlo collector supports new warehouse types in the latest Monte Carlo GraphQL APIs.

  • dbt cloud and dbt core collectors support versions dbt 1.5.0, 1.6.0, 1.7.0.

  • Log files for collectors: All collectors now compress log files prior to upload.

Bug fixes

  • Power BI, Sigma, and Thoughtspot collectors: OAuth tokens for the Power BI, Sigma, and Thoughtspot collectors are properly refreshed when they expire.

Release version 2.168

Details about this release

Table 59.

Item

Details

Release version

2.168

Release date

17 November, 2023

Docker image ID

  • amd64: 0416059f885cd43bc6f5d40ee47012504b009e8c667e7b3df78bc081bc0a4009

  • arm64: a3cd7057132f519161dd12964b6f30caa2cb360f114b8a8d08ab182ee66aac68



New features and changes

  • A new collector is now available for Teradata.

Bug fixes

  • The Databricks collector stops abruptly due to lack of permission on the referenced resources.

Release version 2.167

Details about this release

Table 60.

Item

Details

Release version

2.167

Release date

10 November, 2023

Docker image ID

  • arm64: 18e2f40735ab011f9293b067188524da2677130a3c615f7b2657d8e2e3de10fa

  • amd64: 0506c93a875da238b468bf415954f0cb267aa586eedc30386178597b3be2c639



New features and changes

  • Databricks collector: Additional lineage and object metadata are now harvested for database objects via Unity Catalog.

  • Log messages improvements: Improved logs and guidance is available for situations where SSL certificate problems occur during metadata harvesting from sources secured with self-signed certificates.

Bug fixes

  • Monte Carlo collector: Database Views are now accurately differentiated from Database Tables.

Release version 2.166

Details about this release

Table 61.

Item

Details

Release version

2.166

Release date

2 November, 2023

Docker image ID

  • arm64: 4613831341f85ff155ff03df0d1b1f923de45c68d81da852029f52421a14b936

  • amd64: ffc674cb7f13098becacdee1c2b9e89a6f77ee17107f14a334b7b18b28c26f23



New features and changes

Release version 2.165

Details about this release

Table 62.

Item

Details

Release version

2.165

Release date

25 October, 2023

Docker image ID

  • arm64: b05a2ec51918ff86213184857d553985da4fba0a2f5b0aa9f6dc59fa896ed3bb

  • amd64: bc177db62b32b302e871397f2e0e0deaa5cbfb0332649e239f185814fa3fd2ff



New features and changes

  • Manta collector has improved log messages.

Release version 2.164

Details about the release

Table 63.

Item

Details

Release version

2.164

Release date

24 October, 2023

Docker image ID

  • arm64: a41cf5b2872d414fa11965e410d7c96cfe593aed492c1b2f6c6bb0f2c34d1881

  • amd64: 0995995e7165ee90845e522583723243385007dc5af728d788177e96e8ba8635



New features and changes

  • Amazon S3, AWS Athena, and AWS Glue collectors now support authentication using AWS config files that reference the credentials on an Amazon EC2 instance profile. For details see the AWS documentation.

Release version 2.163

Details about the release

Table 64.

Item

Details

Release version

2.163

Release date

20 October, 2023

Docker image ID

  • arm64: c73514806422fb6684abd76c2720be5070f9c0abcd5ffb188c1333e85f5ab089

  • amd64: 027d87cd40668fb8a0c9d2bd56bce75e35eeb2059b724c4104c1ce50ee673266



New features and changes

  • Amazon S3 collector: When the collector is run in dry run mode (with the --dry-run parameter), it now also checks the credentials and lists the buckets found.

  • Monte Carlo Collector:

    • The incident log messages for the collector now include UUIDs.

    • Domain filtering logic is changed to address an API change when fetching monitors, tables, incidents from Monte Carlo.

Bug fixes

  • All collectors:

    • The collector output files generated after running the collectors now include the collector version.

    • Collectors now return a not successful exit status if there are errors when uploading the catalog output files to data.world.

Release version 2.162

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 65.

Item

Details

Release version

2.162

Release date

19 October, 2023

Docker image ID

  • arm64: 4bef6711f00b6b83bdb1e4abd2091055de16f70b5f475eaacf5324a5270be655

  • amd64: fab55cff57e53861d676f8f4e465997d6bdc1cec8190b2aa0addfdcdfb80c3ef



Release version 2.161

Details about the release

Table 66.

Item

Details

Release version

2.161

Release date

5 October, 2023

Docker image ID

  • amd64: 035adb8a23d3a23fbf13dc602e8bd27ad2d2aedde641e1baa4753096bdcb674f

  • arm64: cc18462eb8e205885cec0f49b3e19f9e122a5442ab8b013539237f12f29073ff



Bug fixes

  • Oracle collector: The collector gracefully handles exceptions during column harvesting.

Release version 2.160

Details about the release

Table 67.

Item

Details

Release version

2.160

Release date

4 October, 2023

Docker image ID

  • arm64: 8d0d9067bf92a22b857d7f5e08173b6c73316b479b4a2c82446808cbb3c6b770

  • amd64: 2be3e2ea9d63e236179d6d122a5821ad03dabc6d0f79489fd67a00842cd8543a



New features and changes

  • Log messages improvements: Collector log messages now include both date and time.

Bug fixes

  • Tableau Collector now harvests column identifiers for column resources.

  • Confluent Platform Collector now properly handles --include-topic and --exclude-topic options.

  • Monte Carlo, ThoughtSpot, and InfluxDB Collectors now produce only one catalog record for each resource.

Release version 2.159

Details about the release

Table 68.

Item

Details

Release version

2.159

Release date

26 September, 2023

Docker image ID

  • arm64: 4a35703e63ec5c46786172661d8006882387c37ff39c825a7a8d6c1f8397a83b

  • amd64: 9aa2c832c17107e2e3ea391c82961e82d00b26f0a526f9aac678290b093bee2f



New features and changes

  • Confluent Cloud collector now support topic filtering using regular expressions or exact topic names using two new parameters: --include-topic and --exclude-topic.

  • Confluent Platform collector

    • The collector now supports Confluent Platform 6.1.0 and above.

    • The collector now support topic filtering using regular expressions or exact topic names using two new parameters: --include-topic and --exclude-topic.

Bug fixes

  • Databricks collector properly harvests column statistics from columns containing dash characters.

Release version 2.158

Details about the release

Table 69.

Item

Details

Release version

2.158

Release date

18 September, 2023

Docker image ID

  • arm64: c74f5866de2865782c6f1ef70b97d02bbd15a5bb824088362fe8e6037d500322

  • amd64: 5223699780b6dc2798c5e4173e55b2e74468b7f163a6e0e4c7928699c09d7849



New features and changes

  • Amazon S3 collector now harvests resources up to and including the maximum count specified by the user or the 10,000 default limit.

  • dbt Cloud collector handles scenarios where dbt Cloud runs contain more than the current dbt Cloud limit of 1000 artifacts.

Bug fixes

  • Databricks collector properly harvests column statistics from tables with no columns.

  • MS SQL Server collector properly handles case sensitivity of SQL keywords when parsing lineage.

Release version 2.157

Details about the release

Table 70.

Item

Details

Release version

2.157

Release date

14 September, 2023

Docker image ID

  • arm64: c76986c17f6d1b7ff36c052a21491eb5bbb4f68dc8418a0fe0f9f8d19abda6b3

  • amd64: e36c90c0fc7dbe9887951168b5c93236cb3eb357f7626f22eea6eedcf75cb2de



New features and changes

  • The following two new collectors are now available:

  • Power BI and power BI Gov Collectors: The following parameters for these two collectors now support regular expressions: --exclude-workspace --include-workspace

  • The following JDBC collectors now test for connection status before executing queries. If a connection is closed by the database, the collector detects this condition and re-opens the connection:

    Databricks, DB2, MySQL, PostgreSQL, Redshift, Snowflake, MS SQL Server, Azure Synapse Analytics collectors

Bug fixes

  • Databricks collector now properly handles non-alphanumeric characters in object names.

Release version 2.156

Details about the release

Table 71.

Item

Details

Release version

2.156

Release date

7 September, 2023

Docker image ID

  • arm64: f1f5b73979468a5997a3e29e3a9ec0ae34b535a3863918078417894b67046d19

  • amd64: 75ea1cd791e8680fc5d24088ad8294b9561d48d9682f028105937180310593ef



New features and changes

Bug fixes

  • dbt Cloud Collector now properly selects the job that the user specified while running the collector.

Release version 2.155

Details about the release

Table 72.

Item

Details

Release version

2.155

Release date

29 August, 2023

Docker image ID

  • arm64: 4a09c0486552990aa75131e4ec88e47a4eb2eded2bf88a8cb559d88ab8ac11a4

  • amd64: 1108f76e93e4cb614cae9ddc03aec49a104566fbe62a93a307c4b2c648fb4bd3



Bug fixes

  • The AWS Glue collector properly handles AWS Glue Catalog instances with more than 100 databases.

Release version 2.154

Details about the release

Table 73.

Item

Details

Release version

2.154

Release date

27 August, 2023

Docker image ID

  • arm64: b243aee7fdf18855b387d99b03955a811ebf1888fb36136ee32802bcb0c5b7e3

  • amd64: b512e67115bbe9ff63f7adbe39416e885db9649301c665e11c6d620fe9c17a5f



New features and changes

  • AWS Glue collector has improved logging to help with troubleshooting of access and permissions issues.

Release version 2.153

Details about the release

Table 74.

Item

Details

Release version

2.153

Release date

25 August, 2023

Docker image ID

  • arm64: b5d818c510c161d3f08c10e932661e2f582a9053810d97942ac1a12b0f994ad0

  • amd64: e4b92cc1323516b104f7ea146b80bce46debe87246f9298c4eb1f17b3f030fa9



New features and changes

  • DBT Cloud collector: The following enhacements are made to the collector.

    • The collector now harvests information about dbt Cloud resources associated with the artifacts from which metadata is harvested.

    • The collector now supports two new parameters (--dbt-cloud-environment and --dbt-cloud-job) to allow users to filter runs by environment and job.

Release version 2.152

Details about the release

Table 75.

Item

Details

Release version

2.152

Release date

22 August, 2023

Docker image ID

  • arm64: f8606120f88e38e658902478b5e26181d0d260b732c389dc22a3f3fe89e41c58

  • amd64: ebf746424eda52ebf7edc7ce8013f4da6cb985abdd8036ede8af9885e8560f83



New features and changes

  • Snowflake and SQL Server Collectors now harvest column-level lineage from Stored Procedures.

  • Logging improvements:

    • Improved log messages for instances when the service account/user account used by the collector does not have access to upload to a dataset in data.world.

    • Debug logs messages now log current memory and stack size.

Release version 2.151

Details about the release

Table 76.

Item

Details

Release version

2.151

Release date

15 August, 2023

Docker image ID

  • arm64: 6cf7d92cd8fdf9fdc8b9ff2ea3bfb369203c43f1bc57e1e3f14c12f7ef651af8

  • amd64: 655cf89d3bd60e6a95da428abb1d3eb3621f96a384166ac15cdc4d73ca1a354e



New features and changes

  • A new collector is now available for SQL Server Reporting Services.

  • Databricks collector: Update the collector to retry after a pause when the Databricks API responds with too many requests.

  • All database collectors: Optimized the database collectors to reuse database connections where possible.

Release version 2.150

Details about the release

Table 77.

Item

Details

Release version

2.150

Release date

10 August, 2023

Docker image ID

  • arm64: 57a1a0425917d69c40688dfcf46b05a531a8cdca5ae3c798b0d20e518fcb60ee

  • amd64: 53920c9e9b80e45f3494b8f755c2e3dbf8275a0d125dd0997a0161f6d2edba99



Bug fixes

  • All collectors: Fixed an issue that prevented the user from passing command-line options containing spaces, when running the collectors using the docker container.

Release version 2.149

Details about the release

Table 78.

Item

Details

Release version

2.149

Release date

9 August, 2023

Docker image ID

  • arm64: d456d1966e6ede73cfabd4d11e77c4fddc9b3861ca82ab360493cf9a6b0f782b

  • amd64:  8406dae1ab24932c24a3bbc82618368653e32e0f4a5c09f0caa0eb22ed4e711a



New features and changes

  • Databricks Collector The collector now allows users to use Personal Access Token without specifying username/password for authentication.

  • Power BI Gov collector: The --include-user-workspace parameter is removed from the collector CLI options.

Release version 2.148

Details about the release

Table 79.

Item

Details

Release version

2.148

Release date

27 July, 2023

Docker image ID

  • arm64: 2c856f96af8576024b9a81fb89eb3803eaa03abbe055f601f5597f2f79a62019

  • amd64:  dcb90da4d519a165da4228f02ee75da4a892320901305df85279a99ea84e2cee



New features and changes

  • Databricks Collector has a new option --workflow-exclude to exclude harvesting of jobs/workflows.

  • Power BI and Power BI Gov Collectors now support parameter values in Power BI expressions.

Bug fixes

  • Tableau collector properly handles duplicate data sources when multiple filtered projects are specified.

Release version 2.147

Details about the release

Table 80.

Item

Details

Release version

2.147

Release date

24 July, 2023

Docker image ID

  • arm64: 872cbd1f19c7e54d6db57fdab60dc08fab2690a59df54e74f0718d8cd8794381

  • amd64:  d0a77c9cb81754d217f53dc4373a3b9b85425d687dcb6fb218abea678c42c148



New features and changes

  • Power BI and Power BI Gov collectors: The collectors have a new option --max-parseable-expression-length, which sets the maximum number of characters in a PowerBI expression that will be parsed for lineage metadata.

Bug fixes

  • Power BI and ThoughtSpot collectors now refresh expired authentication tokens.

  • The SQL Server collector now properly handles missing SQL definition when harvesting stored procedures.

Release version 2.146

Details about the release

Table 81.

Item

Details

Release version

2.146

Release date

19 July, 2023

Docker image ID

  • arm64: 3c70ebdae29b700f381f8dd079279ae16403083846549b747fe1be2efb03d5d2

  • amd64: 60fa58a68e2ea530faef3a5286d0e2fb55bd0d1b48260a31658ad25ea1f97644



New features and changes

  • Tableau collector: The collector now harvests lineage relationships between embedded data sources and published data sources to reflect any such relationship that exists in Tableau.

Bug fixes

  • Power BI collector: Improvements made to the collector to avoid hitting Power BI Admin API rate limits that prevented successful collection for certain large Power BI organizations.

  • Marquez collector: API authentication token [--marquez-api-key] is now a required parameter for the collector.

  • Fivetran collector: Fivetran API key (--fivetran-apikey) and Fivetran secret (--fivetran-apisecret) options are now required parameters for the collector.

Release version 2.145

Details about the release

Table 82.

Item

Details

Release version

2.145

Release date

14 July, 2023

Docker image ID

  • arm64: fcb792fef00634de9e02e635d522b776dad844d6918b8bd002d84eda1bc1c9a1

  • amd64: 318ce1c64752c55318941576a952015e9d8f11b2db1c1f94c86607504d3896ab



New features and changes

Release version 2.144

Details about the release

Table 83.

Item

Details

Release version

2.144

Release date

11 July, 2023

Docker image ID

  • arm64: d98bbc978ee4beb798b3068fc554f7ce4478fd9c5af971e598ad2b349422e0f6

  • amd64:  93bb18e5dd6a0f2ff4caaa8058f040d28f23420bf4a535209909a48cbb4f028f



Bug fixes

  • Databricks collector: The collector now properly handles missing information returned by the Databricks APIs.

  • dbt Core and dbt Cloud Collectors: The collectors now use the description property for resources in a dbt manifest file to populate the description of associated catalog resources.

  • Power BI collector: The collector now properly handles unexpected source formats.

Release version 2.143

Details about the release

Table 84.

Item

Details

Release version

2.143

Release date

7 July, 2023

Docker image ID

  • arm64: 98423e91c32ea3a45410da41290a0b4afb6041e1d59e4488adfb12269f48d695

  • amd64: 506104e806a64eb170b3f27a4b4f40067ff27ed2f43de101f9c7e1f004aac463



New features and changes

  • DB2 collector: The collector now support harvesting of column statistics and function and stored procedure information. For details about using the new parameters (--target-sample-size--sample-string-values--enable-column-statistics) for these features, see the DB2 collector documentation.

  • Redshift collector: The collector now properly distinguishes between user-defined functions and stored procedures when harvesting function and stored procedure metadata in the collector.

  • Tableau collector: Improved error messages and handling of missing Salesforce connection information within the Tableau collector.

Bug fixes

  • Databricks collector: Fixed defects in the collector to accommodate invalid number formats and missing information returned by Databricks APIs in some cases.

Release version 2.142

Details about the release

Table 85.

Item

Details

Release version

2.142

Release date

23 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: ccc19cfe72b618bce99c18f755f1c7c7489f012f626e5ecf7abf98f8f9590012

  • amd64: 507d600ade53fdad8973e0ef9cccdb116cb46b1e198ed38427feb2f8ebb8ac95



New features and changes

  • Power BI collector: A new parameter --all-workspaces-and-apps is available for the Power BI collector which allows users to catalog all available data from the tenant using the admin API.

Bug fixes

  • Databricks collector: Fixed an issue where the collector was terminating abnormally when it encountered a notebook that had no language specified for it.

  • Microsoft SQL Server collector: Fixed an issue with parsing the SQL for certain Views in Microsoft SQL Server that prevented harvesting of lineage.

Release version 2.141

Details about the release

Table 86.

Item

Details

Release version

2.141

Release date

20 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: d17699bcb6ac1a11f8f32be735dd04367517a4cd32c2afaecce17e1069f1e203

  • amd64: a0da0c8f2db9ad45ee028fe491a7e43159969033edc1bbdd906d6327c44d7812



New features and changes

  • Thoughtspot collector now harvests:

    • Column-level lineage between JDBC source tables columns and ThoughtSpot logical columns.

    • Column-level lineage between ThoughtSpot logical columns and Answers and Liveboards that connect to the data.

  • Databricks collector now harvests additional metadata for Databricks tables.

  • The Redshift, SQL Server, and PostgreSQL collectors now harvest:

    • Functions

    • Stored procedures

  • Power BI, Looker, and Thoughtspot collectors: The resources cataloged by these collectors will now automatically include a link to the resource in the source system. This allows users to go from data.world to the associated URL for the same resource in the source system so the users do not have to manually find that resource in the source system.

    resource_button.png

Release version 2.140

Details about the release

Table 87.

Item

Details

Release version

2.140

Release date

13 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 9c2d1cf893eddc89924f212d1e10c9f26deb05ac254db783d3bdc8fc83fb6d5e

  • amd64: dc9dc94deed7fbf7da17fb822d9a085434f0a72baee8e965c7dfe1b537d62b95



New features and changes

  • dbt cloud collector allows the user to pass in a Snowflake role and Snowflake warehouse to override values found in the dbt cloud project configuration.

Bug fixes

  • dbt core and dbt cloud collectors properly handle source meta config values that are objects rather than strings in the generated dbt manifest file.

  • SQL Server Collector properly disables lineage collection when the --disable-lineage-collection parameter is set.

  • Databricks collector includes additional checks for existence of and access to Unity Catalog.

Release version 2.139

Details about the release

Table 88.

Item

Details

Release version

2.139

Release date

7 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 95c668b9ceb092cc7d99a78caa4b64b1d70ee2a1de9d47fadd1f8206aecb6948

  • amd64: fbe3a314510718a74a0162918db20d345a48e76c879f49ed6851768547e9e855



New features and changes

  • SQL Server collector now harvests created date and modified date for tables and schemas, and harvests table size in bytes.

  • Snowflake collector now harvests table size in bytes.

  • Power BI collector allows setting the --azure-tenantid option when using username and password authentication.

  • All collectors now support the ability to set the JVM stack size using the DWCC_JVM_OPTIONS parameter.

Bug fixes

  • SQL Server collector properly handles harvesting of View SQL containing character length that is more than the SQL Server column default character length (6000).

  • dbt Cloud collector Rather than reporting an error, the collector now skips job runs that do not have generated documentation artifacts.

  • Tableau collector properly catalogs all sites if no site is specified in the CLI/YAML.

Release version 2.138

Details about the release

Table 89.

Item

Details

Release version

2.138

Release date

2 June, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: ae9e23f1063eea7c998e4028a286e93411483a6d7a46c81eb71f11ca48f3e0ca

  • amd64: a729ad549567254d7cd4e96bf7546457b8977f250c20ecef558bf97c1565dd0c



New features and changes

  • Monte Carlo collector: The collector now:

    • Catalogs additional metadata for incidents, monitors, and tables.

    • Uses a smaller default GraphQL page size.

Bug fixes

  • BigQuery collector:

    • Properly handles issue where table IDs are returned as null.

    • Properly handles issue with table IDs that have quotes in them.

  • Monte Carlo collector: Properly handles external URLs for tables that contain spaces.

  • dbt Core and dbt Cloud collectors: Properly harvests all meta config containing object values.

  • Profiling: Properly handles histogram values containing excessive range, overflow, or underflow values.

Release version 2.137

Details about the release

Table 90.

Item

Details

Release version

2.137

Release date

23 May, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 4976a94145f0872a6bf99135accfa3bf2d656d8a7ff9b492c8064a1f4cf1d807

  • amd64: a5f3651c5816c27888f5b0434255e35e6005cd4ede3849c254f8a18ac3843350



New features and changes

  • The new Azure Data Lake Storage Gen2 collector is now available.

  • Snowflake collector: The collector now harvests:

    • Snowflake Stored Procedures and Functions

    • Lineage between functions and database objects

  • Databricks collector and Power BI collector properly identify database object for Power BI connections to Databricks database objects.

  • Databricks collector: The collector now harvests:

    • Jobs, Tasks, and Clusters

    • Function

    • Column-level lineage for Hive metastore and Unity Catalog

    • Lineage between Tasks and the Notebooks referenced in Tasks

    • Lineage between upstream and downstream table with intermediate Job

    • Primary and foreign keys for Tables

    • Column statistics

Bug fixes

  • dbt and dbt Cloud collectors now properly harvests meta config containing object values.

Release version 2.136

Important

This release was for internal improvements and has no customer impacting changes.

Table 91.

Item

Details

Release version

2.136

Release date

22 May, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 41889d3723ac38294a45771188f294729eff046deca34f986a4e6a580d0b77ba

  • amd64: ae06ee0bb4445f1c532318c2e6bf0cfdf89d750805eff0d23ad46646d5f6468b



Release version 2.134

Details about the release

Table 92.

Item

Details

Release version

2.134

Release date

9 May, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: a97fa7126a89d4167f631932f9961e0de9b7a50522763372475a0e9f1cae7597

  • amd64: 1542530a6d7fab3487182e52178eeccb9c52f838feef4ca2bf966552e0d6f3f0



Bug fixes

  • Snowflake collector: Fixed an issue in the collector when calculating sample size for harvesting column-statistics.

Release version 2.133

Details about the release

Table 93.

Item

Details

Release version

2.133

Release date

26 April, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: bec64902a5d224a3167690e30d28e92e977d5f80b2409496f9de81f8f1f71e13

  • amd64: 9c7b987f273be3d1267c9e22205e3b23505446fcd80a50c59f488d0338f2437a



New features and changes

  • Monte Carlo collector

    • Reduced the number of API calls to improve the performance of the collector runs for large Monte Carlo instances.

    • A new parameter --montecarlo-incident-lookback-days is now available to harvest incidents from a specific number of days from collector run.

    • A new parameter –montecarlo-domain is now available to harvest resources from specified domain names.

      For details about these new parameters, please see the Monte Carlo collector documentation.

Bug fixes

  • Fixed a NullPointerException issue which occurred while harvesting column-statistics for columns with large string.

Release version 2.132

Details about the release

Table 94.

Item

Details

Release version

2.132

Release date

18 April, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: f3d1cb23e0b3077741f2d05321cb4f456d8eb8703074d82de3225dd8c81fb2a0

  • amd64: d5d328a7460617d45dc9db2c868e6faf5e84365a74bc9e858e99879079293ce0



New features and changes

  • The new ThoughtSpot collector is now available.

  • Grafana collector: The collector now produces catalog outputs containing hashed namespace. This allows the resources with spaces to be properly harvested.

  • Monte Carlo collector: The collector now has improved logging messages.

Release version 2.131

Details about the release

Table 95.

Item

Details

Release version

2.131

Release date

17 April, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: c8f4d7e5374ce7fc2d835fac1ff008ef0e0e0ddfc421997abd87d3453295c702

  • amd64: 2aca9a07affac355c6e2ed688b4aa297337c2245aa015db1a932c6cfcab9f0cf



New features and changes

  • Power BI collector: Performance improvements made to the Power BI collector. The collector now utilizes less memory when parsing expressions to harvest Lineage relationships.

  • Snowflake collector: Removed JDBC URL parsing warning messages from the collector log file. These warnings were caused due to Snowflake JDBC driver.

  • Collector logs: The logs now include the Operating System information on which the collector (jar) is running.

Release version 2.130

Details about the release

Table 96.

Item

Details

Release version

2.130

Release date

7 April, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64: 2403a222d94cdcc1a1fbc19c921de986aa74aaf7e9ea7b729ee146d207489012

amd64: 67b534504174f86ce0a611b85e6aebd665e4a19b02008b117d37401c59ab9f4b



Bug fixes

  • Power BI collector: The collector now properly handles escape characters in the directory paths of SharePoint files.

  • BigQuery collector: The release includes an update to how catalog resource IRIs are generated to ensure proper lineage relationships to other systems such as Tableau.

  • Profiling:

    • Profiling properly generates column histograms for string data types. 

    • Profiling properly supports decimal values that are stored in scientific notation format

Release version 2.129

Details about the release

Table 97.

Item

Details

Release version

2.129

Release date

28 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 13b4a37f5c21531bb282525c7282001857b5959ee6276665f149b4a5ebe7f7d4

  • amd64: 771c0739c3698f8bdfaab1acc97524ad9a6de0813245a6c38cfb5803de9c584d



Bug fixes

  • Microsoft SQL Server collector:

    • Fixed an issue that prevented profiling from working with the collector.

    • Addressed an issue that prevented parsing of some Microsoft SQL Server views to harvest lineage.

  • Monte Carlo collector: Added a page size option to the collector, which helps if a customer runs into timeouts with the current default of 5000. Set the optional --montecarlo-graphql-page-size parameter to use this option.

  • Tableau collector: Made an adjustment to a query in Tableau so that a warning message which previously printed Column null with id... will now show the column name rather than null.

Release version 2.128

Details about the release

Important

This release was for internal improvements and has no customer impacting changes.

Table 98.

Item

Details

Release version

2.128

Release date

22 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 70328cef174353aaee098316df6324d6c7777805d0e8dca25210976c4083b979

  • amd64: 4568d255e7c9a36ab82ac980ec915decc7beddebb1e3f124a8e5f18bd3515c27



Release version 2.127

Details about the release

Table 99.

Item

Details

Release version

2.127

Release date

21 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 1a2686c8317986cea74500dd1018152614f08937cbbbd4232ff675cc4e3a2100

  • amd64: 3f1a3f175fcfafa3d7dae390850eb7fa2aa9393c9f1196316bce4c80e64b910b



New features and changes

  • dbt cloud collector is now available. Detailed documentation about the collector is available here.

Release version 2.126

Details about the release

Table 100.

Item

Details

Release version

2.126

Release date

17 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64: 44d78be268b948226cad5fc41310202e34e30c5313f026380a00554e135ddb27

amd64: f9033e2060fea8f22beb599296800a4b4494592878e0048dd2de59bf7a308321



New features and changes

  • Snowflake collector: supports profiling for columns with values stored in scientific notations.

Bug fixes

  • Power BI collector: Fixed an issue with tabular files to properly handle invalid paths or http paths.

Release version 2.125

Details about the release

Table 101.

Item

Details

Release version

2.125

Release date

9 March, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64:  37c33410f1e3162b11e0885600f860d3fe9a41790faeed62cf791b4289797703

amd64 : 64e38233b4c47fac90e2d68eafa948a4c46fbf7f3504968e28244857298b2a46



Bug fixes

  • Fivetran collector: Updated destination identifiers to match the case for currently supported database types. Specifically, this resolves the duplicate Snowflake resource pages issue.

  • Snowflake collector: Fixed an issue that was causing duplicate snowflake tag-value pairs.

  • Tableau collector: Updated project filtering to ensure collector harvests calculated fields which are referenced in a sheet but were not created in the sheet.

Release version 2.124

Details about the release

Table 102.

Item

Details

Release version

2.124

Release date

21 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

arm64: b756d2f91373067746af00d951e117fabcd930d65df2dcb27706ee05689f495c

amd64: 165071142006ba509759d1e5d7fa49a57e9b09ff9d1ce665bf41a6683685d27b



New features and changes

  • Amazon S3 Collector: The new Amazon S3 collector harvests metadata about buckets and objects, including the Region, Version State, Size, Last Modified Data, ACL Owner, Grantee and Grant Permission, amongst others. See all the details about this collector in  this documentation.

  • BigQuery collector enhancements:

    • You can now harvest column-level lineage between views and tables, as well as more metadata about datasets, projects, tables, and views.

    • The collector now provide an option to do a test run to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

Bug fixes

  • Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: When parsing view definition SQL to harvest column-level lineage, the collectors now correctly parse SQL in which tables are fully qualified in the FROM clause but not in the SELECT clause.

  • Power BI collector: The Power BI collector has changed the URL used as the dwec:externalUrl property from Power BI's embedUrl to Power BI's webUrl, which now allows the user to open the Report, Dashboard, or Dataset in a browser. Additionally, the collector now harvests the embedUrl from Power BI as a separate property, kos:embedUrl.

  • Snowflake Collector: The collector now handles scenario when Snowflake JDBC driver does not provide valid default values for certain database columns.

Release version 2.123

Details about this release

Table 103.

Item

Details

Release version

2.123

Release date

13 February, 2023

Docker image ID

  • arm64: 7e2738ad5f2dae819332ef2f17a1cc34adaa0e3af167bdfa1fd6fedd36520871

  • amd64: ac00e4820508b612a7dbceb865112ad5d9115c258e02842991b0850dc8b4ea89



New features and changes

  • Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: Enhancements have been made to parsing of view definition SQL to harvest column-level lineage. We now support joins on named subqueries and correctly handle quoted identifiers.

Bug fixes

  • Snowflake collector: The sampling queries used to calculate the column statistics were failing.

Release version 2.122

Details about this release

Table 104.

Item

Details

Release version

2.122

Release date

10 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: b50f49f324e6df07864a39b904609b007105eb0378daee89fda1e255fd07a075

  • amd64: 87e2c4d932f4c584c42b87d2842b81cc3e54a7a42a18af5dd400340a4eae62e5



New features and changes

  • Manta collector: 

    • Collector now supports Manta version r38.1

    • The collector now also supports token-based authentication.

  • JDBC collectors: The description of the --jdbc-property for JDBC collectors is updated for clarity.

  • The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Tableau collector

  • YAML configuration files used to configure collectors can now interpolate system environment variables and Java system properties. For details about using this feature, see this documentation.

Release version 2.121

Details about this release

Table 105.

Item

Details

Release version

2.121

Release date

2 February, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • amd64: a730cf46ed312e871e5d54c6fc89ab1cd050a8c3ed7f0a0155bcae7c1aab7ac3

  • arm64: 894f34263dd50b1290d5cbbd63849830150815c10a4969ab57c58ed625466ab1



New features and changes

  • Tableau collector: Improved detection of underlying database type when a Tableau data source uses ODBC.

Bug fixes

  • dbt collector: Fixed an issue in the dbt collector that caused coining of IRIs that were inconsistent with IRIs coined by the Snowflake collector, which prevented the linking of database objects between dbt and Snowflake in the catalog. The application now ensures consistency of database object IRIs created by the dbt and Snowflake collectors.

Release version 2.120

Details about this release

Table 106.

Item

Details

Release version

2.120

Release date

27 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • amd64: 78e999b0bcd7493ab9c0c48680483689dd7272efe84d006df74e5f33a17c700d

  • arm64: d26d71eb535ef56bb68cb411292865967b5399a8c49bedb1f5d0e8cda9b8173c



New features and changes

  • BigQuery collector: The collector now harvests catalog resources representing BigQuery datasets and their associated metadata.

  • dbt collector:

    • The collector now supports key pair authentication to Snowflake allowing users to use private-public key pair for authenticating to Snowflake.

    • The collector now has improved detection of target database type information when that information is missing in the profiles.yml file.

    • Users can now use the new --snowflake-account CLI parameter to override snowflake account information from the command line.

    • The help text for --snowflake-role, --snowflake-warehouse, and snowflake-application parameters now include examples and case-sensitivity information.

  • Snowflake collector:

    • The collector now supports key pair authentication allowing users to use a private-public key pair for authenticating to Snowflake.

    • Enhancements made to parsing of Snowflake SQL dialect when harvesting column-level lineage allows for parsing of statements with copy grants.

  • Tableau collector The help text now includes examples for the --tableau-project and --tableau-exclude parameters.

  • Power BI collector The help text now includes examples for the --include-workspace and --exclude-workspace parameters.

Release version 2.119

Details about this release

Table 107.

Item

Details

Release version

2.119

Release date

20 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

AMD64: 5a8f9e24ebe05dc027caf74075cac4ce51667271da30935640fc3c9471578445

ARM64: af0b4528e0ee097d29d286c29c803db185f616babdb2a867b6228e77efaf1cd5



New features and changes

  • A new Further Help section is added to the help available for collectors that is accessed using the -H or --help parameters in the command. It now guides users to the collectors help available on the data.world documentation site.

  • The collectors now emit a globally unique IRI to track collector runs.

Bug fixes

  • Snowflake collector: Column statistics now supports Number data type.

Release version 2.118

Details about this release

Table 108.

Item

Details

Release version

2.118

Release date

18 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: aca6f9202192bac23a5579f88eb155576e46425ad6b901c3febfbab32ff4158a

  • amd64: 56cbb748ada10f006d41a841034a0a6ba7211085c9682aa32331682ea92d20b0



 

Bug fixes

  • Snowflake collector: Column statistics now supports columns with spaces in names. 

  • Tableau collector:  The Tableau collector released from version 2.113-2.117 had an issue because of which it was not able to parse GraphQL queries. If you are using collectors between version 2.113-2.117, you must upgrade to 2.118 to be able to use the Tableau collector successfully.

Release version 2.117

Details about this release

Note

This release was for internal improvements and has no customer impacting changes.

Table 109.

Item

Details

Release version

2.117

Release date

12 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 429a55a11d4bcd15647d1316d9debd9ead4b4ab5c0b9146894d07c39aa814290

  • amd64: 481dd2da6de71525248eba186feeeafcc73cc956ade0a196a4e8b0c2424e74b9



Release version 2.116

Details about this release

Table 110.

Item

Details

Release version

2.116

Release date

10 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 011ebeaf6000b1fdc47f1d3f8cb8a7655cbbe3528b844abe2a2cd9bd9fddc0fe

  • amd64: 192a2b94b6e58016c8e5f7ae871480e6e38fb74214597640f8b862a245d5c629



New features and changes

  • Power BI Collector: The following alternate options are added for some of the command line parameters:

    • For --include-user-workspace alternate parameter --user-workspace-include

    • For --include-workspace alternate parameter --workspace-include

    • For --exclude-workspace alternate parameter --workspace-exclude

  • BigQuery collector:

    • The collector now harvests additional metadata from projects, datasets, views, and tables available in BigQuery.

    • Column-level lineage added between Views and Tables.

Bug fixes

  • Snowflake collector: Fixed issue for parsing a SQL statement that contained copy grants in Views. This helps improve the column-level lineage harvested by the collector.

Release version 2.115

Details about this release

Table 111.

Item

Details

Release version

2.115

Release date

10 January, 2023

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 693e1661d3ae178c0d5a2bca8e40f406928e91d34b4c1c749f1cce31bf720592

  • amd64: 060e0268ecdfb3d9c7382cce30192c334c1240831edf21681887bc8ee29a33c4



New features and changes

  • The documentation of the jdbc-property property for database collectors is improved to explain how users can specify multiple properties. This change applies to 19 collectors that include this parameter.

  • A new resource dwec:Source is added to the catalog emitted from database collectors. It is a mechanism that allows users to render specified resource properties as read-only in the data.world catalog UI.

  • Power BI collector: The collector now has enhanced parsing of power BI transformation expressions. As a result of this change more column-level lineage information is harvested from Power BI.

  • Snowflake collector: The collector now harvests table usage counts information.

  • dbt collector: User-defined database attributes are now enabled for the dbt collector to fully mitigate missing or incomplete profiles YAML file when cataloging database objects referenced by dbt.

Release version 2.114

Details about this release

Table 112.

Item

Details

Release version

2.114

Release date

22 December, 2022

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64:7202f9ae528a73e8ff7e6a29c36a22c8202680e733000886e760b0a3864b692a

  • amd64: 321d046a526f04b47fee389c3e48222b0b6b6c0d940ff66b938be92d85f59b0



New collectors

  • The Grafana collector is now available as a private beta release for select customers. Please contact data.world if you are interested in using this collector.

New features and changes

  • The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Power BI

  • Catalog graphs (.ttl files) that are automatically uploaded to the data.world platform with -u / --upload are now compressed, enabling larger graphs to be uploaded.

  • Power BI collector: The collector now provides a new option --disable-expression-lineage to skip parsing lineage from the source expressions.

  • Snowflake collector: The collector has a new ability to harvest table usage and query count. This functionality is enabled by passing --table-usage-collection. It calculates, for each table in the database being harvested, the percentage of tables in the database that have been queried no fewer times than the subject table. The time period over which this analysis is performed is controlled with option --table-usage-lookback-days (that is, the number of days prior to the time when the collector is being run during which queries of each table are tallied), which defaults to a value of 7.

Bug fixes

  • Snowflake collector: Fixed an issue with SQL parsing in Snowflake for windowed aggregate functions.

  • Power BI collector: Fixed an issue with the Power BI expression parsing related to joins in source expressions.

  • Looker collector: Fixed an issue in the Looker collector that caused an abnormal termination of the collector run with certain Looker views.

Release version 2.113

Details about this release

Table 113.

Item

Details

Release version

2.113

Release date

12 December, 2022

Docker image ID

(use this to verify the integrity of the Docker image.)

  • arm64: 0517b905198728ba73bc59304ce06dd60ac99c3f7a25ad84569b94bef41eb1c2

  • amd64: b52b36ccf20c00fc0bb16b6abcb01496d55c1f64a8425a23a24f9473de54c9e3



New Features and changes

  • The following collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.

    • Databricks

    • Db2

    • Denodo

    • Dremio

    • Generic JDBC

    • Hive

    • Infor Ion

    • Mysql

    • Oracle

    • Presto

    • Salesforce

    • SQl Anywhere

    • Vertica

  • Power BI collector: Updated the Power BI collector to harvest metadata for Dataflows.

  • Databricks collector: Updated the Databricks collector driver version to 2.6.32. Drivers available here

  • dbt collector: Updated the dbt collector to harvest metas (as key value pairs) for dbt resources.

Release version 2.112

Details about this release

Table 114.

Item

Details

Release version

6 December, 2022

Release date

2.112

Docker image ID (use this to verify the integrity of the Docker image.)

  • arm64: 0921bdcf30a1e28f7a1d5094ff806537bfa023af93d2904bce6c9624e8cde3cf

  • amd64: 80e973a297d89d73d1a3c62d319d11baa23f45bc804e01050868827c60c2ad64



New features and changes

  • Added the following options for Snowflake, Redshift, PostgreSql and MS SQL:

    • --dry-run: If specified, the collector does not actually harvest any metadata, but just checks the database connection parameters provided by the user and reports success or failure at connecting.

    • --enable-column-statistics: to enable harvesting of column statistics (i.e., data profiling)

    • --sample-string-values: to enable harvesting of sample values and histograms for columns containing string data

    • --target-sample-size: to control the number of rows sampled for computation of column statistics and string-value histograms

Release version 2.111

Details about this release

Table 115.

Item

Details

Release date

29 November, 2022

Release number

2.111

Docker image ID

(use this to verify the integrity of the Docker image.)

 

  • arm64: 4d4cd1fde0816ae5209b72f92f87c798da83dba5b2f155e3614bc89c68f39b71

  • amd64: bdd7daa56b2a62864c59b8c8958100e12080e61dc8f18559555840bef58c8079



New features and changes

  • data.world now produce images for the arm64 architecture (in addition to amd64). The addition of arm64 means that dwcc images run seamlessly on M1 mac. As a result of this change, from this release onward two hashes are available per release.

Release version 2.110

Details about this release

Note

This release was for internal improvements and has no customer impacting changes.

Table 116.

Item

Details

Release date

12 November, 2022

Release version

2.110

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 98583ecda023782df1e08a0f2347a536e239186dcca3936d16c67ae1f6aad0f6



Release version 2.109

Details about this release

Table 117.

Item

Details

Release date

10 November, 2022

Release version

2.109

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 6602f313506e5eb3ea74c296994f7e4d7bd56845c6f2b35e6d1d4cde5f402832



New features and changes

  • Snowflake collector: Snowflake policy fully-qualified names are being written to the title instead of to the description property.

Release version 2.108

Details about this release

Table 118.

Item

Details

Release date

8 November, 2022

Release version

2.108

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: b81a221abff982a356a21c4430f80da1e4459f0c3ead2d4f4f51a8f1d45c5604



New features and changes

  • The --post-process-sparql cli option is now available for all other collectors (This feature was previously made available for some collectors in release version 2.107). This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem and/or uploaded to the data.world API.

  • BigQuery collector: The option to use the credential file for BigQuery no longer allows use of -c . It must be specified with --credentialFile.

Release version 2.107

Details about this release

Table 119.

Item

Details

Release date

3 November, 2022

Release version

2.107

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 4a33db022e92488914d1f088d4041a31d76883706ae13a88bf1a0e8aa67eaa51



New features and changes

  • Added --post-process-sparql cli option to some collectors. This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem or uploaded to the data.world API.

Bug fixes

  • Fixed an issue where MS SQL Server database objects referenced from Power BI did not always link to those objects harvested by the SQL Server collector due to mismatched IRIs.

Release version 2.106

Details about this release

Table 120.

Item

Details

Release date

30 October 2022

Release version

2.106

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64 0ac22da04737fbcaaac0da9d076eaf92e3cdd870c85544dd16a556f54a8900a8



New features and changes

  • Google BigQuery collector: The collector is updated to coin IRIs for database objects that align with IRIs coined by other collectors.

  • dbt collector The collector now writes catalog records for each Snowflake tag and policy.

Bug fixes

  • Fixed a defect in which boolean properties that appeared in the global_options section of a dwcc configuration file were not properly recognized.

Release version 2.105

Details about this release

Table 121.

Item

Details

Release date

28 October, 2022

Release version

2.105

Docker image ID

(use this to verify the integrity of the Docker image.)

amd64: 427443cadbc21a3e26f095a4c054f6193bf3c2b96d257cdb22b57abd061bad68



New features and changes

  • Power BI collector: The collector now supports the ability to include specific workspaces for cataloging via the parameter --include-workspace. The collector continues to allow exclusion of specific workspaces with --exclude-workspace. Use of --include-workspace takes precedence.

  • dbt collector: 

    • The collector now supports harvesting of dbt projects/artifacts that specify Snowflake as the target database.

    • The collector now correctly coins database object (e.g., database, schema, table, column) IRIs that align with IRIs coined by the JDBC collectors. Previously, if the case used for identifiers in dbt artifacts did not match the target database’s default collation, the IRIs would not align (they do now).

All other versions

10-25-22

Collector v2.104

hash: 0b01f8c379e52f3167577a6fd1e5ad2f8d2f3d73871797ae3859b79f83bf5c29

  • Updated the Monte Carlo collector to add a --bigquery-credentials-file option, in order to standardize the option since the dbt collector has a --bigquery-credentials-file option (note the --big-query-credentialFile still exists in Monte Carlo, this is a new alias for the same option).

  • The Snowflake collector now harvests Tags, Masking Policies, and Row Access Policies, and associate these resources with the database objects to which they apply. There are new CLI options in order to include these: --tag-collection and --policy-collection.

10-14-22 Collector v2.103

hash: 0f4f021c4c8fc17c7f47618ef9942e255327eef0d4c749e453a06e1d0e96760b

  • Updated SQL Server collector to harvest intra-database lineage from views.

  • Updated the log messages for missing files not required for DBT collector to run.

  • Added table name to warning messages in Tableau collector in addition to the table ID.

  • Added parent-child relationship between projects in Tableau collector when the parent project is not included in the filtered projects.

  • Added pagination for certain queries in Tableau to prevent the result hitting the max node limit.

  • Updated automatic catalog upload functionality to accommodate large catalog graphs.

9-30-22 Collector v2.102

hash: 4dd8a1bdc776f0e8eb352954298842867c7873224d658f2abd8faefe31c40a76

  • Updated the Tableau collector to accommodate changes in the Tableau metadata api that were preventing detection of lineage relationships between Tableau fields and underlying database objects.

  • Updated the AWS Glue collector to handle an error with jobs that have a space or other invalid characters in their paths.

  • Updated the Databricks collector to include the UserAgentEntry property in the jdbc connection.

  • The collectors will now emit the Collector version to the logs.

  • Added a fix to SQL parsing for window aggregate functions (e.g. SUM(X) OVER (PARTITION by Y ORDER BY Z..) )

9-19-22 Collector v2.101

hash: 2be46a6268e34acceedd5b80412787d10f732ad1a2f1ceb83c6d5ce2fe819457

  • Added a filtering feature to filter Tableau fields by project in Tableau collector.

  • Fixed an intermittent authentication issue associated with harvesting metadata from a single site with Tableau collector.

  • Added a log message for missing job script in AWS Glue collector.

  • Enhanced harvesting of column-level lineage from database views, including handling SQL SELECT statements missing a FROM clause, and updated list of Snowflake keywords passed to functions.

9-9-22 Collector v2.100

hash: 946a0c51c091e74d6043dea1450a1ac818546b040e702e91526da185297a2858

  • Fixed the Fivetran collector so that it doesn't produce "blank" nodes (no id or name)

  • Added a change to use log_level rather than log-level.

9-2-22 Collector v2.99

hash: 68e4c4d6a6b40cb91a8e574a1f106c9c20ba2f1156a93f5f871b0284e975a766

  • Fixed an issue in Power BI with the new metadata API calls.

9-1-22 Collector v2.98

hash: 228090b0af31681952b7ccd5abef9beaf070450692501fef527bff8ca32280cb

  • Added harvesting of column-level lineage in the dbt collector, for dbt projects that target one of the collectors for which intra-database lineage is supported (i.e., Snowflake, Redshift, and PostgreSQL).

8-29-22 Collector v2.97

hash: 0a2134ef29a057b0c003ab353c14f034e65702b298a450f01fafcea5b6e8c1ea

  • PostgreSQL can now be cataloged using either catalog-postgres or catalog-postgresql for the command.

  • Microsoft SQL Server collector now harvests SQL Server extended properties for databases, schemas, tables, and columns.

  • Column-level lineage harvesting in the Snowflake, Redshift, and PostgreSQL collectors now properly harvests lineage from views whose sql statements include comments starting with “--“, and also statements with inline subselects.

  • The dbt collector harvests process (activity) and model (agent) metadata using PROV-O qualified derivations.

8-17-22 Collector v2.96 (no 2.95 release)

hash: a09e365296d57965385563569c6c58a6f706da1a4c1c6d711141aabd316d8629

  • Tableau collector now supports multiple --tableau-project options, allowing the user to include multiple projects in the same collector run.

  • Tableau collector no longer associates Custom SQL Table resources as part of a database.

  • The Collector no longer includes a bundled jdbc driver for Salesforce. Please contact data.world support team for assistance in obtaining an appropriate JDBC driver.

8-15-22 Collector v2.94

hash: ac21b2f728b79e3dff38c2a395a81c0b0b1558979b8385747c6d00b76e1d6724

  • Enhancements to the Tableau Collector for project filtering and additional logging

  • Postgress additional triple for Table to Database linking

8-2-22 Collector v2.93

hash: afdecd160fd38e3db565cb14db3805fed05fa86b5c3a70662d0c8f0b0d10799f

  • Includes some internal dependency updates.

  • Enhancements to the DBT collector to validate the input profile.yaml file

7-29-22 Collector v2.92

hash: 9b87934376246cd3926bfe413d36f2a7a0f2e7d848d7f2e68380d6035fe276f6

  • Enhancement to the tableau collector to add a retry if a graphql query fails

  • Enhancement to the collectors to add a check at the beginning of a collector run to ensure the output directory exists (if the o/-output option is used), which will log an error and stop if the output directory doesn't exist.

7-21-22 Collector v2.91

hash: 708b34b19b2695d14d5a74a8281d5365cb659c4eeeebb791b3ebf2aa2e4d6686

  • Enhancement to the Tableau collector to reauthenticate if the Tableau API reports failed authorization during the collector run.

  • Released the new Fivetran collector (catalog-fivetran)

7-12-22 Collector v2.90

hash: bf94b0431b5a99dd95f485a8a48f202ea138f103b54873b4440b5080d86d529a

  • Added the parameter --include-information-schema for Snowflake and SQL Server collectors; we no longer catalog the information schema in these collectors when --all-schemas is specified, unless the user also specifies --include-information-schema.

  • Improved handling of manifest json structures with some nulls in the dbt collector.

  • Added reporting on user access issues during parsing/resolution for Snowflake collector.

7-9-22 Collector v2.89

hash: 848e38708b832c703652dc45d148e471a2341bce7f6ec159c2471f287a8d3620

  • Updated tableau collector to print a clear log message when authentication expires during a collector run

  • Update tableau collector to allow optimized serialization of API requests under JDK 17

7-1-22 Collectorv2.88

hash: 8534190cb3f0f93bd2a326abd54086e89eb38ece8180bf0487486dc66242d6c8

  • Significant updates to the Power BI collector. As of this release Power BI Collector outputs different classes than the version before it. The collector now emits information about where it is sourcing it's data.

  • Internal developer and testing improvements

  • The MANTA collector is now more specific about the concepts that it emits about Informatica PC.

6-24-22 Collectorv2.87

hash: 7860e33213ba90783851cd7f7e6529ee99a5f261ae086d3a7038938c6f290ae6

  • The information schema collector now explicitly supports Oracle.

  • We have added enhancements to the dbt collector to harvest DBT snapshots and sources.

6-22-22 Collectorv2.86

hash: 6fdae2dd70896e402ca648701bcd48210a8fd5979230c958b0dc06030bd7b1ec 

  • For collectors that take API endpoint URLs, the data.world Collector will add a trailing slash to the URL if needed and not specified by the user

  • New command-line option --warehouse available for the Snowflake collector that allows the user to specify which warehouse to use to connect to snowflake.

6-18-22 Collectorv2.85

hash: aaa6e55bf19af7ef37f1ab80ad28522af77a6ff286ef616085d92ab51f7d7899

  • Added a the data.world Collector collector for dbt - legacy collector still available.

  • Fixed an issue with auto-uploaded log files, in which not all log messages were being written out.

6-15-22 Collectorv2.84

hash: 2e128cd3c89ffc8c35fbad12f6ee4ba7e6e5cdf9bfcf991fac78e8033d5d17d0

  • Looker collector now emits resources for Looker Views and relates the Looker Dimensions and Looker Measures that are configured within those Views.

  • Improved handling of unexpected database types encountered when cataloging Tableau.

6-3-22 Collectorv2.83

hash: 9487027423a076231cec76f5679f044493a3d75032882c4ca0e5cf1c0304e6cf

  • Further improvements in handling of SQL ORDER BY, GROUP BY, WHERE, and HAVING clauses when harvesting intra-database lineage from database views.

6-2-22 Collector2.82

hash: 22459e3d3a2a38f448d4e56137ed4ecd05170767b5a682dd4870135cceff23c2

  • Corrected coining of IRIs in catalog graphs emitted by the Tableau collector.

  • Improved logging in the Tableau collector to detect unknown linked database types.

  • Improved harvesting of lineage between database views and referenced columns, including support for columns in SQL ORDER BY, HAVING, GROUP BY, and WHERE clauses, and parsing of a wider range of column expressions.

the data.world Collectorv2.81 - INTERNAL RELEASE

5-24-22 the data.world Collectorv2.80

Hash: be5a85c754d54328accabe332dec55ce507baddbe68d2fe9e29a211e9ea1420f

  • With this release, the data.world Collector now requires Java 17. If you run the collector from within Docker this change will not affect you. If you run the data.world Collector from a .jar file, you will need to upgrade your JRE to 17 to run DWCv2.80 and greater.

  • Add the parameter --disable lineage-collection to enable users to turn off cataloging lineage for PostgreSQL, Redshift, MS SQL Server, and Snowflake

5-13-22 the data.world Collectorv2.79

Hash: 5b548c82b96ad5e5dbd4770adff205c9d07cac3c5f949882d7d9381240366ddb

  • The Manta collector can now accept OAuth tokens for MANTA authentication (for harvesting metadata from manta version R35 and above)

  • We have released a new collector powerbigov that only allows tenantid for auth and not user/password and connects to the government powerbi api urls.

5-11-22 the data.world Collectorv2.78

Hash: 71edd8ff7a4c3ed8a91eaf36d59c8e2745b7a76f8666b5750cbee8205021c9c6

  • Added some small Tableau collector enhancements.

  • New PowerBiGov collector with specific endpoints for .gov customers. This collector does not accept a username or password.

  • For PowerBI, a new way to authenticate is available. A user can now enter a tenant ID with a client id and a client secret to authenticate, in addition to using a username and password.

  • For both PowerBi and PowerBiGov, when using the tenantid, secret and client id authentication method, this collector no longer emits information about PowerBI Apps.

4-27-22 the data.world Collectorv2.77

hash: 4bed848791cfa9e46c9db4a78c7a593bb1c986900dc6fcfcd4255ddce1528579

  • Fixes an issue with the Snowflake collector that prevented the bundled jdbc driver from being found. Any users working with the data.world Collector 2.76 should update.

4-22-22 the data.world Collectorv2.76

hash: 30e60a4434ee64d2981b40eb2dc92506da3d367eab22bc0bca0c61bdd44a3f02

  • The Snowflake collector harvests some intra-database lineage information from database views.

  • Improved the host mapping in the Manta collector.

4-7-22 the data.world Collectorv2.75

hash: 1a59dbb3ff8679fb6ee22eadaeb04ccdb28c5660be029e78fbc96403ae33096f

  • the Manta collector now emits resources for file sources and targets and their directory structure. It also emits sources and targets as files.

4-1-22 the data.world Collectorv2.74

hash: 219428f6a72be91205408d5cb3f8cc8b27e1a9a4df0208e4cacb8fbaa1352f90

  • The Tableau collector now emits “column-level lineage”:

  • Improved styling of the data.world Collector command-line errors

  • Updated command-line options for Datakin and Marquez.

3-16-22 dbt collector v.05

This version adds a third command-line argument to specify an output file name.

3-8-22 the data.world Collectorv2.73

hash: 119daf987dcfad25db599e1c1affedf17a35ff2aa002d0618d642eb309cebaaf

  • Permalinks to Looker explores included via externalUrl

  • Improvements to datakin/marquez collectors

  • Tableau collector now emits resources for Tableau Projects, allowing us to establish full relationships between projects and the workbooks and views that they contain

  • Monte Carlo data collector now emits data quality information using enhanced dwec ontology concepts

  • Looker collector now emits descriptions for measures and dimensions

  • MANTA collector now emits Snowflake resources found in MANTA scans

3-1-22 the data.world Collectorv2.72

Hash: 62d156aca58ec92513e8d6490f00fd10ee52dfb7a65f71c20c6a988c938dfddd

  • [BUGFIX] Invalid prefix when using --base option

  • Update the data.world Collector transform to add catalog events to specific collectors

  • Added a Snowflake Sensitive Data Discovery collector

  • Sync CLI options between collector types

  • Validation of CLI options for the data.world Collector

  • Improvements to the the data.world Collector CLI

  • Update the MonteCarlo Collector to use the new Data Quality Ontology

2-17-22 the data.world Collectorv2.71

Digest: 03fc3df90ae63896d62ea22e00688f42cacf5b76d0f47691c06c104736680b2a

  • Bug fix for Marquez collector

  • Bug fix for Manta collector

2-9-22 the data.world Collectorv2.70

Digest: 06bb747c4d7705c1e44664de7854158d87468316bab549ec5604b0a075380c69

  • Preview images for Tableau assets are now harvested much more efficiently, and the resulting image data in the catalog graph are much smaller, reducing catalog harvest run time and enabling image objects to remain within platform constraints during ingest.

  • Fix for unexpected column type errors in BigQuery collector

2-8-22 the data.world Collectorv2.69

Digest: 5ab9b97d5f8f4568613438a9e52b0bdc12974f8d6edd0dab374a281c4982c737

  • Created new collectors for Marquez and Datakin

  • Added schema information to the Tableau collector outputs

2-4-22 the data.world Collectorv2.68

Digest: 23674ee02a6b725d5f9a453615dc507286da2ee606dca83c386472f3aa36d118

  • The Tableau collector now accepts Tableau “Personal Access Tokens” for authentication, via new cli options --tableau-pat-name and --tableau-pat-secret.

  • Fixed an issue with mis-identification of views as tables in BigQuery.

2-2-22 the data.world Collectorv2.67

Digest: 032867c9c52c8d46dc0b90a61a128be65ecec1440bb0adccb8b0d1b249b4e351

  • Fixed an issue with server name identification in Manta.

1-26-22 the data.world Collectorv2.66

Digest: fa9ae2eb3d68375a3ff01ac7bde98fd36f372b84dce0d411444146ea9566b47b

  • With this release the Athena collector is no longer a JDBC collector--we harvest metadata by accessing the Athena API directly, rather than going through a JDBC driver. This means that it is no longer necessary to provide a JDBC driver when running the collector.

1-10-22 the data.world Collectorv2.65

Digest: ed08cdd21a374c30456de0989076f5180bc4187ca998358b051807e521fd44e6

  • This release adds a new option for the MANTA collector, --manta-max-parallel-scenarios. Specifying this option and passing an integer value will configure the MANTA API to export the specified number of scenarios in the MANTA graph in parallel. The default value is 4; adjusting this up or down can improve performance.

1-5-22 DWCv2.64

Digest: 45b72798b0602885790388331a75db1f4286b15bf57b21f30f416eda79041571 

This release upgrades the data.world Collector's dependency on the Apache Jena RDF library to version 4.2.0, which addresses security vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-39239.

12-23-21 the data.world Collectorv2.63

Digest: sha256:eb4208c914269c793a5e2143d59a9982e7b087c5da1c17dd075e02a326e64a3e

  • The Athena JDBC driver is no longer bundled with the data.world Collector as we have discovered that the Athena driver itself has a dependency on a vulnerable version of log4j. Customers that use the the data.world Collector Athena collector will now need to supply their own driver and put it in the jdbc driver directory (as is done with other collectors for which we don’t distribute a driver).

12-15-21 the data.world Collectorv2.62

Digest: sha256:2cd579e09f4eee94e141e8cf7e4e40e9a9b8803029df1be7112d67d62ef33b9e

  • The Oracle collector now supports connecting to the database via SID (instance ID) or Service Name. Service Name is the default. If a connection via SID is desired, pass the SID as the value of the -d/ --database option and add the --oracle-sid-mode option (flag).

12-13-21 the data.world Collectorv2.61

Digest: sha256:bd0ba96208d714ecef4131867cf5d16372be0a33f416c1d6bd01f132c8517323

  • The information schema collector has been modified so that the files table_constraints.csv and constraint_column_usage are now optional, not required.

12-10-21 the data.world Collectorv2.60

Digest: sha256:7fd825bfe7d2f99c9a1298ad26bc1934c9657cc7c5868dd093844344d18fc7b7

  • Updated the BigQuery collector to support current Google Cloud API enhancements.

  • Added a new Information Schema Collector. This collector runs via the {{catalog-information-schema}} command and is notably cataloging four CSV files that are provided to the collector via a {{--csv-file-directory}} parameter rather than connecting to a database. This collector is an option for customers with tricky DB setups that do not allow them to authenticate or establish connections to their DB via our normal the data.world Collector collectors.

12-2-21 the data.world Collectorv2.59

Digest: sha256:051f76748be1c6cf2c7557600dde71a39e1b822c9e49120881ce938f1c8c2b80

  • Verified the Manta collector works with MANTA R34.

  • Released the config file command.

  • Modified the Tableau collector to remove schema and database names from table names.

  • Updated the BigQuery collector to support cataloging all datasets in a project at once by default, and to be able to use cli options to select specific datasets in a project as well. With this last change, the  --dataset param is no longer required. The help text has been updated with new messaging to reflect these changes.

11-10-21 the data.world Collectorv2.58

Digest: sha256:82ebc1cec46f70de000aa94695359bd28d65c2782afc362c9ce14fadc04eae07

  • Added a new collector for Hive (as an alternative to catalog-hive) that uses only the Hive metastore--it does not connect to the Hive server directly. 

  • The PowerBI collector now harvests workspaces and identifies other assets as being in workspaces

  • the data.world Collector now emits “catalog events” into the catalog graph. These capture details about the cataloging process itself, including selected configuration options with which the Collector was run, and summary statistics about the catalog. The ingest process will soon extract this information from catalogs at ingest time and send them to segment for downstream analysis.

11-1-21 the data.world Collectorv2.57

Digest: sha256:606f7cfbe60bf56b4c2ecd5fb3902d4de621e31ae76ad78e68c56c788f81e5e6

  • Fixed an issue in the Tableau collector in which Custom SQL Table objects without an associated database were not handled correctly.

10-27-21 the data.world Collectorv2.56

Digest: sha256:335f7e110a9506d95dff05971492e6509fb8537e74f9275d04dcf9e2427df0f0

  • Added new cli options to salesforce collector so that it can handle sandbox environments and custom login domains customers might have.

10-25-21 the data.world Collectorv2.55

Digest: sha256:c60ae69edc88b8801be833d578ef5dca73b6302646be9b30d31ccdfd7444288a

  • This release updates the BigQuery collector to handle fields in BigQuery tables for which the BigQuery API returns null type.

10-5-21 the data.world Collectorv2.53

Digest: sha256:59c960d525e66e77d08dd34fd58c9b5027334a4bd2271f1f059370ae006a4b0b

  • Enhancements to the MANTA collector to harvest additional lineage information from MANTA scans (lineage from Informatica PowerCenter in particular)

  • Tableau collector enhancement to provide a better warning to the user when an obsolete version of the Tableau API is specified

9-29-21 the data.world Collectorv2.52

Digest: sha256:915e4e91841001f80a84a65fcd76350b9a1d53f4e31678bb0e628d32beab94a1

  • Fixed an issue with the handling of certain fields and database information when the Tableau collector was run with a non-admin credential.

9-28-21 the data.world Collectorv2.51 (internal)

Digest: sha256:261c5bf33b2ae38cbda35a346fcb37c56bbf8ebfb773f328deb9140efba1c8bf

  • Fixedan issue with the Tableau collector issue to handle views/workbooks that exist outside of a project.

9-28-21 the data.world Collectorv2.50 (internal)

Digest: sha256:b407c629247f36afac3869eb8320464fce8caeb2865dd79811882b54ef94d1b5

  • Fixed an issue with the Tableau collector to handle workbooks that exist outside of projects.

9-24-21 the data.world Collectorv2.49

Digest: sha256:397e78867f41aaa393ff69f42b0fa524fdcad662ddd027925cf27f80497b24ce 

  • Added a collector for Salesforce (catalog-slesforce)

  • Fixed a IRI mismatch issue for Tableau Collector when running on Tableau instances with a Snowflake datasource.

9-18-21 the data.world Collectorv2.48

Digest: sha256:c36755489b6235408aa4e639e6e184cab027a32a34e3b8ca369c3c6b3c4bff96

  • Made internal improvements to the tableau collector to enable more efficient querying of the Tableau metadata api.

  • Fixed an issue in the manta collector in which certain missing data in the MANTA lineage graph caused an exception

9-10-21 the data.world Collectorv2.47

Digest: sha256:219edfa247929e15d7c4e2be99ef890b2487c398abc1a23b2f85b3de11812be3

  • Fixed an issue in the Reltio collector that occured when a Reltio configuration was missing certain objects.

  • Added a collector for Databricks (catalog-databricks)

9-8-21 the data.world Collectorv2.46

Digest: sha256:e48cba45b457e076714d94d3a83d1164cb892864213732b3b2b334c041ff178a

  • Fixed an issue with creation of resource IRIs by certain collectors when the user chooses version 1 minting

  • Updated BigQuery collector to enable integration with data.world platform / connection manager

  • Fixed an issue with the MANTA collector in which certain large MANTA scans caused a numeric overflow during json de-serialization

  • Updated Reltio collector to include information about survivorship groups in the emitted catalog

8-24-21 the data.world Collectorv2.45

Digest: sha256:77f4c784b1d0166cf3bb87903696528f712fbe6aee1d4cb7e60097a0f494c7de

  • This release fixed an issue with JDBC drivers not being loaded by the Athena collector.

  • Added a collector for Reltio configurations (catalog-reltio).

the data.world Collectorv2.44

Digest: sha256:47c1bb38b88c25801adf1f765e23c63637d15a60ae11fca8d63b53a8cd4755b2

  • Fixes an issue with URLs for sheets and dashboards that exist in Tableau Online or in Tableau Server within a site other than the default site.

the data.world Collectorv2.43

Digest:

sha256:696deaad59d2948a6adf3c275a90539cbf87057c93de9ee94d911fe105c574ce

  • Additional datetime fields added for Looker objects and typed as xsd:dateTime.

  • Fixed an issue caused by an undocumented change in Tableau Online’s REST API when using the Tableau collector to harvest metadata from Tableau Online.

the data.world Collectorv2.42

Digest: sha256:e6bc353ea4b2ec3486b54d4e9280856d328d93f5d406e367c0c50303cde93704

  • The generic jdbc collector harvests database name when cataloging Intersystems Cache databases

  • Running the Snowflake collector with the -A / --all-schemas option harvests metadata from all available schemas, as with other collectors

the data.world Collectorv2.41

Digest: sha256:bb79aa8afd19bf35b4b7e75840c21598702ec1d74b5f8640cc72a6758a3a0bc9

  • Fixed an issue with permalinks to objects in the MANTA collector .

the data.world Collectorv2.40

DIGEST: sha256:44dd710a49a1500863f49e2f2e4ef261a45cdc6c7354702fe8e764210c27293b

  • Added support for Looker folders and additional attributes to the Looker metadata collector.

  • Added the ability to preview images to the Tableau metadata collector.

the data.world Collectorv2.39

Digest: sha256:992671530f7483bfeb8a2aab52880a524b7df79caf427b373bd825115d71f4dc

  • Fixed an issue with the handling of certain special characters in catalog resource IRIs.

  • The --schema option for JDBC collectors can now be specified multiple times to enable the cataloging of multiple schemas in a single catalog.

the data.world Collectorv2.38

Internal release

the data.world Collectorv2.37

Digest: sha256:6a84217fa33df75d67ce51c486a90a802a8313a3432835abb55fffb5f1d3afc7

  • Updated Tableau collector to paginate additional graphql queries to avoid hitting Tableau Metadata API limits.

  • Updated the Hive2 collector to capture table-level metadata from the hive metastore

  • Updated the Tableau collector to allow the user to exclude specified Tableau objects from the catalog

the data.world Collectorv2.36 

Digest: sha256:8dd9793f3b0e74adcd7e7bc153f06b8c3098470217fb07af4336dde611269671

  • Improvements to error messages produced when using a config-file to run the data.world Collector

  • We disallow running catalog-postgres and catalog-redshift in the same config file as the two collectors use incompatible JDBC drivers

  • Improved error handling throughout the data.world Collector

  • Improvements in representation of Tableau data source names in tableau catalogs

  • Improvements to the MANTA collector

the data.world Collector v2.35 Changes in this release:

  • Upgrade of Denodo collector to Denodo 8

  • Handle edge case of very large field values embedded in manta’s exported artifacts

  • Support for sites

  • Handle edge case of stored procedure columns in manta

the data.world Collector v2.34 This release includes:

  • Enhancements to domo collector output

  • Testing improvements

  • A minor tableau collector enhancement

  • Fix for an issue in the tableau collector in which column fields were sometimes not properly identifying the Tableau Table from which they sourced their data

  • Improvment to the presentation of domo catalogs in the platform UI.

  • Changes to the dockerhub repository where we house images containing non-released versions of the data.world Collector. Previously we were calling these “beta” releases; we now call them “release candidates”. The new repository is datadotworld/dwcc-rc and the image tags are x.y-rc-z where x.y is the next expected Collector release, and z is an increment.

the data.world Collector v2.33 Adds support for harvesting intra-database lineage from manta scans, and accommodates changes in MANTA R32 (aka 1.32). We no longer support MANTA versions earlier than MANTA R32.

the data.world Collector v2.32 This release adds in collector support for Vertica db.

the data.world Collector v2.31 Issued fix to ensure alignment of identifiers for databases referenced by Tableau and Looker collectors.

the data.world Collector v2.30 Installed a config file-driven configuration (as a hidden feature for now). Issued a fix for handling empty powerbi objects returned by the API

the data.world Collector v2.29 The data.world catalog collector now supports Tableau Online! Additionally there was a bugfix for PowerBi.

the data.world Collector v2.28 Bugfix release

the data.world Collector v 2.27 Added the optional CLI option tableau-graphql-page-size to the Tableau collector which allows the user to set a number of objects to be included in each page of paginated queries.

the data.world Collector v2.26 Updated the PowerBi collector so that if a report is unavailable via the API it will be logged, and cataloging will continue on the rest of the repository.

the data.world Collector v2.25 This release includes better and more user-friendly error handling and reporting. We have also added an enhanced collection of Tableau metadata via the Tableau Metadata API (graphql endpoint). New metadata includes data sources, databases, fields, metrics, and many more inter-object relationships.

the data.world Collector v2.24 the data.world Collector is now distributed via Dockerhub Additionally there are changes to the Tableau and PowerBI collectors, and the ability to change the level of error messages written to the console and log file, and a new subcommand to display the the data.world Collector license text.

For Tableau:

  • The Tableau collector now emits RDF in which the object of `dct:creator` is a `dwec:Agent` instead of a string literal. This means we write additional details about the Tableau account that created the dashboard, via properties of the `dwec:Agent` resource. These details include: account name, account “full name”, and account email address (if they are populated in Tableau).

For PowerBI:

  • The PowerBI collector writes resources representing powerbi “data sources” that are now of a PowerBI-specific class, rather than `dwec:DataArtifact`.

Logging changes:

  • It is now possible for users to set the level (severity) of log messages written to the console and log file. By default, we write “info” level messages; users can choose to write only errors (level=“ERROR”), errors+warnings (level=“WARN”), or all messages including debug trace (level=“DEBUG”). This is useful if we want to have customers run the data.world Collector with debug logging turned on, for troubleshooting problems etc.

Display the data.world Collector license information:

  • License information for the data.world Collector is now available as a subcommand of the data.world Collector. To get all licensing information, run the command docker run -it --rm datadotworld/dwcc:X.XX display-license where X.XX is a version of the data.world Collector greater than or equal to 2.24.

the data.world Collector v2.23 Internal release

the data.world Collector v2.22 Internal release

the data.world Collector v2.21 fixed some timeout issues with Looker collector when fetching images from the Looker API. Fixed an issue with cataloging reports and dashboards based on user workspace permissions in PowerBi.

the data.world Collector v2.20 With this release our Tableau collector now supports cataloging of workbooks and non-dashboard views as well as harvesting tags on workbooks and views. FIxed an issue in the Looker collector where preview images returned from looker api were missing.

the data.world Collector v2.19 Includes a clean-up of the embedded help commands for several collectors and:

  • Fixes an issue with the Tableau Server collector when cataloging multi-site server instances.

  • Adds --tableau-site parameter to enable user to restrict cataloging to a single site (not required, by default all sites in the instance are scanned). Value provided to --tableau-site can be a site ID or name.

the data.world Collector v2.18 The tableau collector now has a flag option --tableau-skip-images which skips the harvesting of preview images for views. Usage is like this:

... catalog-tableau --tableau-api-base-url=http://ec2-44-192-86-11.compute-1.amazonaws.com/api/3.10/ --tableau-username=admin --tableau-password=password -a sc-test3 -n tableau-test --tableau-skip-images

the data.world Collector v2.17 Adds a collector for Presto

the data.world Collector v2.16 This release:

  • Adds the parameter --all-databases to the Athena collector so that it can catalog all the databases accessible from the logged-in account.

  • Fixes some issues with datatypes for dwec:externalUrl predicates.

the data.world Collector v2.15 This release contains the following:

  • The Tableau collector formerly had a CLI parameter --tableau-project-id which could be used to catalog only assets in the project with the specified ID. The parameter is now --tableau-project and takes either a project ID or project name

  • Update to the MANTA collector to accommodate a minor change in the MANTA API with v 1.31. Customers who have updated their MANTA instance to v 1.31+ will want to use the data.world Collector 2.15+.

  • The Looker collector now works for non-admin Looker users; however, when the data.world Collector is run by a non-admin, the emitted catalog will not contain any information about databases used by Looker analysis assets (access to database information in Looker requires admin permissions).

  • All JDBC collectors now populate two new properties for dwec:DatabaseColumndwec:columnDefaultValue  and dwec:columnIsNullable, which contain the default value for that column in newly inserted rows, and whether the column can be null, respectively. (Note that only some databases/drivers provide this metadata…we put it in the catalog if it’s there).

the data.world Collector v2.14 Adds a collector for Looker. Minor update to the docker-save.sh script that includes available versions in the error message if you don’t supply a version.

the data.world Collector v2.13 Adds cli params with this version so it now possible to pass arbitrary driver properties through to the connection

the data.world Collector v2.12 Adds collector for SAP (formerly Sybase) SQL Anywhere metadata collector

the data.world Collector v2.11 Improves the Dremio collector’s handling of data sources nested within multiple layers of folders, and fixed a minor issue with the Dremio collector’s harvesting of lineage metadata from the Dremio graph API.

the data.world Collector v2.10 Adds a collector for Domo and JDBC database collectors can now catalog all schemas in the database at once (default remains to catalog only user's default schema).

the data.world Collector v2.9 Adds Tableau Server collector and extended the OpenAPI collector to include a few additional schema property metadata properties.

the data.world Collector v2.8 Adds Infor ION data lake collector. Optimized collection of JDBC metadata (performance improvement).

the data.world Collector v2.7 Adds a collector for PowerBI.

the data.world Collector v2.6 Adds the Manta collector.

the data.world Collector v2.5 Upgrads Java runtime.

the data.world Collector v2.4 Extends handling of OpenAPI collector parameters and responses.

the data.world Collector v2.3 Adds support for OpenAPI (fka Swagger) collector.

the data.world Collector v2.2 A refactoring release.

the data.world Collector v2.1 Fixes an issue with the Denodo cataloger jdbc url port.

the data.world Collector v2.0 We now use v2 URIs as the official locator IDs for metadata resources. This is a breaking change (for structural, intentional reasons) which is not backwards compatible with v1 URIs. For more information see the article on the data.world Collector v2.X.

the data.world Collector v 1.20 Addresses some memory issues and open-cursor leaks.

the data.world Collector v.1.19 Adds writing statements to the catalog graph indicating that the catalog was the data.world Collector by the data.world Collector (with a version). We also added the ability to write database schema objects to the catalog graph.

the data.world Collector v1.18, Allows you to specify alternate organization permissions and upload locations when performing an automatic upload of the metadata.

the data.world Collector v.1.16 and the data.world Collector v.1.17 Address issues with the SQL Server cataloger.

the data.world Collector v.1.15 Adds Dremio support with optional Catalog API lineage fetching.

the data.world Collector v1.14, Enables you to change the amount of memory that gets allocated to a the data.world Collector docker process. See our article on allocating additional memory to Docker for more information.

the data.world Collector v.1.13 Adds support for Microsoft SQL Server, and we enable JVM to use available memory in the container (useful for creating large catalogs). Additionnally we Improve data type recognition in AWS Glue cataloger.

As of the data.world Collector v1.12 we can support not only Glue ETL jobs, but also Glue Data Catalog tables and columns.

With the data.world Collector v.1.11 you can:

  • Upload generated catalogs via the --upload / -U command-line parameters

  • Upload the the data.world Collector log when uploading generated catalogs with --upload

  • Fetch an organization's current catalog with the fetch-catalog command

In the data.world Collector v1.10 we added support for AWS Glue and AWS Athena including cataloging ETL jobs associated with an AWS account. There is no need to mount in a jdbc drivers directory as the Glue cataloger uses the Glue API, not JDBC.

dwc v.1.9 is a bug cleanup release.

It is now possible with the data.world Collector.1.8 to use jdbc drivers on classpath as well as those found in user-specified JDBC Driver Directory (drivers in directory have higher precendence than classpath drivers).

the data.world Collector v.1.7 is a bug-fix release

the data.world Collector v.1.6 adds the support for arbitrary jdbc data sources and the ability to build one-off docker images for testing, demos, etc.,

With the data.world Collector v.1.5 we add support for Oracle.

In the data.world Collector.1.4 we add support for Google BigQuery.

the data.world Collector v.1.3 brings much new functionality including:

  • Support for Denodo and Snowflake

  • Compatibility of JDBC catalogs with tables imported through data.world integrations

  • Ability to differentiate source information for databases cataloged from localhost

  • Cataloging of REMARKS fields into dct:descriptio

With the data.world Collector v.1.2 we support Redshift databases.

the data.world Collector v.1.1 contains documentation clarification and expansion for the documents to streamline tags on customer docker hosts.

The initial release of the data.world Collector v.1.0 provides support for metadata catalog extraction for DB2, Hive, MySQL, Postgres.