Release notes for previous versions
Release version 2.216
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.216 |
Release date | June 26, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.215
Details about the release
Item | Details |
---|---|
Release version | 2.215 |
Release date | June 26, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI Gov Collector:
The collector now supports harvesting of all workspaces and apps using the --all-workspaces-and-apps parameter.
Added the ability to disable lineage harvesting using the --disable-expression-lineage parameter.
Release version 2.214
Important
This release was for internal improvements and has no customer impacting changes.
Details about the release
Item | Details |
---|---|
Release version | 2.214 |
Release date | June 25, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.213
Details about the release
Item | Details |
---|---|
Release version | 2.213 |
Release date | June 25, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Azure Data Factory collector: The collector now harvests Expressions for table names, schema names, file names.
A new collector for SQL Server Integration Services (SSIS) is now available in private preview. If you would like access to this collector, please contact your Customer Success Director.
Bug fixes
Power BI and Power BI Gov collectors: The collectors now correctly harvest lineage for column types.
Release version 2.212
Details about the release
Item | Details |
---|---|
Release version | 2.212 |
Release date | June 21, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes:
Snowflake collector: The collector now harvests metadata for functions and stored procedures from the snowflake.account_usage views when the metadata is unavailable from the information_schema of the database.
Power BI and Power BI gov collectors now catalog:
Dataset table expression
Description for the workspace, app, and dataset
Bug fixes:
ADF collector: Fixed an issue with datetime parse errors while harvesting triggers.
Release version 2.211
Details about the release
Item | Details |
---|---|
Release version | 2.211 |
Release date | June 15, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI gov collectors: The collectors now support lineage for Oracle database objects.
Bug fixes
Power BI and Power BI gov collectors: Resolved an issue with collecting child resources for apps when using service principal authentication.
Snowflake and Oracle collectors: Fixed an issue where the system now correctly does not harvest function lineage when users enable the Disable lineage collection (--disable-lineage-collection) option.
Oracle collector: Fixed an issue with harvesting database columns of LONG type.
Release version 2.210
Details about the release
Item | Details |
---|---|
Release version | 2.210 |
Release date | June 7, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors:
Added a new feature that provides support to parse SQL statements within table expressions, enabling column-level lineage harvesting. To use this feature, you need to use the --datasource-mapping-file to specify the credentials. These credentials allow the collector to link lineage to the database sources.
The collector now harvests measures.
Databricks collector: The collector now harvests table and column tags by schema.
Bug fixes
Snowflake collector was unable to harvest lineage if the SQL statement included a dash in the column aliases.
Snowflake, Teradata, Netezza collectors: Fixed an issue that occurred because of insufficient information while harvesting agent resources for functions and procedures.
SQL Server collector: Fixed an issue that occurred while parsing view queries where columns have dashes in their names.
Release version 2.209
Details about the release
Item | Details |
---|---|
Release version | 2.209 |
Release date | June 2, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector: The collector now harvests table and column lineage from system tables. To use this feature, you need to set new permissions for the collector.
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors: Resolved a problem concerning column statistics when an aggregate statistic has a zero value.
Tableau collector: Resolved an issue to correctly associate lineage with the appropriate parent project.
Sigma collector: Resolved an issue which occurred when a dataset referred to in the lineage was not available among the harvested datasets.
Snowflake collector: Fixed an issue associated with external URLs containing special characters.
Release version 2.208
Details about the release
Item | Details |
---|---|
Release version | 2.208 |
Release date | 24 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
Snowflake collector:
Resolved the issue that arose from Snowflake not returning function metadata.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Addressed the issue encountered during the harvesting of column statistics when the result set contained non-integer values.
Release version 2.207
Details about the release
Item | Details |
---|---|
Release version | 2.207 |
Release date | 21 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
BigQuery collector: The collector is updated to generate catalog records for BigQuery Label instances. This allows them to be visible on the resource pages in the application.
Sigma collector: Resolved an issue that could result in an exception when the Sigma APIs failed to return a table path for a connection.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Enhanced error log statements by adding fully qualified table names when certain tables or columns in the database cannot be located during lineage resolution.
Release version 2.206
Details about the release
Item | Details |
---|---|
Release version | 2.206 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Sigma collector:
A new --pagination-limit parameter is now available for the collector. You can use this parameter to set the page size for the Sigma API response. The maximum value you can set is 1000. If you do not specify a value, the default page size is 25.
The collectors is optimized to enhance the efficiency of lineage harvesting.
Snowflake collector: The collector now harvests extended metadata for tables, views, and materialized views.
Bug fixes
SQL Server collector: Incorporated additional debug logging for when the collector fails to harvest extended metadata.
Oracle collector:
The collector is now able to handle column names with single quotes in them.
Fixed an issue with synonyms being harvested in the wrong schema.
Release version 2.205
Details about the release
Item | Details |
---|---|
Release version | 2.205 |
Release date | 17 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Power BI and Power BI Gov collectors: The OBDC data sources YAML file (datasources.yml) is updated to allow user-specified aliases for the database location (host). This ensures that resources are accurately linked across collectors.
Snowflake collector: Added support for harvesting materialized views for SQL definition, External URL (Snowsight).
Bug fixes
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
The collectors are optimized to load JDBC drivers more efficiently, thereby reducing memory usage.
Release version 2.204
Details about the release
Item | Details |
---|---|
Release version | 2.204 |
Release date | 10 May, 2024 |
Docker image ID |
|
Jar file |
|
Bug fixes
SQL Server collector: The collector now correctly manages a scenario to use a consistent case when a collation is set.
dbt core and dbt cloud collectors: The collectors are optimized to correctly manage scenarios that previously caused an exception while harvesting lineage.
Sigma collector: The collector is optimized to manage scenarios that were previously causing the collector to not run properly.
Release version 2.203
Details about the release
Item | Details |
---|---|
Release version | 2.203 |
Release date | 8 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
dbt Core collector:
Now supports multiple run_results.json in single collector run. Add the new parameter --run-results-directory to your command/YAML file to use this new feature.
Now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
dbt cloud collector now comes with enhancements that optimize the harvesting of column-level lineage for dbt models.
Bugs
Sigma collector properly deserializes objects from Sigma API.
Power BI and Power BI gov collectors now properly obtains server name and port from Power BI data source parameters.
Release version 2.202
Details about the release
Item | Details |
---|---|
Release version | 2.202 |
Release date | 7 May, 2024 |
Docker image ID |
|
Jar file |
|
New features
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Optimized view query parsing to improve the processing time for large SQL statements.
Optimized the querying of metadata during view lineage harvesting.
Oracle Collector: Added a new --oracle-jdbc-timezone-as-region parameter. This allows you to decide if the Oracle JDBC connection timezone should utilize the JVM's default timezone.
Bug fixes
AWS Glue Collector: Improved the log message that are recorded when the harvesting of job lineage fail.
Release version 2.201
Details about the release
Item | Details |
---|---|
Release version | 2.201 |
Release date | 2 May, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle and SQL Server collectors: The collectors now catalog column-level lineage when functions and stored procedures contain sub-selects.
Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server collectors:
Performance optimizations are done to the collectors to improve the overall runtime of the collectors.
A new parameter --disable-extended-metadata is now available that allows you to skip harvesting of extended metadata for resource types such as database, schema, table, columns functions, stored procedures, user defined types, synonyms. Basic metadata for these resource types will still be harvested.
Power BI and Power BI gov collectors now catalog:
Relationships between Power BI apps and workspaces
Apps with associated workspace IDs (when service principal authentication is used)
Bug fixes
Teradata collector properly harvests lineage metadata from views with SQL statements containing REPLACE RECURSIVE VIEW, LOCK ROW ACCESS.
Oracle collector properly harvests lineage metadata from views with COLLECT.
All collectors properly handle config file options that start with option flags.
Release version 2.200
Details about the release
Item | Details |
---|---|
Release version | 2.200 |
Release date | 19 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
All collectors: Users now have the option to define a custom output file name for the collector catalog during run time. To do this, use the --output-name parameter. The system automatically adds .dwec.ttl to the end of the provided file name.
Note
If you are updating the file name for an already configured collector, make sure to check and modify any existing SPARQL queries that explicitly mention existing collector output files.
Oracle Collector: The collector now harvests Oracle package bodies and Oracle package specifications.
Bug fixes
SQL Server collector Fixed an error that occurred when harvesting column statistics.
Power BI and Power BI Gov collectors: Resolved an issue that was causing errors during the parsing of expressions that used the Table.RenameColumns Power Query table function in certain cases.
Snowflake Collector: The collector now properly harvest tags that are defined in a different schema than the schemas specified for the collector.
The following collectors are updated to harvest lineage accurately for group by, order by, where, and having SQL expressions. Prior to this update, the relationships were incorrectly directed.
Postgres, Databricks, Derby, Netezza, Oracle, Redshift, Snowflake, SQL Server, Teradata collectors
Release version 2.199
Details about the release
Item | Details |
---|---|
Release version | 2.199 |
Release date | 11 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
A new collector is now available for Amazon Managed Streaming for Kafka.
Oracle collector: The collector now harvests lineage from views, stored procedures, and functions.
Snowflake collector: The collector now harvests Streamlit apps.
The following collectors now support harvesting from multiple databases specified by users. This means you can provide the --database parameter multiple times while running the collector.
Databricks, PostgresSQL, SQL Server, Db2, Redshift, Denodo, Oracle, MySQL, Snowflake, Teradata
Bug fixes
Power BI and Power BI Gov collector: Resolved an issue that was caused by parsing expand column expressions.
dbt cloud collector: The collector now properly harvests metadata of dbt Cloud artifacts when the target database is not Snowflake. Note the collector will only harvest metadata from the dbt Cloud artifacts and not connect to any unsupported target database to obtain database lineage metadata.
Snowflake collector: The collector harvest policies associated with cataloged database objects, regardless of the database in which the policies reside.
Release version 2.198
Details about the release
Item | Details |
---|---|
Release version | 2.198 |
Release date | 9 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Oracle collector: The collector now harvest Synonyms.
Athena collector: Starting with release 2.198, data.world no longer package the Athena JDBC driver with the Athena collector. You can continue to use the releases previous to 2.198 as-is, but when you updated to the collector version to 2.198 or higher, you will have to download and mount the driver for the collector and update the collector command to include the driver path.
Release version 2.197
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.197 |
Release date | 5 April, 2024 |
Docker image ID |
|
Jar file |
|
Release version 2.196
Details about the release
Item | Details |
---|---|
Release version | 2.196 |
Release date | 2 April, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Log files for collectors: The collector log files for each collector run now have unique names. This allows logs to be written to separate files when running multiple collector instances.
Reltio collector: Survivorship groups and mappings are now recognized as primary entities with catalog records.
Snowflake collector: The collector now harvests tags associated with database objects in the user-specified database, regardless of the database in which the tag resides.
Bug fixes
Teradata collector: Fixed an issue that was blocking column harvesting due to invalid column references in Views.
Azure data Factory collector: Fixed an issue preventing successful file uploads to data.world.
Release version 2.195
Details about the release
Item | Details |
---|---|
Release version | 2.195 |
Release date | 25 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Databricks collector: The collector now harvests tags for Databases, Schemas, Tables, and Columns.
Bug fixes
Power BI Service and Power BI Gov collectors: The collectors now correctly harvest skipped data sources during metadata scans.
Azure Data Lake Storage Gen2 collector: The collector is updated to refresh API authorization requests per ADLS requirements to avoid session expiration.
Azure Data Factory collector: Fixed an issue to accommodate varying data returned from the Azure Data Factory API.
Release version 2.194
Details about the release
Item | Details |
---|---|
Release version | 2.194 |
Release date | 21 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
The Power BI Service and Power BI Gov collectors now support harvesting lineage from ODBC data source types. A new parameter --datasource-mapping-file can be used to provide the information required for harvesting lineage relationships when the data source uses an ODBC connection in Power BI.
Bug fixes
The Amazon S3 collector now continues to harvest objects in the bucket when a 403 error is encountered.
The Azure Data Lake Storage Gen2 collector properly handles the scenario involving special characters in the blob name.
The Azure Data Factory collector properly handles a scenario that causes the collector to stop due to the format of information returned from the Azure Data Factory APIs.
BigQuery Collector properly handles a scenario when a table is in a different database from the one being harvested.
Release version 2.193
Details about the release
Item | Details |
---|---|
Release version | 2.193 |
Release date | 15 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
The following two new collectors are now available in Private Preview. Please contact your Customer Success Director to get access to these collectors:
Bug fixes
The Azure Data factory collector is updated to correctly handle a situation that previously caused the collector to stop, due to the format of the information returned from the ADF APIs.
Release version 2.192
Details about the release
Item | Details |
---|---|
Release version | 2.192 |
Release date | 12 March, 2024 |
Docker image ID |
|
Jar file |
|
New features and changes
Amazon S3 collector: The collector now offers the options, --include-object and --exclude-object. These options allow you to select which objects should be included or excluded from the harvesting process.
Databricks collector: The collector now harvests Databricks tags for database, schema, table, view, and column as as key-value pairs. The collector also harvests tags for clusters and jobs, replacing the existing ClusterTag and JobTag resource types.
Release version 2.191
Details about the release
Item | Details |
---|---|
Release version | 2.191 |
Release date | 7 March, 2024 |
Docked image ID |
|
Jar file |
|
New features and changes
All collectors: The --dry-run option is now available for all collectors. This option allows you do a test run for the collectors to validate that the collector can authenticate to the specified source system. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.
Bug fixes
Teradata collector: The collector is updated to correctly parse view SQL syntax for extracting lineage metadata. It also now includes improved logging of any errors encountered during lineage harvesting.
BigQuery collector: The collector now properly handles fully qualified table names that include dashes (-).
Release version 2.190
Details about the release
Item | Details |
---|---|
Release version | 2.190 |
Release date | 5 March, 2024 |
Docked image ID |
|
Jar file |
|
New features and changes
Snowflake, Teradata and Netezza collectors: In the harvested metadata, the owner of resources are now correctly referenced as owner objects. Earlier they were referenced as string text.
Bug fixes
The Teradata collector now correctly manages variations in database cases within SQL statements while gathering lineage metadata.
Release version 2.189
Details about the release
Item | Details |
---|---|
Release version | 2.189 |
Release date | 24 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The Tableau collector now captures all sub-projects when you specify certain projects to catalog. Additionally, it enables users to exclude specific projects using the --tableau-exclude-project parameter. Any sub-projects under an excluded project are also automatically excluded.
Release version 2.188
Details about the release
Item | Details |
---|---|
Release version | 2.1288 |
Release date | 23 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The Information Schema Catalog Collector now collects descriptions from both tables and columns, if they are present in the source.
The Snowflake collector now harvests comments from Snowflake databases, schemas, and views (as resource description).
The Teradata collector has been enhanced to better parse view SQL definitions that use specific Teradata syntax elements, particularly when extracting lineage from views.
Bug fixes
BigQuery collector:
Fixed issues with handling identifiers with hyphens ( -).
Fixed issues with harvesting lineage when a view refers to columns in a separate database.
Release version 2.187
Details about the release
Item | Details |
---|---|
Release version | 2.187 |
Release date | 20 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
Netezza collector: A new and improved collector is now available for Netezza.
Oracle collector: The collector now harvest definitions for view, function and stored procedure.
Release version 2.186
Details about the release
Item | Details |
---|---|
Release version | 2.186 |
Release date | 14 February, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
The following collectors now harvest all databases in a single collector run when the --database parameter is not specified.
The collectors also support a new parameter --exclude-database to exclude specific databases from metadata collection:
Databricks
DB2
MySQL
Oracle
PostgreSQL
Redshift
SQL Server
Snowflake
Teradata
Bug fixes
Databricks collector: The collector properly handles malformed task responses.
Power BI collector: The collector properly handles harvesting lineage relationships from Power BI data sources when parameters are used in place of the Snowflake Warehouse value.
For the following collectors, the behavior of the --include-information-schema option is changed. Now, if you use this option in the command without the --all-schemas option, the system will generate a warning to alert you about the missing parameter.
Databricks
DB2
Oracle
PostgreSQL
Redshift
SQL Server
Snowflake
Release version 2.185
Details about the release
Item | Details |
---|---|
Release version | 2.185 |
Release date | 9 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Fixed an issue that was causing database collectors to run into error state.
Release version 2.184
Details about the release
Item | Details |
---|---|
Release version | 2.184 |
Release date | 7 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Azure Data Lake Storage Gen2 collector: Fixed an issue that previously prevented the collector from running successfully on machines using amd64 processor.
Microsoft SQL Server collector now properly harvests views from Azure Synapse Analytics.
Release version 2.183
Details about the release
Item | Details |
---|---|
Release version | 2.183 |
Release date | 1 February, 2024 |
Docker image ID |
|
JAR file |
|
Bug fixes
Tableau collector: The collector is updated to properly harvest usage data in newer versions of Tableau Server.
Azure Data Lake Storage Gen2 Collector: Fixed an authentication issue in the collector that resulted in failures to initialize a channel.
Snowflake collector: The collector now properly harvests lineage between function and source table if the source table is in the cataloged schema.
Release version 2.182
Details about the release
Item | Details |
---|---|
Release version | 2.182 |
Release date | 30 January, 2024 |
Docker image ID |
|
JAR file |
|
New features and changes
All collectors: In addition to being available as Docker Images, collectors are now also accessible as JAR files. Follow these instructions to run collectors using JAR files.
The following collectors now harvest all versions of overloaded function and stored procedure resources, each as its own resource:
Db2
MS SQL Server
Netezza
Oracle
PostgreSQL
Redshift
Snowflake
Teradata
Bug fixes
Teradata and MySQL collectors: The following schema options have been removed for these collectors: --all-schemas, --include-information-schema, and --schema.
Release version 2.181
Details about the release
Item | Details |
---|---|
Release version | 2.181 |
Release date | 22 January, 2024 |
Docker image ID |
|
New feature and changes:
The Snowflake collector now harvests Data Metric Functions, their associations to tables and observed metrics.
Release version 2.180
Details about the release
Item | Details |
---|---|
Release version | 2.180 |
Release date | 17 January, 2024 |
Docker image ID |
|
New features and changes
Snowflake collector harvests allowed tag values from Snowflake.
Bug fixes
Oracle collector properly harvests Column descriptions from Oracle Data Dictionary tables.
Release version 2.179
Details about the release
Item | Details |
---|---|
Release version | 2.179 |
Release date | 10 January, 2024 |
Docker image ID |
|
New features and changes
The latest tag for docker images has been removed and is not available for use going forward.
What does this change mean for users using the latest tag?
If you were using the latest tag, you can continue to use the image with the latest tag. However, we recommend all users update their docker run command to use an explicit version.
If you make a change to your local docker environment (such as removing the latest image), then your collector run will not work. You will need to update the run command to use a specific version. You can open a support ticket for assistance on updating the command.
Athena, Snowflake, SQL Server, DB2 collectors now harvest basic metadata for materialized views (name, description if available).
The Postgres collector now collector harvests materialized view with name, description, and view SQL definition (DDL) and column-level lineage.
Bug fixes
All collectors: Environment variables referenced in collector config (YAML) files can now have values containing backslashes and dollar signs.
Release version 2.178
Details about the release
Item | Details |
---|---|
Release version | 2.178 |
Release date | 5 January, 2024 |
Docker image ID |
|
New features and changes
The Snowflake collector now harvests the External URL for Snowsight for tables and views.
The dbt Cloud collector now includes --dbt-cloud-host option to enable interaction with dbt static access URLs.
Bug fixes
Databricks collector: Addressed an issue related to correctly forming IRIs for tables under certain circumstances. This was previously causing duplicate tables and databases to be cataloged and non-existent tables to be referenced by columns.
The Tableau collector now properly handles a scenario when the Tableau instance has no databases defined.
Release version 2.177
Details about the release
Item | Details |
---|---|
Release version | 2.177 |
Release date | 22 December, 2023 |
Docker image ID |
|
Bug fixes
dbt Core and dbt Cloud collectors now catalog the dbt product version.
Tableau collector properly handles columns with missing names.
Monte Carlo collector:
The collector now correctly associates views with incidents, rectifying previous issues caused by missing details for certain incident types and subtypes.
The collector has improved log messages when relating tables to incidents.
Release version 2.176
Details about the release
Item | Details |
---|---|
Release version | 2.176 |
Release date | 20 December, 2023 |
Docker image ID |
|
Bug fixes
Teradata collector: Information was missing while harvesting funtions from Teradata.
dbt Cloud and dbt Core collectors: Information was missing while harvesting test results from dbt Cloud and dbt Core.
Release version 2.175
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.175 |
Release date | 19 December, 2023 |
Release version 2.174
Details about the release
Item | Details |
---|---|
Release version | 2.174 |
Release date | 18 December, 2023 |
Docker image ID |
|
New features and changes
The following two new collectors are now available:
Important
These collectors are available in Private Preview. If you would like access to the collectors, please contact your Customer Success Director.
The following collectors now harvest Schema resources from the source:
Databricks, PostgresSQL, SQL Server, Db2, Redshift, Generic JDBC Collector, Denodo, Dremio, Infor ION, Oracle, Salesforce, SQL Anywhere, Athena, MySQL, Snowflake, Teradata, Presto, Vertica
dbt Cloud and dbt Core collectors now harvest following additional metadata: test results (failed, warning, success), last test run timestamp, test name, test arguments and type of dbt test.
Bug fixes
Teradata collector: Information was missing while harvesting triggers from Teradata.
Release version 2.173
Details about the release
Item | Details |
---|---|
Release version | 2.173 |
Release date | 12 December, 2023 |
Docker image ID |
|
New features and changes
dbt cloud and dbt core collectors now harvests metadata for Columns defined within Models and Sources
The Power BI collector now automatically filter out workspaces named My workspace or PersonalWorkspace <User> when the --all-workspaces-and-apps parameter is used. However, if you wish to include these workspaces in the catalog, you can use the --include-user-workspace option.
Release version 2.172
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.172 |
Release date | 12 December, 2023 |
Release version 2.171
Details about this release
Item | Details |
---|---|
Release version | 2.171 |
Release date | 6 December, 2023 |
Docker image ID |
|
New features and changes
Monte Carlo collector: The Monte Carlo collector is enhanced to automatically retry harvesting from Monte Carlo in case of API failure.
Bug fixes
All collectors: If errors occur while running the collectors using the YAML file, the collectors will now return a not successful exit status.
Release version 2.170
Details about this release
Item | Details |
---|---|
Release version | 2.170 |
Release date | 30 November, 2023 |
Docker image ID |
|
New features and changes
The Monte Carlo collector supports harvesting resources from Monte Carlo when the monitored target database is Databricks.
Release version 2.169
Details about this release
Item | Details |
---|---|
Release version | 2.169 |
Release date | 27 November, 2023 |
Docker image ID |
|
New features and changes
Monte Carlo collector supports new warehouse types in the latest Monte Carlo GraphQL APIs.
dbt cloud and dbt core collectors support versions dbt 1.5.0, 1.6.0, 1.7.0.
Log files for collectors: All collectors now compress log files prior to upload.
Bug fixes
Power BI, Sigma, and Thoughtspot collectors: OAuth tokens for the Power BI, Sigma, and Thoughtspot collectors are properly refreshed when they expire.
Release version 2.168
Details about this release
Item | Details |
---|---|
Release version | 2.168 |
Release date | 17 November, 2023 |
Docker image ID |
|
New features and changes
A new collector is now available for Teradata.
Bug fixes
The Databricks collector stops abruptly due to lack of permission on the referenced resources.
Release version 2.167
Details about this release
Item | Details |
---|---|
Release version | 2.167 |
Release date | 10 November, 2023 |
Docker image ID |
|
New features and changes
Databricks collector: Additional lineage and object metadata are now harvested for database objects via Unity Catalog.
Log messages improvements: Improved logs and guidance is available for situations where SSL certificate problems occur during metadata harvesting from sources secured with self-signed certificates.
Bug fixes
Monte Carlo collector: Database Views are now accurately differentiated from Database Tables.
Release version 2.166
Details about this release
Item | Details |
---|---|
Release version | 2.166 |
Release date | 2 November, 2023 |
Docker image ID |
|
New features and changes
The Manta collector now supports Manta versions R41 and R42.
Release version 2.165
Details about this release
Item | Details |
---|---|
Release version | 2.165 |
Release date | 25 October, 2023 |
Docker image ID |
|
New features and changes
Manta collector has improved log messages.
Release version 2.164
Details about the release
Item | Details |
---|---|
Release version | 2.164 |
Release date | 24 October, 2023 |
Docker image ID |
|
New features and changes
Amazon S3, AWS Athena, and AWS Glue collectors now support authentication using AWS config files that reference the credentials on an Amazon EC2 instance profile. For details see the AWS documentation.
Release version 2.163
Details about the release
Item | Details |
---|---|
Release version | 2.163 |
Release date | 20 October, 2023 |
Docker image ID |
|
New features and changes
Amazon S3 collector: When the collector is run in dry run mode (with the --dry-run parameter), it now also checks the credentials and lists the buckets found.
Monte Carlo Collector:
The incident log messages for the collector now include UUIDs.
Domain filtering logic is changed to address an API change when fetching monitors, tables, incidents from Monte Carlo.
Bug fixes
All collectors:
The collector output files generated after running the collectors now include the collector version.
Collectors now return a not successful exit status if there are errors when uploading the catalog output files to data.world.
Release version 2.162
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.162 |
Release date | 19 October, 2023 |
Docker image ID |
|
Release version 2.161
Details about the release
Item | Details |
---|---|
Release version | 2.161 |
Release date | 5 October, 2023 |
Docker image ID |
|
Bug fixes
Oracle collector: The collector gracefully handles exceptions during column harvesting.
Release version 2.160
Details about the release
Item | Details |
---|---|
Release version | 2.160 |
Release date | 4 October, 2023 |
Docker image ID |
|
New features and changes
Log messages improvements: Collector log messages now include both date and time.
Bug fixes
Tableau Collector now harvests column identifiers for column resources.
Confluent Platform Collector now properly handles --include-topic and --exclude-topic options.
Monte Carlo, ThoughtSpot, and InfluxDB Collectors now produce only one catalog record for each resource.
Release version 2.159
Details about the release
Item | Details |
---|---|
Release version | 2.159 |
Release date | 26 September, 2023 |
Docker image ID |
|
New features and changes
Confluent Cloud collector now support topic filtering using regular expressions or exact topic names using two new parameters: --include-topic and --exclude-topic.
Confluent Platform collector
The collector now supports Confluent Platform 6.1.0 and above.
The collector now support topic filtering using regular expressions or exact topic names using two new parameters: --include-topic and --exclude-topic.
Bug fixes
Databricks collector properly harvests column statistics from columns containing dash characters.
Release version 2.158
Details about the release
Item | Details |
---|---|
Release version | 2.158 |
Release date | 18 September, 2023 |
Docker image ID |
|
New features and changes
Amazon S3 collector now harvests resources up to and including the maximum count specified by the user or the 10,000 default limit.
dbt Cloud collector handles scenarios where dbt Cloud runs contain more than the current dbt Cloud limit of 1000 artifacts.
Bug fixes
Databricks collector properly harvests column statistics from tables with no columns.
MS SQL Server collector properly handles case sensitivity of SQL keywords when parsing lineage.
Release version 2.157
Details about the release
Item | Details |
---|---|
Release version | 2.157 |
Release date | 14 September, 2023 |
Docker image ID |
|
New features and changes
The following two new collectors are now available:
Power BI and power BI Gov Collectors: The following parameters for these two collectors now support regular expressions: --exclude-workspace --include-workspace
The following JDBC collectors now test for connection status before executing queries. If a connection is closed by the database, the collector detects this condition and re-opens the connection:
Databricks, DB2, MySQL, PostgreSQL, Redshift, Snowflake, MS SQL Server, Azure Synapse Analytics collectors
Bug fixes
Databricks collector now properly handles non-alphanumeric characters in object names.
Release version 2.156
Details about the release
Item | Details |
---|---|
Release version | 2.156 |
Release date | 7 September, 2023 |
Docker image ID |
|
New features and changes
Amazon S3 collector now allows users to filter by bucket with --include-bucket and --exclude-bucket options.
Kafka - Confluent Platform Collector now supports SASL/SCRAM-SHA-512 authentication. If you want to use this authentication, use the --kafka-cluster-sasl-type parameter while running the collector.
New parameters for API retries when API calls fail: The following collectors now support two new parameters ( --api-max-retries and --api-retry-delay) which allow users to specify the maximum number of times the collector will try to reconnect when the API call to the data source fails.
Bug fixes
dbt Cloud Collector now properly selects the job that the user specified while running the collector.
Release version 2.155
Details about the release
Item | Details |
---|---|
Release version | 2.155 |
Release date | 29 August, 2023 |
Docker image ID |
|
Bug fixes
The AWS Glue collector properly handles AWS Glue Catalog instances with more than 100 databases.
Release version 2.154
Details about the release
Item | Details |
---|---|
Release version | 2.154 |
Release date | 27 August, 2023 |
Docker image ID |
|
New features and changes
AWS Glue collector has improved logging to help with troubleshooting of access and permissions issues.
Release version 2.153
Details about the release
Item | Details |
---|---|
Release version | 2.153 |
Release date | 25 August, 2023 |
Docker image ID |
|
New features and changes
DBT Cloud collector: The following enhacements are made to the collector.
The collector now harvests information about dbt Cloud resources associated with the artifacts from which metadata is harvested.
The collector now supports two new parameters (--dbt-cloud-environment and --dbt-cloud-job) to allow users to filter runs by environment and job.
Release version 2.152
Details about the release
Item | Details |
---|---|
Release version | 2.152 |
Release date | 22 August, 2023 |
Docker image ID |
|
New features and changes
Snowflake and SQL Server Collectors now harvest column-level lineage from Stored Procedures.
Logging improvements:
Improved log messages for instances when the service account/user account used by the collector does not have access to upload to a dataset in data.world.
Debug logs messages now log current memory and stack size.
Release version 2.151
Details about the release
Item | Details |
---|---|
Release version | 2.151 |
Release date | 15 August, 2023 |
Docker image ID |
|
New features and changes
A new collector is now available for SQL Server Reporting Services.
Databricks collector: Update the collector to retry after a pause when the Databricks API responds with too many requests.
All database collectors: Optimized the database collectors to reuse database connections where possible.
Release version 2.150
Details about the release
Item | Details |
---|---|
Release version | 2.150 |
Release date | 10 August, 2023 |
Docker image ID |
|
Bug fixes
All collectors: Fixed an issue that prevented the user from passing command-line options containing spaces, when running the collectors using the docker container.
Release version 2.149
Details about the release
Item | Details |
---|---|
Release version | 2.149 |
Release date | 9 August, 2023 |
Docker image ID |
|
New features and changes
Databricks Collector The collector now allows users to use Personal Access Token without specifying username/password for authentication.
Power BI Gov collector: The --include-user-workspace parameter is removed from the collector CLI options.
Release version 2.148
Details about the release
Item | Details |
---|---|
Release version | 2.148 |
Release date | 27 July, 2023 |
Docker image ID |
|
New features and changes
Databricks Collector has a new option --workflow-exclude to exclude harvesting of jobs/workflows.
Power BI and Power BI Gov Collectors now support parameter values in Power BI expressions.
Bug fixes
Tableau collector properly handles duplicate data sources when multiple filtered projects are specified.
Release version 2.147
Details about the release
Item | Details |
---|---|
Release version | 2.147 |
Release date | 24 July, 2023 |
Docker image ID |
|
New features and changes
Power BI and Power BI Gov collectors: The collectors have a new option --max-parseable-expression-length, which sets the maximum number of characters in a PowerBI expression that will be parsed for lineage metadata.
Bug fixes
Power BI and ThoughtSpot collectors now refresh expired authentication tokens.
The SQL Server collector now properly handles missing SQL definition when harvesting stored procedures.
Release version 2.146
Details about the release
Item | Details |
---|---|
Release version | 2.146 |
Release date | 19 July, 2023 |
Docker image ID |
|
New features and changes
Tableau collector: The collector now harvests lineage relationships between embedded data sources and published data sources to reflect any such relationship that exists in Tableau.
Bug fixes
Power BI collector: Improvements made to the collector to avoid hitting Power BI Admin API rate limits that prevented successful collection for certain large Power BI organizations.
Marquez collector: API authentication token [--marquez-api-key] is now a required parameter for the collector.
Fivetran collector: Fivetran API key (--fivetran-apikey) and Fivetran secret (--fivetran-apisecret) options are now required parameters for the collector.
Release version 2.145
Details about the release
Item | Details |
---|---|
Release version | 2.145 |
Release date | 14 July, 2023 |
Docker image ID |
|
New features and changes
The following two new collectors are now available:
Release version 2.144
Details about the release
Item | Details |
---|---|
Release version | 2.144 |
Release date | 11 July, 2023 |
Docker image ID |
|
Bug fixes
Databricks collector: The collector now properly handles missing information returned by the Databricks APIs.
dbt Core and dbt Cloud Collectors: The collectors now use the description property for resources in a dbt manifest file to populate the description of associated catalog resources.
Power BI collector: The collector now properly handles unexpected source formats.
Release version 2.143
Details about the release
Item | Details |
---|---|
Release version | 2.143 |
Release date | 7 July, 2023 |
Docker image ID |
|
New features and changes
DB2 collector: The collector now support harvesting of column statistics and function and stored procedure information. For details about using the new parameters (--target-sample-size, --sample-string-values, --enable-column-statistics) for these features, see the DB2 collector documentation.
Redshift collector: The collector now properly distinguishes between user-defined functions and stored procedures when harvesting function and stored procedure metadata in the collector.
Tableau collector: Improved error messages and handling of missing Salesforce connection information within the Tableau collector.
Bug fixes
Databricks collector: Fixed defects in the collector to accommodate invalid number formats and missing information returned by Databricks APIs in some cases.
Release version 2.142
Details about the release
Item | Details |
---|---|
Release version | 2.142 |
Release date | 23 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Power BI collector: A new parameter --all-workspaces-and-apps is available for the Power BI collector which allows users to catalog all available data from the tenant using the admin API.
Bug fixes
Databricks collector: Fixed an issue where the collector was terminating abnormally when it encountered a notebook that had no language specified for it.
Microsoft SQL Server collector: Fixed an issue with parsing the SQL for certain Views in Microsoft SQL Server that prevented harvesting of lineage.
Release version 2.141
Details about the release
Item | Details |
---|---|
Release version | 2.141 |
Release date | 20 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Thoughtspot collector now harvests:
Column-level lineage between JDBC source tables columns and ThoughtSpot logical columns.
Column-level lineage between ThoughtSpot logical columns and Answers and Liveboards that connect to the data.
Databricks collector now harvests additional metadata for Databricks tables.
The Redshift, SQL Server, and PostgreSQL collectors now harvest:
Functions
Stored procedures
Power BI, Looker, and Thoughtspot collectors: The resources cataloged by these collectors will now automatically include a link to the resource in the source system. This allows users to go from data.world to the associated URL for the same resource in the source system so the users do not have to manually find that resource in the source system.
Release version 2.140
Details about the release
Item | Details |
---|---|
Release version | 2.140 |
Release date | 13 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
dbt cloud collector allows the user to pass in a Snowflake role and Snowflake warehouse to override values found in the dbt cloud project configuration.
Bug fixes
dbt core and dbt cloud collectors properly handle source meta config values that are objects rather than strings in the generated dbt manifest file.
SQL Server Collector properly disables lineage collection when the --disable-lineage-collection parameter is set.
Databricks collector includes additional checks for existence of and access to Unity Catalog.
Release version 2.139
Details about the release
Item | Details |
---|---|
Release version | 2.139 |
Release date | 7 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
SQL Server collector now harvests created date and modified date for tables and schemas, and harvests table size in bytes.
Snowflake collector now harvests table size in bytes.
Power BI collector allows setting the --azure-tenantid option when using username and password authentication.
All collectors now support the ability to set the JVM stack size using the DWCC_JVM_OPTIONS parameter.
Bug fixes
SQL Server collector properly handles harvesting of View SQL containing character length that is more than the SQL Server column default character length (6000).
dbt Cloud collector Rather than reporting an error, the collector now skips job runs that do not have generated documentation artifacts.
Tableau collector properly catalogs all sites if no site is specified in the CLI/YAML.
Release version 2.138
Details about the release
Item | Details |
---|---|
Release version | 2.138 |
Release date | 2 June, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Monte Carlo collector: The collector now:
Catalogs additional metadata for incidents, monitors, and tables.
Uses a smaller default GraphQL page size.
Bug fixes
BigQuery collector:
Properly handles issue where table IDs are returned as null.
Properly handles issue with table IDs that have quotes in them.
Monte Carlo collector: Properly handles external URLs for tables that contain spaces.
dbt Core and dbt Cloud collectors: Properly harvests all meta config containing object values.
Profiling: Properly handles histogram values containing excessive range, overflow, or underflow values.
Release version 2.137
Details about the release
Item | Details |
---|---|
Release version | 2.137 |
Release date | 23 May, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
The new Azure Data Lake Storage Gen2 collector is now available.
Snowflake collector: The collector now harvests:
Snowflake Stored Procedures and Functions
Lineage between functions and database objects
Databricks collector and Power BI collector properly identify database object for Power BI connections to Databricks database objects.
Databricks collector: The collector now harvests:
Jobs, Tasks, and Clusters
Function
Column-level lineage for Hive metastore and Unity Catalog
Lineage between Tasks and the Notebooks referenced in Tasks
Lineage between upstream and downstream table with intermediate Job
Primary and foreign keys for Tables
Column statistics
Bug fixes
dbt and dbt Cloud collectors now properly harvests meta config containing object values.
Release version 2.136
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.136 |
Release date | 22 May, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Release version 2.134
Details about the release
Item | Details |
---|---|
Release version | 2.134 |
Release date | 9 May, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Bug fixes
Snowflake collector: Fixed an issue in the collector when calculating sample size for harvesting column-statistics.
Release version 2.133
Details about the release
Item | Details |
---|---|
Release version | 2.133 |
Release date | 26 April, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Monte Carlo collector
Reduced the number of API calls to improve the performance of the collector runs for large Monte Carlo instances.
A new parameter --montecarlo-incident-lookback-days is now available to harvest incidents from a specific number of days from collector run.
A new parameter –montecarlo-domain is now available to harvest resources from specified domain names.
For details about these new parameters, please see the Monte Carlo collector documentation.
Bug fixes
Fixed a NullPointerException issue which occurred while harvesting column-statistics for columns with large string.
Release version 2.132
Details about the release
Item | Details |
---|---|
Release version | 2.132 |
Release date | 18 April, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
The new ThoughtSpot collector is now available.
Grafana collector: The collector now produces catalog outputs containing hashed namespace. This allows the resources with spaces to be properly harvested.
Monte Carlo collector: The collector now has improved logging messages.
Release version 2.131
Details about the release
Item | Details |
---|---|
Release version | 2.131 |
Release date | 17 April, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Power BI collector: Performance improvements made to the Power BI collector. The collector now utilizes less memory when parsing expressions to harvest Lineage relationships.
Snowflake collector: Removed JDBC URL parsing warning messages from the collector log file. These warnings were caused due to Snowflake JDBC driver.
Collector logs: The logs now include the Operating System information on which the collector (jar) is running.
Release version 2.130
Details about the release
Item | Details |
---|---|
Release version | 2.130 |
Release date | 7 April, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) | arm64: 2403a222d94cdcc1a1fbc19c921de986aa74aaf7e9ea7b729ee146d207489012 amd64: 67b534504174f86ce0a611b85e6aebd665e4a19b02008b117d37401c59ab9f4b |
Bug fixes
Power BI collector: The collector now properly handles escape characters in the directory paths of SharePoint files.
BigQuery collector: The release includes an update to how catalog resource IRIs are generated to ensure proper lineage relationships to other systems such as Tableau.
Profiling:
Profiling properly generates column histograms for string data types.
Profiling properly supports decimal values that are stored in scientific notation format
Release version 2.129
Details about the release
Item | Details |
---|---|
Release version | 2.129 |
Release date | 28 March, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Bug fixes
Microsoft SQL Server collector:
Fixed an issue that prevented profiling from working with the collector.
Addressed an issue that prevented parsing of some Microsoft SQL Server views to harvest lineage.
Monte Carlo collector: Added a page size option to the collector, which helps if a customer runs into timeouts with the current default of 5000. Set the optional --montecarlo-graphql-page-size parameter to use this option.
Tableau collector: Made an adjustment to a query in Tableau so that a warning message which previously printed Column null with id... will now show the column name rather than null.
Release version 2.128
Details about the release
Important
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.128 |
Release date | 22 March, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Release version 2.127
Details about the release
Item | Details |
---|---|
Release version | 2.127 |
Release date | 21 March, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
dbt cloud collector is now available. Detailed documentation about the collector is available here.
Release version 2.126
Details about the release
Item | Details |
---|---|
Release version | 2.126 |
Release date | 17 March, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) | arm64: 44d78be268b948226cad5fc41310202e34e30c5313f026380a00554e135ddb27 amd64: f9033e2060fea8f22beb599296800a4b4494592878e0048dd2de59bf7a308321 |
New features and changes
Snowflake collector: supports profiling for columns with values stored in scientific notations.
Bug fixes
Power BI collector: Fixed an issue with tabular files to properly handle invalid paths or http paths.
Release version 2.125
Details about the release
Item | Details |
---|---|
Release version | 2.125 |
Release date | 9 March, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) | arm64: 37c33410f1e3162b11e0885600f860d3fe9a41790faeed62cf791b4289797703 amd64 : 64e38233b4c47fac90e2d68eafa948a4c46fbf7f3504968e28244857298b2a46 |
Bug fixes
Fivetran collector: Updated destination identifiers to match the case for currently supported database types. Specifically, this resolves the duplicate Snowflake resource pages issue.
Snowflake collector: Fixed an issue that was causing duplicate snowflake tag-value pairs.
Tableau collector: Updated project filtering to ensure collector harvests calculated fields which are referenced in a sheet but were not created in the sheet.
Release version 2.124
Details about the release
Item | Details |
---|---|
Release version | 2.124 |
Release date | 21 February, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) | arm64: b756d2f91373067746af00d951e117fabcd930d65df2dcb27706ee05689f495c amd64: 165071142006ba509759d1e5d7fa49a57e9b09ff9d1ce665bf41a6683685d27b |
New features and changes
Amazon S3 Collector: The new Amazon S3 collector harvests metadata about buckets and objects, including the Region, Version State, Size, Last Modified Data, ACL Owner, Grantee and Grant Permission, amongst others. See all the details about this collector in this documentation.
BigQuery collector enhancements:
You can now harvest column-level lineage between views and tables, as well as more metadata about datasets, projects, tables, and views.
The collector now provide an option to do a test run to validate that the collector can authenticate to the specified source system. This is done by adding the
--dry-run
parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.
Bug fixes
Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: When parsing view definition SQL to harvest column-level lineage, the collectors now correctly parse SQL in which tables are fully qualified in the FROM clause but not in the SELECT clause.
Power BI collector: The Power BI collector has changed the URL used as the dwec:externalUrl property from Power BI's embedUrl to Power BI's webUrl, which now allows the user to open the Report, Dashboard, or Dataset in a browser. Additionally, the collector now harvests the embedUrl from Power BI as a separate property, kos:embedUrl.
Snowflake Collector: The collector now handles scenario when Snowflake JDBC driver does not provide valid default values for certain database columns.
Release version 2.123
Details about this release
Item | Details |
---|---|
Release version | 2.123 |
Release date | 13 February, 2023 |
Docker image ID |
|
New features and changes
Postgres, Snowflake, Redshift, Microsoft SQL Server collectors: Enhancements have been made to parsing of view definition SQL to harvest column-level lineage. We now support joins on named subqueries and correctly handle quoted identifiers.
Bug fixes
Snowflake collector: The sampling queries used to calculate the column statistics were failing.
Release version 2.122
Details about this release
Item | Details |
---|---|
Release version | 2.122 |
Release date | 10 February, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Manta collector:
Collector now supports Manta version r38.1
The collector now also supports token-based authentication.
JDBC collectors: The description of the --jdbc-property for JDBC collectors is updated for clarity.
The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the --dry-run parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.
Tableau collector
YAML configuration files used to configure collectors can now interpolate system environment variables and Java system properties. For details about using this feature, see this documentation.
Release version 2.121
Details about this release
Item | Details |
---|---|
Release version | 2.121 |
Release date | 2 February, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Tableau collector: Improved detection of underlying database type when a Tableau data source uses ODBC.
Bug fixes
dbt collector: Fixed an issue in the dbt collector that caused coining of IRIs that were inconsistent with IRIs coined by the Snowflake collector, which prevented the linking of database objects between dbt and Snowflake in the catalog. The application now ensures consistency of database object IRIs created by the dbt and Snowflake collectors.
Release version 2.120
Details about this release
Item | Details |
---|---|
Release version | 2.120 |
Release date | 27 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
BigQuery collector: The collector now harvests catalog resources representing BigQuery datasets and their associated metadata.
dbt collector:
The collector now supports key pair authentication to Snowflake allowing users to use private-public key pair for authenticating to Snowflake.
The collector now has improved detection of target database type information when that information is missing in the profiles.yml file.
Users can now use the new --snowflake-account CLI parameter to override snowflake account information from the command line.
The help text for --snowflake-role, --snowflake-warehouse, and snowflake-application parameters now include examples and case-sensitivity information.
Snowflake collector:
The collector now supports key pair authentication allowing users to use a private-public key pair for authenticating to Snowflake.
Enhancements made to parsing of Snowflake SQL dialect when harvesting column-level lineage allows for parsing of statements with copy grants.
Tableau collector The help text now includes examples for the --tableau-project and --tableau-exclude parameters.
Power BI collector The help text now includes examples for the --include-workspace and --exclude-workspace parameters.
Release version 2.119
Details about this release
Item | Details |
---|---|
Release version | 2.119 |
Release date | 20 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) | AMD64: 5a8f9e24ebe05dc027caf74075cac4ce51667271da30935640fc3c9471578445 ARM64: af0b4528e0ee097d29d286c29c803db185f616babdb2a867b6228e77efaf1cd5 |
New features and changes
A new Further Help section is added to the help available for collectors that is accessed using the -H or --help parameters in the command. It now guides users to the collectors help available on the data.world documentation site.
The collectors now emit a globally unique IRI to track collector runs.
Bug fixes
Snowflake collector: Column statistics now supports Number data type.
Release version 2.118
Details about this release
Item | Details |
---|---|
Release version | 2.118 |
Release date | 18 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Bug fixes
Snowflake collector: Column statistics now supports columns with spaces in names.
Tableau collector: The Tableau collector released from version 2.113-2.117 had an issue because of which it was not able to parse GraphQL queries. If you are using collectors between version 2.113-2.117, you must upgrade to 2.118 to be able to use the Tableau collector successfully.
Release version 2.117
Details about this release
Note
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release version | 2.117 |
Release date | 12 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
Release version 2.116
Details about this release
Item | Details |
---|---|
Release version | 2.116 |
Release date | 10 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Power BI Collector: The following alternate options are added for some of the command line parameters:
For
--include-user-workspace
alternate parameter--user-workspace-include
For
--include-workspace
alternate parameter--workspace-include
For
--exclude-workspace
alternate parameter--workspace-exclude
BigQuery collector:
The collector now harvests additional metadata from projects, datasets, views, and tables available in BigQuery.
Column-level lineage added between Views and Tables.
Bug fixes
Snowflake collector: Fixed issue for parsing a SQL statement that contained copy grants in Views. This helps improve the column-level lineage harvested by the collector.
Release version 2.115
Details about this release
Item | Details |
---|---|
Release version | 2.115 |
Release date | 10 January, 2023 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
The documentation of the
jdbc-property
property for database collectors is improved to explain how users can specify multiple properties. This change applies to 19 collectors that include this parameter.A new resource
dwec:Source
is added to the catalog emitted from database collectors. It is a mechanism that allows users to render specified resource properties as read-only in the data.world catalog UI.Power BI collector: The collector now has enhanced parsing of power BI transformation expressions. As a result of this change more column-level lineage information is harvested from Power BI.
Snowflake collector: The collector now harvests table usage counts information.
dbt collector: User-defined database attributes are now enabled for the dbt collector to fully mitigate missing or incomplete profiles YAML file when cataloging database objects referenced by dbt.
Release version 2.114
Details about this release
Item | Details |
---|---|
Release version | 2.114 |
Release date | 22 December, 2022 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New collectors
The Grafana collector is now available as a private beta release for select customers. Please contact data.world if you are interested in using this collector.
New features and changes
The following additional collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the
--dry-run
parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.Power BI
Catalog graphs (.ttl files) that are automatically uploaded to the data.world platform with
-u
/--upload
are now compressed, enabling larger graphs to be uploaded.Power BI collector: The collector now provides a new option
--disable-expression-lineage
to skip parsing lineage from the source expressions.Snowflake collector: The collector has a new ability to harvest table usage and query count. This functionality is enabled by passing
--table-usage-collection
. It calculates, for each table in the database being harvested, the percentage of tables in the database that have been queried no fewer times than the subject table. The time period over which this analysis is performed is controlled with option--table-usage-lookback-days
(that is, the number of days prior to the time when the collector is being run during which queries of each table are tallied), which defaults to a value of 7.
Bug fixes
Snowflake collector: Fixed an issue with SQL parsing in Snowflake for windowed aggregate functions.
Power BI collector: Fixed an issue with the Power BI expression parsing related to joins in source expressions.
Looker collector: Fixed an issue in the Looker collector that caused an abnormal termination of the collector run with certain Looker views.
Release version 2.113
Details about this release
Item | Details |
---|---|
Release version | 2.113 |
Release date | 12 December, 2022 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New Features and changes
The following collectors now provide an option to do a test run for the collectors to validate that the collector can authenticate to the specified source system. This is done by adding the
--dry-run
parameter while running the collector. If specified, the collector does not actually harvest any metadata, but just checks the connection parameters provided by the user and reports success or failure at connecting.Databricks
Db2
Denodo
Dremio
Generic JDBC
Hive
Infor Ion
Mysql
Oracle
Presto
Salesforce
SQl Anywhere
Vertica
Power BI collector: Updated the Power BI collector to harvest metadata for Dataflows.
Databricks collector: Updated the Databricks collector driver version to 2.6.32. Drivers available here
dbt collector: Updated the dbt collector to harvest metas (as key value pairs) for dbt resources.
Release version 2.112
Details about this release
Item | Details |
---|---|
Release version | 6 December, 2022 |
Release date | 2.112 |
Docker image ID (use this to verify the integrity of the Docker image.) |
|
New features and changes
Added the following options for Snowflake, Redshift, PostgreSql and MS SQL:
--dry-run
: If specified, the collector does not actually harvest any metadata, but just checks the database connection parameters provided by the user and reports success or failure at connecting.--enable-column-statistics
: to enable harvesting of column statistics (i.e., data profiling)--sample-string-values
: to enable harvesting of sample values and histograms for columns containing string data--target-sample-size
: to control the number of rows sampled for computation of column statistics and string-value histograms
Release version 2.111
Details about this release
Item | Details |
---|---|
Release date | 29 November, 2022 |
Release number | 2.111 |
Docker image ID (use this to verify the integrity of the Docker image.)
|
|
New features and changes
data.world now produce images for the arm64 architecture (in addition to amd64). The addition of arm64 means that dwcc images run seamlessly on M1 mac. As a result of this change, from this release onward two hashes are available per release.
Release version 2.110
Details about this release
Note
This release was for internal improvements and has no customer impacting changes.
Item | Details |
---|---|
Release date | 12 November, 2022 |
Release version | 2.110 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64: 98583ecda023782df1e08a0f2347a536e239186dcca3936d16c67ae1f6aad0f6 |
Release version 2.109
Details about this release
Item | Details |
---|---|
Release date | 10 November, 2022 |
Release version | 2.109 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64: 6602f313506e5eb3ea74c296994f7e4d7bd56845c6f2b35e6d1d4cde5f402832 |
New features and changes
Snowflake collector: Snowflake policy fully-qualified names are being written to the title instead of to the description property.
Release version 2.108
Details about this release
Item | Details |
---|---|
Release date | 8 November, 2022 |
Release version | 2.108 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64: b81a221abff982a356a21c4430f80da1e4459f0c3ead2d4f4f51a8f1d45c5604 |
New features and changes
The
--post-process-sparql
cli option is now available for all other collectors (This feature was previously made available for some collectors in release version 2.107). This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem and/or uploaded to the data.world API.BigQuery collector: The option to use the credential file for BigQuery no longer allows use of
-c
. It must be specified with--credentialFile
.
Release version 2.107
Details about this release
Item | Details |
---|---|
Release date | 3 November, 2022 |
Release version | 2.107 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64: 4a33db022e92488914d1f088d4041a31d76883706ae13a88bf1a0e8aa67eaa51 |
New features and changes
Added
--post-process-sparql
cli option to some collectors. This option allows the user to pass in a SPARQL query to post-process the catalog graph created by the collector prior to it being written to the filesystem or uploaded to the data.world API.
Bug fixes
Fixed an issue where MS SQL Server database objects referenced from Power BI did not always link to those objects harvested by the SQL Server collector due to mismatched IRIs.
Release version 2.106
Details about this release
Item | Details |
---|---|
Release date | 30 October 2022 |
Release version | 2.106 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64 0ac22da04737fbcaaac0da9d076eaf92e3cdd870c85544dd16a556f54a8900a8 |
New features and changes
Google BigQuery collector: The collector is updated to coin IRIs for database objects that align with IRIs coined by other collectors.
dbt collector The collector now writes catalog records for each Snowflake tag and policy.
Bug fixes
Fixed a defect in which boolean properties that appeared in the
global_options
section of a dwcc configuration file were not properly recognized.
Release version 2.105
Details about this release
Item | Details |
---|---|
Release date | 28 October, 2022 |
Release version | 2.105 |
Docker image ID (use this to verify the integrity of the Docker image.) | amd64: 427443cadbc21a3e26f095a4c054f6193bf3c2b96d257cdb22b57abd061bad68 |
New features and changes
Power BI collector: The collector now supports the ability to include specific workspaces for cataloging via the parameter
--include-workspace
. The collector continues to allow exclusion of specific workspaces with--exclude-workspace
. Use of--include-workspace
takes precedence.dbt collector:
The collector now supports harvesting of dbt projects/artifacts that specify Snowflake as the target database.
The collector now correctly coins database object (e.g., database, schema, table, column) IRIs that align with IRIs coined by the JDBC collectors. Previously, if the case used for identifiers in dbt artifacts did not match the target database’s default collation, the IRIs would not align (they do now).
All other versions
10-25-22
Collector v2.104
hash: 0b01f8c379e52f3167577a6fd1e5ad2f8d2f3d73871797ae3859b79f83bf5c29
Updated the Monte Carlo collector to add a
--bigquery-credentials-file option
, in order to standardize the option since the dbt collector has a--bigquery-credentials-file
option (note the--big-query-credentialFile
still exists in Monte Carlo, this is a new alias for the same option).The Snowflake collector now harvests Tags, Masking Policies, and Row Access Policies, and associate these resources with the database objects to which they apply. There are new CLI options in order to include these:
--tag-collection
and--policy-collection
.
10-14-22 Collector v2.103
hash: 0f4f021c4c8fc17c7f47618ef9942e255327eef0d4c749e453a06e1d0e96760b
Updated SQL Server collector to harvest intra-database lineage from views.
Updated the log messages for missing files not required for DBT collector to run.
Added table name to warning messages in Tableau collector in addition to the table ID.
Added parent-child relationship between projects in Tableau collector when the parent project is not included in the filtered projects.
Added pagination for certain queries in Tableau to prevent the result hitting the max node limit.
Updated automatic catalog upload functionality to accommodate large catalog graphs.
9-30-22 Collector v2.102
hash: 4dd8a1bdc776f0e8eb352954298842867c7873224d658f2abd8faefe31c40a76
Updated the Tableau collector to accommodate changes in the Tableau metadata api that were preventing detection of lineage relationships between Tableau fields and underlying database objects.
Updated the AWS Glue collector to handle an error with jobs that have a space or other invalid characters in their paths.
Updated the Databricks collector to include the
UserAgentEntry
property in the jdbc connection.The collectors will now emit the Collector version to the logs.
Added a fix to SQL parsing for window aggregate functions (e.g.
SUM(X) OVER (PARTITION by Y ORDER BY Z..)
)
9-19-22 Collector v2.101
hash: 2be46a6268e34acceedd5b80412787d10f732ad1a2f1ceb83c6d5ce2fe819457
Added a filtering feature to filter Tableau fields by project in Tableau collector.
Fixed an intermittent authentication issue associated with harvesting metadata from a single site with Tableau collector.
Added a log message for missing job script in AWS Glue collector.
Enhanced harvesting of column-level lineage from database views, including handling SQL
SELECT
statements missing aFROM
clause, and updated list of Snowflake keywords passed to functions.
9-9-22 Collector v2.100
hash: 946a0c51c091e74d6043dea1450a1ac818546b040e702e91526da185297a2858
Fixed the Fivetran collector so that it doesn't produce "blank" nodes (no id or name)
Added a change to use log_level rather than log-level.
9-2-22 Collector v2.99
hash: 68e4c4d6a6b40cb91a8e574a1f106c9c20ba2f1156a93f5f871b0284e975a766
Fixed an issue in Power BI with the new metadata API calls.
9-1-22 Collector v2.98
hash: 228090b0af31681952b7ccd5abef9beaf070450692501fef527bff8ca32280cb
Added harvesting of column-level lineage in the dbt collector, for dbt projects that target one of the collectors for which intra-database lineage is supported (i.e., Snowflake, Redshift, and PostgreSQL).
8-29-22 Collector v2.97
hash: 0a2134ef29a057b0c003ab353c14f034e65702b298a450f01fafcea5b6e8c1ea
PostgreSQL can now be cataloged using either
catalog-postgres
orcatalog-postgresql
for the command.Microsoft SQL Server collector now harvests SQL Server extended properties for databases, schemas, tables, and columns.
Column-level lineage harvesting in the Snowflake, Redshift, and PostgreSQL collectors now properly harvests lineage from views whose sql statements include comments starting with “--“, and also statements with inline subselects.
The dbt collector harvests process (activity) and model (agent) metadata using PROV-O qualified derivations.
8-17-22 Collector v2.96 (no 2.95 release)
hash: a09e365296d57965385563569c6c58a6f706da1a4c1c6d711141aabd316d8629
Tableau collector now supports multiple --tableau-project options, allowing the user to include multiple projects in the same collector run.
Tableau collector no longer associates Custom SQL Table resources as part of a database.
The Collector no longer includes a bundled jdbc driver for Salesforce. Please contact data.world support team for assistance in obtaining an appropriate JDBC driver.
8-15-22 Collector v2.94
hash: ac21b2f728b79e3dff38c2a395a81c0b0b1558979b8385747c6d00b76e1d6724
Enhancements to the Tableau Collector for project filtering and additional logging
Postgress additional triple for Table to Database linking
8-2-22 Collector v2.93
hash: afdecd160fd38e3db565cb14db3805fed05fa86b5c3a70662d0c8f0b0d10799f
Includes some internal dependency updates.
Enhancements to the DBT collector to validate the input
profile.yaml
file
7-29-22 Collector v2.92
hash: 9b87934376246cd3926bfe413d36f2a7a0f2e7d848d7f2e68380d6035fe276f6
Enhancement to the tableau collector to add a retry if a graphql query fails
Enhancement to the collectors to add a check at the beginning of a collector run to ensure the output directory exists (if the o/-output option is used), which will log an error and stop if the output directory doesn't exist.
7-21-22 Collector v2.91
hash: 708b34b19b2695d14d5a74a8281d5365cb659c4eeeebb791b3ebf2aa2e4d6686
Enhancement to the Tableau collector to reauthenticate if the Tableau API reports failed authorization during the collector run.
Released the new Fivetran collector (
catalog-fivetran
)
7-12-22 Collector v2.90
hash: bf94b0431b5a99dd95f485a8a48f202ea138f103b54873b4440b5080d86d529a
Added the parameter
--include-information-schema
for Snowflake and SQL Server collectors; we no longer catalog the information schema in these collectors when--all-schemas
is specified, unless the user also specifies--include-information-schema
.Improved handling of manifest json structures with some nulls in the dbt collector.
Added reporting on user access issues during parsing/resolution for Snowflake collector.
7-9-22 Collector v2.89
hash: 848e38708b832c703652dc45d148e471a2341bce7f6ec159c2471f287a8d3620
Updated tableau collector to print a clear log message when authentication expires during a collector run
Update tableau collector to allow optimized serialization of API requests under JDK 17
7-1-22 Collectorv2.88
hash: 8534190cb3f0f93bd2a326abd54086e89eb38ece8180bf0487486dc66242d6c8
Significant updates to the Power BI collector. As of this release Power BI Collector outputs different classes than the version before it. The collector now emits information about where it is sourcing it's data.
Internal developer and testing improvements
The MANTA collector is now more specific about the concepts that it emits about Informatica PC.
6-24-22 Collectorv2.87
hash: 7860e33213ba90783851cd7f7e6529ee99a5f261ae086d3a7038938c6f290ae6
The information schema collector now explicitly supports Oracle.
We have added enhancements to the dbt collector to harvest DBT snapshots and sources.
6-22-22 Collectorv2.86
hash: 6fdae2dd70896e402ca648701bcd48210a8fd5979230c958b0dc06030bd7b1ec
For collectors that take API endpoint URLs, the data.world Collector will add a trailing slash to the URL if needed and not specified by the user
New command-line option
--warehouse
available for the Snowflake collector that allows the user to specify which warehouse to use to connect to snowflake.
6-18-22 Collectorv2.85
hash: aaa6e55bf19af7ef37f1ab80ad28522af77a6ff286ef616085d92ab51f7d7899
Added a the data.world Collector collector for dbt - legacy collector still available.
Fixed an issue with auto-uploaded log files, in which not all log messages were being written out.
6-15-22 Collectorv2.84
hash: 2e128cd3c89ffc8c35fbad12f6ee4ba7e6e5cdf9bfcf991fac78e8033d5d17d0
Looker collector now emits resources for Looker Views and relates the Looker Dimensions and Looker Measures that are configured within those Views.
Improved handling of unexpected database types encountered when cataloging Tableau.
6-3-22 Collectorv2.83
hash: 9487027423a076231cec76f5679f044493a3d75032882c4ca0e5cf1c0304e6cf
Further improvements in handling of SQL ORDER BY, GROUP BY, WHERE, and HAVING clauses when harvesting intra-database lineage from database views.
6-2-22 Collector2.82
hash: 22459e3d3a2a38f448d4e56137ed4ecd05170767b5a682dd4870135cceff23c2
Corrected coining of IRIs in catalog graphs emitted by the Tableau collector.
Improved logging in the Tableau collector to detect unknown linked database types.
Improved harvesting of lineage between database views and referenced columns, including support for columns in SQL ORDER BY, HAVING, GROUP BY, and WHERE clauses, and parsing of a wider range of column expressions.
the data.world Collectorv2.81 - INTERNAL RELEASE
5-24-22 the data.world Collectorv2.80
Hash: be5a85c754d54328accabe332dec55ce507baddbe68d2fe9e29a211e9ea1420f
With this release, the data.world Collector now requires Java 17. If you run the collector from within Docker this change will not affect you. If you run the data.world Collector from a .jar file, you will need to upgrade your JRE to 17 to run DWCv2.80 and greater.
Add the parameter
--disable lineage-collection
to enable users to turn off cataloging lineage for PostgreSQL, Redshift, MS SQL Server, and Snowflake
5-13-22 the data.world Collectorv2.79
Hash: 5b548c82b96ad5e5dbd4770adff205c9d07cac3c5f949882d7d9381240366ddb
The Manta collector can now accept OAuth tokens for MANTA authentication (for harvesting metadata from manta version R35 and above)
We have released a new collector powerbigov that only allows tenantid for auth and not user/password and connects to the government powerbi api urls.
5-11-22 the data.world Collectorv2.78
Hash: 71edd8ff7a4c3ed8a91eaf36d59c8e2745b7a76f8666b5750cbee8205021c9c6
Added some small Tableau collector enhancements.
New PowerBiGov collector with specific endpoints for .gov customers. This collector does not accept a username or password.
For PowerBI, a new way to authenticate is available. A user can now enter a tenant ID with a client id and a client secret to authenticate, in addition to using a username and password.
For both PowerBi and PowerBiGov, when using the tenantid, secret and client id authentication method, this collector no longer emits information about PowerBI Apps.
4-27-22 the data.world Collectorv2.77
hash: 4bed848791cfa9e46c9db4a78c7a593bb1c986900dc6fcfcd4255ddce1528579
Fixes an issue with the Snowflake collector that prevented the bundled jdbc driver from being found. Any users working with the data.world Collector 2.76 should update.
4-22-22 the data.world Collectorv2.76
hash: 30e60a4434ee64d2981b40eb2dc92506da3d367eab22bc0bca0c61bdd44a3f02
The Snowflake collector harvests some intra-database lineage information from database views.
Improved the host mapping in the Manta collector.
4-7-22 the data.world Collectorv2.75
hash: 1a59dbb3ff8679fb6ee22eadaeb04ccdb28c5660be029e78fbc96403ae33096f
the Manta collector now emits resources for file sources and targets and their directory structure. It also emits sources and targets as files.
4-1-22 the data.world Collectorv2.74
hash: 219428f6a72be91205408d5cb3f8cc8b27e1a9a4df0208e4cacb8fbaa1352f90
The Tableau collector now emits “column-level lineage”:
Improved styling of the data.world Collector command-line errors
Updated command-line options for Datakin and Marquez.
3-16-22 dbt collector v.05
This version adds a third command-line argument to specify an output file name.
3-8-22 the data.world Collectorv2.73
hash: 119daf987dcfad25db599e1c1affedf17a35ff2aa002d0618d642eb309cebaaf
Permalinks to Looker explores included via externalUrl
Improvements to datakin/marquez collectors
Tableau collector now emits resources for Tableau Projects, allowing us to establish full relationships between projects and the workbooks and views that they contain
Monte Carlo data collector now emits data quality information using enhanced dwec ontology concepts
Looker collector now emits descriptions for measures and dimensions
MANTA collector now emits Snowflake resources found in MANTA scans
3-1-22 the data.world Collectorv2.72
Hash: 62d156aca58ec92513e8d6490f00fd10ee52dfb7a65f71c20c6a988c938dfddd
[BUGFIX] Invalid prefix when using --base option
Update the data.world Collector transform to add catalog events to specific collectors
Added a Snowflake Sensitive Data Discovery collector
Sync CLI options between collector types
Validation of CLI options for the data.world Collector
Improvements to the the data.world Collector CLI
Update the MonteCarlo Collector to use the new Data Quality Ontology
2-17-22 the data.world Collectorv2.71
Digest: 03fc3df90ae63896d62ea22e00688f42cacf5b76d0f47691c06c104736680b2a
Bug fix for Marquez collector
Bug fix for Manta collector
2-9-22 the data.world Collectorv2.70
Digest: 06bb747c4d7705c1e44664de7854158d87468316bab549ec5604b0a075380c69
Preview images for Tableau assets are now harvested much more efficiently, and the resulting image data in the catalog graph are much smaller, reducing catalog harvest run time and enabling image objects to remain within platform constraints during ingest.
Fix for unexpected column type errors in BigQuery collector
2-8-22 the data.world Collectorv2.69
Digest: 5ab9b97d5f8f4568613438a9e52b0bdc12974f8d6edd0dab374a281c4982c737
Created new collectors for Marquez and Datakin
Added schema information to the Tableau collector outputs
2-4-22 the data.world Collectorv2.68
Digest: 23674ee02a6b725d5f9a453615dc507286da2ee606dca83c386472f3aa36d118
The Tableau collector now accepts Tableau “Personal Access Tokens” for authentication, via new cli options
--tableau-pat-name
and--tableau-pat-secret
.Fixed an issue with mis-identification of views as tables in BigQuery.
2-2-22 the data.world Collectorv2.67
Digest: 032867c9c52c8d46dc0b90a61a128be65ecec1440bb0adccb8b0d1b249b4e351
Fixed an issue with server name identification in Manta.
1-26-22 the data.world Collectorv2.66
Digest: fa9ae2eb3d68375a3ff01ac7bde98fd36f372b84dce0d411444146ea9566b47b
With this release the Athena collector is no longer a JDBC collector--we harvest metadata by accessing the Athena API directly, rather than going through a JDBC driver. This means that it is no longer necessary to provide a JDBC driver when running the collector.
1-10-22 the data.world Collectorv2.65
Digest: ed08cdd21a374c30456de0989076f5180bc4187ca998358b051807e521fd44e6
This release adds a new option for the MANTA collector,
--manta-max-parallel-scenarios
. Specifying this option and passing an integer value will configure the MANTA API to export the specified number of scenarios in the MANTA graph in parallel. The default value is 4; adjusting this up or down can improve performance.
1-5-22 DWCv2.64
Digest: 45b72798b0602885790388331a75db1f4286b15bf57b21f30f416eda79041571
This release upgrades the data.world Collector's dependency on the Apache Jena RDF library to version 4.2.0, which addresses security vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-39239.
12-23-21 the data.world Collectorv2.63
Digest: sha256:eb4208c914269c793a5e2143d59a9982e7b087c5da1c17dd075e02a326e64a3e
The Athena JDBC driver is no longer bundled with the data.world Collector as we have discovered that the Athena driver itself has a dependency on a vulnerable version of log4j. Customers that use the the data.world Collector Athena collector will now need to supply their own driver and put it in the jdbc driver directory (as is done with other collectors for which we don’t distribute a driver).
12-15-21 the data.world Collectorv2.62
Digest: sha256:2cd579e09f4eee94e141e8cf7e4e40e9a9b8803029df1be7112d67d62ef33b9e
The Oracle collector now supports connecting to the database via SID (instance ID) or Service Name. Service Name is the default. If a connection via SID is desired, pass the SID as the value of the -d/ --database option and add the --oracle-sid-mode option (flag).
12-13-21 the data.world Collectorv2.61
Digest: sha256:bd0ba96208d714ecef4131867cf5d16372be0a33f416c1d6bd01f132c8517323
The information schema collector has been modified so that the files table_constraints.csv and constraint_column_usage are now optional, not required.
12-10-21 the data.world Collectorv2.60
Digest: sha256:7fd825bfe7d2f99c9a1298ad26bc1934c9657cc7c5868dd093844344d18fc7b7
Updated the BigQuery collector to support current Google Cloud API enhancements.
Added a new Information Schema Collector. This collector runs via the {{catalog-information-schema}} command and is notably cataloging four CSV files that are provided to the collector via a {{--csv-file-directory}} parameter rather than connecting to a database. This collector is an option for customers with tricky DB setups that do not allow them to authenticate or establish connections to their DB via our normal the data.world Collector collectors.
12-2-21 the data.world Collectorv2.59
Digest: sha256:051f76748be1c6cf2c7557600dde71a39e1b822c9e49120881ce938f1c8c2b80
Verified the Manta collector works with MANTA R34.
Released the config file command.
Modified the Tableau collector to remove schema and database names from table names.
Updated the BigQuery collector to support cataloging all datasets in a project at once by default, and to be able to use cli options to select specific datasets in a project as well. With this last change, the
--dataset
param is no longer required. The help text has been updated with new messaging to reflect these changes.
11-10-21 the data.world Collectorv2.58
Digest: sha256:82ebc1cec46f70de000aa94695359bd28d65c2782afc362c9ce14fadc04eae07
Added a new collector for Hive (as an alternative to catalog-hive) that uses only the Hive metastore--it does not connect to the Hive server directly.
The PowerBI collector now harvests workspaces and identifies other assets as being in workspaces
the data.world Collector now emits “catalog events” into the catalog graph. These capture details about the cataloging process itself, including selected configuration options with which the Collector was run, and summary statistics about the catalog. The ingest process will soon extract this information from catalogs at ingest time and send them to segment for downstream analysis.
11-1-21 the data.world Collectorv2.57
Digest: sha256:606f7cfbe60bf56b4c2ecd5fb3902d4de621e31ae76ad78e68c56c788f81e5e6
Fixed an issue in the Tableau collector in which Custom SQL Table objects without an associated database were not handled correctly.
10-27-21 the data.world Collectorv2.56
Digest: sha256:335f7e110a9506d95dff05971492e6509fb8537e74f9275d04dcf9e2427df0f0
Added new cli options to salesforce collector so that it can handle sandbox environments and custom login domains customers might have.
10-25-21 the data.world Collectorv2.55
Digest: sha256:c60ae69edc88b8801be833d578ef5dca73b6302646be9b30d31ccdfd7444288a
This release updates the BigQuery collector to handle fields in BigQuery tables for which the BigQuery API returns null type.
10-5-21 the data.world Collectorv2.53
Digest: sha256:59c960d525e66e77d08dd34fd58c9b5027334a4bd2271f1f059370ae006a4b0b
Enhancements to the MANTA collector to harvest additional lineage information from MANTA scans (lineage from Informatica PowerCenter in particular)
Tableau collector enhancement to provide a better warning to the user when an obsolete version of the Tableau API is specified
9-29-21 the data.world Collectorv2.52
Digest: sha256:915e4e91841001f80a84a65fcd76350b9a1d53f4e31678bb0e628d32beab94a1
Fixed an issue with the handling of certain fields and database information when the Tableau collector was run with a non-admin credential.
9-28-21 the data.world Collectorv2.51 (internal)
Digest: sha256:261c5bf33b2ae38cbda35a346fcb37c56bbf8ebfb773f328deb9140efba1c8bf
Fixedan issue with the Tableau collector issue to handle views/workbooks that exist outside of a project.
9-28-21 the data.world Collectorv2.50 (internal)
Digest: sha256:b407c629247f36afac3869eb8320464fce8caeb2865dd79811882b54ef94d1b5
Fixed an issue with the Tableau collector to handle workbooks that exist outside of projects.
9-24-21 the data.world Collectorv2.49
Digest: sha256:397e78867f41aaa393ff69f42b0fa524fdcad662ddd027925cf27f80497b24ce
Added a collector for Salesforce (catalog-slesforce)
Fixed a IRI mismatch issue for Tableau Collector when running on Tableau instances with a Snowflake datasource.
9-18-21 the data.world Collectorv2.48
Digest: sha256:c36755489b6235408aa4e639e6e184cab027a32a34e3b8ca369c3c6b3c4bff96
Made internal improvements to the tableau collector to enable more efficient querying of the Tableau metadata api.
Fixed an issue in the manta collector in which certain missing data in the MANTA lineage graph caused an exception
9-10-21 the data.world Collectorv2.47
Digest: sha256:219edfa247929e15d7c4e2be99ef890b2487c398abc1a23b2f85b3de11812be3
Fixed an issue in the Reltio collector that occured when a Reltio configuration was missing certain objects.
Added a collector for Databricks (catalog-databricks)
9-8-21 the data.world Collectorv2.46
Digest: sha256:e48cba45b457e076714d94d3a83d1164cb892864213732b3b2b334c041ff178a
Fixed an issue with creation of resource IRIs by certain collectors when the user chooses version 1 minting
Updated BigQuery collector to enable integration with data.world platform / connection manager
Fixed an issue with the MANTA collector in which certain large MANTA scans caused a numeric overflow during json de-serialization
Updated Reltio collector to include information about survivorship groups in the emitted catalog
8-24-21 the data.world Collectorv2.45
Digest: sha256:77f4c784b1d0166cf3bb87903696528f712fbe6aee1d4cb7e60097a0f494c7de
This release fixed an issue with JDBC drivers not being loaded by the Athena collector.
Added a collector for Reltio configurations (catalog-reltio).
the data.world Collectorv2.44
Digest: sha256:47c1bb38b88c25801adf1f765e23c63637d15a60ae11fca8d63b53a8cd4755b2
Fixes an issue with URLs for sheets and dashboards that exist in Tableau Online or in Tableau Server within a site other than the default site.
the data.world Collectorv2.43
Digest:
sha256:696deaad59d2948a6adf3c275a90539cbf87057c93de9ee94d911fe105c574ce
Additional datetime fields added for Looker objects and typed as xsd:dateTime.
Fixed an issue caused by an undocumented change in Tableau Online’s REST API when using the Tableau collector to harvest metadata from Tableau Online.
the data.world Collectorv2.42
Digest: sha256:e6bc353ea4b2ec3486b54d4e9280856d328d93f5d406e367c0c50303cde93704
The generic jdbc collector harvests database name when cataloging Intersystems Cache databases
Running the Snowflake collector with the -A / --all-schemas option harvests metadata from all available schemas, as with other collectors
the data.world Collectorv2.41
Digest: sha256:bb79aa8afd19bf35b4b7e75840c21598702ec1d74b5f8640cc72a6758a3a0bc9
Fixed an issue with permalinks to objects in the MANTA collector .
the data.world Collectorv2.40
DIGEST: sha256:44dd710a49a1500863f49e2f2e4ef261a45cdc6c7354702fe8e764210c27293b
Added support for Looker folders and additional attributes to the Looker metadata collector.
Added the ability to preview images to the Tableau metadata collector.
the data.world Collectorv2.39
Digest: sha256:992671530f7483bfeb8a2aab52880a524b7df79caf427b373bd825115d71f4dc
Fixed an issue with the handling of certain special characters in catalog resource IRIs.
The --schema option for JDBC collectors can now be specified multiple times to enable the cataloging of multiple schemas in a single catalog.
the data.world Collectorv2.38
Internal release
the data.world Collectorv2.37
Digest: sha256:6a84217fa33df75d67ce51c486a90a802a8313a3432835abb55fffb5f1d3afc7
Updated Tableau collector to paginate additional graphql queries to avoid hitting Tableau Metadata API limits.
Updated the Hive2 collector to capture table-level metadata from the hive metastore
Updated the Tableau collector to allow the user to exclude specified Tableau objects from the catalog
the data.world Collectorv2.36
Digest: sha256:8dd9793f3b0e74adcd7e7bc153f06b8c3098470217fb07af4336dde611269671
Improvements to error messages produced when using a config-file to run the data.world Collector
We disallow running catalog-postgres and catalog-redshift in the same config file as the two collectors use incompatible JDBC drivers
Improved error handling throughout the data.world Collector
Improvements in representation of Tableau data source names in tableau catalogs
Improvements to the MANTA collector
the data.world Collector v2.35 Changes in this release:
Upgrade of Denodo collector to Denodo 8
Handle edge case of very large field values embedded in manta’s exported artifacts
Support for sites
Handle edge case of stored procedure columns in manta
the data.world Collector v2.34 This release includes:
Enhancements to domo collector output
Testing improvements
A minor tableau collector enhancement
Fix for an issue in the tableau collector in which column fields were sometimes not properly identifying the Tableau Table from which they sourced their data
Improvment to the presentation of domo catalogs in the platform UI.
Changes to the dockerhub repository where we house images containing non-released versions of the data.world Collector. Previously we were calling these “beta” releases; we now call them “release candidates”. The new repository is datadotworld/dwcc-rc and the image tags are x.y-rc-z where x.y is the next expected Collector release, and z is an increment.
the data.world Collector v2.33 Adds support for harvesting intra-database lineage from manta scans, and accommodates changes in MANTA R32 (aka 1.32). We no longer support MANTA versions earlier than MANTA R32.
the data.world Collector v2.32 This release adds in collector support for Vertica db.
the data.world Collector v2.31 Issued fix to ensure alignment of identifiers for databases referenced by Tableau and Looker collectors.
the data.world Collector v2.30 Installed a config file-driven configuration (as a hidden feature for now). Issued a fix for handling empty powerbi objects returned by the API
the data.world Collector v2.29 The data.world catalog collector now supports Tableau Online! Additionally there was a bugfix for PowerBi.
the data.world Collector v2.28 Bugfix release
the data.world Collector v 2.27 Added the optional CLI option tableau-graphql-page-size
to the Tableau collector which allows the user to set a number of objects to be included in each page of paginated queries.
the data.world Collector v2.26 Updated the PowerBi collector so that if a report is unavailable via the API it will be logged, and cataloging will continue on the rest of the repository.
the data.world Collector v2.25 This release includes better and more user-friendly error handling and reporting. We have also added an enhanced collection of Tableau metadata via the Tableau Metadata API (graphql endpoint). New metadata includes data sources, databases, fields, metrics, and many more inter-object relationships.
the data.world Collector v2.24 the data.world Collector is now distributed via Dockerhub Additionally there are changes to the Tableau and PowerBI collectors, and the ability to change the level of error messages written to the console and log file, and a new subcommand to display the the data.world Collector license text.
For Tableau:
The Tableau collector now emits RDF in which the object of `dct:creator` is a `dwec:Agent` instead of a string literal. This means we write additional details about the Tableau account that created the dashboard, via properties of the `dwec:Agent` resource. These details include: account name, account “full name”, and account email address (if they are populated in Tableau).
For PowerBI:
The PowerBI collector writes resources representing powerbi “data sources” that are now of a PowerBI-specific class, rather than `dwec:DataArtifact`.
Logging changes:
It is now possible for users to set the level (severity) of log messages written to the console and log file. By default, we write “info” level messages; users can choose to write only errors (level=“ERROR”), errors+warnings (level=“WARN”), or all messages including debug trace (level=“DEBUG”). This is useful if we want to have customers run the data.world Collector with debug logging turned on, for troubleshooting problems etc.
Display the data.world Collector license information:
License information for the data.world Collector is now available as a subcommand of the data.world Collector. To get all licensing information, run the command
docker run -it --rm datadotworld/dwcc:X.XX display-license
where X.XX is a version of the data.world Collector greater than or equal to 2.24.
the data.world Collector v2.23 Internal release
the data.world Collector v2.22 Internal release
the data.world Collector v2.21 fixed some timeout issues with Looker collector when fetching images from the Looker API. Fixed an issue with cataloging reports and dashboards based on user workspace permissions in PowerBi.
the data.world Collector v2.20 With this release our Tableau collector now supports cataloging of workbooks and non-dashboard views as well as harvesting tags on workbooks and views. FIxed an issue in the Looker collector where preview images returned from looker api were missing.
the data.world Collector v2.19 Includes a clean-up of the embedded help commands for several collectors and:
Fixes an issue with the Tableau Server collector when cataloging multi-site server instances.
Adds
--tableau-site
parameter to enable user to restrict cataloging to a single site (not required, by default all sites in the instance are scanned). Value provided to--tableau-site
can be a site ID or name.
the data.world Collector v2.18 The tableau collector now has a flag option --tableau-skip-images which skips the harvesting of preview images for views. Usage is like this:
... catalog-tableau --tableau-api-base-url=http://ec2-44-192-86-11.compute-1.amazonaws.com/api/3.10/ --tableau-username=admin --tableau-password=password -a sc-test3 -n tableau-test --tableau-skip-images
the data.world Collector v2.17 Adds a collector for Presto
the data.world Collector v2.16 This release:
Adds the parameter
--all-databases
to the Athena collector so that it can catalog all the databases accessible from the logged-in account.Fixes some issues with datatypes for
dwec:externalUrl
predicates.
the data.world Collector v2.15 This release contains the following:
The Tableau collector formerly had a CLI parameter
--tableau-project-id
which could be used to catalog only assets in the project with the specified ID. The parameter is now--tableau-project
and takes either a project ID or project nameUpdate to the MANTA collector to accommodate a minor change in the MANTA API with v 1.31. Customers who have updated their MANTA instance to v 1.31+ will want to use the data.world Collector 2.15+.
The Looker collector now works for non-admin Looker users; however, when the data.world Collector is run by a non-admin, the emitted catalog will not contain any information about databases used by Looker analysis assets (access to database information in Looker requires admin permissions).
All JDBC collectors now populate two new properties for
dwec:DatabaseColumn
:dwec:columnDefaultValue
anddwec:columnIsNullable
, which contain the default value for that column in newly inserted rows, and whether the column can be null, respectively. (Note that only some databases/drivers provide this metadata…we put it in the catalog if it’s there).
the data.world Collector v2.14 Adds a collector for Looker. Minor update to the docker-save.sh script that includes available versions in the error message if you don’t supply a version.
the data.world Collector v2.13 Adds cli params with this version so it now possible to pass arbitrary driver properties through to the connection
the data.world Collector v2.12 Adds collector for SAP (formerly Sybase) SQL Anywhere metadata collector
the data.world Collector v2.11 Improves the Dremio collector’s handling of data sources nested within multiple layers of folders, and fixed a minor issue with the Dremio collector’s harvesting of lineage metadata from the Dremio graph API.
the data.world Collector v2.10 Adds a collector for Domo and JDBC database collectors can now catalog all schemas in the database at once (default remains to catalog only user's default schema).
the data.world Collector v2.9 Adds Tableau Server collector and extended the OpenAPI collector to include a few additional schema property metadata properties.
the data.world Collector v2.8 Adds Infor ION data lake collector. Optimized collection of JDBC metadata (performance improvement).
the data.world Collector v2.7 Adds a collector for PowerBI.
the data.world Collector v2.6 Adds the Manta collector.
the data.world Collector v2.5 Upgrads Java runtime.
the data.world Collector v2.4 Extends handling of OpenAPI collector parameters and responses.
the data.world Collector v2.3 Adds support for OpenAPI (fka Swagger) collector.
the data.world Collector v2.2 A refactoring release.
the data.world Collector v2.1 Fixes an issue with the Denodo cataloger jdbc url port.
the data.world Collector v2.0 We now use v2 URIs as the official locator IDs for metadata resources. This is a breaking change (for structural, intentional reasons) which is not backwards compatible with v1 URIs. For more information see the article on the data.world Collector v2.X.
the data.world Collector v 1.20 Addresses some memory issues and open-cursor leaks.
the data.world Collector v.1.19 Adds writing statements to the catalog graph indicating that the catalog was the data.world Collector by the data.world Collector (with a version). We also added the ability to write database schema objects to the catalog graph.
the data.world Collector v1.18, Allows you to specify alternate organization permissions and upload locations when performing an automatic upload of the metadata.
the data.world Collector v.1.16 and the data.world Collector v.1.17 Address issues with the SQL Server cataloger.
the data.world Collector v.1.15 Adds Dremio support with optional Catalog API lineage fetching.
the data.world Collector v1.14, Enables you to change the amount of memory that gets allocated to a the data.world Collector docker process. See our article on allocating additional memory to Docker for more information.
the data.world Collector v.1.13 Adds support for Microsoft SQL Server, and we enable JVM to use available memory in the container (useful for creating large catalogs). Additionnally we Improve data type recognition in AWS Glue cataloger.
As of the data.world Collector v1.12 we can support not only Glue ETL jobs, but also Glue Data Catalog tables and columns.
With the data.world Collector v.1.11 you can:
Upload generated catalogs via the --upload / -U command-line parameters
Upload the the data.world Collector log when uploading generated catalogs with --upload
Fetch an organization's current catalog with the fetch-catalog command
In the data.world Collector v1.10 we added support for AWS Glue and AWS Athena including cataloging ETL jobs associated with an AWS account. There is no need to mount in a jdbc drivers directory as the Glue cataloger uses the Glue API, not JDBC.
dwc v.1.9 is a bug cleanup release.
It is now possible with the data.world Collector.1.8 to use jdbc drivers on classpath as well as those found in user-specified JDBC Driver Directory (drivers in directory have higher precendence than classpath drivers).
the data.world Collector v.1.7 is a bug-fix release
the data.world Collector v.1.6 adds the support for arbitrary jdbc data sources and the ability to build one-off docker images for testing, demos, etc.,
With the data.world Collector v.1.5 we add support for Oracle.
In the data.world Collector.1.4 we add support for Google BigQuery.
the data.world Collector v.1.3 brings much new functionality including:
Support for Denodo and Snowflake
Compatibility of JDBC catalogs with tables imported through data.world integrations
Ability to differentiate source information for databases cataloged from localhost
Cataloging of
REMARKS
fields into dct:descriptio
With the data.world Collector v.1.2 we support Redshift databases.
the data.world Collector v.1.1 contains documentation clarification and expansion for the documents to streamline tags on customer docker hosts.
The initial release of the data.world Collector v.1.0 provides support for metadata catalog extraction for DB2, Hive, MySQL, Postgres.