Catalog collector change log
8-2-22 the data.cowlrd Collector v2.93
hash: afdecd160fd38e3db565cb14db3805fed05fa86b5c3a70662d0c8f0b0d10799f
Includes some internal dependency updatesIncludes some internal dependency updates
Enhancements to the DBT collector to validate the input
profile.yaml
file
7-29-22 the data.world Collector v2.92
hash: 9b87934376246cd3926bfe413d36f2a7a0f2e7d848d7f2e68380d6035fe276f6
Enhancement to the tableau collector to add a retry if a graphql query fails
Enhancement to the collectors to add a check at the beginning of a collector run to ensure the output directory exists (if the o/-output option is used), which will log an error and stop if the output directory doesn't exist.
7-21-22 the data.world Collector v2.91
hash: 708b34b19b2695d14d5a74a8281d5365cb659c4eeeebb791b3ebf2aa2e4d6686
Enhancement to the Tableau collector to reauthenticate if the Tableau API reports failed authorization during the collector run.
Released the new Fivetran collector (
catalog-fivetran
)
7-12-22 the data.world Gollector v2.90
hash: bf94b0431b5a99dd95f485a8a48f202ea138f103b54873b4440b5080d86d529a
Added the parameter
--include-information-schema
for Snowflake and SQL Server collectors; we no longer catalog the information schema in these collectors when--all-schemas
is specified, unless the user also specifies--include-information-schema
.Improved handling of manifest json structures with some nulls in the dbt collector.
Added reporting on user access issues during parsing/resolution for Snowflake collector.
7-9-22 the data.world Collector v2.89
hash: 848e38708b832c703652dc45d148e471a2341bce7f6ec159c2471f287a8d3620
Updated tableau collector to print a clear log message when authentication expires during a collector run
Update tableau collector to allow optimized serialization of API requests under JDK 17
7-1-22 the data.world Collectorv2.88
hash: 8534190cb3f0f93bd2a326abd54086e89eb38ece8180bf0487486dc66242d6c8
Significant updates to the Power BI collector. As of this release PowerBI Collector outputs different classes than the version before it. The collector now emits information about where it is sourcing it's data.
Internal developer and testing improvements
The MANTA collector is now more specific about the concepts that it emits about Informatica PC.
6-24-22 the data.world Collectorv2.87
hash: 7860e33213ba90783851cd7f7e6529ee99a5f261ae086d3a7038938c6f290ae6
The information schema collector now explicitly supports Oracle.
We have added enhancements to the dbt collector to harvest DBT snapshots and sources.
6-22-22 the data.world Collectorv2.86
hash: 6fdae2dd70896e402ca648701bcd48210a8fd5979230c958b0dc06030bd7b1ec
For collectors that take API endpoint URLs, the data.world Collector will add a trailing slash to the URL if needed and not specified by the user
New command-line option
--warehouse
available for the Snowflake collector that allows the user to specify which warehouse to use to connect to snowflake.
6-18-22 the data.world Collectorv2.85
hash: aaa6e55bf19af7ef37f1ab80ad28522af77a6ff286ef616085d92ab51f7d7899
Added a the data.world Collector collector for dbt - legacy collector still available.
Fixed an issue with auto-uploaded log files, in which not all log messages were being written out.
6-15-22 the data.world Collectorv2.84
hash: 2e128cd3c89ffc8c35fbad12f6ee4ba7e6e5cdf9bfcf991fac78e8033d5d17d0
Looker collector now emits resources for Looker Views and relates the Looker Dimensions and Looker Measures that are configured within those Views.
Improved handling of unexpected database types encountered when cataloging Tableau.
6-3-22 the data.world Collectorv2.83
hash: 9487027423a076231cec76f5679f044493a3d75032882c4ca0e5cf1c0304e6cf
Further improvements in handling of SQL ORDER BY, GROUP BY, WHERE, and HAVING clauses when harvesting intra-database lineage from database views.
6-2-22 the data.world Collector2.82
hash: 22459e3d3a2a38f448d4e56137ed4ecd05170767b5a682dd4870135cceff23c2
Corrected coining of IRIs in catalog graphs emitted by the Tableau collector.
Improved logging in the Tableau collector to detect unknown linked database types.
Improved harvesting of lineage between database views and referenced columns, including support for columns in SQL ORDER BY, HAVING, GROUP BY, and WHERE clauses, and parsing of a wider range of column expressions.
the data.world Collectorv2.81 - INTERNAL RELEASE
5-24-22 the data.world Collectorv2.80
Hash: be5a85c754d54328accabe332dec55ce507baddbe68d2fe9e29a211e9ea1420f
With this release, the data.world Collector now requires Java 17. If you run the collector from within Docker this change will not affect you. If you run the data.world Collector from a .jar file, you will need to upgrade your JRE to 17 to run DWCv2.80 and greater.
5-13-22 the data.world Collectorv2.79
Hash: 5b548c82b96ad5e5dbd4770adff205c9d07cac3c5f949882d7d9381240366ddb
The Manta collector can now accept OAuth tokens for MANTA authentication (for harvesting metadata from manta version R35 and above)
We have released a new collector powerbigov that only allows tenantid for auth and not user/password and connects to the government powerbi api urls.
5-11-22 the data.world Collectorv2.78
Hash: 71edd8ff7a4c3ed8a91eaf36d59c8e2745b7a76f8666b5750cbee8205021c9c6
Added some small Tableau collector enhancements.
New PowerBiGov collector with specific endpoints for .gov customers. This collector does not accept a username or password.
For PowerBI, a new way to authenticate is available. A user can now enter a tenant ID with a client id and a client secret to authenticate, in addition to using a username and password.
For both PowerBi and PowerBiGov, when using the tenantid, secret and client id authentication method, this collector no longer emits information about PowerBI Apps.
4-27-22 the data.world Collectorv2.77
hash: 4bed848791cfa9e46c9db4a78c7a593bb1c986900dc6fcfcd4255ddce1528579
Fixes an issue with the Snowflake collector that prevented the bundled jdbc driver from being found. Any users working with the data.world Collector 2.76 should update.
4-22-22 the data.world Collectorv2.76
hash: 30e60a4434ee64d2981b40eb2dc92506da3d367eab22bc0bca0c61bdd44a3f02
The Snowflake collector harvests some intra-database lineage information from database views.
Improved the host mapping in the Manta collector.
4-7-22 the data.world Collectorv2.75
hash: 1a59dbb3ff8679fb6ee22eadaeb04ccdb28c5660be029e78fbc96403ae33096f
the Manta collector now emits resources for file sources and targets and their directory structure. It also emits sources and targets as files.
4-1-22 the data.world Collectorv2.74
hash: 219428f6a72be91205408d5cb3f8cc8b27e1a9a4df0208e4cacb8fbaa1352f90
The Tableau collector now emits “column-level lineage”:
Improved styling of the data.world Collector command-line errors
Updated command-line options for Datakin and Marquez.
3-16-22 dbt collector v.05
This version adds a third command-line argument to specify an output file name.
3-8-22 the data.world Collectorv2.73
hash: 119daf987dcfad25db599e1c1affedf17a35ff2aa002d0618d642eb309cebaaf
Permalinks to Looker explores included via externalUrl
Improvements to datakin/marquez collectors
Tableau collector now emits resources for Tableau Projects, allowing us to establish full relationships between projects and the workbooks and views that they contain
Monte Carlo data collector now emits data quality information using enhanced dwec ontology concepts
Looker collector now emits descriptions for measures and dimensions
MANTA collector now emits Snowflake resources found in MANTA scans
3-1-22 the data.world Collectorv2.72
Hash: 62d156aca58ec92513e8d6490f00fd10ee52dfb7a65f71c20c6a988c938dfddd
[BUGFIX] Invalid prefix when using --base option
Update the data.world Collector transform to add catalog events to specific collectors
Added a Snowflake Sensitive Data Discovery collector
Sync CLI options between collector types
Validation of CLI options for the data.world Collector
Improvements to the the data.world Collector CLI
Update the MonteCarlo Collector to use the new Data Quality Ontology
2-17-22 the data.world Collectorv2.71
Digest: 03fc3df90ae63896d62ea22e00688f42cacf5b76d0f47691c06c104736680b2a
Bug fix for Marquez collector
Bug fix for Manta collector
2-9-22 the data.world Collectorv2.70
Digest: 06bb747c4d7705c1e44664de7854158d87468316bab549ec5604b0a075380c69
Preview images for Tableau assets are now harvested much more efficiently, and the resulting image data in the catalog graph are much smaller, reducing catalog harvest run time and enabling image objects to remain within platform constraints during ingest.
Fix for unexpected column type errors in BigQuery collector
2-8-22 the data.world Collectorv2.69
Digest: 5ab9b97d5f8f4568613438a9e52b0bdc12974f8d6edd0dab374a281c4982c737
Created new collectors for Marquez and Datakin
Added schema information to the Tableau collector outputs
2-4-22 the data.world Collectorv2.68
Digest: 23674ee02a6b725d5f9a453615dc507286da2ee606dca83c386472f3aa36d118
The Tableau collector now accepts Tableau “Personal Access Tokens” for authentication, via new cli options
--tableau-pat-name
and--tableau-pat-secret
.Fixed an issue with mis-identification of views as tables in BigQuery.
2-2-22 the data.world Collectorv2.67
Digest: 032867c9c52c8d46dc0b90a61a128be65ecec1440bb0adccb8b0d1b249b4e351
Fixed an issue with server name identification in Manta.
1-26-22 the data.world Collectorv2.66
Digest: fa9ae2eb3d68375a3ff01ac7bde98fd36f372b84dce0d411444146ea9566b47b
With this release the Athena collector is no longer a JDBC collector--we harvest metadata by accessing the Athena API directly, rather than going through a JDBC driver. This means that it is no longer necessary to provide a JDBC driver when running the collector.
1-10-22 the data.world Collectorv2.65
Digest: ed08cdd21a374c30456de0989076f5180bc4187ca998358b051807e521fd44e6
This release adds a new option for the MANTA collector,
--manta-max-parallel-scenarios
. Specifying this option and passing an integer value will configure the MANTA API to export the specified number of scenarios in the MANTA graph in parallel. The default value is 4; adjusting this up or down can improve performance.
1-5-22 DWCv2.64
Digest: 45b72798b0602885790388331a75db1f4286b15bf57b21f30f416eda79041571
This release upgrades the data.world Collector's dependency on the Apache Jena RDF library to version 4.2.0, which addresses security vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-39239.
12-23-21 the data.world Collectorv2.63
Digest: sha256:eb4208c914269c793a5e2143d59a9982e7b087c5da1c17dd075e02a326e64a3e
The Athena JDBC driver is no longer bundled with the data.world Collector as we have discovered that the Athena driver itself has a dependency on a vulnerable version of log4j. Customers that use the the data.world Collector Athena collector will now need to supply their own driver and put it in the jdbc driver directory (as is done with other collectors for which we don’t distribute a driver).
12-15-21 the data.world Collectorv2.62
Digest: sha256:2cd579e09f4eee94e141e8cf7e4e40e9a9b8803029df1be7112d67d62ef33b9e
The Oracle collector now supports connecting to the database via SID (instance ID) or Service Name. Service Name is the default. If a connection via SID is desired, pass the SID as the value of the -d/ --database option and add the --oracle-sid-mode option (flag).
12-13-21 the data.world Collectorv2.61
Digest: sha256:bd0ba96208d714ecef4131867cf5d16372be0a33f416c1d6bd01f132c8517323
The information schema collector has been modified so that the files table_constraints.csv and constraint_column_usage are now optional, not required.
12-10-21 the data.world Collectorv2.60
Digest: sha256:7fd825bfe7d2f99c9a1298ad26bc1934c9657cc7c5868dd093844344d18fc7b7
Updated the BigQuery collector to support current Google Cloud API enhancements.
Added a new Information Schema Collector. This collector runs via the {{catalog-information-schema}} command and is notably cataloging four CSV files that are provided to the collector via a {{--csv-file-directory}} parameter rather than connecting to a database. This collector is an option for customers with tricky DB setups that do not allow them to authenticate or establish connections to their DB via our normal the data.world Collector collectors.
12-2-21 the data.world Collectorv2.59
Digest: sha256:051f76748be1c6cf2c7557600dde71a39e1b822c9e49120881ce938f1c8c2b80
Verified the Manta collector works with MANTA R34.
Released the config file command.
Modified the Tableau collector to remove schema and database names from table names.
Updated the BigQuery collector to support cataloging all datasets in a project at once by default, and to be able to use cli options to select specific datasets in a project as well. With this last change, the
--dataset
param is no longer required. The help text has been updated with new messaging to reflect these changes.
11-10-21 the data.world Collectorv2.58
Digest: sha256:82ebc1cec46f70de000aa94695359bd28d65c2782afc362c9ce14fadc04eae07
Added a new collector for Hive (as an alternative to catalog-hive) that uses only the Hive metastore--it does not connect to the Hive server directly.
The PowerBI collector now harvests workspaces and identifies other assets as being in workspaces
the data.world Collector now emits “catalog events” into the catalog graph. These capture details about the cataloging process itself, including selected configuration options with which the Collector was run, and summary statistics about the catalog. The ingest process will soon extract this information from catalogs at ingest time and send them to segment for downstream analysis.
11-1-21 the data.world Collectorv2.57
Digest: sha256:606f7cfbe60bf56b4c2ecd5fb3902d4de621e31ae76ad78e68c56c788f81e5e6
Fixed an issue in the Tableau collector in which Custom SQL Table objects without an associated database were not handled correctly.
10-27-21 the data.world Collectorv2.56
Digest: sha256:335f7e110a9506d95dff05971492e6509fb8537e74f9275d04dcf9e2427df0f0
Added new cli options to salesforce collector so that it can handle sandbox environments and custom login domains customers might have.
10-25-21 the data.world Collectorv2.55
Digest: sha256:c60ae69edc88b8801be833d578ef5dca73b6302646be9b30d31ccdfd7444288a
This release updates the BigQuery collector to handle fields in BigQuery tables for which the BigQuery API returns null type.
10-5-21 the data.world Collectorv2.53
Digest: sha256:59c960d525e66e77d08dd34fd58c9b5027334a4bd2271f1f059370ae006a4b0b
Enhancements to the MANTA collector to harvest additional lineage information from MANTA scans (lineage from Informatica PowerCenter in particular)
Tableau collector enhancement to provide a better warning to the user when an obsolete version of the Tableau API is specified
9-29-21 the data.world Collectorv2.52
Digest: sha256:915e4e91841001f80a84a65fcd76350b9a1d53f4e31678bb0e628d32beab94a1
Fixed an issue with the handling of certain fields and database information when the Tableau collector was run with a non-admin credential.
9-28-21 the data.world Collectorv2.51 (internal)
Digest: sha256:261c5bf33b2ae38cbda35a346fcb37c56bbf8ebfb773f328deb9140efba1c8bf
Fixedan issue with the Tableau collector issue to handle views/workbooks that exist outside of a project.
9-28-21 the data.world Collectorv2.50 (internal)
Digest: sha256:b407c629247f36afac3869eb8320464fce8caeb2865dd79811882b54ef94d1b5
Fixed an issue with the Tableau collector to handle workbooks that exist outside of projects.
9-24-21 the data.world Collectorv2.49
Digest: sha256:397e78867f41aaa393ff69f42b0fa524fdcad662ddd027925cf27f80497b24ce
Added a collector for Salesforce (catalog-slesforce)
Fixed a IRI mismatch issue for Tableau Collector when running on Tableau instances with a Snowflake datasource.
9-18-21 the data.world Collectorv2.48
Digest: sha256:c36755489b6235408aa4e639e6e184cab027a32a34e3b8ca369c3c6b3c4bff96
Made internal improvements to the tableau collector to enable more efficient querying of the Tableau metadata api.
Fixed an issue in the manta collector in which certain missing data in the MANTA lineage graph caused an exception
9-10-21 the data.world Collectorv2.47
Digest: sha256:219edfa247929e15d7c4e2be99ef890b2487c398abc1a23b2f85b3de11812be3
Fixed an issue in the Reltio collector that occured when a Reltio configuration was missing certain objects.
Added a collector for Databricks (catalog-databricks)
9-8-21 the data.world Collectorv2.46
Digest: sha256:e48cba45b457e076714d94d3a83d1164cb892864213732b3b2b334c041ff178a
Fixed an issue with creation of resource IRIs by certain collectors when the user chooses version 1 minting
Updated BigQuery collector to enable integration with data.world platform / connection manager
Fixed an issue with the MANTA collector in which certain large MANTA scans caused a numeric overflow during json de-serialization
Updated Reltio collector to include information about survivorship groups in the emitted catalog
8-24-21 the data.world Collectorv2.45
Digest: sha256:77f4c784b1d0166cf3bb87903696528f712fbe6aee1d4cb7e60097a0f494c7de
This release fixed an issue with JDBC drivers not being loaded by the Athena collector.
Added a collector for Reltio configurations (catalog-reltio).
the data.world Collectorv2.44
Digest: sha256:47c1bb38b88c25801adf1f765e23c63637d15a60ae11fca8d63b53a8cd4755b2
Fixes an issue with URLs for sheets and dashboards that exist in Tableau Online or in Tableau Server within a site other than the default site.
the data.world Collectorv2.43
Digest:
sha256:696deaad59d2948a6adf3c275a90539cbf87057c93de9ee94d911fe105c574ce
Additional datetime fields added for Looker objects and typed as xsd:dateTime.
Fixed an issue caused by an undocumented change in Tableau Online’s REST API when using the Tableau collector to harvest metadata from Tableau Online.
the data.world Collectorv2.42
Digest: sha256:e6bc353ea4b2ec3486b54d4e9280856d328d93f5d406e367c0c50303cde93704
The generic jdbc collector harvests database name when cataloging Intersystems Cache databases
Running the Snowflake collector with the -A / --all-schemas option harvests metadata from all available schemas, as with other collectors
the data.world Collectorv2.41
Digest: sha256:bb79aa8afd19bf35b4b7e75840c21598702ec1d74b5f8640cc72a6758a3a0bc9
Fixed an issue with permalinks to objects in the MANTA collector .
the data.world Collectorv2.40
DIGEST: sha256:44dd710a49a1500863f49e2f2e4ef261a45cdc6c7354702fe8e764210c27293b
Added support for Looker folders and additional attributes to the Looker metadata collector.
Added the ability to preview images to the Tableau metadata collector.
the data.world Collectorv2.39
Digest: sha256:992671530f7483bfeb8a2aab52880a524b7df79caf427b373bd825115d71f4dc
Fixed an issue with the handling of certain special characters in catalog resource IRIs.
The --schema option for JDBC collectors can now be specified multiple times to enable the cataloging of multiple schemas in a single catalog.
the data.world Collectorv2.38
Internal release
the data.world Collectorv2.37
Digest: sha256:6a84217fa33df75d67ce51c486a90a802a8313a3432835abb55fffb5f1d3afc7
Updated Tableau collector to paginate additional graphql queries to avoid hitting Tableau Metadata API limits.
Updated the Hive2 collector to capture table-level metadata from the hive metastore
Updated the Tableau collector to allow the user to exclude specified Tableau objects from the catalog
the data.world Collectorv2.36
Digest: sha256:8dd9793f3b0e74adcd7e7bc153f06b8c3098470217fb07af4336dde611269671
Improvements to error messages produced when using a config-file to run the data.world Collector
We disallow running catalog-postgres and catalog-redshift in the same config file as the two collectors use incompatible JDBC drivers
Improved error handling throughout the data.world Collector
Improvements in representation of Tableau data source names in tableau catalogs
Improvements to the MANTA collector
the data.world Collector v2.35 Changes in this release:
Upgrade of Denodo collector to Denodo 8
Handle edge case of very large field values embedded in manta’s exported artifacts
Support for sites
Handle edge case of stored procedure columns in manta
the data.world Collector v2.34 This release includes:
Enhancements to domo collector output
Testing improvements
A minor tableau collector enhancement
Fix for an issue in the tableau collector in which column fields were sometimes not properly identifying the Tableau Table from which they sourced their data
Improvment to the presentation of domo catalogs in the platform UI.
Changes to the dockerhub repository where we house images containing non-released versions of the data.world Collector. Previously we were calling these “beta” releases; we now call them “release candidates”. The new repository is datadotworld/dwcc-rc and the image tags are x.y-rc-z where x.y is the next expected Collector release, and z is an increment.
the data.world Collector v2.33 Adds support for harvesting intra-database lineage from manta scans, and accommodates changes in MANTA R32 (aka 1.32). We no longer support MANTA versions earlier than MANTA R32.
the data.world Collector v2.32 This release adds in collector support for Vertica db.
the data.world Collector v2.31 Issued fix to ensure alignment of identifiers for databases referenced by Tableau and Looker collectors.
the data.world Collector v2.30 Installed a config file-driven configuration (as a hidden feature for now). Issued a fix for handling empty powerbi objects returned by the API
the data.world Collector v2.29 The data.world catalog collector now supports Tableau Online! Additionally there was a bugfix for PowerBi.
the data.world Collector v2.28 Bugfix release
the data.world Collector v 2.27 Added the optional CLI option tableau-graphql-page-size
to the Tableau collector which allows the user to set a number of objects to be included in each page of paginated queries.
the data.world Collector v2.26 Updated the PowerBi collector so that if a report is unavailable via the API it will be logged, and cataloging will continue on the rest of the repository.
the data.world Collector v2.25 This release includes better and more user-friendly error handling and reporting. We have also added an enhanced collection of Tableau metadata via the Tableau Metadata API (graphql endpoint). New metadata includes data sources, databases, fields, metrics, and many more inter-object relationships.
the data.world Collector v2.24 the data.world Collector is now distributed via Dockerhub Additionally there are changes to the Tableau and PowerBI collectors, and the ability to change the level of error messages written to the console and log file, and a new subcommand to display the the data.world Collector license text.
For Tableau:
The Tableau collector now emits RDF in which the object of `dct:creator` is a `dwec:Agent` instead of a string literal. This means we write additional details about the Tableau account that created the dashboard, via properties of the `dwec:Agent` resource. These details include: account name, account “full name”, and account email address (if they are populated in Tableau).
For PowerBI:
The PowerBI collector writes resources representing powerbi “data sources” that are now of a PowerBI-specific class, rather than `dwec:DataArtifact`.
Logging changes:
It is now possible for users to set the level (severity) of log messages written to the console and log file. By default, we write “info” level messages; users can choose to write only errors (level=“ERROR”), errors+warnings (level=“WARN”), or all messages including debug trace (level=“DEBUG”). This is useful if we want to have customers run the data.world Collector with debug logging turned on, for troubleshooting problems etc.
Display the data.world Collector license information:
License information for the data.world Collector is now available as a subcommand of the data.world Collector. To get all licensing information, run the command
docker run -it --rm datadotworld/dwcc:X.XX display-license
where X.XX is a version of the data.world Collector greater than or equal to 2.24.
the data.world Collector v2.23 Internal release
the data.world Collector v2.22 Internal release
the data.world Collector v2.21 fixed some timeout issues with Looker collector when fetching images from the Looker API. Fixed an issue with cataloging reports and dashboards based on user workspace permissions in PowerBi.
the data.world Collector v2.20 With this release our Tableau collector now supports cataloging of workbooks and non-dashboard views as well as harvesting tags on workbooks and views. FIxed an issue in the Looker collector where preview images returned from looker api were missing.
the data.world Collector v2.19 Includes a clean-up of the embedded help commands for several collectors and:
Fixes an issue with the Tableau Server collector when cataloging multi-site server instances.
Adds
--tableau-site
parameter to enable user to restrict cataloging to a single site (not required, by default all sites in the instance are scanned). Value provided to--tableau-site
can be a site ID or name.
the data.world Collector v2.18 The tableau collector now has a flag option --tableau-skip-images which skips the harvesting of preview images for views. Usage is like this:
... catalog-tableau --tableau-api-base-url=http://ec2-44-192-86-11.compute-1.amazonaws.com/api/3.10/ --tableau-username=admin --tableau-password=password -a sc-test3 -n tableau-test --tableau-skip-images
the data.world Collector v2.17 Adds a collector for Presto
the data.world Collector v2.16 This release:
Adds the parameter
--all-databases
to the Athena collector so that it can catalog all the databases accessible from the logged-in account.Fixes some issues with datatypes for
dwec:externalUrl
predicates.
the data.world Collector v2.15 This release contains the following:
The Tableau collector formerly had a CLI parameter
--tableau-project-id
which could be used to catalog only assets in the project with the specified ID. The parameter is now--tableau-project
and takes either a project ID or project nameUpdate to the MANTA collector to accommodate a minor change in the MANTA API with v 1.31. Customers who have updated their MANTA instance to v 1.31+ will want to use the data.world Collector 2.15+.
The Looker collector now works for non-admin Looker users; however, when the data.world Collector is run by a non-admin, the emitted catalog will not contain any information about databases used by Looker analysis assets (access to database information in Looker requires admin permissions).
All JDBC collectors now populate two new properties for
dwec:DatabaseColumn
:dwec:columnDefaultValue
anddwec:columnIsNullable
, which contain the default value for that column in newly inserted rows, and whether the column can be null, respectively. (Note that only some databases/drivers provide this metadata…we put it in the catalog if it’s there).
the data.world Collector v2.14 Adds a collector for Looker. Minor update to the docker-save.sh script that includes available versions in the error message if you don’t supply a version.
the data.world Collector v2.13 Adds cli params with this version so it now possible to pass arbitrary driver properties through to the connection
the data.world Collector v2.12 Adds collector for SAP (formerly Sybase) SQL Anywhere metadata collector
the data.world Collector v2.11 Improves the Dremio collector’s handling of data sources nested within multiple layers of folders, and fixed a minor issue with the Dremio collector’s harvesting of lineage metadata from the Dremio graph API.
the data.world Collector v2.10 Adds a collector for Domo and JDBC database collectors can now catalog all schemas in the database at once (default remains to catalog only user's default schema).
the data.world Collector v2.9 Adds Tableau Server collector and extended the OpenAPI collector to include a few additional schema property metadata properties.
the data.world Collector v2.8 Adds Infor ION data lake collector. Optimized collection of JDBC metadata (performance improvement).
the data.world Collector v2.7 Adds a collector for PowerBI.
the data.world Collector v2.6 Adds the Manta collector.
the data.world Collector v2.5 Upgrads Java runtime.
the data.world Collector v2.4 Extends handling of OpenAPI collector parameters and responses.
the data.world Collector v2.3 Adds support for OpenAPI (fka Swagger) collector.
the data.world Collector v2.2 A refactoring release.
the data.world Collector v2.1 Fixes an issue with the Denodo cataloger jdbc url port.
the data.world Collector v2.0 We now use v2 URIs as the official locator IDs for metadata resources. This is a breaking change (for structural, intentional reasons) which is not backwards compatible with v1 URIs. For more information see the article on the data.world Collector v2.X.
the data.world Collector v 1.20 Addresses some memory issues and open-cursor leaks.
the data.world Collector v.1.19 Adds writing statements to the catalog graph indicating that the catalog was the data.world Collector by the data.world Collector (with a version). We also added the ability to write database schema objects to the catalog graph.
the data.world Collector v1.18, Allows you to specify alternate organization permissions and upload locations when performing an automatic upload of the metadata.
the data.world Collector v.1.16 and the data.world Collector v.1.17 Address issues with the SQL Server cataloger.
the data.world Collector v.1.15 Adds Dremio support with optional Catalog API lineage fetching.
the data.world Collector v1.14, Enables you to change the amount of memory that gets allocated to a the data.world Collector docker process. See our article on allocating additional memory to Docker for more information.
the data.world Collector v.1.13 Adds support for Microsoft SQL Server, and we enable JVM to use available memory in the container (useful for creating large catalogs). Additionnally we Improve data type recognition in AWS Glue cataloger.
As of the data.world Collector v1.12 we can support not only Glue ETL jobs, but also Glue Data Catalog tables and columns.
With the data.world Collector v.1.11 you can:
Upload generated catalogs via the --upload / -U command-line parameters
Upload the the data.world Collector log when uploading generated catalogs with --upload
Fetch an organization's current catalog with the fetch-catalog command
In the data.world Collector v1.10 we added support for AWS Glue and AWS Athena including cataloging ETL jobs associated with an AWS account. There is no need to mount in a jdbc drivers directory as the Glue cataloger uses the Glue API, not JDBC.
dwc v.1.9 is a bug cleanup release.
It is now possible with the data.world Collector.1.8 to use jdbc drivers on classpath as well as those found in user-specified JDBC Driver Directory (drivers in directory have higher precendence than classpath drivers).
the data.world Collector v.1.7 is a bug-fix release
the data.world Collector v.1.6 adds the support for arbitrary jdbc data sources and the ability to build one-off docker images for testing, demos, etc.,
With the data.world Collector v.1.5 we add support for Oracle.
In the data.world Collector.1.4 we add support for Google BigQuery.
the data.world Collector v.1.3 brings much new functionality including:
Support for Denodo and Snowflake
Compatibility of JDBC catalogs with tables imported through data.world integrations
Ability to differentiate source information for databases cataloged from localhost
Cataloging of
REMARKS
fields into dct:descriptio
With the data.world Collector v.1.2 we support Redshift databases.
the data.world Collector v.1.1 contains documentation clarification and expansion for the documents to streamline tags on customer docker hosts.
The initial release of the data.world Collector v.1.0 provides support for metadata catalog extraction for DB2, Hive, MySQL, Postgres.