About the BigQuery collector
Use this collector to harvest metadata for BigQuery datasets, projects, tables, and columns in a BigQuery instance and make it searchable and discoverable in data.world. The collector also harvests column-level lineage relationships between tables and views.
Important
The BigQuery collector can be run in the Cloud or on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.248. To view the release notes for this version and all previous versions, please go here.
What is cataloged
The collector catalogs the following information.
Object | Information cataloged |
---|---|
Datasets | ID, name, description, labels (note these are key/value pairs), created date, last modified date, default table expiry, default partition expiry, data location |
Projects | Name |
Tables | Name, Description, Created date, Last modified date, Default table expiration, Data location, Labels, Type (Standard, External, Snapshot, Model), Partitioned on field, Clustered by columns for standard and snapshot tables, Partition type (range or time) requires partition filter - Range (Start, end, interval) Time (Partition type (hour, day, month, year), expiration) |
Columns | Name, Description, Data Type, Is Nullable, Column size |
View | Name, description, created date, default table expiration, last modified date, data location, default collation, labels, view SQL, clustered by columns for materialized |
Relationship between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Datasets | Tables, Views |
Projects | Dataset |
Tables | Column, Labels |
Columns | Table, View |
Views | Column |
Label Value | Table, View, Project, Dataset |
Lineage for BigQuery
The following lineage information is collected by the BigQuery collector.
Object | Lineage available |
---|---|
View Column | The collector identifies the associated column in an upstream view or table:
|
Authentication supported
The collector authenticates to BigQuery using a Service Account associated with the project.