About the BigQuery collector

Use this collector to harvest metadata for BigQuery datasets, projects, tables, and columns in a BigQuery instance and make it searchable and discoverable in data.world. The collector also harvests column-level lineage relationships between tables and views.

Important

The BigQuery collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.292. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Table 1.

Object	Information cataloged
Datasets	ID, name, description, labels (note these are key/value pairs), created date, last modified date, default table expiry, default partition expiry, data location
Projects	Name
Tables	Name, Description, Created date, Last modified date, Default table expiration, Data location, Labels, Type (Standard, External, Snapshot, Model), Partitioned on field, Clustered by columns for standard and snapshot tables, Partition type (range or time) requires partition filter - Range (Start, end, interval) Time (Partition type (hour, day, month, year), expiration)
Columns	Name, Description, Data Type, Is Nullable, Column size
View	Name, description, created date, default table expiration, last modified date, data location, default collation, labels, view SQL, clustered by columns for materialized

Relationship between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page	Relationship
Datasets	Tables, Views
Projects	Dataset
Tables	Column, Labels
Columns	Table, View
Views	Column
Label Value	Table, View, Project, Dataset

Lineage for BigQuery

The following lineage information is collected by the BigQuery collector.

Table 3.

Object	Lineage available
View Column	The collector identifies the associated column in an upstream view or table: Where the data is sourced from That sort the rows via ORDER BY That filter the rows via WHERE/HAVING That aggregate the rows via GROUP BY

Authentication supported

The collector authenticates to BigQuery using a Service Account associated with the project.

In this section:

About the BigQuery collector

Important

Note

What is cataloged

Relationship between objects

Lineage for BigQuery

Important

Authentication supported

Search results