Skip to main content

About the BigQuery collector

Use this collector to harvest metadata for BigQuery datasets, projects, tables, and columns across the enterprise systems and make it searchable and discoverable in data.world. The collector also harvest column-level lineage relationships between tables and views.

Important

The BigQuery collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.200. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Table 1.

Object

Information cataloged

Datasets

ID, name, description, labels (note these are key/value pairs), created date, last modified date, default table expiry, default partition expiry, data location

Projects

Name

Tables

Name, Description, Created date, Last modified date, Default table expiration, Data location, Labels, Type (Standard, External, Snapshot, Model), Partitioned on field, Clustered by columns for standard and snapshot tables, Partition type (range or time) requires partition filter - Range (Start, end, interval) Time (Partition type (hour, day, month, year), expiration)

Columns

Name, Description, Data Type, Is Nullable, Column size

View

Name, description, created date, default table expiration, last modified date, data location, default collation, labels, view SQL, clustered by columns for materialized



Relationship between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Datasets

Tables, Views

Projects

Dataset

Tables

Column, Labels

Columns

Table, View

Views

Column

Label Value

Table, View, Project, Dataset



Lineage for BigQuery

The following lineage information is collected by the BigQuery collector.

Table 3.

Object

Lineage available

View Column

The collector identifies the associated column in an upstream view or table:

  • Where the data is sourced from

  • That sort the rows via ORDER BY

  • That filter the rows via WHERE/HAVING

  • That aggregate the rows via GROUP BY



Authentication supported

The collector authenticates to BigQuery using a Service Account associated with the project.