Skip to main content

About the Tableau collector (legacy version)

Important

The Tableau collector can be run in the Cloud or on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.255. To view the release notes for this version and all previous versions, please go here.

Use this collector to:

  • Discover Tableau objects (such as, Tableau workbooks and dashboards) in your Tableau Online or Tableau Server instance, etc.

  • Perform impact analysis to understand how changes to upstream data sources impact Tableau objects

Tableau version supported

  • The collector supports Tableau Cloud and Tableau Server. The specific versions supported are Tableau API versions 3.7-3.10 on Tableau Server v 2022.1

    It is expected that the collector will support current versions of Tableau Online and Tableau Server. If you have any questions or encounter problems, please contact data.world Support.

Authentication supported

The Tableau collector supports the following methods for authentication:

These authentication details are used while generating the CLI or YAML file for the collector.

What is cataloged

The collector catalogs the following information.

Table 1.

Object

Information cataloged

Databases

Name, Identifier, Description, Database Connection Type

Database tables

Name, Identifier

Database columns

Name, Identifier

Projects

Name, Description

Workbooks

Name, Description, Creator Email, Creator Name, Creator Tableau User, Preview Image, and Workbook URL

Note: Unpublished workbooks are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published.

Dashboards

Name, Creator Email, Creator Name, Creator Tableau User, Preview Image, and Dashboard URL

Note: Unpublished dashboard are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published.

Views

Name, Creator Email, Creator Name, Creator Tableau User, Number of Views, Number of Favorites, Preview Image, and View URL

Note: Unpublished views are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published.

Fields

Name, Identifier, Description

Calculated fields

Name, Identifier, Description, Calculation Formula

Dimensions

Name, Identifier, Description

Measures

Name, Identifier, Description

Metrics

Name, Identifier, Creator, Creation Date, Modified Date, Metrics UrlField Data Type, Field Format, Field Type

Custom SQL tables

Name, Identifier, Description, Query

Embedded data sources

Name, Identifier

Published data sources

Name, Identifier, Description



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Databases

  • Schemas contained within database

  • Tables contained within database

Database tables

  • Views that use database table

  • Schema containing database table

  • Database containing the database table

Database columns

  • Table that a database column is part of

Projects

  • Views contained within the project

  • Workbooks contained within the project

  • Dashboards contained within project

  • Subprojects contained within project

Workbooks

  • Projects that contain workbook

  • Data sources embedded within workbook

  • Views contained within workbook

Dashboards

  • Fields used by dashboard

  • Projects containing dashboard

  • Tables used by dashboard

  • Workbooks containing dashboard

  • Views embedded in dashboard

Views

  • Fields used by view

  • Projects containing view

  • Tables used by view

  • Workbooks containing view

  • Dashboards which embed the view

Fields

  • Data Sources containing field

  • Views using field

Calculated fields

  • Views that use the calculated field

  • Data sources that contain the calculated field

Dimensions

  • Data sources containing dimension

  • Table related to dimension

Measures

  • Data Source containing measure

  • Views using measure

Custom SQL tables

  • Views using Custom SQL table

Embedded data sources

  • Fields contained within embedded data source

  • Workbook embedding embedded data source

Published data sources

  • Fields contained within published data source



Lineage for Tableau

The collector does not support harvesting cross-system lineage when Tableau reports connect to a source system using ODBC connections.

Table 3.

Object

Lineage available

Database columns and tables

Fields that use database columns and tables

Dashboards

Fields and tables that dashboards source their data from

Views

Fields and tables that views source their data from

Fields

Columns, tables, and other fields that a field uses its data from

Embedded data sources

Published data sources



Supported cross-system lineage

The currently supported data sources for cross-system lineage are:

  • Postgres 

  • Snowflake

  • BigQuery

  • Redshift

    Important

    While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between TabIeau and these sources.

Important things to note about improving the performance of collector runs

Depending on the size of your Tableau instance, you may want to exclude or include specific resources from your catalog.

  1. Exclude object types: Use the --tableau-exclude parameter to exclude harvesting of certain object types. The supported object types are: View, Dashboard, Database, PublishedDataSource, EmbeddedDataSource, CalculatedField, ColumnField, BinField, GroupField, DatasourceField, CustomSQLTable, Metric

  2. Filter to specific Tableau site: Use the --tableau-site parameter to filter to a specific site.

  3. Filter to specific Tableau projects: Use the --tableau-project parameter to harvest from multiple tableau projects. Use the parameter multiple times for multiple projects.

  4. Filter out specific Tableau projects: Use the --tableau-exclude-project parameter to skip harvesting from multiple tableau projects. Use the parameter multiple times for multiple projects.

  5. GraphQL page size: Use the --tableau-graphql-page-size parameter to adjust the GraphQL page size. The maximum page size is 1000.

  6. Increase Docker resources: If you run into out of memory errors, increase the memory on the machine running the collector, or increase the java heap size when running a jar file, or use filtering.