About the Tableau collector
Important
The Tableau collector can be run on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.253. To view the release notes for this version and all previous versions, please go here.
Use this collector to:
Discover Tableau objects (such as, Tableau workbooks and dashboards) in your Tableau Online or Tableau Server instance, etc.
Perform impact analysis to understand how changes to upstream data sources impact Tableau objects
Tableau version supported
The collector supports Tableau Cloud and Tableau Server. The specific versions supported are Tableau API versions 3.10 and above on Tableau Server v 2022.1
It is expected that the collector will support current versions of Tableau Online and Tableau Server. If you have any questions or encounter problems, please contact data.world Support.
Authentication supported
The Tableau collector supports the following methods for authentication:
These authentication details are used while generating the CLI or YAML file for the collector.
What is cataloged
The collector catalogs the following information.
Object | Information cataloged |
---|---|
Databases | Name, Identifier, Description, Database Connection Type |
Database Schemas | Name, Identifier |
Database tables | Name, Identifier |
Database columns | Name, Identifier |
Tableau Databases | Name, Identifier, Connection Type |
Tableau Database tables | Name, Identifier, Connection Type |
Tableau Database columns | Name, Identifier |
Projects | Name, Description |
Workbooks | Name, Description, Creator Email, Creator Name, Creator Tableau User, Preview Image, and Workbook URL Note: Unpublished workbooks are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published. |
Dashboards | Name, Creator Email, Creator Name, Creator Tableau User, Preview Image, Dashboard URL, Number of Favorites, and Number of Views Note: Unpublished dashboard are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published. |
Views | Name, Creator Email, Creator Name, Creator Tableau User, Number of Views, Number of Favorites, Preview Image, and View URL Note: Unpublished views are not harvested. This is because the Tableau REST APIs do not return the objects if they are not published. |
Datasource fields | Name, Identifier, Description |
Calculated fields | Name, Identifier, Description, Calculation Formula, Category, Role, Type |
Group fields | Name, Identifier, Description, Category, Role, Type |
Bin fields | Name, Identifier, Description, Category, Role, Type, Bin Size |
Column fields | Name, Identifier, Description, Category, Role, Type |
Metrics | Name, Identifier, Description, Creator Email, Creator Name, Creator Tableau User, Metric Url |
Custom SQL tables | Name, Identifier, Description, SQL Query |
Embedded data sources | Name, Identifier |
Published data sources | Name, Identifier, Description |
Relationships between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Databases |
|
Database Schemas |
|
Database tables |
|
Database columns |
|
Tableau Databases |
|
Tableau Database tables |
|
Tableau Database columns |
|
Projects |
|
Workbooks |
|
Dashboards |
|
Views |
|
Datasource fields |
|
Calculated fields |
|
Group fields |
|
Bin fields |
|
Column fields |
|
Custom SQL tables |
|
Embedded data sources |
|
Published data sources |
|
Lineage for Tableau
The collector does not support harvesting cross-system lineage when Tableau reports connect to a source system using ODBC connections.
Object | Lineage available |
---|---|
Database columns and tables | Fields that use database columns and tables |
Projects | Databases, Database schemas, Database Tables, Database Columns, Workbooks, Views, Dashboards, custom SQL tables, and Data sources that projects contain |
Dashboards | Fields and tables that dashboards source their data from |
Views | Fields and tables that views source their data from |
Fields | Columns, tables, and other fields that a field uses its data from |
Tableau Database tables | Tableau Databases containing the Tableau Database table |
Tableau Database columns | Fields that reference the Tableau Database column, Tableau Database tables containing the Tableau Database column |
Published data sources | Embedded data sources that were derived from published data source |
Embedded data sources | Database tables and Database columns that the Embedded data source uses data from |
Supported cross-system lineage
The currently supported data sources for cross-system lineage are:
Postgres
Snowflake
BigQuery
Redshift
Important
While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between Tableau and these sources.
Important things to note about improving the performance of collector runs
Depending on the size of your Tableau instance, you may want to exclude or include specific resources from your catalog.
Exclude object types: Use the --tableau-exclude parameter to exclude harvesting of certain object types. The supported object types are: View, Dashboard, Database, PublishedDataSource, EmbeddedDataSource, CalculatedField, ColumnField, BinField, GroupField, DatasourceField, CustomSQLTable, Metric
Filter to specific Tableau site: Use the --tableau-site parameter to filter to a specific site.
Filter to specific Tableau projects: Use the --tableau-project parameter to harvest from multiple tableau projects. Use the parameter multiple times for multiple projects.
Filter out specific Tableau projects: Use the --tableau-exclude-project parameter to skip harvesting from multiple tableau projects. Use the parameter multiple times for multiple projects.
GraphQL page size: Use the --tableau-graphql-page-size parameter to adjust the GraphQL page size. The maximum page size is 1000.
Increase Docker resources: If you run into out of memory errors, increase the memory on the machine running the collector, or increase the java heap size when running a jar file, or use filtering.