Skip to main content

Set up the system for deploying collectors output & MDP

In this section we will walk you through the process of getting your system setup for processing and showing the collector output and the related UI elements (MDP) to your end users. In this example, we will use the Tableau collector. You can follow the same process for the other collectors in system.

Feature availability

Currently this feature is available for the following collectors.

  • Amazon DynamoDB

  • AWS S3

  • AWS Glue

  • Azure Data Factory

  • Azure Data Lake Storage Gen2

  • Databricks

  • DB2

  • dbt

  • Denodo

  • Fivetran

  • Generic JDBC

  • Google BigQuery

  • Grafana

  • InfluxDB

  • Kafka

  • Looker

  • MANTA

  • Microsoft SQL Server

  • Microsoft SSRS

  • Monte Carlo

  • Netezza

  • OpenAPI

  • Oracle

  • PostgreSQL

  • Power BI

  • Redshift

  • Reltio

  • Salesforce

  • Sigma

  • Snowflake

  • Tableau

  • Teradata

  • ThoughtSpot

  • Hive

  • Hive Metastore

If you want to use it for another collector, contact the data.world support team.

STEP 1: Check availability of datasets and projects for the data source

 First, we will check to make sure that the datasets and project for the data source are available and ready for use.

Note

This task is performed in the Catalog sources organization.

  1. In the Catalog Sources organization, browse to the Organization profile page.

  2. From the Resources tab, check and verify that the following datasets and projects are available to you:

    • Project: DDW Catalog Tableau

    • Dataset for the Sandbox organization: DDW Tableau Sandbox

    • Dataset for the Main organization: DDW Tableau Main

    Important

    Organizations with different names! If your organizations are named differently, the datasets will be named accordingly. For example, if the organization is named Global, the dataset name will be DDW Tableau Global.

    locate_dataset_project.png

STEP 2: Run the collector

Note

This task is performed in the Catalog sources organization.

  1. Run the collector.

    1. If you are running the collector using CLI or YAML files, make sure:

      1. --site:The name for the data.world site into which you will load this catalog. For example, for https://YourSiteName.app.data.world/orgName, use --site="YourSiteName". This parameter should not be used for the multi-tenant or VPC instances.

      2. --account: Is set to point to the Catalog sources organization.

      3. --upload-location: Is set to the corresponding dataset for the collector available in the Catalog Sources organization. For example, for Tableau, it will be ddw-tableau-catalog-sandbox for the DDW Tableau Sandbox dataset or ddw-tableau-main for the DDW Tableau Main dataset depending on which organization you want the output to show in.

        Make sure you provide the ID of the dataset and not the name. The dataset ID can be found in the dataset URL from your browser's address bar. For instance, in the dataset URL: https://data.world/8bank/ddw-datasource-sandbox, the dataset ID is ddw-datasource-sandbox, and this is the value you should input for this parameter. If you need to provide value in the format account/dataset ID - the value will be 8bank/ddw-sandbox.

    2. If you are running the collector using the Cloud collectors, make sure:

      1. Make sure you run the Cloud collector in the Catalog sources organization.

      2. In the Configure the collector window, set the Automatic upload location as the corresponding dataset for the collector available in the Catalog Sources organization. For example, for Tableau, it will be ddw-tableau-catalog-sandbox for the DDW Tableau Sandbox dataset or ddw-tableau-main for the DDW Tableau Main dataset depending on which organization you want the output to show in.

        Make sure you provide the ID of the dataset and not the name. The dataset ID can be found in the dataset URL from your browser's address bar. For instance, in the dataset URL: https://data.world/8bank/ddw-datasource-sandbox, the dataset ID is ddw-datasource-sandbox, and this is the value you should input for this parameter. If you need to provide value in the format account/dataset ID - the value will be 8bank/ddw-sandbox.

  2. Once the collector has run successfully, browse to the collector specific dataset in the Catalog Sources organization. In this example, it is the DDW Tableau Sandbox dataset or DDW Tableau Main dataset. Verify that the output of the collector has been added to the dataset.

    view_output_added_to_dataset.png

STEP 3: Add the metadata profile for the collector

Note

This task is performed in the Catalog configuration organization.

  1. In the Catalog Configuration organization, go to the Metadata profiles collection.

  2. On the Overview tab, click the Edit button and from the Source tab of the window, select Tableau. Click Save.

  3. Next, in the Catalog Configuration organization, go to the ddw-catalogs dataset. Locate the sync source-modules.ttl file in the dataset and click the Sync now button. This will load your selected source modules into Catalog Config.

  4. You will see a collection with the name of the data source (in this case Tableau) which has all the MDP configurations related to the data source is added to the organization.

    view_tableau_collection.png

STEP 4: Adjust the Collector metadata profile for your business needs

Note

This task is performed in the Catalog configuration organization.

Now we will review the contents of the Metadata profile to see if you want to make any changes. When you complete STEP 4, the MDP for the collector are added to 2 places in the organization.

  1. The collection specific for the collector, for example, Tableau collection: This collection is created to preserve a pristine copy of the MDP related to the collector and it should be only used for reference and not edited.

  2. The Metadata profiles collection: An entity must be in the Metadata profile collection in order for it to sync to your destination organizations for the collector outputs. Therefore, if you want to hide an entity or relationship in your destination organization without deleting it, it can simply be removed from this Metadata profile collection.

    remove_field_from_collection.gif

STEP 5: Publish the MDP and collected metadata to end users

Note

This task is performed in the Main or Sandbox organization.

  1. In the Main or Sandbox organization, go to the ddw-catalogs dataset.

  2. In the dataset, find the matching file for the collector, and click Sync now.

    deploy_.png

View the results🎉

  •  You can now browse to the Sandbox or Main organization and view the metadata collected by the Tableau collector.

    enable_deployment05.png