Skip to main content

About the Databricks Publisher automation

Important

This automation is available only for customers that have purchased the Data Governance Premium tier.

Configure the automation to publish Databricks comments and tags from data.world. This automation allows you to update comments and tags on Databricks column and table resources in data.world and publish these updates back to Databricks either automatically or manually using a simple button click.

databricks_automation.png

Frequently asked questions

General questions

  • What types of resources can be updated?

    You can update the following information for Databricks tables and columns:

    • Descriptions for Databricks tables and columns in data.world will be published to Databricks as comments.

    • Metadata fields you select for Databricks tables and columns in data.world will be published to Databricks as tags.

      Note: Only custom metadata fields can be published. Standard fields such as title, description, summary, and relationships are not supported.

  • Can I publish these changes automatically to Databricks?

    Yes, when you configure the automation, you have the option to automatically publish changes to Databricks when they are saved in data.world. Alternatively, you can provide users with a Publish Metadata to Databricks button to manually publish the changes to Databricks. When automatic updates are enabled, any changes made to the description and metadata fields from the UI (on individual resource pages or through the bulk update/upload flow) will be published automatically.

  • How quickly do changes publish to Databricks?

    Changes usually publish to Databricks immediately upon refreshing.

  • Can I set up multiple instances of Databricks Publisher automations?

    No, only one Databricks Publisher automation should be set up per organization.

FAQs about publishing descriptions

  • What is the Source of truth for Databricks comments?

    Upon enabling the Databricks Publisher automation, data.world becomes the Source of truth for table and column descriptions (called comments in Databricks). Users setting up the automation implicitly accept this premise. Adding a description in data.world and subsequently removing it in Databricks will not remove it from data.world—even after a more recent collector run. Therefore, descriptions should be applied or removed in data.world to ensure they are correctly reflected in Databricks.

  • Are users notified when they publish the descriptions for tables and columns from data.world?

    Yes, users are notified when they publish the comments for tables and columns from data.world. The user updating the descriptions receives notification emails when the descriptions are successfully updated in Databricks or if an error occurs during the update.

    Important

    These notification emails are only sent when the user has the Confirmation notifications enabled.

FAQs about publishing custom metadata as tags

  • What is the Source of truth for tags?

    Upon enabling the Databricks Publisher automation, data.world becomes the Source of truth for table and column tags (metadata fields in data.world). Users setting up the automation implicitly accept this premise. Adding a metadata field in data.world and subsequently removing it in Databricks will not remove it from data.world—even after a more recent collector run. Therefore, tags (metadata fields) should be applied or removed in data.world to ensure they are correctly reflected in Databricks.

    Note

    You can create and manage your individual tags directly in Databricks. However, it is important to avoid creating tags with the same key names as those published from data.world metadata fields, as these will be overwritten by the automation.

  • How do tags get added to Databricks?

    When the automation is configured, selected metadata fields in data.world are converted and published to Databricks as tags.

    1. For example, a metadata field called Data Steward with the value of John Doe will be transformed to Databricks as a tag with the key data_steward and the value John Doe.

    2. If you have a metadata field with multiple values, they will be published as two separate tags with their respective values. For example, metadata field Geography with the values Paris and London will be published as two tags - geography:Paris and geography:London.

  • How do metadata fields get converted?

    Metadata field names are converted to lowercase. If a metadata field name contains whitespace and special characters, they are replaced with underscores (_). The metadata field values remain unchanged.

  • What are the limitations when publishing metadata fields to Databricks as tags?

    Databricks has a set of rules applied to their tags, which you can learn more about on the Databricks tags constraints page.

  • Are users notified when they update the tags for tables and columns in data.world?

    Yes, users are notified when they update the tags for tables and columns in data.world. The user updating the metadata fields in data.world receives notification emails when the tags are successfully updated in Databricks or if an error occurs during the update.

    Important

    These notification emails are only sent when the user has the Confirmation notifications enabled.