Skip to main content

Generate relationships between tables and business glossary terms

Goal of this Eureka automation

This automation runs against the all cataloged database tables and all business glossary terms and generates relationships between them.

Note

Before you begin - Ensure your organization has cataloged database tables and business glossary terms and they appear in your catalog.

flow_eu_automation_simple.png

STEP 1: Add automation SPARQL query to the Eureka Automations Project

Note

The SPARQL query goes in the ddw-eureka-automations project created here.

  1. Go to the project ddw-eureka-automations.

  2. On the Project details page, click the Launch Workspace button.

  3. Create a new SPARQL Query by clicking +Add > SPARQL Query.

  4. Add the following query. Replace yourorgname with the name of your organization.

    PREFIX : <https://yourorgname.linked.data.world/d/generate-relationships/>
    PREFIX fn: <http://data.world/function/functions#>
    PREFIX prov: <http://www.w3.org/ns/prov#>
    PREFIX dwec: <https://dwec.data.world/v0/>
    
    construct {
        ?table_IRI  dwec:subject ?term_IRI.
    }
    where {
        ?btType rdfs:subClassOf* dwec:BusinessTerm .
        ?term_IRI a ?btType ;
            dct:title ?BizTerm ;
        .
        MINUS{?term_IRI prov:wasGeneratedBy dwec:Eureka}
        BIND(LCASE(?BizTerm) AS ?BizTerm1) # standardize to lower case
        BIND(REPLACE(?BizTerm1, "[^a-z0-9]", "") AS ?BizTerm_ClusterKey) # collapse whitespace and punctuation
    
        ?table_IRI a dwec:DatabaseTable ;
            dct:identifier ?table_ID ;
        .
        BIND(LCASE(?table_ID) AS ?table_ID1) # standardize to lower case
        BIND(REPLACE(?table_ID1, "[^a-z0-9]", "") AS ?table_ClusterKey) # collapse whitespace and punctuation
    
        ##StringMatching
        BIND(fn:jaroWinklerDistance(?BizTerm_ClusterKey, ?table_ClusterKey) AS ?jw)
        BIND(fn:levenshteinDistance(?BizTerm_ClusterKey, ?table_ClusterKey) AS ?lev)
    
        FILTER(?jw > .8)
    }
  5. Save the query with:

    1. Nameea-generate-relationships.result.ttl.rq

    2. Description: Given a business glossary, automatically create relationships between Database Tables and Business Glossary terms.

STEP 2: Run your Eureka automation

  1. Execute the SPARQL query from the previous step.

  2. From the Results page, select Download the results > Save to dataset.

  3. Set file name. The query name will automatically appear. Just delete the suffix .rq

  4. Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.

    Note

    This file goes in the ddw-catalogs dataset.

View the results! 🎉

  1. Browse to the Glossary of your organization.

    Note

    🎉 Congrats! You will see the list of newly added glossary terms.

  2. Browse to the tables you used for creating a list of glossary terms.

    Note

    🎉 Congrats! You will see the that the table now shows related business terms.