Skip to main content

Generate glossary terms and relate to tables and columns

Goal of this Eureka automation

This automation runs against the cataloged database tables and generate a list of glossary terms and relates them with the appropriate tables and columns in the organization.

Note

Before you begin - Ensure your organization has cataloged database tables and they appear in your catalog.

flow_eu_automation_simple.png

STEP 1: Add automation SPARQL query to the Eureka Automations Project

Note

The SPARQL query goes in the ddw-eureka-automations project create here.

  1. Go to the project ddw-eureka-automations.

  2. On the Project details page, click the Launch Workspace button.

  3. Create a new SPARQL Query by clicking +Add > SPARQL Query.

  4. Add the following query. Replace yourorgname with the name of your organization.

    PREFIX : <https://yourorgname.linked.data.world/d/ddw-catalogs/>
    prefix dcat:  <http://www.w3.org/ns/dcat#> .
    PREFIX fn: <http://data.world/function/functions#>
    PREFIX arq: <http://jena.apache.org/ARQ/property#>
    
    CONSTRUCT {
        ?s ?p ?o .
    
        :cat.eureka-glossary dcat:record ?record .
        ?record a dcat:CatalogRecord ;
            foaf:primaryTopic ?termIRI ;
            dct:issued ?timestamp ;
            dct:modified ?timestamp ;
            .
    
        ?termIRI a dwec:BusinessTerm ;
            dct:identifier ?term ;
            dct:title ?termTitle ;
            prov:wasGeneratedBy dwec:Eureka;
            dwec:hasAssetStatus :EurekaSuggestedStatus;
        .
        ?t dwec:subject ?termIRI.
    
    }
    where {
         {
            VALUES (?s ?p ?o ) {
                (:cat.eureka-glossary    rdf:type                    dwec:Catalog )
                (:cat.eureka-glossary     dct:identifier              "eureka-glossary" )
                (:cat.eureka-glossary     dct:description             "Business Glossary created automatically by data.world's Eureka" )
                (:cat.eureka-glossary     dct:title                   "Eureka Business Glossary" )
            }
         }
         union {
    
        
    
    {
    Select (count(*) as ?total) ?termIRI
    where {
        { 
        ?t a dwec:DatabaseTable;
            dct:identifier ?id1;
        .
        }
        union {
            ?t a dwec:DatabaseColumn;
                dct:identifier ?id1;
        .
        }
        BIND(LCASE(?id1) AS ?label) # standardize to lower case
        ?y arq:strSplit(?label " ") .
        ?x arq:strSplit(?y "_") . 
        ?term arq:strSplit(?x "\\.") .
        FILTER( strlen( ?term ) > 3 )
        BIND(IRI(CONCAT(STR(:), "EurekaTerm.", ?term)) as ?termIRI)
    }
    group by ?termIRI having (?total > 15)
    }
    
        { 
        ?t a dwec:DatabaseTable;
            dct:identifier ?id1;
        .
        }
        union {
            ?t a dwec:DatabaseColumn;
                dct:identifier ?id1;
        .
        }
        BIND(LCASE(?id1) AS ?label) # standardize to lower case
        ?y arq:strSplit(?label " ") .
        ?x arq:strSplit(?y "_") . 
        ?term arq:strSplit(?x "\\.") .
        BIND(IRI(CONCAT(STR(:), "EurekaTerm.", ?term)) as ?termIRI2)
        Filter (?termIRI = ?termIRI2)
    
        BIND(IRI(CONCAT(STR(:), "Record_", SHA1(CONCAT("EurekaTerm.", ?id1)))) as ?record)
        BIND(NOW() AS ?timestamp)
        BIND(CONCAT(UCASE(SUBSTR(?term, 1, 1)), SUBSTR(?term, 2)) as ?termTitle)
    }
     }
  5. Save the query with:

    • Name: ea-generate-glossary-and-relationships-from-databases.result.ttl.rq

    • Description: Generate a business glossary and automatically create relationships between the Database Tables and Business Glossary terms.

STEP 2: Run the Eureka automation

  1. Execute the SPARQL query from the previous step.

  2. From the Results page, select Download the results > Save to dataset.

  3. Set file name. The query name will automatically appear. Just delete the suffix .rq

  4. Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.

    Note

    This file goes in the ddw-catalogs dataset.

View the results! 🎉

  1. Browse to the Glossary of your organization.

    Note

    🎉 Congrats! You will see the list of newly added glossary terms.

  2. Click through to any of the glossary term added using the Eureka automation.

    Note

    🎉 Congrats! You will see the term has the Status set as Eureka suggested.

  3. Browse to the tables you used for creating a list of glossary terms.

    Note

    🎉 Congrats! You will see the that the table now shows related business terms.