Generate glossary terms and relate to tables and columns
Goal of this Eureka automation
This automation runs against the cataloged database tables and generate a list of glossary terms and relates them with the appropriate tables and columns in the organization.
Note
Before you begin - Ensure your organization has cataloged database tables and they appear in your catalog.
STEP 1: Add automation SPARQL query to the Eureka Automations Project
Note
The SPARQL query goes in the ddw-eureka-automations project create here.
Go to the project ddw-eureka-automations.
On the Project details page, click the Launch Workspace button.
Create a new SPARQL Query by clicking +Add > SPARQL Query.
Add the following query. Replace yourorgname with the name of your organization.
PREFIX : <https://yourorgname.linked.data.world/d/ddw-catalogs/> prefix dcat: <http://www.w3.org/ns/dcat#> . PREFIX fn: <http://data.world/function/functions#> PREFIX arq: <http://jena.apache.org/ARQ/property#> CONSTRUCT { ?s ?p ?o . :cat.eureka-glossary dcat:record ?record . ?record a dcat:CatalogRecord ; foaf:primaryTopic ?termIRI ; dct:issued ?timestamp ; dct:modified ?timestamp ; . ?termIRI a dwec:BusinessTerm ; dct:identifier ?term ; dct:title ?termTitle ; prov:wasGeneratedBy dwec:Eureka; dwec:hasAssetStatus :EurekaSuggestedStatus; . ?t dwec:subject ?termIRI. } where { { VALUES (?s ?p ?o ) { (:cat.eureka-glossary rdf:type dwec:Catalog ) (:cat.eureka-glossary dct:identifier "eureka-glossary" ) (:cat.eureka-glossary dct:description "Business Glossary created automatically by data.world's Eureka" ) (:cat.eureka-glossary dct:title "Eureka Business Glossary" ) } } union { { Select (count(*) as ?total) ?termIRI where { { ?t a dwec:DatabaseTable; dct:identifier ?id1; . } union { ?t a dwec:DatabaseColumn; dct:identifier ?id1; . } BIND(LCASE(?id1) AS ?label) # standardize to lower case ?y arq:strSplit(?label " ") . ?x arq:strSplit(?y "_") . ?term arq:strSplit(?x "\\.") . FILTER( strlen( ?term ) > 3 ) BIND(IRI(CONCAT(STR(:), "EurekaTerm.", ?term)) as ?termIRI) } group by ?termIRI having (?total > 15) } { ?t a dwec:DatabaseTable; dct:identifier ?id1; . } union { ?t a dwec:DatabaseColumn; dct:identifier ?id1; . } BIND(LCASE(?id1) AS ?label) # standardize to lower case ?y arq:strSplit(?label " ") . ?x arq:strSplit(?y "_") . ?term arq:strSplit(?x "\\.") . BIND(IRI(CONCAT(STR(:), "EurekaTerm.", ?term)) as ?termIRI2) Filter (?termIRI = ?termIRI2) BIND(IRI(CONCAT(STR(:), "Record_", SHA1(CONCAT("EurekaTerm.", ?id1)))) as ?record) BIND(NOW() AS ?timestamp) BIND(CONCAT(UCASE(SUBSTR(?term, 1, 1)), SUBSTR(?term, 2)) as ?termTitle) } }
Save the query with:
Name: ea-generate-glossary-and-relationships-from-databases.result.ttl.rq
Description: Generate a business glossary and automatically create relationships between the Database Tables and Business Glossary terms.
STEP 2: Run the Eureka automation
Execute the SPARQL query from the previous step.
From the Results page, select Download the results > Save to dataset.
Set file name. The query name will automatically appear. Just delete the suffix .rq
Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.
Note
This file goes in the ddw-catalogs dataset.
View the results! 🎉
Browse to the Glossary of your organization.
Note
🎉 Congrats! You will see the list of newly added glossary terms.
Click through to any of the glossary term added using the Eureka automation.
Note
🎉 Congrats! You will see the term has the Status set as Eureka suggested.
Browse to the tables you used for creating a list of glossary terms.
Note
🎉 Congrats! You will see the that the table now shows related business terms.