Skip to main content

Apply column count tag to tables

Goal of this Eureka automation

With this automation we will tag tables based on the numbers of columns in them.

Note

Before you begin -

Ensure your organization has cataloged database tables and they appear in your catalog.

flow_eu_automation_comp.png

STEP 1: Prepare a seed file and add to a dataset

First we will prepare the seed file:

  1. Create a seed file that assigns a tag based on a minimum and maximum columns in a table. In this example, if tables have between 0 and 10 columns, assign the tag low. If tables have between 11 and 50 columns, then tag medium, and so on. You can add/edit this seed file as you wish.

    column-count-tag.seed.csv

    min,max,tag
    0,10,low
    11,50,medium
    51,1000,high

Now, we will create a dataset and add the seed file to it:

  1. Create a dataset and name it ddw-eureka-column-count-tag.

  2. Upload the seed file to the ddw-eureka-column-count-tag dataset.

STEP 2: Add dataset and SPARQL query to the Eureka Automations Project

Note

These tasks are done in the ddw-eureka-automations project created here.

  1. Go to the project ddw-eureka-automations.

  2. On the Project details page, click the Launch Workspace button.

  3. Link the dataset ddw-eureka-column-count-tag created in STEP 1 to the project by clicking Add > Dataset > .

  4. Create a new SPARQL Query by clicking +Add > SPARQL Query.

  5. Add the following query. Replace yourorgname with the name of your organization.

    PREFIX : <https://yourorgname.linked.data.world/d/ddw-eureka-column-count-tag/>
    PREFIX dwec: <https://dwec.data.world/v0/> 
    PREFIX dct:  <http://purl.org/dc/terms/>
    
    CONSTRUCT{
        ?table dwec:textTag ?ColumnTag
    }
    WHERE {
        {
            select (count(*) as ?total) ?table
            where {
            ?table a dwec:DatabaseTable.
            ?column  a dwec:DatabaseColumn.
            ?column dct:isPartOf ?table.
            }
            group by ?table
        }
        ?x a :tbl-column_count_tag_seed;
            :col-column_count_tag_seed-min ?min;
            :col-column_count_tag_seed-max ?max;
            :col-column_count_tag_seed-tag ?tag;
        .
        filter(?min <= ?total && ?total <= ?max)
        BIND(IF(?min <= ?total && ?total <= ?max, ?tag, "Unknown") as ?ColumnTag)
    
    }
  6. Save the query with:

    1. Name ea-column-count-tag.result.ttl.rq

    2. Description Depending on the number of columns a table has, tag with High, Medium or Low tag, based on the seed file.

STEP 3: Run your Eureka automation

  1. Execute the SPARQL query added in Step 2.

  2. From the Results page, select Download the results > Save to dataset.

  3. Set file name. The query name will automatically appear. Just delete the suffix >.rq

  4. Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.

    Note

    This file goes in the ddw-catalogs dataset.

View the Results! 🎉

Go to your organization profile page and from the Resources tab, view the tags available for filtering.

tags.png

Note

🎉 Congrats!Your tables are automatically tagged based on numbers of columns in them. You can use these tag to narrow down your lists.

filter_on_tags.png