Setting up Popularity Analytics

What is Popularity Analytics?

Popularity analytics helps data consumers find the trending or popular resources that are available in the application. It helps data producers and stewards focus their time and energy on the things that provide the most value, and spend less time (or even remove / curate away) content that is less valuable.

Pre-requisites: Metrics Dataset

Popularity Analytics uses the Metrics of the organization to identify the popularity of the resources. Before you begin you must make sure have the Metrics dataset is configured in your organization. For details, see Metrics for the catalog.

STEP 1: Configure the metadata profile for popularity

To configure the metadata profile for people:

Prepare the mdp-popularity.ttl file by doing the following:
1. Copy the following ttl file contents in a text editor.
2. Replace yourorgname with the name of your organization.
3. For dwec:forType property, specify the resources for which you want to enable the Popularity field. For example, set the value as dwec:DwDataset, dwec:DatabaseTable, dwec:DatabaseColumn, dwec:BusinessTerm ; to enable for datasets, tables, columns, and business terms.
4. Save the file as mdp-popularity.ttl on your machine.
Upload the mdp-popularity.ttl file to the ddw-catalogs dataset.
Note
This file goes in the ddw-catalogs dataset.
Note
Success! Browse to the resource in your organization for which you enabled the field (for example, dataset) and you should see a new Popularity option.

STEP 2: Create Popularity Analytics Project and Link to Metrics Dataset

In this section, we create the Popularity Analytics project and link it with the Metrics dataset in the organization.

To create the project:

Create a project called ddw-popularity-analytics.
On the Project details page, click the Launch Workspace button.
Link the dataset ddw-metrics (this name will be different for your organization) to the project by clicking Add > Dataset.

STEP 3: Configure Eureka Automation for Metadata Popularity

STEP A: Add and Run the Metadata Popularity SQL Query

Note

What does this query do?

The query creates a valuation scores based on 2 metrics - number of views and number of edits on the metadata resources. By default, both views and edits have the same weights. The valuation score is calculated for two different time periods. By default, it is 14 and 60 days. The popularity rank is calculated by a cumulated distribution over the valuation score. Based on the popularity rank, the metadata resources are categorized as - Popular, Unpopular, Rising. A metadata resource is popular if it's the top 20th percentile (the default value), unpopular if it's in the bottom 20th percentile and rising if was unpopular in the last 60 days but it is popular in the last 14 days.

The default values can be changed.

You can find technical details about this query in the reference information section.

To add and run the SQL query:

In the ddw-popularity-analytics project, add a new SQL Query.

Save the following query with the name metadata-popularity

DECLARE ?views_wt AS INTEGER = 1
DECLARE ?edit_wt AS INTEGER = 1
DECLARE ?daysbackStart AS INTEGER = 14
DECLARE ?daysbackFinish AS INTEGER = 60
DECLARE ?popularity AS DECIMAL = .2

WITH valuationStart as (
SELECT resourcetype, resource,
  sum(views * ?views_wt + edits * ?edit_wt) AS valuation_score,
  sum(views) as views,
  sum(edits) as edits
FROM events_catalog_resources_pages_activity_by_day
where date > DATE_SUB(now(), ?daysbackStart, "day")
GROUP BY resourcetype, resource
),
valuationFinish as (
SELECT resourcetype, resource,
  sum(views * ?views_wt + edits * ?edit_wt) AS valuation_score,
  sum(views) as views,
  sum(edits) as edits
FROM events_catalog_resources_pages_activity_by_day
where date > DATE_SUB(now(), ?daysbackFinish, "day")
GROUP BY resourcetype, resource
),
rankStart as (
SELECT
    resourcetype, resource,
    valuation_score,
    CUME_DIST() OVER (PARTITION BY resourcetype ORDER BY valuation_score DESC) as popularity_rank
FROM valuationStart
),
rankFinish as (
SELECT
    resourcetype, resource,
    valuation_score,
    CUME_DIST() OVER (PARTITION BY resourcetype ORDER BY valuation_score DESC) as popularity_rank
FROM valuationFinish
)
select 
    finish.resource,
    finish.resourcetype,
    finish.popularity_rank as popularity_rank_finish,
    start.popularity_rank as popularity_rank_start,
    if(start.popularity_rank <= ?popularity,"Y","N") as popular,
    if(start.popularity_rank >= (1-?popularity),"Y","N") as unpopular,
    if(start.popularity_rank <= ?popularity and finish.popularity_rank > ?popularity,"Y","N") as rising
from rankFinish finish
left join rankStart start on  finish.resource = start.resource and finish.resourcetype = start.resourcetype
order by finish.popularity_rank asc

Next, run the query and save the result of the query as a file in the ddw-popularity-analytics project.
1. In the Query results page, click Download and select Save to dataset or project.
2. In the Choose how data updates window, select the Use data extract option.
3. In the Save data extract window, enable the Maintain link to saved query and save the file with the name metadata-popularity.
  Note
  Success!
  You will now see the metadata-popularity file under the Project Files.

STEP B: Add Eureka Automation SPARQL query for Metadata Popularity

In the ddw-popularity-analytics project, add a new SPARQL Query.

Add the following query. Replace yourorgname with the name of your organization.

PREFIX popularity: <https://yourorgname.linked.data.world/d/ddw-popularity-analytics/>
PREFIX : <https://yourorgname.linked.data.world/d/ddw-catalogs/>

CONSTRUCT {
    ?iri :popularity ?popularityAnnotation .
}
 WHERE {

    [   a popularity:tbl-metadata_popularity ;
        popularity:col-metadata_popularity-resource ?iri ; 
        popularity:col-metadata_popularity-popular ?popular ;
        popularity:col-metadata_popularity-unpopular ?unpopular ;
        popularity:col-metadata_popularity-rising ?rising ;
    ]  .

    VALUES ?nil { UNDEF }
     BIND(IF(?popular = "N" && ?unpopular = "N" && ?rising = "Y", :popularity_rising,
         IF(?popular = "N" && ?unpopular = "Y" && ?rising = "N", :popularity_unpopular,
         IF(?popular = "Y" && ?unpopular = "N" && ?rising = "N", :popularity_popular, ?nil))) AS ?popularityAnnotation)

 }

Save the query with the name popularity-metadata.ttl.rq

STEP C: Run the Eureka Automation for Metadata Popularity

Run the popularity-metadata.ttl.rq SPARQL query from the previous step.
From the Results page, select Download the results > Save to dataset or project.
In the Save data extract window:
1. The query name will automatically appear. Just delete the suffix .rq
2. Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.
  Note
  This file goes in the ddw-catalogs dataset.

STEP 4: Configure Eureka Automation for Dataset Popularity

STEP A: Add and Run the Dataset Popularity SQL Query

Note

What does this query do?

The query creates a valuation scores based on 5 metrics - page views, queries run, downloads, bookmarks, authorization requests of a dataset resources. Each metric has a corresponding weight. The valuation score is calculated for two different time periods. By default, it is 14 and 60 days. The popularity rank is calculated by a cumulated distribution over the valuation score. Based on the popularity rank, the dataset resources are categorized as - Popular, Unpopular, Rising. A dataset resource is popular if it's the top 20th percentile (the default value), unpopular if it's in the bottom 20th percentile and rising if was unpopular in the last 60 days but it is popular in the last 14 days.

The default values can be changed.

You can find technical details about this query in the reference information section.

In the ddw-popularity-analytics project, add a new SQL Query.

Save the following query with the name dataset-popularity.

DECLARE ?daysbackStart AS INTEGER = 14
DECLARE ?daysbackFinish AS INTEGER = 60
DECLARE ?queryrun_wt AS INTEGER = 5
DECLARE ?download_wt AS INTEGER = 5
DECLARE ?bookmark_wt AS INTEGER = 10
DECLARE ?authreq_wt AS INTEGER = 10
DECLARE ?popularity AS DECIMAL = .2

WITH valuationStart as (
SELECT resource,
  sum(pageviews + queries_run * ?queryrun_wt + downloads * ?download_wt + bookmarks * ?bookmark_wt + auth_requests * ?authreq_wt) AS valuation_score,
  sum(pageviews) as pageviews,
  sum(queries_run) as queries_run,
  sum(downloads) as downloads,
  sum(bookmarks) as bookmarks,
  sum(auth_requests) as auth_requests
FROM events_dataset_activity_by_day
WHERE
date > DATE_SUB(now(), ?daysbackStart, "day")
 AND resource NOT LIKE "ddw%"
GROUP BY resource
), 
valuationFinish as (
SELECT resource,
  sum(pageviews + queries_run * ?queryrun_wt + downloads * ?download_wt + bookmarks * ?bookmark_wt + auth_requests * ?authreq_wt) AS valuation_score,
  sum(pageviews) as pageviews,
  sum(queries_run) as queries_run,
  sum(downloads) as downloads,
  sum(bookmarks) as bookmarks,
  sum(auth_requests) as auth_requests
FROM events_dataset_activity_by_day
WHERE
date > DATE_SUB(now(), ?daysbackFinish, "day")
 AND resource NOT LIKE "ddw%"
GROUP BY resource
), 
rankStart as (
SELECT
    resource,
    valuation_score,
    CUME_DIST() OVER (ORDER BY valuation_score DESC) as popularity_rank
FROM
    valuationStart
), 
rankFinish as (
SELECT
    resource,
    valuation_score,
    CUME_DIST() OVER (ORDER BY valuation_score DESC) as popularity_rank
FROM
    valuationFinish
)
select 
    finish.resource,
    finish.popularity_rank as popularity_rank_finish,
    start.popularity_rank as popularity_rank_start,
    if(start.popularity_rank <= ?popularity,"Y","N") as popular,
    if(start.popularity_rank >= (1-?popularity),"Y","N") as unpopular,
    if(start.popularity_rank <= ?popularity and finish.popularity_rank > ?popularity,"Y","N") as rising
from rankFinish finish
left join rankStart start on  finish.resource = start.resource 
order by finish.popularity_rank asc

Next, run the query and save the result of the query as a file in the ddw-popularity-analytics project.
1. In the Query results page, click Download and select Save to dataset or project.
2. In the Choose how data updates window, select the Use data extract option.
3. In the Save data extract window, enable the Maintain link to saved query and save the file with the name dataset-popularity.
  Note
  Success! You will now see the dataset-popularity file under the Project Files.

STEP B: Add Eureka Automation SPARQL query for Dataset Popularity

In the ddw-popularity-analytics project, add a new SPARQL Query.

Add the following query. Replace yourorgname with the name of your organization at three places in the query.

Note

Organization name needs to be updated at 3 places!

Please note that the organization name in this query has to be updated in 3 lines in this query.

PREFIX popularity: <https://yourorgname.linked.data.world/d/ddw-popularity-analytics/>
PREFIX : <https://yourorgname.linked.data.world/d/ddw-catalogs/>
PREFIX dwec: <https://dwec.data.world/v0/>
PREFIX dct: <http://purl.org/dc/terms/>

CONSTRUCT {
    ?iri a dwec:DwDataset.
    ?iri dct:identifier ?resource.
    ?iri :popularity ?popularityAnnotation .
}
 WHERE {

    [   a popularity:tbl-dataset_popularity ;
        popularity:col-dataset_popularity-resource ?resource ; 
        popularity:col-dataset_popularity-popular ?popular ;
        popularity:col-dataset_popularity-unpopular ?unpopular ;
        popularity:col-dataset_popularity-rising ?rising ;
    ]  .

    VALUES ?nil { UNDEF }
     BIND(IF(?popular = "N" && ?unpopular = "N" && ?rising = "Y", :popularity_rising,
         IF(?popular = "N" && ?unpopular = "Y" && ?rising = "N", :popularity_unpopular,
         IF(?popular = "Y", :popularity_popular, ?nil))) AS ?popularityAnnotation)
    BIND(IRI(CONCAT(STR("https://yourorgname.linked.data.world/d/"),  ?resource)) as ?iri)

 }

Save the query with the name popularity-dataset.ttl.rq.

STEP C: Run the Eureka Automation for Dataset Popularity

Run the popularity-dataset.ttl.rq SPARQL query from the previous step.
From the Results page, select Download the results > Save to dataset or project.
In the Save data extract window:
1. The query name will automatically appear. Just delete the suffix .rq
2. Save the file to the ddw-catalogs dataset of your organization. You must set the file to auto-sync every hour.
  Note
  This file goes in the ddw-catalogs dataset.

View the results🥳

Now that you have set up the popularity analysis for the datasets and metadata resources, you can:

Browse to the resources to see them automatically categorized as Popular, Unpopular, and Rising!
Click the Edit button in the More information section, to edit the popularity!
On various product pages such as Search results Resources, use the Popularity filter to narrow down lists by popularity scores attached to the resources.

In this section:

Setting up Popularity Analytics

What is Popularity Analytics?

Pre-requisites: Metrics Dataset

STEP 1: Configure the metadata profile for popularity

Note

Note

STEP 2: Create Popularity Analytics Project and Link to Metrics Dataset

STEP 3: Configure Eureka Automation for Metadata Popularity

STEP A: Add and Run the Metadata Popularity SQL Query

Note

Note

STEP B: Add Eureka Automation SPARQL query for Metadata Popularity

STEP C: Run the Eureka Automation for Metadata Popularity

Note

STEP 4: Configure Eureka Automation for Dataset Popularity

STEP A: Add and Run the Dataset Popularity SQL Query

Note

Note

STEP B: Add Eureka Automation SPARQL query for Dataset Popularity

Note

STEP C: Run the Eureka Automation for Dataset Popularity

Note

View the results🥳

Search results