Skip to main content

Reference information

Technical details about the query for calculating metadata popularity

Note

This section captures the technical details of the query used for calculating popularity analytics for metadata resources.

Metadata popularity is based on a ranking of a valuation based on the views and edits of a metadata resource.

The views and edits can have a weight assigned to them: views_wt and edit_wt

The valuation score is calculated for two time periods, daysbackStart and daysbackFinish, in the common table expression called valuationStart and valuationFinish:

  • For each day (a row in events_catalog_resources_pages_activity_by_day table) in the corresponding period (daysbackStart, daysbackFinish), multiply each of the selected metrics (views and edits) by the declared weight and add them together, generating the daily valuation score for that resource.

  • Sum all the daily valuation totals, grouped by the metadata resource type which is the resulting valuation_score.

The popularity ranking is also calculated for two time periods, daysbackStart and daysbackFinish in the common table expression called rankStart and rankFinish:

  • For each value in valuationStart and valuationFinish, calculate the cumulative distribution (using CUME_DIST()) of the valuation score within a group of all valuations scores which is the resulting popularity_rank.

A metadata resource is defined by comparing the popularity_rank to a popularity threshold.

  • popular if it ranks in the top popularity-th percentile

  • unpopular if it ranks in the bottom popularity-th percentile

  • rising if the rank increases over a window of time: not popular daysbackFinish days ago, but popular daysbackStart days ago

The default values are

  • views_wt = 1

  • edit_wt = 1

  • daysbackStart = 14

  • daysbackFinish = 60

  • popularity = .2

Technical details about the query for calculating Dataset popularity

Note

This section captures the technical details of the query used for calculating popularity analytics for datasets.

Dataset popularity is based on a ranking of a valuation based on the page views, queries run, downloads, bookmarks, authorization requests of a dataset resource. Each of these have a corresponding weight.

The valuation score is calculated for two time periods, daysbackStart and daysbackFinish, in the common table expression called valuationStart and valuationFinish:

  • For each day (a row in events_dataset_activity_by_day table) in the corresponding period (daysbackStart, daysbackFinish), multiply each of the selected metrics by the declared weight and add them together, generating the daily valuation score for that resource.

  • Sum all the daily valuation totals, grouped by the metadata resource type which is the resulting valuation_score.

The popularity ranking is also calculated for two time periods, daysbackStart and daysbackFinish in the common table expression called rankStart and rankFinish:

  • For each value in valuationStart and valuationFinish, calculate the cumulative distribution (using CUME_DIST()) of the valuation score within a group of all valuations scores which is the resulting popularity_rank.

A dataset resource is defined by comparing the popularity_rank to a popularity threshold.

  • popular if it ranks in the top popularity-th percentile

  • unpopular if it ranks in the bottom popularity-th percentile

  • rising if the rank increases over a window of time: not popular daysbackFinish days ago, but popular daysbackStart days ago

The default values are

  • queryrun_wt = 5

  • download_wt = 5

  • bookmark_wt = 10

  • authreq_wt = 10

  • daysbackStart = 14

  • daysbackFinish = 60

  • popularity = .2