Skip to main content

Platform analytics

A collection of resources that track and analyze various aspects of catalog usage and user behavior. It includes daily summaries of dataset interactions, user activity, API usage, and metadata management. Provides comprehensive insights into platform engagement, resource utilization, and customer trends. These tables are essential for monitoring the overall health and performance of the platform.

customer_agents_dim table

Captures details of customer agents/users, including agent ID, display name, email, onboard date, and organization. Tracks whether an agent account is registered and if it belongs to a non-paid metrics organization. Useful for understanding agent onboarding and registration status within the catalog.

Table 1.

Column

Description

AGENTID

Unique identifier for each user within a customer's environment.

CUSTOMER

Identifier for the customer to whom the agent's daily activity pertains.

DATE

The calendar date when the agent's daily activity metrics were recorded.

DISPLAYNAME

Identifies the name of the user associated with the user account in the customer's environment.

EMAIL

The email address associated with the individual user agent.

ENVIRONMENT

This column contains the specific environment where the customer's agent operates.

NON_PAID_METRICS_ORG

Whether the customer entity is not associated with paid services

ONBOARD_DATE

The date when the agent (user or organization) joined data.world.

ORG

The org that the individual user account has authorization to.

REGISTERED_ACCOUNT

SITEID

Unique identifier for the data.world Private Instance where an agent's activity takes place.



customer_daily_agent_facts table

Provides a daily summary of activity metrics for each agent (user) within a customer’s environment. Each row corresponds to a single agent’s activity on a specific date. The table includes various metrics such as page views, UI events, API events, session details, and engagement indicators. It allows customers to monitor and analyze user behavior and usage within their data.world catalog over time.

Table 2.

Column

Description

AGENTID

Unique identifier for each user within a customer's environment.

API_EVENTS

Records the number of API events triggered by an agent daily.

AVG_LENGTH_OF_SESSION_IN_MILLISECONDS

The averange length of sessions in milliseconds.

AVG_NUMBER_OF_UI_ACTIONS_PER_SESSION

The average number of UI actions per session.

CUSTOMER

Identifier for the customer to whom the agent's daily activity pertains.

DATE

The calendar date when the agent's daily activity metrics were recorded.

DISPLAYNAME

Identifies the name of the user associated with the user account in the customer's environment.

EMAIL

The email address associated with the individual user agent.

ENVIRONMENT

This column contains the specific environment where the customer's agent operates.

IS_RAD

Is the user recently active.

LAST_ACTIVE_DAY

When was the user last active.

NON_PAID_METRICS_ORG

Whether the customer entity is not associated with paid services.

NUMBER_OF_SESSIONS

The count of sessions an agent has had in a day. A session is defined as any actions or cluster of actions bounded by inactive periods of 30-minutes or more. A session is measured as beginning upon the earliest event and ending upon the last event.

PAGEVIEWS

Represents the number of page views an agent had on a specific date.

SITEID

Unique identifier for the data.world Private Instance where an agent's activity takes place.

TOTAL_ACTIVE_DAYS_WITHIN_LAST_MONTH

The total number of active days within the last month.

TOTAL_LENGTH_OF_STAY_IN_MILLISECONDS

The total time an agent (user) was active on data.world during a specific day, expressed in milliseconds.

UI_EVENTS

Tracks user interface interactions made by an agent daily.



customer_daily_facts table

Provides a daily summary of key activity metrics at the customer level, without individual user data. Each row corresponds to a specific customer’s activity on a given day, capturing overall usage and engagement within their environment. Key metrics include the number of active users (daily, weekly, monthly), search counts, cumulative assets created, and RAD (Recent Active Days) metrics. This data is essential for understanding customer engagement trends and overall platform utilization.

Table 3.

Column

Description

CUMULATIVE_DATABASESOURCES_CREATED

Total count of databases connections created by a customer till the referenced date.

CUMULATIVE_DATASETS_CREATED

Total number of datasets created by a customer till the current date.

CUMULATIVE_LIVE_ASSETS

Total count of live assets a customer has accumulated till that day.

CUMULATIVE_LIVE_ASSETS_15DAYSAGO

Cumulative count of live assets the customer had 15 days prior.

CUMULATIVE_LIVE_ASSETS_30DAYSAGO

Total live assets a customer had 30 days ago.

CUMULATIVE_LIVE_ASSETS_7DAYSAGO

Cumulative live assets of customer 7 days ago.

CUMULATIVE_LIVE_ASSETS_STATUS_15DAYSAGO

Status of cumulative live assets for a customer 15 days ago.

CUMULATIVE_LIVE_ASSETS_STATUS_30DAYSAGO

Status of customer's cumulative live assets 30 days ago.

CUMULATIVE_LIVE_ASSETS_STATUS_7DAYSAGO

Snapshot of live assets status count, 7 days prior to given day.

CUMULATIVE_QUERIES_CREATED

Total number of queries created by a customer till the given date.

CUMULATIVE_SEARCH_COUNT

Total count of searches performed by a customer till the specified date.

CUMULATIVE_SEARCH_COUNT_15DAYSAGO

Total search count for a specific customer 15 days prior.

CUMULATIVE_SEARCH_COUNT_STATUS_15DAYSAGO

Total search count status 15 days ago from the customer's daily activity.

CUSTOMER

Identifies the specific customer in the daily metrics summary.

DAILY_ACTIVE_USERS

Count of unique users active on a customer's environment per day.

DAILY_ACTIVE_USERS_15DAYSAGO

Number of users active per day 15 days prior, for each customer.

DAILY_ACTIVE_USERS_30DAYSAGO

Count of users active daily, captured 30 days ago.

DAILY_ACTIVE_USERS_7DAYSAGO

Number of active users for a specific customer 7 days ago.

DAILY_ACTIVE_USERS_STATUS_15DAYSAGO

Status of daily active users 15 days ago for a specific customer.

DAILY_ACTIVE_USERS_STATUS_30DAYSAGO

Status of daily active users 30 days ago for each customer.

DAILY_ACTIVE_USERS_STATUS_7DAYSAGO

Status of daily active users 7 days prior for each customer.

DAILY_SEARCH_COUNT

Number of searches by a customer on a specific day.

DAILY_SEARCH_COUNT_ZERO_RESULTS

Daily count of customer searches that yielded no results.

DAILY_SEARCH_COUNT_ZERO_RESULTS_RATE

Daily rate of searches yielding zero results for each customer.

DATE

The specific day corresponding to the customer's activity summary.

ENVIRONMENT

Refers to the specific environment where customer activity takes place.

MONTHLY_ACTIVE_USERS

Number of users active on a monthly basis for a specific customer.

MONTHLY_ACTIVE_USERS_15DAYSAGO

Count of active users in the customer's environment 15 days ago.

MONTHLY_ACTIVE_USERS_30DAYSAGO

Number of users active 30 days prior for each customer's daily record

MONTHLY_ACTIVE_USERS_7DAYSAGO

Number of users actively using the platform 7 days ago.

MONTHLY_ACTIVE_USERS_90DAYSAGO

Count of active users for a customer from 90 days prior to the current date.

MONTHLY_ACTIVE_USERS_STATUS_15DAYSAGO

Status of monthly active users 15 days prior, related to a specific customer.

MONTHLY_ACTIVE_USERS_STATUS_30DAYSAGO

Status of monthly active users 30 days ago for each customer.

MONTHLY_ACTIVE_USERS_STATUS_7DAYSAGO

Status of monthly active users 7 days ago for each unique customer.

MONTHLY_ACTIVE_USERS_STATUS_90DAYSAGO

Status of monthly active users 90 days ago for a specific customer day.

NON_PAID_METRICS_ORG

Metrics related to non-paid activities performed by the organization.

PROVISIONED_AGENTS

Number of provisioned agents.

RADS

Represents the Recent Active Days for a specific customer's activity.

RADS_15DAYSAGO

Count of the customer's recent active days 15 days prior.

RADS_30DAYSAGO

Number of recent active days for a customer 30 days ago.

RADS_7DAYSAGO

Number of Recent Active Days for a customer 7 days ago.

RADS_NOT_API_ONLY

Metrics of recent active days excluding API-only activities.

RADS_STATUS_15DAYSAGO

Customer's RAD status 15 days ago, part of customer daily engagement metrics.

RADS_STATUS_30DAYSAGO

Status of customer's recent active days 30 days ago.

RADS_STATUS_7DAYSAGO

Shows the customer's RAD status 7 days ago.

SITEID

Unique identifier for a specific customer's activity.

WEEKLY_ACTIVE_USERS

The total count of weekly active users for a specific customer.

WEEKLY_ACTIVE_USERS_15DAYSAGO

Number of users who were active a week and a half prior.

WEEKLY_ACTIVE_USERS_30DAYSAGO

Number of users who were active weekly 30 days ago.

WEEKLY_ACTIVE_USERS_7DAYSAGO

Number of users active from 7 days ago on a weekly basis.

WEEKLY_ACTIVE_USERS_STATUS_15DAYSAGO

Status of weekly active users 15 days prior for a specific customer.

WEEKLY_ACTIVE_USERS_STATUS_30DAYSAGO

Weekly active user status of a customer 30 days ago.

WEEKLY_ACTIVE_USERS_STATUS_7DAYSAGO

Status of weekly active users from seven days ago.



daily_agent_facts table

Captures the daily actions and engagement metrics for data.world “agents” (users and organizations). Each row represents a summary of key activities for an agent on a specific date, including API events, UI interactions, page views, queries, project creation, and more. This table is essential for analyzing user behavior, tracking engagement, and generating aggregations across organizations, teams, or other groupings. It can be joined with dimension tables to provide detailed insights into daily user activities and trends on the platform.

Table 4.

Column

Description

ACTION_EVENTS

Records the number of action type events by a user for a specific date.

AGENTID

Unique identifier for each agent (user) in data.world.

API_EVENTS

Records the count of API events executed by a data.world user for the measured calendar date.

AVG_LENGTH_OF_SESSION_IN_MILLISECONDS

Avg. duration of user sessions on a specific date, in milliseconds.

AVG_NUMBER_OF_UI_ACTIONS_PER_SESSIONS

Average number of user interface interactions per session.

DATASETS_CREATED

Number of datasets created by an agent on a specific date.

DATE

The specific date on which agents' interactions and activities were recorded.

DOWNLOADS

Counts the number of daily downloads by individual agents.

ENVIRONMENT

Specifies the operating environment for the agents' daily activities.

NUMBER_OF_SESSIONS

Counts daily sessions of a specific agent on the platform.

ONBOARD_DATE

The date when the agent (user or organization) joined data.world.

PAGEVIEWS

Represents the number of page views an agent had on a specific date.

PROJECTS_CREATED

Number of projects created by an agent on a specific date.

QUERIES_CREATED

Counts the number of queries created by an agent in a single day.

QUERIES_RUN

Number of queries run by an agent in a day.

SITEID

Unique identifier for each agent (user/organization) in the platform.

TOTAL_LENGTH_OF_STAY_IN_MILLISECONDS

Total time spent by an agent on the platform in milliseconds.

TYPE

Specifies the category of the agent in data.world, like users or organizations.

UI_EVENTS

Tracks daily UI interactions performed by agents.



daily_api_client_facts table

Provides daily metrics on API usage by customer and agent. Each row represents an agent’s API activity for a specific customer on a given day, capturing details such as the number of API acquisitions and events. This data is crucial for tracking API adoption and usage patterns across different environments and clients, offering insights into how customers and agents interact with the API over time.

Table 5.

Column

Description

AGENTID

Unique identifier for the agent in the API usage record.

API_ACQUISITIONS

we measure an 'API Acquisition' anytime someone uses a new API client for the first time

API_CLIENT

examples, data-dot-world-tableau, datadotworld-excel-addon, powerbi, etc.

API_EVENTS

count of all api_events from yellowbird.events grouped by api_client (properties:clientid::string)

CUSTOMER

Identifier for the customer associated with the agent's API activity.

CUSTOMER_NAME

The name of the customer associated with the daily API usage metrics.

DATE

The date on which the specific API activity by an agent for a customer took place.

EMAIL

Email of the user associated with the API activity

ENVIRONMENT

Specifies the environment where the client's API activity took place.

SITEID

Unique identifier for the data.world Private Instance where the API activity took place.



daily_catalog_collector_fact table

Records metrics for each run of a catalog collector job. Each row represents a single run, capturing key data such as the start and end times, the customer and organization details. This table is helpful for monitoring the performance and efficiency of catalog collector jobs.

Table 6.

Column

Description

AVERAGE_TIME_IN_MINUTES

Avg. duration of catalog collector job run, in minutes.

CATALOG

the value of the catalogIri in the event.

COLLECTOR

the value of the collectorIri in the event. For e.g. Redshift, Tableau, Snowflake collector etc. This represents the type of collector that was run.

CUSTOMER

Identifies the customer related to the catalog collector job run.

DATE

date in utc. This is derived from the timestamp metadata field form Segment events. Please see here - https://segment.com/docs/connections/spec/common/

DWCC_CATALOGER

the dwcc scope cataloger name

DWCC_VERSION

The version of the catalog that was run

ENVIRONMENT

p, m, v, g etc. The single letter identifying the main environment.

EVENT_COUNT

total count of events

FIRST_TIME_RUN

Timestamp when the collector was first run by a specific customer.

LAST_TIME_RUN

Timestamp of the last time the catalog collector job was run.

MAX_TIME_IN_MINUTES

Maximum time in minutes taken by a catalog collector job run.

MIN_TIME_IN_MINUTES

Shortest duration of a catalog collector job run in minutes.

ORG

Represents the organization associated with a catalog collector job run.

RUN_END_TIME

The completion timestamp of each catalog collector job run.

RUN_START_TIME

Timestamp indicating when a catalog collector job run started.

SITEID

The site scope of the dwcc. The production environment can have multiple sites.

TOTAL_RUNS

Total number of catalog collector job runs.



daily_catalog_resources_pages_facts table

Captures detailed metrics on user interactions with metadata assets within the catalog. Each row represents activity related to a specific resource by an agent on a given date, including views, edits, creates, deletes, and suggestions (submitted, approved, and denied). This data is valuable for understanding user engagement and the lifecycle of metadata resources.

Table 7.

Column

Description

AGENTID

the userid who took the action.

CREATES

Tracks the number of resource creation activities by users per day.

CUSTOMER

the customer to which the agent belonged

DATE

the date derived from the timestamp value when the event happened. The timestamp itself is a metadata field that is available in raw segment events. Please see here - https://segment.com/docs/connections/spec/common/#timestamp

DELETES

Counts the number of times a resource was deleted by an agent in a specific day.

EDITS

count of edit events that happened.

ENVIRONMENT

a single letter signifying the environment derived from segmentid value. p for production, m for mckinsey, n for indeed etc.

ORG

the org in which the resource belongs.

OVERVIEW_TAB_VIEWS

Tracks the daily count of user views on the overview tab of catalog resources.

REQUESTER

Identifier for the user who interacted with a metadata resource.

RESOURCE_PAGE_VIEWS

Count of the times a resource page was viewed in the catalog each day.

RESOURCEID

the unique iri value of the resource on which the action occurred.

RESOURCENAME

the name of the resource on which the action occurred.

RESOURCETYPE

the type of the resource. This is derived from the entryTypeLable field in the properties object.

SITEID

string signifying the siteid unique identifier.

SUGGESTIONS_APPROVED

Counts daily approved suggestions for specific metadata resources.

SUGGESTIONS_DENIED

Count of suggestions denied for a specific resource on a given date.

SUGGESTIONS_SUBMITTED

count of suggestions that were submitted on the resource. For e.g. if there is business glossary metadata resource, then how many suggestions to update the definitions where submitted

VIEWS

count of views events that happened.



daily_dataset_facts table

Provides a daily summary of activity related to individual datasets within the catalog, including metrics such as downloads, queries saved, etc. This data is important for tracking how frequently and in what ways datasets are being accessed and utilized.

Table 8.

Column

Description

AUTH_REQUESTS

Number of authorization requests for a dataset on a specific calendar date.

BOOKMARKS

Count of bookmarks placed on a dataset on a specific calendar date.

CUSTOMER

Identifies the customer that owns the dataset.

DATASETID

Unique identifier for individual datasets within the catalog.

DATE

The date on which the dataset activities occurred.

DOWNLOADS

Count of the times the dataset was downloaded on a specific calendar date.

ENVIRONMENT

The data.world environment where the dataset resides.

OWNERID

Unique agentid for the owner of the dataset in the catalog. The owner may be an individual user or an org.

PAGEVIEWS

Number of times a dataset was viewed on a particular day.

QUERIES_RUN

Tally of queries executed on a dataset for a particular day.

QUERIES_SAVED

Count of queries saved per dataset each day.

SITEID

Unique 5-digit identifier for the data.world Private Instance where the dataset resides.



daily_download_facts table

Provides a daily summary of download activities performed by agents within the catalog. Each row records the details of downloads, including the type of resource, the agent involved, and the number of downloads. This data is useful for tracking and analyzing download patterns and resource utilization.

Table 9.

Column

Description

AGENTID

Identifier of the user who performed the download activity.

CUSTOMER

Identifier for the customer owning the data.world resource associated with the download action.

DATE

The date when the download activities were performed.

DISPLAYNAME

The name displayed for each agent involved in the daily download activities.

EMAIL

Agent's email address involved in the download activity.

ENVIRONMENT

Specifies the data.world environment where the agents performed download activities.

FILELABELS

Labels related to file types.

FILENAME

Name of the file downloaded in the catalog by the agent.

N_DOWNLOADS

The number of daily downloads performed by an agent within the catalog.

OWNER

The agentid identifier for the user or org that owns the downloaded resource.

RESOURCE

The identifier for the downloaded resource. This identifier is unique per owner.

RESOURCEID

The unique identifier for the downloaded resource. Composed of the following elements: owner/resource.

SITEID

Unique identifier for the data.world Private Instance where the download activity occurred.

TYPE

Indicates the type of download context. Eg, file, dataset, query-result, bulk export.



daily_dwec_asset_facts table

Provides counts and cumulative totals of live and overall metadata assets created, broken down by date, customer, owner, and asset type. Tracks both daily creation metrics and the running total of assets within a given environment and site, offering insights into the growth and management of metadata assets over time.

Table 10.

Column

Description

ASSET

CUMULATIVE_LIVE_ASSETS

Total count of live metadata assets in a customer's catalog on a specific calendar date.

CUMULATIVE_TOTAL_ASSETS

cumulative sum of live and staged assets -- partitioned by customer, owner, asset

CUSTOMER

Identifier for the customer associated with the metadata assets.

DATE

The date when metadata assets facts were recorded.

ENVIRONMENT

Specifies the environment where metadata assets are managed.

LIVE_ASSETS_CREATED

Count of live metadata assets created each day.

OWNER

the org where the asset resides

SITEID

Unique 5-digit identifier for the specific data.world Private Instance where metadata assets reside.

TOTAL_ASSETS_CREATED

Total count of metadata assets created till the date in respective environment and site.



daily_search_facts table

Tabulates the daily count of unique search phrases categorized by search phrase, search type, number of results, selected facets, and agent ID. This data provides insights into search behaviors in the catalog.

Table 11.

Column

Description

AGENTID

The account identifier for the user that executed the search; the AGENTID is unique at the level of the siteid

CUSTOMER

The customer environment or Private Instance where the search was executed

DATE

The calendar date when the search was executed

ENVIRONMENT

The environment where the search was executed

NUM_RESULTS

The number of results the search returned

SEARCH_COUNT

The number of times that exact search was executed on the recorded date

SEARCH_TYPE

The method of search entry; examples include search view results, search bar suggestion click, search bar full search, search view submit and a few others that are unlisted here.

SEARCH_VALUE

The inputted search terms

SELECTED_FACETS

The filters, or facets, selected by the user in the search interface

SITEID

The unique five-digit identifier representing the Private Instance



files_from_latest_dataset_dim table

Lists files from the latest dataset version, including file name, creation time, size, discoverability, and status. Also includes dataset ID, name, version, agent ID, and environment context (e.g., production). Files can be materialized (downloadable) or virtualized (non-downloadable).

Table 12.

Column

Description

AGENTID

The identifier for the individual user or org that owns the specified dataset.

DATASET_NAME

Name of the dataset from the latest version in FILES_FROM_LATEST_DATASET_DIM table.

DATASET_VERSIONID

Identifies the version of the dataset.

DATASETID

Unique identifier of the dataset pertaining to the listed files.

ENVIRONMENT

Identifies the data.world environment.

FILE_CREATED_TIME

Timestamp indicating when each file in the dataset was created.

FILE_CREATEDBY

Identifies the agentid for the user that created the file in the latest dataset version.

FILE_MATERIALIZED_OR_VIRTUALIZED

Indicates if the file is materialized or virtualized.

FILE_NAME

The name of the file from the latest version of the dataset.

FILE_SIZE_IN_BYTES

Represents the size of each file from the latest dataset, in bytes.

FILE_STATUS

Indicates the current status of a file in the latest dataset version.

IS_FILE_DISCOVERABLE

Indicates whether a file from the latest dataset version is discoverable.

SITEID

Unique 5-digit identifier for the data.world Private Instance where the file is located.



org_membership_dim table

Tracks individual user account memberships within organizational groups. Captures direct and indirect (through group membership) associations between users and organizations. Includes details on authorization levels, visibility settings, and timestamps for when memberships were created or updated. Useful for analyzing and managing organizational roles and permissions.

Table 13.

Column

Description

AGENTID

the agentid for the individual user account

CUSTOMER

the customeragentid for the customer account

ENVIRONMENT

Identifies the data.world environment where the org is located.

LEVEL_AGENTID_TO_ORGGROUP

authorization level for the individual user to the orggroup ("ADMIN", "DISCOVER", "MEMBER", "READ" or "WRITE")

LEVEL_AGENTID_TO_SECONDARY_GROUP

authorization level for the individual user to the secondary group ("ADMIN", "READ" or "WRITE")

LEVEL_ORGGROUP_TO_ORG

authorization level for the orggroup to the org ("ADMIN", "NONE" or "READ")

LEVEL_SECONDARY_GROUP_TO_ORGGROUP

authorizaton level for the secondary group to the orggroup ("ADMIN", "MEMBER", "READ" or "WRITE")

ORG

The org that the individual user account has authorization to.

ORGGROUP

a group that has direct membership to an org

SECONDARY_GROUP

a group having direct membership to an orggroup

SITEID

Unique 5-digit identifier for the data.world Private Instance.

TS_AGENTID_TO_ORGGROUP

timestamp in UTC when authorization for individual user to orggroup was created or last updated

TS_AGENTID_TO_SECONDARY_GROUP

timestamp in UTC when authorization for individual user to secondary group was created or last updated

TS_ORGGROUP_TO_ORG

timestamp in UTC when authorization for orrgroup to org was created or last updated

TS_SECONDARY_GROUP_TO_ORGGROUP

timestamp in UTC when authorization for secondary_group_to_orggroup was created or last updated

VISIBILITY_AGENTID_TO_ORGGROUP

visibility setting for the user's authorization to the orggroup ("PRIVATE", "PUBLIC" or null)



provisioned_seats_by_date table

Lists currently active provisioned seats by date, agent ID, and customer. Includes details such as onboard date and whether the seat is part of a non-paid metrics organization.

Table 14.

Column

Description

AGENTID

Identifier for the user. Unique at the environment-level and siteid-level.

CUSTOMER

Designates the customer associated with the provisioned seats.

DATE

The date(s) when the users were recorded as having a provisioned seat.

DISPLAYNAME

Denotes the name listed for the user having the provisioned seat.

EMAIL

Email address associated with the user having the provisioned seat.

ENVIRONMENT

Identifies the data.world environment in which the provisioned seat is active.

NON_PAID_METRICS_ORG

Indicates if the seat is part of a non-paid metrics organization.

ONBOARD_DATE

The date when the user's data.world account was originally created.

SITEID

Unique 5-digit identifier for the data.world Private Instance.



registered_account_dates_dim table

Tracks registration dates for accounts, including agent ID and agent type (e.g., organization). Provides a daily record of when each account was registered, useful for monitoring account creation trends.

Table 15.

Column

Description

AGENTID

Identifier for the user. Unique at the environment-level and siteid-level.

AGENTTYPE

The type of agent. For example, organization.

DATE

The calendar date when the agent's daily activity metrics were recorded.

ENV

Identifies the data.world environment.

ENVIRONMENT

Identifies the data.world environment in which the provisioned seat is active.

SITEID

Unique 5-digit identifier for the data.world Private Instance.