Skip to main content

Key terms and concepts

Table 1.

Term

Definition

Advanced search

Advanced search is a feature that allows users to refine their search queries and get more specific results. Access advanced search by clicking the Advanced search button in the search bar.

All members group

This group encompasses every member of the organization by default. Users in this group do not have access to view any catalog resources or datasets and projects. They can only access datasets and projects that are shared with the organization or are set as Discoverable. The access level of this group can be adjusted to align with your business needs. Removing a member from this group is the same as removing the member from the organization.

API

An API (Application Programming Interface) is a set of rules or protocols that enables software applications to communicate with each other to exchange data, features and functionality. The data.world API provides a structured interface for interacting with its capabilities via interactive API endpoints, requiring API tokens for secure authentication.

Archie Bots

Archie Bots are AI-powered features in data.world that simplify data discovery, enrichment, and analysis. They integrate large language models with data.world knowledge graph, enabling enterprise users to perform AI-assisted searches, auto-generate descriptions for data assets, receive guided research questions, and convert natural language queries into SQL. These capabilities help users quickly find and understand data, making data management and analysis more efficient and accessible.

Authors group

Users in this group are automatically authorized to add new projects, datasets and catalog resources within the organization. Members of this group can contribute content but do not have administrative privileges over the organization management. The access level of this group cannot be changed.

Business Glossary

A collection of business terms unique to each organization in data.world, helping in standardizing concepts and vocabulary. Every organization within data.world maintains its own unique glossary.

Business terms

Business Terms are a special catalog  resource type that allows you to set a standardized and agreed-upon definition of a specific concept or vocabulary used within the organization.

Browse card for application home page

A Browse card for application home page is a customizable navigation card designed to guide users swiftly to important organizations, collections, datasets, projects, and other resources in the organizations. It appears on the application home page and can be personalized with multiple sections, each containing links to internal resources or external sites. This makes it a versatile component for enhancing user navigation and resource discovery across the platform. Administrators can adjust the card's layout, content, and links through the user interface or a configuration file.

Browse card for organization profile page

A Browse card for organization profile page is a customizable navigation card designed to guide users swiftly to important collections, datasets, projects, and other resources within an organization. It appears on the organization profile page and can be personalized with multiple sections, each containing links to internal resources or external sites. This makes it a versatile component for enhancing user navigation and resource discovery within the organization. Administrators can adjust the card's layout, content, and links through a configuration file.

Catalog Toolkit (CTK)

A UI based tool for configuring the catalog UI to suit your business needs. It allows you to add custom fields, sections, resources, and create glossary term subtypes. The Catalog Toolkit is set up with four organizations: Catalog Configuration, where you create the metadata profile; Catalog Sources, for data upload; Catalog Sandbox, a preview or QA environment; and Catalog Main, the production organization seen by end-users.

Collection

Collections are organization-defined groupings of resources. Every resource will belong to one or more collections. The two most common types of collections are source collections that correlate to a source system such as Snowflake or Databricks, and domain collections that represent how your business is organized such as Marketing, Finance, Product, etc.

Beyond grouping resources together, collections allow you to manage access and stewardship by the content group.

Columns

A column is an individual attribute or characteristic within a table. Columns represent specific pieces of data within a table, such as names, dates, quantities, or any other relevant information. They  provide a way to categorize and organize data, enabling you to filter and sort information based on these specific attributes.

Database view

A database view is a table that presents a tailored subset of information from one or more tables. In other words, it offers a customized perspective on the data stored in tables. By creating views, you can simplify complex data models, provide data security, and present a unified and simplified view of relevant information to different stakeholders within your organization.

Data extraction

Data extraction is a method used to retrieve and store data from various external sources internally. This approach is ideal for handling small datasets under 3GB, which enhances query performance. The extracted data is securely encrypted and stored within the platform. To ensure data accuracy and currency, it requires regular synchronization with the source, which can be managed through manual uploads or set to automatic on an hourly, daily, or weekly basis.

Data virtualization

Data virtualization provides real-time access to data directly from its original sources, without needing to import it. This method sets up secure connections to data sources and allows queries to be executed by external systems. The data stays updated, mirroring any changes at the source immediately, making it suitable for datasets that exceed the platform's size limits. Data virtualization facilitates seamless access to the latest data without the need for local storage or frequent synchronization.

Data catalog

A data catalog is a structured collection of data that allows an organization to find and manage its data. It includes metadata that describes the location, structure, and quality of data as well as information about data usage, relationships, meaning, and lineage. A data catalog acts like a central repository, storing all crucial information about datasets, helping users to discover, organize, access, understand, and use the available data.

Datasets

Basic repositories for data files, metadata, documentation, scripts, and other supporting resources, used for sharing and use in projects. They store and document data for later sharing and use in projects, and are the basic repository for data files and associated metadata, documentation, scripts, and any other supporting resources.

Hoots

Hoots are visual indicators added to reports, dashboards, or documents, displaying data health via a color scheme: green for healthy data, yellow for potential issues, and red for serious problems. Hoots help users quickly understand the reliability of the data without additional research.

Insights

Insights allow you to capture the conclusions from your work on projects, packaging them up in a way that quickly communicates a nugget of information, while giving the viewer the tools they need to dig down into your methods and sources. Use insights to capture the results and analysis of your work and synthesize them so they are understandable and accessible to stakeholders at all levels in the project.

Instance administrator

Instance Administrator is a user with elevated permissions allowing them to access the Admin Portal. This portal provides functionalities for managing advanced administrative tasks such as promoting users to administrators, auditing user access, deactivating and reactivating user accounts. Instance Administrators can also configure and manage UI elements like Browse cards and customize UI branding. To become an Instance Administrator, a user must be granted the necessary privileges by existing administrators or through data.world support.

IRI

An IRI (Internationalized Resource Identifier) is a unique identifier within data.world, essential for referencing various elements and relationships in a knowledge graph. Every dataset and project with a defined metadata profile features a technical reference page. This page includes an IRI for resources, metadata fields, relationships, and other elements. By using IRIs, you can seamlessly interact with the data.world API, enhancing the ability to manage your catalogs.

Knowledge graph

A knowledge graph organizes and links your data into a visual map, where nodes are key entities such as databases, tables, and columns, and edges represent the relationships between them. This graph structure accumulates and conveys the knowledge, enabling users to efficiently navigate and integrate information from various sources. It transforms complex data connections into an intuitive, searchable network, enhancing data discovery and decision-making processes in your organization.

Metadata field

Metadata fields play a crucial role in organizing and making sense of the collected data in the catalog. Metadata fields in a data catalog might include: Name, Description, Summary, Confidentiality Classification, Steward, etc. The application includes pre-configured metadata fields. However, to further customize your user interface, you also have the capability to create your custom metadata fields.

Metadata profile

The Metadata Profile (MDP) is a configuration tool, written in Turtle syntax, and available for customize the user interface to meet specific business needs. The MDP is a graph file written in Turtle syntax and added to the catalog configuration dataset, ddw-catalogs. It can add and alter custom metadata sections, fields, asset statuses, resources, resource types, and even relationships between resources. These changes allow users to adapt, extend, and tailor the system to their unique use case.

Metadata sections

A specific part of the user interface displaying metadata field. There are three default metadata sections: Informational, Technical, and People. In addition to these default sections, users can add custom metadata sections. When adding custom sections, users must define the custom fields that will appear in these sections. Custom metadata sections will only display when the metadata fields within the section have values or are marked as Mandatory.

Organization

Organizations are spaces where content (data, metadata resources, and business glossary) and access to the content is managed. Installations that have Catalog Toolkit are typically configured with four organizations. - Catalog configuration, Catalog sources, Sandbox, Main, each with a unique purpose. End Users of the catalog typically interact with the Main organization.

Organization administrator group

This is a group with the highest level of access within an organization. Members of this group can manage all datasets, projects, catalog resources, and organization settings, billing, and member groups. There must always be at least one member in the group, and the access level of this group cannot be changed.

Projects

Projects are where all querying, analysis and discussion of data takes place in data.world. Data in different datasets can be used for many different projects, but each project contains all and only the data that is relevant for that project. The information in a project can come from datasets, files attached directly to the project, insights written by the project's team members about the data and the project, and discussions about the project.

QueryQuerying the data

A query is a precise request for information retrieval from data sources or catalogs. Queries can be executed using SQL or SPARQL to filter, analyze, and extract specific data based on your requirements. This process involves specifying conditions and instructions that the data source interprets and executes to return the requested information. Queries are essential for dynamic data analysis and can range from simple data retrieval to complex analyses involving multiple datasets. They are executed within the Workspace, where users can create, save, and share their queries to facilitate collaborative exploration and analysis.

ResourceManage Resources

All discrete items that can be acted upon in data.world are considered resources. These resources can either be metadata or data resources. For example, business terms, collections, datasets, projects, and insights are all resources.

Search

A feature enabling users to find resources across data.world, including auto-suggestions and the last eight accessed resources.

In systems where Archie bot is enabled, users can also use Archie Bot to discover data with a chat-like interface to quickly search, get suggested filters, and refine your results.

Sentries

Sentries act as automated monitors that listen for updates from external processes such as pipelines, monitoring tools, or APIs. When a sentry detects an issue, for example, a pipeline failure, it updates the corresponding Hoot, which then informs users of the data status in real time. This proactive approach helps data teams communicate effectively and reduces the need for manual status checks.

SQL

SQL (Structured Query Language) is a standardized language used for managing and querying relational databases. In data.world, SQL plays a crucial role in enabling users to interact with and analyze structured data stored within datasets to perform a querying, inserting, updating, and deleting data. SQL is distinguished by its ability to handle structured data, where data is organized into tables consisting of rows and columns.

SPARQL

SPARQL is a powerful query language used to retrieve, modify and make the best use out of linked data. In data.world, SPARQL is essential for querying complex datasets and leveraging semantic data integration and analysis. It enables users to write detailed queries to extract and analyze data from RDF graphs, making it an invaluable tool for advanced data operations.

Tables

A table is a resource that contains specific types of information related to a particular topic. For example, you might have a table for customer details, sales transactions, or inventory records. Tables ensure structured storage and easy retrieval of data, allowing you to efficiently manage and analyze vast amounts of information.

Tags

Tags are descriptive keywords or labels applied to metadata resources, datasets, and projects. These labels serve as an efficient tool for organizing, categorizing, and identifying specific characteristics of the data. They facilitate prompt search and discovery of data assets by enabling users to filter and locate relevant resources based on the associated tags.

User access

User access refers to the rules and settings that govern what actions users can perform and what resources they can view or interact with on the data.world platform. These permissions are crucial for maintaining the security and integrity of data, ensuring that sensitive information is only accessible to authorized users. Access levels can range from viewing and querying data to editing and managing datasets, projects, and organizational settings. In organizations, administrators can create user groups with specific permission sets, simplifying the management of access rights for users.

User groups

User groups are a collection of users that share similar functions or roles in an organization. Groups make it much easier to manage members and their access requirements.

User profile

Individual pages for users in data.world, showcasing associated resources, organizations, activity, and followers, and allowing users to follow others. Each user in data.world has a profile page in the UI where other users can see associated resources, organizations, activity, and followers. This page is also where users can follow other users from.

Workspace

Workspace is an interactive environment where users can manage, analyze, and collaborate on projects. It serves as the central hub for all activities related to a specific project or dataset, providing tools and features for data exploration, query execution, visualization creation, and file management. Workspaces are designed to support both individual and collaborative work, allowing users to share insights, document and discuss findings, and contribute to projects in real time.

Within a workspace, users can access and manipulate data using SQL or SPARQL queries, connect to data sources, and utilize insights for data visualization.