Skip to main content

Creating advanced searches manually

This article delves into the advanced search operators used in the data.world search bar. While many searches can be performed using available filters on the search page, this guide provides insights on manually building advanced search queries for more precise results. It is designed for advanced users who need a higher level of control and specificity in their searches.

How to use operators for specific searches

The following table shows how to use various operators to perform specific and complex searches on data.world, utilizing their syntax and keywords effectively.

Note

All the links in the following examples open search results in the data.world open data community. For best experience, make sure you are logged in to the application when you click these links.

If you want to

Use

Find all projects owned by "siyeh".

owner:siyeh AND resourcetype:projects

Finds all insights owned by "siyeh".

owner:siyeh AND resourcetype:insight

Finds all datasets that are tagged "health" and have the word 'shelter' somewhere in them.

type:dataset AND tag:health AND shelter

Finds all datasets tagged as "economics" and created between the specified dates.

type:dataset AND tag:"economics" AND created:{2021-01-01 TO 2021-07-03}

Find all datasets owned by "democorp" that have the status "deprecated" .

type:dataset and owner:democorp AND status:deprecated

Find all datasets that have a specific tech owner.

Note: This shows how to use custom metadata to form complex searches.

type:dataset AND metadata:"Tech Owner:brenda griffith"

Find all datasets verified by a specific user.

Note: This shows how to use custom metadata to form complex searches.

type:dataset AND metadata:"verified by:Sarah Smart"

Find all columns available in files in datasets and projects with the term "county" in them.

county AND type:"dataset column"

Find all table columns in the catalog with the term "sales_agent" in the column name or description.

type:column AND "sales_agent"

Find all business terms with the word "bill" in the name or description.

type:"business term" AND bill

Find everything verified by a specific user.

metadata:"verified by:Sarah Smart"

Find everything that has a value set for a specific custom metadata fields defined for your catalog.

hasMetadata:"Steward"

Find everything that does not have a value set for a custom metadata fields defined for your catalog.

NOT hasMetadata:"Steward"

Find everything that has a value/does not have a value set for the following standard data.world fields.

title: For datasets, metadata resources, columns, users, and organizations.

has:title

NOT has:title

tag: For datasets, metadata resources, and columns

has:tag

NOT has:tag

bio: For users and organization

has:bio

NOT has:bio

datatype: For columns

has:datatype

NOT has:datatype

status: For datasets, metadata resources, and columns

has:status

NOT has:status

summary: For datasets and metadata resources

has:summary

NOT has:summary

description: For datasets, metadata resources, and columns

has:description

NOT has:description

Do targeted search for words in resource titlte, description, summary

Exact matches

title:Sales reports"

description:"customer shipment address us state"

summary:"customer shipment address us state"

Partial matches

summary:state: Will search for resources with state in the summary

description:"*state name": Will search for resouces with the word state and name in desctiption.

title:8bank: Title contains the term 8bank

title:"*space station": Will search for resources with the word space and station in title.

metadata:Submitted by:Jane Smith": Searches for exact matches on the Submitted by field with the value Jane Smith

metadata:”Steward:*Juan”: Searches for matches on the Steward field that contain the term Juan

Complex searches

Combining search operators is a powerful way to restrict search results and really drill down through the data to find what you want. Here are some examples of complex searches created by combining operators:

Note

All the links in the following examples open search results in the data.world open data community. For best experience, make sure you are logged in to the application when you click these links.

Table 1.

Operator

Description

Example searches

AND

Default operator; combines multiple terms.

AND is the default operator; no need to specify it explicitly if you are doing a simple text search.

colony collapse: Returns results containing both "colony" and "collapse".

owner:siyeh AND resourcetype:insight: Finds all insights written by a specific person.

resourcetype:dataset AND owner:dave: Finds all datasets owned by anyone with 'dave' in their ID or display name.

extension:jpeg and resourcetype:project: Find all projects which include files with the .jpeg extension.

OR

Returns results with either one term or the other.

Can be used multiple times in a search string to broaden the search results.

sales or order : Returns results containing either "sales" or "order".

sales or order or analysis Also returns results containing "analysis"

"sales analysis" or "sales_order": "sales analysis" or "sales_order": Returns results containing "sales analysis" or "sales_order".

"sales analysis" or "sales_order" or "order"

NOT

Excludes items from search results.

Cannot be used in complex searches without specifying keyword grouping. Use it to refine search results by excluding unwanted terms.

wildlife NOT refuge: Excludes results containing "refuge".

wildlife NOT refuge NOT "us-doi-gov": Further excludes results containing "us-doi-gov".

Combining Operators

Combination of AND, OR, NOT

Combining operators can refine search results significantly. Carefully group terms to ensure accurate search results. Complex searches require precise grouping to avoid incorrect parsing.

For example the search string bee AND pesticide OR colony AND collapse could be parsed in a few of different ways including:

(bee AND pesticide) OR (colony AND collapse) - all results that either have bee and pesticide or have colony and collapse.

bee AND (pesticide OR colony) AND collapse - all results that have bee and either pesticide or colony and also have collapse.

bee AND (pesticide OR (colony AND collapse)) - all results that have bee and either pesticide or both colony and collapse.

Exact Matches

Searching for exact phrases.

Searching for exact matches in complex searches also requires careful construction of the search string to get the desired results.

university degree OR high school diploma: Will not return desired results because of the lack of grouping

"university degree" OR "high school diploma": All results containing either the exact phrase "university degree" or "high school diploma"

(university degree) OR (high school diploma): All results have either the terms university and degree (together or separate in any order or location), or the terms high, school, and diploma (also together or separate in any order or location).

 "university degree" OR "high school diploma": Ensures exact matches for the complete phrases.



Operators and keywords guide

Keywords can be used with data.world-specific operators to further refine your searches. These operators have a consistent syntax: operator:keyword. There is no space after the colon (operator:keyword, not operator: keyword), and double quotes are required to match strings with spaces, hyphens, or underscores (for example, operator:"key word")

Table 1.

Aspect

Description

Example searches

General Syntax

Syntax of an operator search is operator:keyword. No space after the colon and use double quotes for strings with spaces, hyphens, or underscores.

operator:keyword 

operator:"key word"

Ensure proper syntax for accurate results.

Created and Updated

Use to find items based on creation or last update date. Timestamps are UTC.

created:>2022-01-22 Finds items created after Jan 22, 2022.<br> 

updated:>=2021-07-01 Finds items updated on or after Jul 1, 2021

created:<2021-07-01: Finds items created before Jul 1, 2021. 

created:{2021-07-01 TO 2021-07-03} Finds items created between July 1 and July 3, 2021 (not inclusive).

Created and Updated yesterday

Use to find resources created/updated a day before or since the last day.

created:today: Resources that were created on the same calendar day as today.

updated:yesterday: Resources created on or before the same calendar day as yesterday.

created:<=yesterday: Resources that were updated on the same calendar day as yesterday.

updated:>yesterday: Resources updated after yesterday. (The same as updated:today.)

Created and Updated (Relative)

Use relative time syntax to find items created or updated within a specific number of past days. Only days is supported as a time unit.

created:{last 1 day}: Resources that were created in the last 24 hours.

updated:{last 7 days}: Resource that were updated in the last 7 days.

Extension

Search for datasets/projects with specific file extensions. Exact-match only; '.' is optional.

Search for all resources with .jpg files: extension:jpg, extension:"jpg", or extension:.jpg

Search for all projects with .jpeg files: extension:jpeg and resourcetype:project, extension:.jpeg and resourcetype:project, or extension:"jpeg" and resourcetype:project

File

Search for datasets/projects with specific files or filenames.

file:damage: Finds projects/datasets with files named "damage"

type:file and damage Finds files with "damage" in the name.

Owner, Creator, and Contributor

Search for resources owned or created by users/orgs, or where the user is a contributor.

owner:dave: Resources owned by any user/org with 'dave' in their name.

owner:"dave", owner:@dave, owner:"@dave": Resources owned by any user or organization with the exact display name or ID 'dave'.

owner:"dave griffith": Resources owned by Dave Griffith.

creator:@stateofny: Resources created by user with login stateofny.

contributor:dave, contributor:"dave": All resources contributed by user dave.

Resourcetype

Search for specific types of resources (such as dataset, project, insight, file, table, query, catalogTable, catalog, term, datatype, analysis, catalogEntry, collection). Useful in conjunction with other search strings.

resourcetype:project: Searches for projects.

 resourcetype:project and sales: Searches for projects with "sales" in the name.

Status

Filter results based on status.

status:approved: Finds all resources with "approved" status.

status:approved AND type:dataset: Finds datasets with "approved" status.

status:deprecated and type:dataset and owner:democorp: Finds datasets owned by democorp in "deprecated" status.

Table

The table operator is used to find datasets and projects with specific tables in them. It looks for tablular data either in table or as a sheet in a spreadsheet.

table:income: All datasets and projects with tables with "income"in the name.

resourcetype:table and income:

Tables with the word "income" in them. This returns a list of tables.

table:"austin_animal_center_outcomes"

All datasets and projects with the exact table name "austin_animal_center_outcomes" in them.

resourcetype:table and "austin_animal_center_outcomes" Tables with the exact name "austin_animal_center_outcomes". This returns a list of tables.

Tag

Search for datasets/projects with specific tags. Supports partial and exact matches.

tag:property: Finds datasets/projects with 'property' in their tags.

tag:"property tax": Exact match for the tag 'property tax'.

tag:"property" not tag:"land": Excludes results that have both 'property' and 'land' tags.

User and Org

Search for users or organizations containing specific strings in their name or ID. The character "@" restricts to exact matches of the ID.

@denver: Exact match for user/org.

org:denver: Any org with "denver" in name or ID.

user:"dave": Users with exact name 'dave'.

user:"dave griffith": User with display name Dave Griffith.

Visibility

Verify data permissions.

visibility:private: Finds all private resources owned by you or your org.

visibility:open: Finds all public resources on data.world.



A comprehensive guide to tokenization in searches

Tokenization is the process of breaking down text into smaller elements, or tokens, that can be analyzed individually. Understanding how tokenization works in search engines can help you refine your searches to get the most accurate results.

Table 1.

Aspect

Description

Example Searches

Hyphen and Underscore Handling

Hyphens and underscores are tokenized and not read as themselves in general searches, except in exact match searches.

animal_center  animal-center

Both return the same results because hyphens and underscores are treated as spaces during tokenization.

Space Handling

Spaces are tokenized differently and impact search results.

animal center

Returns a different set of results compared to animal_center and animal-center because spaces separate words into distinct tokens.

Exact Match Searches

To search for exact strings, use quotes around the terms.

"animal-center" "animal_center"  "animal center"

Each search returns different results, reflecting exact matches for the given string. Quotes enforce exact string matching, so hyphens, underscores, and spaces are respected.

Exact Table Searches

Using exact match searches for table names requires a specific format.

table:"bee_colony_census_data_by_county"

Returns exact matches for the specified table name. Ensure the entire table name is within quotes for an exact match.