Creating advanced searches manually
This article is an advanced look at the search operators used in the search bar. For an introduction to all search capabilities including filtering search results and finding similar data start with the article on finding data.
There is a lot of information on data.world, and finding just the data resource that you're looking for can be a daunting task. Fortunately the robust search options available with the data.world search engine enable you to craft just the right search string to find what you're looking for. From the search bar (located at the top of your homepage and on many other data.world pages), you can search for an entire phrase or for matches on single words. Additionally, you can qualify your search with various operators or perform complex searches combining operators.
Two ways for running advanced searches:
Use the Advanced Search tool. This friendly form helps you construct more complex searches with multiple filters, logical operators, categories, and custom metadata fields.
If you are an expert user and know the advanced queries, you can type them directly in the Search bar. See the following table to get you started on the advanced search queries. Replace the resource types and search terms to adapt the queries for your needs.
How to use operators for specific searches
The following table shows how to use various operators to perform specific and complex searches on data.world, utilizing their syntax and keywords effectively.
Note
All the links in the following examples open search results in the data.world open data community.
If you want to | Use |
---|---|
Find all projects owned by "siyeh". | |
Finds all insights owned by "siyeh". | |
Finds all datasets that are tagged "health" and have the word 'shelter' somewhere in them. | |
Finds all datasets tagged as "economics" and created between the specified dates. | type:dataset AND tag:"economics" AND created:{2021-01-01 TO 2021-07-03} |
Find all datasets owned by "democorp" that have the status "deprecated" . | |
Find all datasets that have a specific tech owner. Note: This shows how to use custom metadata to form complex searches. | |
Find all datasets verified by a specific user. Note: This shows how to use custom metadata to form complex searches. | |
Find all columns available in files in datasets and projects with the term "county" in them. | |
Find all table columns in the catalog with the term "sales_agent" in the column name or description. | |
Find all business terms with the word "bill" in the name or description. | |
Find everything verified by a specific user. | |
Find everything that has a value set for a specific custom metadata fields defined for your catalog. | |
Find everything that does not have a value set for a custom metadata fields defined for your catalog. | NOT hasMetadata:"Steward" |
Find everything that has a value/does not have a value set for the following standard data.world fields. | |
title: For datasets, metadata resources, columns, users, and organizations. | has:title NOT has:title |
tag: For datasets, metadata resources, and columns | has:tag NOT has:tag |
bio: For users and organization | has:bio NOT has:bio |
datatype: For columns | has:datatype NOT has:datatype |
status: For datasets, metadata resources, and columns | has:status NOT has:status |
summary: For datasets and metadata resources | has:summary NOT has:summary |
description: For datasets, metadata resources, and columns | has:description NOT has:description |
Operators and keywords guide
Keywords can be used with data.world-specific operators to further refine your searches. These operators have a consistent syntax: operator:keyword. There is no space after the colon ( (operator:keyword
, not operator: keyword
), and double quotes are required to match strings with spaces, hyphens, or underscores (for example, operator:"key word"
)
Aspect | Description | Example searches |
---|---|---|
General Syntax | Syntax of an operator search is operator:keyword. No space after the colon and use double quotes for strings with spaces, hyphens, or underscores. | operator:keyword operator:"key word" Ensure proper syntax for accurate results. |
Created and Updated | Use to find items based on creation or last update date. Timestamps are UTC. | created:>2022-01-22 Finds items created after Jan 22, 2022.<br> updated:>=2021-07-01 Finds items updated on or after Jul 1, 2021 created:<2021-07-01: Finds items created before Jul 1, 2021. created:{2021-07-01 TO 2021-07-03} Finds items created between July 1 and July 3, 2021 (not inclusive). |
Extension | Search for datasets/projects with specific file extensions. Exact-match only; '.' is optional. | Search for all resources with .jpg files: extension:jpg, extension:"jpg", or extension:.jpg Search for all projects with .jpeg files: extension:jpeg and resourcetype:project, extension:.jpeg and resourcetype:project, or extension:"jpeg" and resourcetype:project |
File | Search for datasets/projects with specific files or filenames. | file:damage: Finds projects/datasets with files named "damage" type:file and damage Finds files with "damage" in the name. |
Owner, Creator, and Contributor | Search for resources owned or created by users/orgs, or where the user is a contributor. | owner:dave: Resources owned by any user/org with 'dave' in their name. owner:"dave", owner:@dave, owner:"@dave": Resources owned by any user or organization with the exact display name or ID 'dave'. owner:"dave griffith": Resources owned by Dave Griffith. creator:@stateofny: Resources created by user with login stateofny. contributor:dave, contributor:"dave": All resources contributed by user dave. |
Resourcetype | Search for specific types of resources (such as dataset, project, insight, file, table, query, catalogTable, catalog, term, datatype, analysis, catalogEntry, collection). Useful in conjunction with other search strings. | resourcetype:project: Searches for projects. resourcetype:project and sales: Searches for projects with "sales" in the name. |
Status | Filter results based on status. | status:approved: Finds all resources with "approved" status. status:approved AND type:dataset: Finds datasets with "approved" status. status:deprecated and type:dataset and owner:democorp: Finds datasets owned by democorp in "deprecated" status. |
Table | The | table:income: All datasets and projects with tables with "income"in the name. resourcetype:table and income: Tables with the word "income" in them. This returns a list of tables. table:"austin_animal_center_outcomes" All datasets and projects with the exact table name "austin_animal_center_outcomes" in them. resourcetype:table and "austin_animal_center_outcomes" Tables with the exact name "austin_animal_center_outcomes". This returns a list of tables. |
Tag | Search for datasets/projects with specific tags. Supports partial and exact matches. | tag:property: Finds datasets/projects with 'property' in their tags. tag:"property tax": Exact match for the tag 'property tax'. tag:"property" not tag:"land": Excludes results that have both 'property' and 'land' tags. |
User and Org | Search for users or organizations containing specific strings in their name or ID. The character "@" restricts to exact matches of the ID. | @denver: Exact match for user/org. org:denver: Any org with "denver" in name or ID. user:"dave": Users with exact name 'dave'. user:"dave griffith": User with display name Dave Griffith. |
Visibility | Verify data permissions. | visibility:private: Finds all private resources owned by you or your org. visibility:open: Finds all public resources on data.world. |
Complex searches
Combining search operators is a powerful way to restrict search results and really drill down through the data to find what you want. Here are some examples of complex searches created by combining operators:
Note
All the links in the following examples open search results in the data.world open data community.
Operator | Description | Example searches |
---|---|---|
AND | Default operator; combines multiple terms. AND is the default operator; no need to specify it explicitly if you are doing a simple text search. | colony collapse: Returns results containing both "colony" and "collapse". owner:siyeh AND resourcetype:insight: Finds all insights written by a specific person. resourcetype:dataset AND owner:dave: Finds all datasets owned by anyone with 'dave' in their ID or display name. extension:jpeg and resourcetype:project: Find all projects which include files with the .jpeg extension. |
OR | Returns results with either one term or the other. Can be used multiple times in a search string to broaden the search results. | sales or order : Returns results containing either "sales" or "order". sales or order or analysis Also returns results containing "analysis" "sales analysis" or "sales_order": "sales analysis" or "sales_order": Returns results containing "sales analysis" or "sales_order". |
NOT | Excludes items from search results. Cannot be used in complex searches without specifying keyword grouping. Use it to refine search results by excluding unwanted terms. | wildlife NOT refuge: Excludes results containing "refuge". wildlife NOT refuge NOT "us-doi-gov": Further excludes results containing "us-doi-gov". |
Combining Operators | Combination of AND, OR, NOT Combining operators can refine search results significantly. Carefully group terms to ensure accurate search results. Complex searches require precise grouping to avoid incorrect parsing. | For example the search string bee AND pesticide OR colony AND collapse could be parsed in a few of different ways including: (bee AND pesticide) OR (colony AND collapse) - all results that either have bee and pesticide or have colony and collapse. bee AND (pesticide OR colony) AND collapse - all results that have bee and either pesticide or colony and also have collapse. bee AND (pesticide OR (colony AND collapse)) - all results that have bee and either pesticide or both colony and collapse. |
Exact Matches | Searching for exact phrases. Searching for exact matches in complex searches also requires careful construction of the search string to get the desired results. | university degree OR high school diploma: Will not return desired results because of the lack of grouping "university degree" OR "high school diploma": All results containing either the exact phrase "university degree" or "high school diploma" (university degree) OR (high school diploma): All results have either the terms university and degree (together or separate in any order or location), or the terms high, school, and diploma (also together or separate in any order or location). "university degree" OR "high school diploma": Ensures exact matches for the complete phrases. |
A comprehensive guide to tokenization in searches
Tokenization is the process of breaking down text into smaller elements, or tokens, that can be analyzed individually. Understanding how tokenization works in search engines can help you refine your searches to get the most accurate results.
Aspect | Description | Example Searches |
---|---|---|
Hyphen and Underscore Handling | Hyphens and underscores are tokenized and not read as themselves in general searches, except in exact match searches. | animal_center animal-center Both return the same results because hyphens and underscores are treated as spaces during tokenization. |
Space Handling | Spaces are tokenized differently and impact search results. | animal center Returns a different set of results compared to animal_center and animal-center because spaces separate words into distinct tokens. |
Exact Match Searches | To search for exact strings, use quotes around the terms. | "animal-center" "animal_center" "animal center" Each search returns different results, reflecting exact matches for the given string. Quotes enforce exact string matching, so hyphens, underscores, and spaces are respected. |
Exact Table Searches | Using exact match searches for table names requires a specific format. | table:"bee_colony_census_data_by_county" Returns exact matches for the specified table name. Ensure the entire table name is within quotes for an exact match. |