Enterprise docs

Advanced search

This article is an advanced look at the search operators used in the search bar on data.world. For an introduction to all search capabilities including filtering search results and finding similar data start with the article on finding data.

There is a lot of information on data.world, and finding just the data resource that you're looking for can be a daunting task. Fortunately the robust search options available with the data.world search engine enable you to craft just the right search string to find what you're looking for. From the search bar (located at the top of your homepage and on many other data.world pages), you can search for an entire phrase or for matches on single words. Additionally, you can qualify your search with various operators or perform complex searches combining operators.

The logical operators AND, OR, and NOT can all be used to restrict results returned from a search.

Keywords can also be used with a set of data.world-specific operators to further qualify your searches. There is a common set of rules that govern the use of these operators and it is consistent across all of them. For all keyword operators:

  • The syntax of an operator search string is operator:keyword where operator is the name of the operator and keyword is the string you want to match.

  • There is no space after the colon (operator:keyword, not operator: keyword).

  • If there are underscores, hyphens, or spaces in the search string you need to use double quotes to match the entire string.

Searching with the operator column returns a list of all datasets and projects which have a tabular file with a column with that name in it:

  • column:outcome - As expected, all datasets and projects which have tables containing columns with 'outcome' in their names

  • column:"test_outcome" - Only the project with the column named 'test_outcome' in one of its tabular files.

Created and updated are two operators which can be used to find datasets, projects, insights, users and organizations based on the date they were added or last updated. Timestamps are set in UTC, not your local time, so you might get results that are a day off of your local time depending on where you are:

Searching with the extension operator returns all datasets and projects which include files with the specified extension. The searches are exact-match only and the '.' is optional:

The way the search engine treats the file operator has been updated and file results are shown the same way as other primary resources like dataset, project, insight, etc. Now when you search for a file you get a list of all the files that match your search--not a list of the datasets and projects which use that file:

  • file:bee - Returns a list of files with 'bee' in the name, and the location of the file is shown by the icon at the bottom of the result card. The orange icon indicates the file is located in a project, the blue icon indicates it's in a dataset:

Screen_Shot_2019-09-04_at_1.03.02_PM.png
  • file:bee colony - Returns any file with 'bee' in the name and 'colony' in any other searchable field

The owner of a resource is the person or entity who was designated as such when the resource was created. If a person was selected as the owner, that person will also be the creator. If an organization was selected as the owner, the creator will still be the person who created the resource. The owner operator returns all the datasets, projects, and insights owned by either a person or an organization. The creator operator returns all the datasets, projects, and insights created by a user. They both follow the same patterns as user and org:

  • owner:dave - All datasets, projects and insights owned by any user or organization with 'dave' in either the display name or id.

  • owner:"dave", owner:@dave, owner:"@dave" - All datasets, projects and insights owned by any user or organization with the exact display name or id 'dave'.

  • owner:"dave griffith" - All datasets, projects and insights owned by the user whose display name is Dave Griffith.

  • creator:@stateofny - everything created by the user with the login of stateofny regardless of whether the user or an organization that the user belongs to is the owner.

  • owner:data-ny-gov - Everything owned by the organization data-ny-gov (created by individuals in the organization).

The resourcetype operator allows you to search for either datasets, projects, or insights.

resourcetype - choices are:

  • dataset

  • project

  • insight

  • file

  • table

  • query

  • catalogTable

  • catalog

  • term

  • datatype

  • analysis

It is best used in conjunction with another search string in a complex search (see the examples below under Complex Searches).

Status is now an available search term. With it users can use advanced search syntax to filter by asset status. For example, status:"my status" or status:approved can now be used to find resources with a particular asset status value:

status1.png
status2.png
status3.png

The table operator is used to find tablular data either in table or as a sheet in a spreadsheet:

  • table:cutting - All tables with either 'cutting' in the name of the file or as the name of a sheet in a spreadsheet are returned:

Screen_Shot_2019-09-04_at_2.43.25_PM.png
Screen_Shot_2019-09-04_at_2.34.30_PM.png
Screen_Shot_2019-09-04_at_2.36.28_PM.png
  • table:"austin_animal_center_outcomes" - Using an exact search on the table operator is one way to find all the public datasets and projects that were created from the same source data. In this case, several different people imported the Austin Animal Center statistics from the City of Austin government website:

Screen_Shot_2019-09-04_at_2.48.53_PM.png

The tag operator specifically searches against the tags associated with a dataset or project and returns a list of only the datasets and projects that have that tag. Partial and exact matches are allowed:

  • tag:bee - Any dataset or project with the word 'bee' in its tag (e.g., 'bee', 'bees' and 'bee colony').

  • tag:"bee" - Only datasets and projects with the exact tag 'bee'.

  • tag:bees - Any dataset or project that has a tag which includes 'bees' (e.g., 'bees' and 'native bees') NOTE: Does not include datasets or projects with the tag 'bee'.

  • tag:"bees knees" - Only datasets and projects with the exact tag 'bees knees'

  • tag:bees knees - Any dataset or project that has a tag which includes 'bees' and the string 'knees' in any searchable field.

The user and org operators search for a string found in the display name or the id of a user or organization respectively. The character "@" restricts the search to an exact match of the id:

  • user:dave - all users with the string dave in the login name or in the display name.

  • user:"dave" - all users with the exact login name 'dave' or display name 'dave'.

  • user:@dave and user:"@dave" - only the user whose id is @dave

  • user:dave griffith - Users with 'dave' in either the login or display name, and 'griffith' in either the login or display name fields.

  • user:da - Any user or organization with the leading string 'da' in either the login or display name

  • user:"dave griffith" - Only the user whose display name is Dave Griffith.

  • org:data - any organization with the string 'data' in either the the display name or the organization id.

  • org:"denver" and org:"@denver" - Only the organization with the id 'denver'.

NOTE: The search @name will return either the organization or the user with the id 'name'.

The visibility operator is mainly useful to verify permissions on your data:

  • visibility:private - Returns all private datasets and projects owned by you or an organization you are in.

  • visibility:open - Returns all public datasets and projects on data.world.

Hyphen and underscore (- _) characters are tokenized in some searches and are not read by the search engine as hyphens and underscores except in exact match searches:

Combining search operators is a powerful way to restrict search results and really drill down through the data to find what you want. Here are some examples of complex searches created by combining operators:

It is also possible to combine different operators in a complex search, but you need to clearly group the parts of the search string that go with each operator or the search engine will not process your request correctly. For examples the search string

bee AND pesticide OR colony AND collapse

could be parsed in a few of different ways including:

The search string

bee AND pesticide OR colony AND collapse

will not return predictable results.

Searching for exact matches in complex searches also requires careful construction of the search string to get the desired results. For example if you wanted to search for everything that had to do with either a university degree or a high school diploma the following search strings would give you completely different results: