The SELECT and FROM Clauses

An introduction to the SELECT and FROM clauses and their related keywords

SELECT and FROM

The first clause in any query is the SELECT clause. The SELECT clause contains either a list of the columns you want returned from the query separated by a comma, or the wildcard *. The second clause in the query is the FROM clause. The FROM clause indicates against which table to run the query.

The wildcard * when used after SELECT means that all the columns in the table should be returned and they are presented in the order in which they are found in the original table.

The query to see everything in the Intakes table would look like:

SELECT *
  FROM austin_animal_center_intakes

And the first seven rows of the result would look like this:

animal_id name datetime monthyear found_location intake_type intake_condition animal_type sex_upon_intake age_upon_intake breed color
A006100 Scamp 2017-12-07T14:07:00 2017-12 Colony Creek And Hunters Trace in Austin (TX) Stray Normal Dog Neutered Male 10 years Spinone Italiano Mix Yellow/White
A006100 Scamp 2014-12-19T10:21:00 2014-12 8700 Research Blvd in Austin (TX) Public Assist Normal Dog Neutered Male 7 years Spinone Italiano Mix Yellow/White
A191351 Bri-Bri 2015-11-13T15:57:00 2015-11 1912 E William Cannon Rd in Austin (TX) Stray Normal Cat Intact Female 16 years Domestic Longhair Mix Black/White
A322813 Tyson 2015-03-05T14:49:00 2015-03 Austin (TX) Public Assist Normal Dog Neutered Male 11 years Rottweiler Mix Black/Brown
A553074 Jo Jo 2016-03-27T00:04:00 2016-03 3614 Bill Price in Travis (TX) Public Assist Normal Dog Spayed Female 7 years Labrador Retriever/German Shepherd Brown/Black
A672744 Oso 2014-02-17T17:10:00 2014-02 Travis (TX) Owner Surrender Normal Dog Intact Male 3 years Rottweiler Mix Black/Tan
A672744 Oso 2014-02-17T17:10:00 2014-02 Austin (TX) Public Assist Normal Dog Intact Male 3 years Rottweiler Mix Black/Tan

In the previous example the keywords SELECT and FROM are both capitalized. This is a convention, not a requirement. The SQL parser is case-insensitive so both upper and lower cases work for keywords. The clauses are also on different lines–another convention. SQL has a lot of conventions. You, however, don’t need to worry about them as data.world autocompletes your keywords and source names for you, and auto-formats the entire query in a very readable, industry-standard format when you run it–an extremely handy feature as you’ll see next when you go to enter specific column names.

Okay, we got a report with all of the data in the table by running a SELECT * query. But what if we didn’t want all of that information? What if we only wanted to see what type of animal, which sex, what age, and what condition they were in on intake? That query would look like this:

SELECT animal_type,
       sex_upon_intake,
       age_upon_intake,
       intake_condition
  FROM austin_animal_center_intakes

And the first several rows of the results would render as:

animal_type sex_upon_intake age_upon_intake intake_condition
Dog Neutered Male 10 years Normal
Dog Neutered Male 7 years Normal
Cat Intact Female 16 years Normal
Dog Neutered Male 11 years Normal
Dog Spayed Female 7 years Normal
Dog Intact Male 3 years Normal
Dog Intact Male 3 years Normal

Notice that the order of the columns in the results match the order in which the columns were listed–not the order in which they are found in the original table (the table order is shown in the example above for the SELECT * query). This feature is very handy for presenting information in order of importance for your purposes, which may or may not be the way in which it was originally captured.

SELECT AS

Another way you can change the presentation of the data–gussy it up, if you will–is to replace the column names in the dataset with something more readable or meaningful to your purpose. In the last example, our query returned data on the type, sex, age and condition of the animals. If we wanted to rename the columns to match those names in our results we would select the same columns, but we would use the keyword AS after each column name followed by the column name we would like to see. AS introduces the column name you would like to see in the results of a query.

To get the column names written the way we want, we would write the query thusly:

SELECT animal_type AS Animal,
       sex_upon_intake AS Sex,
       age_upon_intake AS Age,
       intake_condition AS Condition
  FROM austin_animal_center_intakes

And the table we’d get back would look like:

Type Sex Age Condition
Dog Neutered Male 10 years Normal
Dog Neutered Male 7 years Normal
Cat Intact Female 16 years Normal
Dog Neutered Male 11 years Normal
Dog Spayed Female 7 years Normal
Dog Intact Male 3 years Normal
Dog Intact Male 3 years Normal

Note that if you use AS with nonnumeric characters in the replacement column name (spaces, e.g.,) you have to surround the replacement text with backticks. The backtick (`) is the keyboard character on the same key as the tilda (~)–not to be confused with the single quote character.

If instead of replacing the column names with one-word titles we wanted to just take out the underscores, we’d need to write the query using backticks:

SELECT animal_type AS `animal type`,
       sex_upon_intake AS `sex upon intake`,
       age_upon_intake AS `age upon intake`,
       intake_condition AS `intake condition`
  FROM austin_animal_center_intakes

Resulting in:

animal type sex upon intake age upon intake intake condition
Dog Neutered Male 10 years Normal
Dog Neutered Male 7 years Normal
Cat Intact Female 16 years Normal
Dog Neutered Male 11 years Normal
Dog Spayed Female 7 years Normal
Dog Intact Male 3 years Normal
Dog Intact Male 3 years Normal

Adding capitalization to our changes would yield:

SELECT animal_type AS `Animal Type`,
       sex_upon_intake AS `Sex Upon Intake`,
       age_upon_intake AS `Age Upon Intake`,
       intake_condition AS `Intake Condition`
  FROM austin_animal_center_intakes

And the resulting table would look like:

Animal Type Sex Upon Intake Age Upon Intake Intake Condition
Dog Neutered Male 10 years Normal
Dog Neutered Male 7 years Normal
Cat Intact Female 16 years Normal
Dog Neutered Male 11 years Normal
Dog Spayed Female 7 years Normal
Dog Intact Male 3 years Normal
Dog Intact Male 3 years Normal

SELECT DISTINCT

The dataset we are using for our examples is very large. There were 54,724 animals taken in by the Austin Texas Animal Center in the almost three years covered by the dataset. If you wanted to know the different types of animals that were taken in (were they all dogs and cats, e.g.,) you could scan through all the data, or you could use the nifty little modifier for the select clause called DISTINCT. DISTINCT is used in a SELECT clause to return unique combinations of data across all the columns returned by the query.

The query for distinct animal types would be written like this:

SELECT DISTINCT animal_type
  FROM austin_animal_center_intakes

Returning:

animal_type
Dog
Cat
Other
Bird
Livestock

DISTINCT can be a bit tricky to use, but if you keep in mind that what is distinct is the combination of data for a single row across all of the columns displayed, then you are less likely to get tripped up.

If you wanted to show only the unique animal types, but you want to see the columns for age and sex as well you might construct a query like this:

SELECT DISTINCT animal_type,
       sex_upon_intake,
       age_upon_intake
  FROM austin_animal_center_intakes

It is a legitimate query so you wouldn’t get an error, but the resulting table would not have distinct values for just animal type. Instead the values considered to be distinct would be each combination of type, sex, and age. The DISTINCT qualifier refers to a unique row of data, not a unique column. This is what the first few rows of the results would look like:

animal_type sex_upon_intake age_upon_intake
Dog Neutered Male 10 years
Dog Neutered Male 7 years
Cat Intact Female 16 years
Dog Neutered Male 11 years
Dog Spayed Female 7 years
Dog Intact Male 3 years
Dog Spayed Female 2 years

For this dataset the number of records (rows) returned was reduced to 539 of the original 75,947.

Exercises

There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.

Exercise 1

Write a query against the allergies table that returns all the records.

Try query

Hint

SELECT * (all the records) FROM the allergies table.

See solution

Exercise 2

Write a query against the allergies table to return the columns start, stop and description.

Try query

Hint

SELECT the columns start, stop, and description renamed AS condition FROM the conditions table.

See solution

Exercise 3

Write a query against the allergies table that shows a list of every allergy with each one only showing up once.

Try query

Hint

SELECT DISTINCT on the description column FROM the allergies table.

See solution