The SELECT and FROM Clauses
An introduction to the SELECT
and FROM
clauses and their related keywords
SELECT
and FROM
The first clause in any query is the SELECT
clause. The SELECT
clause
contains either a list of the columns you want returned from the query
separated by a comma, or the wildcard *
. The second clause in the query is
the FROM
clause. The FROM
clause indicates against which table to run the
query.
The wildcard *
when used after SELECT
means that all
the columns in the table should be returned and they are presented in the order
in which they are found in the original table.
The query to see everything in the Intakes table would look like:
SELECT *
FROM austin_animal_center_intakes
And the first seven rows of the result would look like this:
animal_id | name | datetime | monthyear | found_location | intake_type | intake_condition | animal_type | sex_upon_intake | age_upon_intake | breed | color |
---|---|---|---|---|---|---|---|---|---|---|---|
A006100 | Scamp | 2017-12-07T14:07:00 | 2017-12 | Colony Creek And Hunters Trace in Austin (TX) | Stray | Normal | Dog | Neutered Male | 10 years | Spinone Italiano Mix | Yellow/White |
A006100 | Scamp | 2014-12-19T10:21:00 | 2014-12 | 8700 Research Blvd in Austin (TX) | Public Assist | Normal | Dog | Neutered Male | 7 years | Spinone Italiano Mix | Yellow/White |
A191351 | Bri-Bri | 2015-11-13T15:57:00 | 2015-11 | 1912 E William Cannon Rd in Austin (TX) | Stray | Normal | Cat | Intact Female | 16 years | Domestic Longhair Mix | Black/White |
A322813 | Tyson | 2015-03-05T14:49:00 | 2015-03 | Austin (TX) | Public Assist | Normal | Dog | Neutered Male | 11 years | Rottweiler Mix | Black/Brown |
A553074 | Jo Jo | 2016-03-27T00:04:00 | 2016-03 | 3614 Bill Price in Travis (TX) | Public Assist | Normal | Dog | Spayed Female | 7 years | Labrador Retriever/German Shepherd | Brown/Black |
A672744 | Oso | 2014-02-17T17:10:00 | 2014-02 | Travis (TX) | Owner Surrender | Normal | Dog | Intact Male | 3 years | Rottweiler Mix | Black/Tan |
A672744 | Oso | 2014-02-17T17:10:00 | 2014-02 | Austin (TX) | Public Assist | Normal | Dog | Intact Male | 3 years | Rottweiler Mix | Black/Tan |
In the previous example the keywords SELECT
and FROM
are
both capitalized. This is a convention, not a requirement. The SQL parser is
case-insensitive so both upper and lower cases work for keywords. The clauses
are also on different lines–another convention. SQL has a lot of conventions.
You, however, don’t need to worry about them as data.world autocompletes your
keywords and source names for you, and auto-formats the entire query in a very
readable, industry-standard format when you run it–an extremely handy feature
as you’ll see next when you go to enter specific column names.
Okay, we got a report with all of the data in the table by running a SELECT *
query. But what if we didn’t want all of that information? What if we only
wanted to see what type of animal, which sex, what age, and what condition they
were in on intake? That query would look like this:
SELECT animal_type,
sex_upon_intake,
age_upon_intake,
intake_condition
FROM austin_animal_center_intakes
And the first several rows of the results would render as:
animal_type | sex_upon_intake | age_upon_intake | intake_condition |
---|---|---|---|
Dog | Neutered Male | 10 years | Normal |
Dog | Neutered Male | 7 years | Normal |
Cat | Intact Female | 16 years | Normal |
Dog | Neutered Male | 11 years | Normal |
Dog | Spayed Female | 7 years | Normal |
Dog | Intact Male | 3 years | Normal |
Dog | Intact Male | 3 years | Normal |
Notice that the order of the columns in the results match the
order in which the columns were listed–not the order in which they are found
in the original table (the table order is shown in the example above for the
SELECT *
query). This feature is very handy for presenting information in
order of importance for your purposes, which may or may not be the way in which
it was originally captured.
SELECT AS
Another way you can change the presentation of the data–gussy it up, if you
will–is to replace the column names in the dataset with something more
readable or meaningful to your purpose. In the last example, our query returned
data on the type, sex, age and condition of the animals. If we wanted to rename
the columns to match those names in our results we would select the same
columns, but we would use the keyword AS
after each column name followed by
the column name we would like to see. AS
introduces the
column name you would like to see in the results of a query.
To get the column names written the way we want, we would write the query thusly:
SELECT animal_type AS Animal,
sex_upon_intake AS Sex,
age_upon_intake AS Age,
intake_condition AS Condition
FROM austin_animal_center_intakes
And the table we’d get back would look like:
Type | Sex | Age | Condition |
---|---|---|---|
Dog | Neutered Male | 10 years | Normal |
Dog | Neutered Male | 7 years | Normal |
Cat | Intact Female | 16 years | Normal |
Dog | Neutered Male | 11 years | Normal |
Dog | Spayed Female | 7 years | Normal |
Dog | Intact Male | 3 years | Normal |
Dog | Intact Male | 3 years | Normal |
Note that if you use AS
with nonnumeric characters in the replacement column name (spaces, e.g.,) you have to surround the replacement text with backticks. The backtick (`) is the keyboard character on the same key as the tilda (~)–not to be confused with the single quote character.
If instead of replacing the column names with one-word titles we wanted to just take out the underscores, we’d need to write the query using backticks:
SELECT animal_type AS `animal type`,
sex_upon_intake AS `sex upon intake`,
age_upon_intake AS `age upon intake`,
intake_condition AS `intake condition`
FROM austin_animal_center_intakes
Resulting in:
animal type | sex upon intake | age upon intake | intake condition |
---|---|---|---|
Dog | Neutered Male | 10 years | Normal |
Dog | Neutered Male | 7 years | Normal |
Cat | Intact Female | 16 years | Normal |
Dog | Neutered Male | 11 years | Normal |
Dog | Spayed Female | 7 years | Normal |
Dog | Intact Male | 3 years | Normal |
Dog | Intact Male | 3 years | Normal |
Adding capitalization to our changes would yield:
SELECT animal_type AS `Animal Type`,
sex_upon_intake AS `Sex Upon Intake`,
age_upon_intake AS `Age Upon Intake`,
intake_condition AS `Intake Condition`
FROM austin_animal_center_intakes
And the resulting table would look like:
Animal Type | Sex Upon Intake | Age Upon Intake | Intake Condition |
---|---|---|---|
Dog | Neutered Male | 10 years | Normal |
Dog | Neutered Male | 7 years | Normal |
Cat | Intact Female | 16 years | Normal |
Dog | Neutered Male | 11 years | Normal |
Dog | Spayed Female | 7 years | Normal |
Dog | Intact Male | 3 years | Normal |
Dog | Intact Male | 3 years | Normal |
SELECT DISTINCT
The dataset we are using for our examples is very large. There were 54,724
animals taken in by the Austin Texas Animal Center in the almost three years
covered by the dataset. If you wanted to know the different types of animals
that were taken in (were they all dogs and cats, e.g.,) you could scan through
all the data, or you could use the nifty little modifier for the select clause
called DISTINCT
. DISTINCT
is used in a SELECT
clause to return unique combinations of data across all the columns returned by
the query.
The query for distinct animal types would be written like this:
SELECT DISTINCT animal_type
FROM austin_animal_center_intakes
Returning:
animal_type |
---|
Dog |
Cat |
Other |
Bird |
Livestock |
DISTINCT
can be a bit tricky to use, but if you keep in
mind that what is distinct is the combination of data for a single row across
all of the columns displayed, then you are less likely to get tripped
up.
If you wanted to show only the unique animal types, but you want to see the columns for age and sex as well you might construct a query like this:
SELECT DISTINCT animal_type,
sex_upon_intake,
age_upon_intake
FROM austin_animal_center_intakes
It is a legitimate query so you wouldn’t get an error, but the resulting table
would not have distinct values for just animal type. Instead the values
considered to be distinct would be each combination of type, sex, and age. The
DISTINCT
qualifier refers to a unique row of data, not a unique column. This
is what the first few rows of the results would look like:
animal_type | sex_upon_intake | age_upon_intake |
---|---|---|
Dog | Neutered Male | 10 years |
Dog | Neutered Male | 7 years |
Cat | Intact Female | 16 years |
Dog | Neutered Male | 11 years |
Dog | Spayed Female | 7 years |
Dog | Intact Male | 3 years |
Dog | Spayed Female | 2 years |
For this dataset the number of records (rows) returned was reduced to 539 of the original 75,947.
Exercises
There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.
Exercise 1
Write a query against the allergies table that returns all the records.
Hint
SELECT *
(all the records) FROM
the allergies
table.
Exercise 2
Write a query against the allergies table to return the columns start, stop and description.
Hint
SELECT
the columns start,
stop,
and description
renamed AS
condition
FROM
the conditions
table.
Exercise 3
Write a query against the allergies table that shows a list of every allergy with each one only showing up once.
Hint
SELECT
DISTINCT
on the description
column FROM
the allergies
table.
An introduction to the LIMIT
clause.