Joins
An introduction to table joins including the conditions used in them, and their different types.
In order for tables to be joined there has to be a way of linking the data in one to the data in the other called a join condition. Usually this is done by having a column present in both tables that has the same data and linking the records in each table based on the values in that column being equal. There are other ways to link tables based on calculated values or different comparisons, but they are very rare.
The keywords used to specify join conditions are WHERE
, ON
, USING
,
NATURAL
and CROSS
. WHERE
can be used to to create a join between tables
without using the keyword JOIN
, but it can only be used for inner joins.
ON
is used with JOIN
in a join condition to identify which columns is each
table to link and can be used with all types of joins. USING
is also combined
with JOIN
in a join condition, but it requires that the column name be
identical in both joined tables. A NATURAL
join links all columns which have
identical names in the tables being joined. A CROSS
join returns a table of
rows representing each row in the first table combined with each row in the
second table. CROSS
joins should be used with extreme caution as they cause
the size of the data returned to increase exponentially based on the number of
columns in the tables.
Types of Joins
When you join tables together there are different ways to specify which columns
are returned from which tables. An INNER JOIN
returns all the rows for which
data exists and matches for both instances of the specified column. OUTER
JOIN
comes in three flavors. A FULL OUTER JOIN
returns all the data from one
table and all the data from another table linking them by the join condition.
LEFT OUTER JOIN
returns all the data from the first table specified and all
the data from the second table which matches the join condition. RIGHT OUTER
JOIN
returns all the data from the second table specified and all the data
from the first table that matches the join condition. RIGHT OUTER JOIN
can be
avoided altogether by changing the order the tables are presented in the FROM
clause. Join clauses can also be chained. The least common join type is a self
join. A self join is used to link a table to itself when data from one column
is reused in another column and you want to show the relationship between the
columns.
While it’s not strictly necessary to include the word ‘OUTER’
in an OUTER JOIN
clause, it’s good style for clarity and readability.
Inner Join
The simplest join to write uses a WHERE
clause. For example, if we wanted to
see the managers’ names as well as the sales agents’ names by account for deals
which had been won, we could write the query like this with a WHERE
clause:
SELECT sales_teams.manager,
sales_pipeline.sales_agent,
sales_pipeline.account
FROM sales_teams, sales_pipeline
WHERE (sales_pipeline.sales_agent = sales_teams.sales_agent)
AND sales_pipeline.deal_stage = "Won"
The resulting table would look like this:
manager | sales_agent | account |
---|---|---|
Cara Losch | Garret Kinder | Acme Corporation |
Rocco Neubert | Reed Clapper | Acme Corporation |
Rocco Neubert | Donn Cantrell | Acme Corporation |
Summer Sewald | Kary Hendrixson | Condax |
Summer Sewald | Maureen Marcano | Condax |
Dustin Brinkmann | Lajuana Vencill | Condax |
Rocco Neubert | Donn Cantrell | Condax |
Dustin Brinkmann | Cecily Lampkin | Condax |
Celia Rouche | Rosalina Dieter | Condax |
Cara Losch | Garret Kinder | Condax |
Rocco Neubert | Donn Cantrell | Betasoloin |
If you wanted to format your results table and make the query a little more
readable, you could use the AS
keyword to create aliases for column and table
names in joins as shown in the example below:
SELECT sales_teams.manager,
sales_pipeline.sales_agent AS agent,
sales_pipeline.account
FROM sales_teams AS teams, sales_pipeline AS pipeline
WHERE (pipeline.sales_agent = teams.sales_agent)
AND pipeline.deal_stage = "Won"
The first several rows in the result would look like this:
manager | agent | account |
---|---|---|
Cara Losch | Garret Kinder | Acme Corporation |
Rocco Neubert | Reed Clapper | Acme Corporation |
Rocco Neubert | Donn Cantrell | Acme Corporation |
Summer Sewald | Kary Hendrixson | Condax |
Summer Sewald | Maureen Marcano | Condax |
Dustin Brinkmann | Lajuana Vencill | Condax |
Rocco Neubert | Donn Cantrell | Condax |
Dustin Brinkmann | Cecily Lampkin | Condax |
Celia Rouche | Rosalina Dieter | Condax |
Cara Losch | Garret Kinder | Condax |
Rocco Neubert | Donn Cantrell | Betasoloin |
The above example also shows how you can use WHERE
as both
a join condition and as a filter in the same query.
The exact same query can be written with a USING
keyword:
SELECT DISTINCT sales_teams.manager,
sales_pipeline.sales_agent AS agent,
sales_pipeline.account
FROM sales_teams AS teams
JOIN sales_pipeline AS pipeline USING (sales_agent)
WHERE pipeline.deal_stage = "Won"
manager | agent | account |
---|---|---|
Cara Losch | Garret Kinder | Acme Corporation |
Rocco Neubert | Reed Clapper | Acme Corporation |
Rocco Neubert | Donn Cantrell | Acme Corporation |
Summer Sewald | Kary Hendrixson | Condax |
Summer Sewald | Maureen Marcano | Condax |
Dustin Brinkmann | Lajuana Vencill | Condax |
Rocco Neubert | Donn Cantrell | Condax |
Dustin Brinkmann | Cecily Lampkin | Condax |
Celia Rouche | Rosalina Dieter | Condax |
Cara Losch | Garret Kinder | Condax |
Rocco Neubert | Donn Cantrell | Betasoloin |
It can also be written with the keyword ON
:
SELECT DISTINCT sales_teams.manager,
sales_pipeline.sales_agent AS agent,
sales_pipeline.account
FROM sales_teams AS teams
JOIN sales_pipeline AS pipeline ON pipeline.sales_agent = teams.sales_agent
WHERE pipeline.deal_stage = "Won"
manager | agent | account |
---|---|---|
Cara Losch | Garret Kinder | Acme Corporation |
Rocco Neubert | Reed Clapper | Acme Corporation |
Rocco Neubert | Donn Cantrell | Acme Corporation |
Summer Sewald | Kary Hendrixson | Condax |
Summer Sewald | Maureen Marcano | Condax |
Dustin Brinkmann | Lajuana Vencill | Condax |
Rocco Neubert | Donn Cantrell | Condax |
Dustin Brinkmann | Cecily Lampkin | Condax |
Celia Rouche | Rosalina Dieter | Condax |
Cara Losch | Garret Kinder | Condax |
Rocco Neubert | Donn Cantrell | Betasoloin |
The main difference between USING
and ON
in a join
condition is that when SELECT *
is used in a query with a USING
join, the
resulting table will only have one instance of the column specified in the
USING
condition. In SELECT *
with ON
in a join, both instances of the
column specified in the ON
condition will be present.
Taking our previous query and modifying it to return all columns with USING
would look like this:
SELECT *
FROM sales_teams
JOIN sales_pipeline AS pipeline USING (sales_agent)
WHERE pipeline.deal_stage = "Won"
sales_agent | manager | regional_office | status | account | opportunity_id | deal_stage | product | created_date | close_date | close_value |
---|---|---|---|---|---|---|---|---|---|---|
Moses Frase | Dustin Brinkmann | Central | Current | Cancity | 1C1I7A6R | Won | GTX Plus Basic | 2016-10-20 | 2017-03-01 | 1054 |
Darcel Schlecht | Melvin Marxen | Central | Current | Isdom | Z063OYW0 | Won | GTXPro | 2016-10-25 | 2017-03-11 | 4514 |
Anna Snelling | Dustin Brinkmann | Central | Current | Bioholding | NL3JZH1Z | Won | MG Special | 2016-11-04 | 2017-03-10 | 53 |
Violet Mclelland | Cara Losch | East | Current | Donquadtech | 0LVWSWEW | Won | MG Special | 2016-11-25 | 2017-03-06 | 65 |
Cassey Cress | Rocco Neubert | East | Current | Sumace | NHDTSMX2 | Won | GTXPro | 2017-02-22 | 2017-03-02 | 4711 |
Corliss Cosme | Cara Losch | East | Current | Zencorporation | PNL17RE9 | Won | GTXPro | 2017-02-22 | 2017-06-28 | 5133 |
Modifying the same query to return all columns with ON
would look like this:
SELECT *
FROM sales_teams AS teams
JOIN sales_pipeline AS pipeline ON pipeline.sales_agent = teams.sales_agent
WHERE pipeline.deal_stage = "Won"
teams.sales_agent | manager | regional_office | account | opportunity_id | pipeline.sales_agent | deal_stage | product | created_date | close_date | close_value |
---|---|---|---|---|---|---|---|---|---|---|
Daniell Hammack | Rocco Neubert | East | Umbrella Corporation | 8H1VRXAK | Daniell Hammack | Won | GTXBasic | 2017-05-15 | 2017-09-15 | 536.0 |
Anna Snelling | Dustin Brinkmann | Central | Good Burger | H8P6K2DX | Anna Snelling | Won | MGRPFU | 2017-08-18 | 2017-11-10 | 3300.0 |
Jonathan Berthelot | Melvin Marxen | Central | Treequote | GQT2DFWZ | Jonathan Berthelot | Won | MGRPFU | 2017-10-10 | 2017-10-20 | 3892.0 |
Kary Hendrixson | Summer Sewald | West | Scotfind | FBEP7939 | Kary Hendrixson | Won | GTXPro | 2017-08-23 | 2017-11-15 | 5280.0 |
Anna Snelling | Dustin Brinkmann | Central | Globex Corporation | IF8PILJR | Anna Snelling | Won | MGRPFS | 2017-07-27 | 2017-07-29 | 46.0 |
Corliss Cosme | Cara Losch | East | Konex | 1KVGSF08 | Corliss Cosme | Won | GTXPlusPro | 2017-07-14 | 2017-09-23 | 4989.0 |
ON
is the most common keyword used in join clauses as
WHERE
is only for inner joins, and USING
requires the names in each table
to match exactly.
Outer Join
So far we’ve only looked at inner joins–the join which produces an
intersection of the data in the joined tables. There are, however, times when
you might want to return all the rows form one table matched up with values
(where they exist) from another table. For example, if we wanted a list of all
the sales agents with a indication of they have deals in the pipeline, we could
run the following query with a LEFT OUTER JOIN
:
SELECT DISTINCT sales_teams.sales_agent AS agent,
sales_pipeline.deal_stage
FROM sales_teams
LEFT OUTER JOIN sales_pipeline ON sales_teams.sales_agent = sales_pipeline.sales_agent
WHERE sales_pipeline.deal_stage = "In_Progress"
agent | deal_stage |
---|---|
Anna Snelling | In_Progress |
Cecily Lampkin | In_Progress |
Mei-Mei Johns | In_Progress |
Violet Mclelland | In_Progress |
Corliss Cosme | In_Progress |
Rosie Papadopoulos | In_Progress |
Garret Kinder | In_Progress |
A RIGHT OUTER JOIN
does the exact same thing as a LEFT OUTER JOIN
, it just
returns all the values from the second table specified (after the LEFT OUTER
JOIN
expression) instead of from the first table specified (after the FROM
keyword). A RIGHT OUTER JOIN
can be rewritten as a LEFT OUTER JOIN
, and vice
versa so they are not very common. If we did want to use a RIGHT OUTER JOIN
for our previous query, we would write it like this:
SELECT DISTINCT sales_teams.sales_agent AS agent,
sales_pipeline.deal_stage
FROM sales_pipeline
RIGHT OUTER JOIN sales_teams ON sales_pipeline.sales_agent = sales_teams.sales_agent
WHERE sales_pipeline.deal_stage = "In_Progress"
agent | deal_stage |
---|---|
Anna Snelling | In_Progress |
Cecily Lampkin | In_Progress |
Mei-Mei Johns | In_Progress |
Violet Mclelland | In_Progress |
Corliss Cosme | In_Progress |
Rosie Papadopoulos | In_Progress |
Garret Kinder | In_Progress |
A FULL OUTER JOIN
query is written just like a LEFT OUTER JOIN
query or
a RIGHT OUTER JOIN
query and will return all the values from both the first
and second tables specified as they match up together so there will be null
values in columns returned from both tables. In LEFT OUTER JOIN
query
results, only the values returned from the first table in the join condition
can have null values in the joined rows. In RIGHT OUTER JOIN
results, only
the values returned from the second table in the join condition can have null
values in the joined rows.
Self Join
One final join type we will look at is the self join. A self join is useful when a table has the values from one column re-used in another column. The canonical example of this kind of re-use is in an employee table where each employee has a unique id, and this id is also in the column listing the manager for the employee. If you would like a list of all the employees with their managers names, you would need to use a self join to pull that data, and the query would look like this:
SELECT emp.name AS employee,
mgr.name AS manager
FROM employees AS emp
JOIN employees AS mgr ON emp.mgr_id = mgr.emp_id
And the first several rows of the resulting table would look like this:
employee | manager |
---|---|
Anna Snelling | Dustin Brinkmann |
Cecily Lampkin | Dustin Brinkmann |
Mei-Mei Johns | Melvin Marxen |
Violet Mclelland | Cara Losch |
Corliss Cosme | Cara Losch |
Rosie Papadopoulos | Cara Losch |
Garret Kinder | Cara Losch |
Wilburn Farren | Cara Losch |
Elizabeth Anderson | Cara Losch |
Daniell Hammack | Rocco Neubert |
Cassey Cress | Rocco Neubert |
Exercises (Continued from previous section)
There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.
Exercise 19
Write a query joining the patients table to the medications table using the patient column. Use WHERE
in your join condition.. Your results should contain the columns patient, marital, race, ethnicity, and gender from the patients table and description (renamed as medications), and reasondescription (renamed as diagnosis) from the medications table.
Hint
SELECT
the columns patient.patient,
patient.marital,
patient.race,
patient.ethnicity,
patient.gender,
medications.description
(relabeled AS
medication
), and medications.reasondescription
(relabeled AS
diagnosis
) FROM
the tables patients,
and medications
WHERE
the column patients.patient
=
the column medications.patient
.
Exercise 20
Write the same query again, but this time join the tables with USING
.
Hint
SELECT
the columns patient.patient,
patient.marital,
patient.race,
patient.ethnicity,
patient.gender,
medications.description
relabeled AS
medication,
and medications.reasondescription
relabeled AS
diagnosis
FROM
the table patients
, JOIN
the table medications
USING
(patient)
.
Exercise 21
Write the same query again, but this time join the tables using ON
.
Hint
SELECT
the columns patient.patient,
patient.marital,
patient.race,
patient.ethnicity,
patient.gender,
medications.description
relabeled AS
medication,
and medications.reasondescription
relabeled AS
diagnosis
FROM
the table patients
. JOIN
the table medications
ON
the column medications.patient
when it =
patients.patient
.
Exercise 22
Write a query to show a list of all the asian female patients and the procedures they had. (Use LEFT OUTER JOIN
and join on the patient column from the patient and procedure tables.) Include the columns patients.patient, patients.race, patients.ethnicity, patients.gender, procedures.description (relabeled procedure), and procedures.reasondescription (relabeled diagnosis) in your results.
Hint
SELECT
the columns patients.patient,
patients.race,
patients.ethnicity,
patients.gender,
procedures.description
relabeled AS
procedure,
procedures.reasondescription
relabeled AS
diagnosis
FROM
the table patients
. Use a LEFT OUTER JOIN
on the table procedures
ON
the column patients.patient
when it =
procedures.patient
. Only return the records WHERE
the gender
=
”F”
AND
the race
=
“asian”
.
An introduction to aggregations, aggregation functions, and aggregation modifiers.