Joins

An introduction to table joins including the conditions used in them, and their different types.

In order for tables to be joined there has to be a way of linking the data in one to the data in the other called a join condition. Usually this is done by having a column present in both tables that has the same data and linking the records in each table based on the values in that column being equal. There are other ways to link tables based on calculated values or different comparisons, but they are very rare.

The keywords used to specify join conditions are WHERE, ON, USING, NATURAL and CROSS. WHERE can be used to to create a join between tables without using the keyword JOIN, but it can only be used for inner joins. ON is used with JOIN in a join condition to identify which columns is each table to link and can be used with all types of joins. USING is also combined with JOIN in a join condition, but it requires that the column name be identical in both joined tables. A NATURAL join links all columns which have identical names in the tables being joined. A CROSS join returns a table of rows representing each row in the first table combined with each row in the second table. CROSS joins should be used with extreme caution as they cause the size of the data returned to increase exponentially based on the number of columns in the tables.

Types of Joins

When you join tables together there are different ways to specify which columns are returned from which tables. An INNER JOIN returns all the rows for which data exists and matches for both instances of the specified column. OUTER JOIN comes in three flavors. A FULL OUTER JOIN returns all the data from one table and all the data from another table linking them by the join condition. LEFT OUTER JOIN returns all the data from the first table specified and all the data from the second table which matches the join condition. RIGHT OUTER JOIN returns all the data from the second table specified and all the data from the first table that matches the join condition. RIGHT OUTER JOIN can be avoided altogether by changing the order the tables are presented in the FROM clause. Join clauses can also be chained. The least common join type is a self join. A self join is used to link a table to itself when data from one column is reused in another column and you want to show the relationship between the columns.

While it’s not strictly necessary to include the word ‘OUTER’ in an OUTER JOIN clause, it’s good style for clarity and readability.

Inner Join

The simplest join to write uses a WHERE clause. For example, if we wanted to see the managers’ names as well as the sales agents’ names by account for deals which had been won, we could write the query like this with a WHERE clause:

SELECT sales_teams.manager,
       sales_pipeline.sales_agent,
       sales_pipeline.account
  FROM sales_teams, sales_pipeline
 WHERE (sales_pipeline.sales_agent = sales_teams.sales_agent)
       AND sales_pipeline.deal_stage = "Won"

The resulting table would look like this:

manager sales_agent account
Cara Losch Garret Kinder Acme Corporation
Rocco Neubert Reed Clapper Acme Corporation
Rocco Neubert Donn Cantrell Acme Corporation
Summer Sewald Kary Hendrixson Condax
Summer Sewald Maureen Marcano Condax
Dustin Brinkmann Lajuana Vencill Condax
Rocco Neubert Donn Cantrell Condax
Dustin Brinkmann Cecily Lampkin Condax
Celia Rouche Rosalina Dieter Condax
Cara Losch Garret Kinder Condax
Rocco Neubert Donn Cantrell Betasoloin

If you wanted to format your results table and make the query a little more readable, you could use the AS keyword to create aliases for column and table names in joins as shown in the example below:

SELECT sales_teams.manager,
       sales_pipeline.sales_agent AS agent,
       sales_pipeline.account
  FROM sales_teams AS teams, sales_pipeline AS pipeline
 WHERE (pipeline.sales_agent = teams.sales_agent)
       AND pipeline.deal_stage = "Won"

The first several rows in the result would look like this:

manager agent account
Cara Losch Garret Kinder Acme Corporation
Rocco Neubert Reed Clapper Acme Corporation
Rocco Neubert Donn Cantrell Acme Corporation
Summer Sewald Kary Hendrixson Condax
Summer Sewald Maureen Marcano Condax
Dustin Brinkmann Lajuana Vencill Condax
Rocco Neubert Donn Cantrell Condax
Dustin Brinkmann Cecily Lampkin Condax
Celia Rouche Rosalina Dieter Condax
Cara Losch Garret Kinder Condax
Rocco Neubert Donn Cantrell Betasoloin

The above example also shows how you can use WHERE as both a join condition and as a filter in the same query.

The exact same query can be written with a USING keyword:

SELECT DISTINCT sales_teams.manager,
       sales_pipeline.sales_agent AS agent,
       sales_pipeline.account
  FROM sales_teams AS teams
       JOIN sales_pipeline AS pipeline USING (sales_agent)
 WHERE pipeline.deal_stage = "Won"
manager agent account
Cara Losch Garret Kinder Acme Corporation
Rocco Neubert Reed Clapper Acme Corporation
Rocco Neubert Donn Cantrell Acme Corporation
Summer Sewald Kary Hendrixson Condax
Summer Sewald Maureen Marcano Condax
Dustin Brinkmann Lajuana Vencill Condax
Rocco Neubert Donn Cantrell Condax
Dustin Brinkmann Cecily Lampkin Condax
Celia Rouche Rosalina Dieter Condax
Cara Losch Garret Kinder Condax
Rocco Neubert Donn Cantrell Betasoloin

It can also be written with the keyword ON:

SELECT DISTINCT sales_teams.manager,
       sales_pipeline.sales_agent AS agent,
       sales_pipeline.account
  FROM sales_teams AS teams
       JOIN sales_pipeline AS pipeline ON pipeline.sales_agent = teams.sales_agent
 WHERE pipeline.deal_stage = "Won"
manager agent account
Cara Losch Garret Kinder Acme Corporation
Rocco Neubert Reed Clapper Acme Corporation
Rocco Neubert Donn Cantrell Acme Corporation
Summer Sewald Kary Hendrixson Condax
Summer Sewald Maureen Marcano Condax
Dustin Brinkmann Lajuana Vencill Condax
Rocco Neubert Donn Cantrell Condax
Dustin Brinkmann Cecily Lampkin Condax
Celia Rouche Rosalina Dieter Condax
Cara Losch Garret Kinder Condax
Rocco Neubert Donn Cantrell Betasoloin

The main difference between USING and ON in a join condition is that when SELECT * is used in a query with a USING join, the resulting table will only have one instance of the column specified in the USING condition. In SELECT * with ON in a join, both instances of the column specified in the ON condition will be present.

Taking our previous query and modifying it to return all columns with USING would look like this:

SELECT *
  FROM sales_teams
       JOIN sales_pipeline AS pipeline USING (sales_agent)
 WHERE pipeline.deal_stage = "Won"
sales_agent manager regional_office status account opportunity_id deal_stage product created_date close_date close_value
Moses Frase Dustin Brinkmann Central Current Cancity 1C1I7A6R Won GTX Plus Basic 2016-10-20 2017-03-01 1054
Darcel Schlecht Melvin Marxen Central Current Isdom Z063OYW0 Won GTXPro 2016-10-25 2017-03-11 4514
Anna Snelling Dustin Brinkmann Central Current Bioholding NL3JZH1Z Won MG Special 2016-11-04 2017-03-10 53
Violet Mclelland Cara Losch East Current Donquadtech 0LVWSWEW Won MG Special 2016-11-25 2017-03-06 65
Cassey Cress Rocco Neubert East Current Sumace NHDTSMX2 Won GTXPro 2017-02-22 2017-03-02 4711
Corliss Cosme Cara Losch East Current Zencorporation PNL17RE9 Won GTXPro 2017-02-22 2017-06-28 5133

Modifying the same query to return all columns with ON would look like this:

SELECT *
  FROM sales_teams AS teams
       JOIN sales_pipeline AS pipeline ON pipeline.sales_agent = teams.sales_agent
 WHERE pipeline.deal_stage = "Won"
teams.sales_agent manager regional_office account opportunity_id pipeline.sales_agent deal_stage product created_date close_date close_value
Daniell Hammack Rocco Neubert East Umbrella Corporation 8H1VRXAK Daniell Hammack Won GTXBasic 2017-05-15 2017-09-15 536.0
Anna Snelling Dustin Brinkmann Central Good Burger H8P6K2DX Anna Snelling Won MGRPFU 2017-08-18 2017-11-10 3300.0
Jonathan Berthelot Melvin Marxen Central Treequote GQT2DFWZ Jonathan Berthelot Won MGRPFU 2017-10-10 2017-10-20 3892.0
Kary Hendrixson Summer Sewald West Scotfind FBEP7939 Kary Hendrixson Won GTXPro 2017-08-23 2017-11-15 5280.0
Anna Snelling Dustin Brinkmann Central Globex Corporation IF8PILJR Anna Snelling Won MGRPFS 2017-07-27 2017-07-29 46.0
Corliss Cosme Cara Losch East Konex 1KVGSF08 Corliss Cosme Won GTXPlusPro 2017-07-14 2017-09-23 4989.0

ON is the most common keyword used in join clauses as WHERE is only for inner joins, and USING requires the names in each table to match exactly.

Outer Join

So far we’ve only looked at inner joins–the join which produces an intersection of the data in the joined tables. There are, however, times when you might want to return all the rows form one table matched up with values (where they exist) from another table. For example, if we wanted a list of all the sales agents with a indication of they have deals in the pipeline, we could run the following query with a LEFT OUTER JOIN:

SELECT DISTINCT sales_teams.sales_agent AS agent,
       sales_pipeline.deal_stage
  FROM sales_teams
       LEFT OUTER JOIN sales_pipeline ON sales_teams.sales_agent = sales_pipeline.sales_agent
 WHERE sales_pipeline.deal_stage = "In_Progress"
agent deal_stage
Anna Snelling In_Progress
Cecily Lampkin In_Progress
Mei-Mei Johns In_Progress
Violet Mclelland In_Progress
Corliss Cosme In_Progress
Rosie Papadopoulos In_Progress
Garret Kinder In_Progress

A RIGHT OUTER JOIN does the exact same thing as a LEFT OUTER JOIN, it just returns all the values from the second table specified (after the LEFT OUTER JOIN expression) instead of from the first table specified (after the FROM keyword). A RIGHT OUTER JOINcan be rewritten as a LEFT OUTER JOIN, and vice versa so they are not very common. If we did want to use a RIGHT OUTER JOIN for our previous query, we would write it like this:

SELECT DISTINCT sales_teams.sales_agent AS agent,
       sales_pipeline.deal_stage
  FROM sales_pipeline
       RIGHT OUTER JOIN sales_teams ON sales_pipeline.sales_agent = sales_teams.sales_agent
 WHERE sales_pipeline.deal_stage = "In_Progress"
agent deal_stage
Anna Snelling In_Progress
Cecily Lampkin In_Progress
Mei-Mei Johns In_Progress
Violet Mclelland In_Progress
Corliss Cosme In_Progress
Rosie Papadopoulos In_Progress
Garret Kinder In_Progress

A FULL OUTER JOIN query is written just like a LEFT OUTER JOIN query or a RIGHT OUTER JOIN query and will return all the values from both the first and second tables specified as they match up together so there will be null values in columns returned from both tables. In LEFT OUTER JOIN query results, only the values returned from the first table in the join condition can have null values in the joined rows. In RIGHT OUTER JOIN results, only the values returned from the second table in the join condition can have null values in the joined rows.

Self Join

One final join type we will look at is the self join. A self join is useful when a table has the values from one column re-used in another column. The canonical example of this kind of re-use is in an employee table where each employee has a unique id, and this id is also in the column listing the manager for the employee. If you would like a list of all the employees with their managers names, you would need to use a self join to pull that data, and the query would look like this:

SELECT emp.name AS employee,
       mgr.name AS manager
  FROM employees AS emp
       JOIN employees AS mgr ON emp.mgr_id = mgr.emp_id

And the first several rows of the resulting table would look like this:

employee manager
Anna Snelling Dustin Brinkmann
Cecily Lampkin Dustin Brinkmann
Mei-Mei Johns Melvin Marxen
Violet Mclelland Cara Losch
Corliss Cosme Cara Losch
Rosie Papadopoulos Cara Losch
Garret Kinder Cara Losch
Wilburn Farren Cara Losch
Elizabeth Anderson Cara Losch
Daniell Hammack Rocco Neubert
Cassey Cress Rocco Neubert

Exercises (Continued from previous section)

There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.

Exercise 19

Write a query joining the patients table to the medications table using the patient column. Use WHERE in your join condition.. Your results should contain the columns patient, marital, race, ethnicity, and gender from the patients table and description (renamed as medications), and reasondescription (renamed as diagnosis) from the medications table.

Try query

Hint

SELECT the columns patient.patient, patient.marital, patient.race, patient.ethnicity, patient.gender, medications.description (relabeled AS medication), and medications.reasondescription (relabeled AS diagnosis) FROM the tables patients, and medications WHERE the column patients.patient = the column medications.patient.

See solution

Exercise 20

Write the same query again, but this time join the tables with USING.

Try query

Hint

SELECT the columns patient.patient, patient.marital, patient.race, patient.ethnicity, patient.gender, medications.description relabeled AS medication, and medications.reasondescription relabeled AS diagnosis FROM the table patients, JOIN the table medications USING (patient).

See solution

Exercise 21

Write the same query again, but this time join the tables using ON.

Try query

Hint

SELECT the columns patient.patient, patient.marital, patient.race, patient.ethnicity, patient.gender, medications.description relabeled AS medication, and medications.reasondescription relabeled AS diagnosis FROM the table patients. JOIN the table medications ON the column medications.patient when it = patients.patient.

See solution

Exercise 22

Write a query to show a list of all the asian female patients and the procedures they had. (Use LEFT OUTER JOIN and join on the patient column from the patient and procedure tables.) Include the columns patients.patient, patients.race, patients.ethnicity, patients.gender, procedures.description (relabeled procedure), and procedures.reasondescription (relabeled diagnosis) in your results.

Try query

Hint

SELECT the columns patients.patient, patients.race, patients.ethnicity, patients.gender, procedures.description relabeled AS procedure, procedures.reasondescription relabeled AS diagnosis FROM the table patients. Use a LEFT OUTER JOIN on the table procedures ON the column patients.patient when it = procedures.patient. Only return the records WHERE the gender = ”F” AND the race = “asian”.

See solution