Multi-Tables
An introduction to multi-tables.
There are times when the data you would like to access in a query isn’t stored
in one table so it would be nice to be able to combine multiple tables into one
in order to run queries against all the data in both at the same time. As we
saw in the intermediate section of these docs, this kind of table combination
can be accomplished with UNION
, but there are instances in which using
UNION
is like using a sledgehammer to drive a nail. Enter data.world
multi-tables. UNION
combines queries; multi-tables combine tables. With
multi-tables you can easily combine tables if they have the same columns and
then run queries against the resulting table. With UNION
, queries can be run
against the individual tables before they are joined into one table by the
UNION
. With multi-tables, the tables are combined first and then can be used
in queries just like any other table.
In our sample dataset there is one table for the sales pipeline, but there are many tables for orders: one for each month to be precise. To see all the order data for a quarter you could run the following query using multi-tables:
SELECT *
FROM [apr_2017_orders, may_2017_orders, jun_2017_orders]
ORDER BY order_value DESC
The []
are unique to multi-tables, and are used to indicate
that what comes in between them is a comma-separated list.
The first several rows of the resulting table would look like:
sales_agent | account | product | order_value | create_date |
---|---|---|---|---|
Rosalina Dieter | Groovestreet | GTK 500 | 30,288 | 2017-06-07 |
Markita Hansen | Goodsilron | GTK 500 | 29,617 | 2017-06-27 |
Elease Gluck | Labdrill | GTK 500 | 27,385 | 2017-06-18 |
Elease Gluck | Cheers | GTK 500 | 25,464 | 2017-06-30 |
Rosalina Dieter | Y-corporation | GTK 500 | 25,288 | 2017-04-05 |
Elease Gluck | Plexzap | GTK 500 | 24,949 | 2017-06-28 |
Elease Gluck | Lexiqvolax | GTK 500 | 23,746 | 2017-04-20 |
If the tables being selected for the query also have similar names (as the ones in the above query do) then you can use a standard Java regular expression to select which tables to included based on pattern-matching their names. For example, to select from all tables in the dataset that contain orders from 2017, the query would be written thusly:
SELECT *
FROM [/.*2017_orders/]
ORDER BY order_value DESC
In a standard Java regular expression the .
stands as a wildcard for any
one character, and the *
means to repeat whatever came before it any number
of times. In a .*
regular expression, the Java single wildcard character is
repeated, effectively making the .*
regular expression operate the same way
as the *
wildcard does elsewhere in SQL.
The first several rows of the results from the query would be:
sales_agent | account | product | order_value | create_date |
---|---|---|---|---|
Rosalina Dieter | Groovestreet | GTK 500 | 30,288 | 2017-06-07 |
Rosalina Dieter | Groovestreet | GTK 500 | 30,288 | 2017-06-07 |
Markita Hansen | Goodsilron | GTK 500 | 29,617 | 2017-06-27 |
Markita Hansen | Goodsilron | GTK 500 | 29,617 | 2017-06-27 |
Elease Gluck | Xx-holding | GTK 500 | 29,220 | 2017-12-04 |
Elease Gluck | Xx-holding | GTK 500 | 29,220 | 2017-12-04 |
Markita Hansen | Cheers | GTK 500 | 29,166 | 2017-09-26 |
So far we have used a SELECT *
clause to pick the columns returned from each
of the tables. This syntax only works if all the columns have identical names.
However tables grow and change with time, and you might find yourself wanting
to compare tables that have some columns which have the same names and formats,
and some that don’t match up. The only columns that can be returned in
a multi-table query must match. If you want to include matching and
non-matching columns in your query, you’ll need to use UNION
. If there are
matching and non-matching columns, but you only want to query on the ones that
match you can specify them by name as in the following query:
SELECT sales_agent,
account,
order_value
FROM [/.*2017_orders/](sales_agent, account, order_value)
ORDER BY order_value DESC
sales_agent | account | order_value |
---|---|---|
Rosalina Dieter | Groovestreet | 30,288 |
Markita Hansen | Goodsilron | 29,617 |
Elease Gluck | Xx-holding | 29,220 |
Markita Hansen | Cheers | 29,166 |
Elease Gluck | Cheers | 27,971 |
Elease Gluck | Labdrill | 27,385 |
Multi-tables have one more useful, unique function, and it’s that you can add
a column to your results which contains the name of the table from which each
row was selected. This column is called table_name
.
SELECT sales_agent,
account,
order_value,
table_name
FROM [/.*2017_orders/] AS orders
ORDER BY order_value DESC
sales_agent | account | order_value | table_name |
---|---|---|---|
Rosalina Dieter | Groovestreet | 30,288 | jun_2017_orders |
Markita Hansen | Goodsilron | 29,617 | jun_2017_orders |
Elease Gluck | Xx-holding | 29,220 | dec_2017_orders |
Markita Hansen | Cheers | 29,166 | sep_2017_orders |
Elease Gluck | Cheers | 27,971 | oct_2017_orders |
Elease Gluck | Labdrill | 27,385 | jun_2017_orders |
Rosalina Dieter | Xx-holding | 26,186 | nov_2017_orders |
An introduction to federated queries.