Multi-Tables

An introduction to multi-tables.

There are times when the data you would like to access in a query isn’t stored in one table so it would be nice to be able to combine multiple tables into one in order to run queries against all the data in both at the same time. As we saw in the intermediate section of these docs, this kind of table combination can be accomplished with UNION, but there are instances in which using UNION is like using a sledgehammer to drive a nail. Enter data.world multi-tables. UNION combines queries; multi-tables combine tables. With multi-tables you can easily combine tables if they have the same columns and then run queries against the resulting table. With UNION, queries can be run against the individual tables before they are joined into one table by the UNION. With multi-tables, the tables are combined first and then can be used in queries just like any other table.

In our sample dataset there is one table for the sales pipeline, but there are many tables for orders: one for each month to be precise. To see all the order data for a quarter you could run the following query using multi-tables:

SELECT *
  FROM [apr_2017_orders, may_2017_orders, jun_2017_orders]
 ORDER BY order_value DESC

The [] are unique to multi-tables, and are used to indicate that what comes in between them is a comma-separated list.

The first several rows of the resulting table would look like:

sales_agent account product order_value create_date
Rosalina Dieter Groovestreet GTK 500 30,288 2017-06-07
Markita Hansen Goodsilron GTK 500 29,617 2017-06-27
Elease Gluck Labdrill GTK 500 27,385 2017-06-18
Elease Gluck Cheers GTK 500 25,464 2017-06-30
Rosalina Dieter Y-corporation GTK 500 25,288 2017-04-05
Elease Gluck Plexzap GTK 500 24,949 2017-06-28
Elease Gluck Lexiqvolax GTK 500 23,746 2017-04-20

If the tables being selected for the query also have similar names (as the ones in the above query do) then you can use a standard Java regular expression to select which tables to included based on pattern-matching their names. For example, to select from all tables in the dataset that contain orders from 2017, the query would be written thusly:

SELECT *
  FROM [/.*2017_orders/]
 ORDER BY order_value DESC

In a standard Java regular expression the . stands as a wildcard for any one character, and the * means to repeat whatever came before it any number of times. In a .* regular expression, the Java single wildcard character is repeated, effectively making the .* regular expression operate the same way as the * wildcard does elsewhere in SQL.

The first several rows of the results from the query would be:

sales_agent account product order_value create_date
Rosalina Dieter Groovestreet GTK 500 30,288 2017-06-07
Rosalina Dieter Groovestreet GTK 500 30,288 2017-06-07
Markita Hansen Goodsilron GTK 500 29,617 2017-06-27
Markita Hansen Goodsilron GTK 500 29,617 2017-06-27
Elease Gluck Xx-holding GTK 500 29,220 2017-12-04
Elease Gluck Xx-holding GTK 500 29,220 2017-12-04
Markita Hansen Cheers GTK 500 29,166 2017-09-26

So far we have used a SELECT * clause to pick the columns returned from each of the tables. This syntax only works if all the columns have identical names. However tables grow and change with time, and you might find yourself wanting to compare tables that have some columns which have the same names and formats, and some that don’t match up. The only columns that can be returned in a multi-table query must match. If you want to include matching and non-matching columns in your query, you’ll need to use UNION. If there are matching and non-matching columns, but you only want to query on the ones that match you can specify them by name as in the following query:

SELECT sales_agent,
       account,
       order_value
  FROM [/.*2017_orders/](sales_agent, account, order_value)
 ORDER BY order_value DESC
sales_agent account order_value
Rosalina Dieter Groovestreet 30,288
Markita Hansen Goodsilron 29,617
Elease Gluck Xx-holding 29,220
Markita Hansen Cheers 29,166
Elease Gluck Cheers 27,971
Elease Gluck Labdrill 27,385

Multi-tables have one more useful, unique function, and it’s that you can add a column to your results which contains the name of the table from which each row was selected. This column is called table_name.

SELECT sales_agent,
       account,
       order_value,
       table_name
  FROM [/.*2017_orders/] AS orders
 ORDER BY order_value DESC
sales_agent account order_value table_name
Rosalina Dieter Groovestreet 30,288 jun_2017_orders
Markita Hansen Goodsilron 29,617 jun_2017_orders
Elease Gluck Xx-holding 29,220 dec_2017_orders
Markita Hansen Cheers 29,166 sep_2017_orders
Elease Gluck Cheers 27,971 oct_2017_orders
Elease Gluck Labdrill 27,385 jun_2017_orders
Rosalina Dieter Xx-holding 26,186 nov_2017_orders