Another piece of data.world-specific functionality is the ability to join tables which are in different datasets and which might not even have the same owner. Normally querying in SQL is database or dataset-specific. However as more and more datasets come on-line, it’s increasingly important to be able to pull information from combinations of them to present a more complete picture of the data.
In the CRM dataset we have been using, we have a table called accounts which has a small bit of information about our client accounts. We have recently learned that one of the regional offices has a dataset with the id “crm-account-stats” which has the table “external_account” which contains other information about the accounts, but doesn’t have some of the data we have. If we combined these tables, we could run a federated query like the following against them:
SELECT accounts.account, accounts2.sector, accounts2.year_established, accounts2.subsidiary_of, accounts.revenue FROM siyeh.`crm-account-stats`.external_account AS accounts2, accounts WHERE accounts2.account = accounts.account
The most difficult part of using federated queries is to know the exact dataset id, table and column names for the table that’s not in your dataset. Currently you can find the dataset id by opening a page for that dataset from data.world and looking in the url. After data.world comes the owner of the dataset’s name, and after that is the dataset’s name. On the main page for the dataset you can also see a list of all the tables in it as well as all of the column names.
``in the query.
You can achieve implicit federation of tables in different datasets by putting them in your own project. For example if we wanted to see the sales price of the products in all the west office orders, we would need to write a federated query because the west office order table is in one dataset an the product table is in another. However, because we have both datasets in the same project the query can be written like this:
SELECT west.create_date, west.sales_agent, west.product, products.sales_price FROM west_office_orders AS west, products WHERE west.product = products.product
The only reason we used a table alias was to shorten a long table name–we didn’t have to specify the owner of the dataset. The first several rows of the results table would look like:
|2017-12-27||Markita Hansen||GTX Basic||550|
|2017-09-16||Vicki Laflamme||GTX Basic||550|
|2017-12-04||Zane Levy||GTX Basic||550|
|2017-09-14||James Ascencio||GTX Basic||550|
|2017-12-04||Vicki Laflamme||GTX Basic||550|
|2017-03-27||Maureen Marcano||GTX Basic||550|
|2017-12-17||Kary Hendrixson||GTX Basic||550|
|2017-11-03||Vicki Laflamme||GTX Basic||550|
|2017-11-26||Zane Levy||GTX Basic||550|
|2017-11-04||James Ascencio||GTX Basic||550|
An introduction to multi-tables.