Data is just information. Data for a particular entity or user can be stored in a container called a relational database. data.world doesn’t use databases. Instead, we use datasets. A dataset is a snapshot of all the information in a database at a given moment in time.
The data in a dataset is further segmented into structures called tables. A table contains information that goes together. For example, all of the people in an address book could go in a table called Contacts. In this table there would be places to put the names, addresses, cities, states, zip codes, phone numbers, etc., for all the people in the address book. The table is organized into columns and rows. A column has, e.g., all the city data or all the name data for every entry in the table, while a row is a single entry that has all the different data for one person.
|Nathanial Hawthorne||27 Union St.||Salem||MA||01970||978-555-4244|
|Lizzie Borden||92 Second St.||Fall River||MA||02721||508-555-5282|
Some tables in a dataset might share data. For instance you might keep a list of Christmas cards written to and received from your friends. This table would probably have the same names in it that your Contacts table did and you could use these tables together. Other tables in your dataset might have nothing to do with any other tables. You might have a table for your Pokemon card collection, e.g., and have all your cards organized in it. Unless you shared your cards with your friends, there would probably be no reason to use your Contacts table with your Pokemon table.
In addition to datasets, data.world also has a projects resource. Projects can contain multiple datasets and give you the ability to use them together in new and more powerful ways than a standard dataset would.
Now that you know where the data is stored you need to know how to get it out. By far the most common language used to retrieve data from a dataset is SQL. If you want to use it to get information from a computer you have to phrase your request in a specific way called a query. Queries are broken up into smaller pieces called clauses. Each clause contains one or more words called keywords. A clause is written in a specific order called its syntax. It isn’t enough just to use the right words in your query, you also have to use the correct syntax for your clauses so that the computer will be able to process the query. For example, when you want to get a list of all the names in your address book you can’t just type a sentence like:
Get all the names out of my address book
Instead, using the example of the table above, you have to use the word SELECT for get, the exact name of the column where the names are kept, and FROM with the exact name of the table that holds your contact list, giving you a query with a SELECT clause and a WHERE clause:
SELECT Name FROM contact_list
For the rest of this section we’ll delve into the meaning of all the basic keywords and where they fit into the syntax of a query with accompanying examples and exercises from real data.world datasets.
An introduction to the
FROM clauses and their related keywords