Objective

Learn to know the different type of joins and how to use them working with mysql or mariadb databases

Requirements

  • No particular requirements

Difficulty

EASY

Conventions

  • # - requires given linux command to be executed with root privileges either directly as a root user or by use of sudo command
  • $ - given linux command to be executed as a regular non-privileged user

Introduction

In a relational database system the data is organized in tables, composed by rows and columns. Each row is an instance of the entity represented by the table, with the columns used as its properties. Relationships between tables are established by the use of foreign keys, and the statement with which we can perform queries that span over multiple tables it's called a join. In this tutorial we will see the different type of joins available when using MySQL or MariaDB.

The "movie_store" database

What are we going to do in this tutorial, is to reproduce some concrete cases in which joins can help us accomplish what we want.

The first thing to do is to create a test database. Let's say we own a movie store and we need to keep track of the titles we have available: we are going to create a "movie_store" database and a table to host information about the movie directors:

MariaDB [(none)]> CREATE DATABASE movie_store;
MariaDB [(none)]> USE movie_store;
MariaDB [movie_store]> CREATE TABLE director(
    -> id SMALLINT(1) UNSIGNED NOT NULL AUTO_INCREMENT,
    -> name VARCHAR(35) NOT NULL,
    -> birthdate DATE NOT NULL,
    -> PRIMARY KEY(id));
Here is the visual representation of the table we just created:

MariaDB [movies]> DESCRIBE director;
+-----------+----------------------+------+-----+---------+----------------+
| Field     | Type                 | Null | Key | Default | Extra          |
+-----------+----------------------+------+-----+---------+----------------+
| id        | smallint(1) unsigned | NO   | PRI | NULL    | auto_increment |
| name      | varchar(35)          | NO   |     | NULL    |                |
| birthdate | date                 | NO   |     | NULL    |                |
+-----------+----------------------+------+-----+---------+----------------+


First we created the database movie_store, than we "entered" it by using the USE statement, and finally created the director table. As we said before, each row in a table represents an "instance" of the entity represented by the table itself, in this case a movie director.

Each director has some properties which are represented by the table columns, so for example, each director has a name and a birthday. Each row has an unique identifier, which is the value in the column that is the primary key of the table.

In this example the primary key is also what is called a surrogate key. This type of key is an "artificial" identifier, in the sense that it is not related to the nature of the entity (a directory in this case): It has not semantic meaning, and it is generated and used by the system for its own internal working. The key is generated automatically, and since it has the AUTO_INCREMENT property, it's incrementally inserted each time we create a new row, so we don't need to insert it explicitly:

MariaDB [movie_store]> INSERT INTO director(`name`, `birthdate`) VALUES
    -> ('George Lucas', '1944-05-14'),
    -> ('George Romero', '1940-02-04'),
    -> ('John McTiernan', '1951-01-08'),
    -> ('Rian Johnson', '1973-12-17');
Our table now contains four directors:

+----+----------------+------------+
| id | name           | birthdate  |
+----+----------------+------------+
|  1 | George Lucas   | 1944-05-14 |
|  2 | George Romero  | 1940-02-04 |
|  3 | John McTiernan | 1951-01-08 |
|  4 | Rian Johnson   | 1973-12-17 |
+----+----------------+------------+
Each of those directors has one or more movies associated with him: how could we represent them ? We cannot add information about the movies in this table: this would mean having a lot of repeated data: each time we add a movie, we would repeat its director information, and this would be horrible to say the least. We need to create a dedicated table to host movies information, and at the same time, we need to be able to create a reference between it and its director. That's what foreign keys are for:



MariaDB [movie_store]> CREATE TABLE title(
    -> id SMALLINT(1) UNSIGNED NOT NULL AUTO_INCREMENT,
    -> name VARCHAR(35) NOT NULL,
    -> release_date DATE NOT NULL,
    -> genre VARCHAR(10) NOT NULL,
    -> director_id SMALLINT(1) UNSIGNED NOT NULL,
    -> PRIMARY KEY(id),
    -> FOREIGN KEY(director_id) REFERENCES director(id));
We created the table just as before, defining a primary key, and adding a foreign key constraint. This is how we enable a relationship between two tables: basically we are imposing that for a row to be inserted, the value of the director_id column must correspond to a value in the id column of the director table (which is unique, since it's the table primary key). In other words, each title must have a reference to an existing director in our database, otherwise an error will be triggered: this ensures consistency.

Let's insert some titles in our table:

MariaDB [movie_store]> INSERT INTO title (`name`, `release_date`, `genre`, `director_id`) VALUES
    -> ('Night of the Living Dead', '1968-10-01', 'horror', 2),
    -> ('Revenge of the Sith', '2005-05-19', 'space opera', 1),
    -> ('Die Hard', '1988-07-15', 'action', 3);
That's it, we have some title. First we inserted that masterpiece of a movie that is 'Night of the Living Dead', directed by George Romero: observe that the 2 in the director_id column corresponds to the id of George Romero in the director table.

Using the same principle we inserted a movie from George Lucas (id 1 in director table), 'Revenge of the Sith', and 'Die Hard', a famous action movie directed by John McTiernan (id 3 in director table). At the moment we have no movies from Rian Johnson: there is a reason for this (apart from the fact that I was disappointed by The Last Jedi), and we will see it later on. Now that we setup a very basic database structure, we can start talking about joins.

How many types of join?

Different names are used to reference the same type of joins, but basically we have inner and outer joins. The former are also called crossed joins or simply joins (they are synonyms in MySQL - MariaDB). The latter category includes left and right joins.


Inner joins

An inner join let us match rows in one table with rows in another one. This association can be based on the relationship between the two tables or can be made regardless of it: in this case all rows of a table will be joined with all rows of the other one, producing what it's called a Cartesian product. This has not much sense in our example, but lets demonstrate it:

MariaDB [movie_store]> SELECT * FROM director JOIN title;
+----+----------------+------------+----+--------------------------+--------------+------------+-------------+
| id | name           | birthdate  | id | name                     | release_date | genre      | director_id |
+----+----------------+------------+----+--------------------------+--------------+------------+-------------+
|  1 | George Lucas   | 1944-05-14 |  1 | Night of the Living Dead | 1968-10-01   | horror     |           2 |
|  1 | George Lucas   | 1944-05-14 |  2 | Revenge of the Sith      | 2005-05-19   | space oper |           1 |
|  1 | George Lucas   | 1944-05-14 |  3 | Die Hard                 | 1988-07-15   | action     |           3 |
|  2 | George Romero  | 1940-02-04 |  1 | Night of the Living Dead | 1968-10-01   | horror     |           2 |
|  2 | George Romero  | 1940-02-04 |  2 | Revenge of the Sith      | 2005-05-19   | space oper |           1 |
|  2 | George Romero  | 1940-02-04 |  3 | Die Hard                 | 1988-07-15   | action     |           3 |
|  3 | John McTiernan | 1951-01-08 |  1 | Night of the Living Dead | 1968-10-01   | horror     |           2 |
|  3 | John McTiernan | 1951-01-08 |  2 | Revenge of the Sith      | 2005-05-19   | space oper |           1 |
|  3 | John McTiernan | 1951-01-08 |  3 | Die Hard                 | 1988-07-15   | action     |           3 |
|  4 | Rian Johnson   | 1973-12-17 |  1 | Night of the Living Dead | 1968-10-01   | horror     |           2 |
|  4 | Rian Johnson   | 1973-12-17 |  2 | Revenge of the Sith      | 2005-05-19   | space oper |           1 |
|  4 | Rian Johnson   | 1973-12-17 |  3 | Die Hard                 | 1988-07-15   | action     |           3 |
+----+----------------+------------+----+--------------------------+--------------+------------+-------------+
As you can see, each row of one table has been combined with each row of the other, producing 12 rows.

Let's now see a different use case for a join. Say we want to inspect our database to check all the movies directed by George Lucas we have in store. To accomplish this task we must restrict the join with a ON clause, so that it will be based on the relationship between titles and their director:

MariaDB [movie_store]> SELECT director.name, title.name AS movie_title FROM director
    -> JOIN title ON director.id = title.director_id
    -> WHERE director.name = "George Lucas"
Here is the result of the query above:

+--------------+---------------------+
| name         | movie_title         |
+--------------+---------------------+
| George Lucas | Revenge of the Sith |
+--------------+---------------------+
Using a restricted join, based on the relationship between the two tables, we discovered that we have only one title of George Lucas in store: Revenge of the Sith. Not only we restricted the join on the base of the relationship existing between the two tables, but we further restricted the query to the movies directed by Lucas, using the WHERE statement. If we had omitted it, the query would have produced a table with all the existing director - movie correspondence:

+----------------+--------------------------+
| name           | movie_title              |
+----------------+--------------------------+
| George Lucas   | Revenge of the Sith      |
| George Romero  | Night of the Living Dead |
| John McTiernan | Die Hard                 |
+----------------+--------------------------+
Notice that Rian Johnson is not included in the query. Why this happens? This is a characteristic of the inner joins: they show only rows where a match exists in both tables. Since no correspondence for Rian Johnson exists in the title table, we have no results for this director.


Outer joins

The other type of joins we have are the outer joins. This category is itself divided in left joins and right joins. What is the difference with the inner joins we saw above ? Contrarily to what happens with an inner join, an outer join shows matches even when a correspondence does not exists in both tables. When it's the case, it will show a null value in the requested column(s) of the table where the match doesn't exists.This can be useful, for example, if we want to know if there are some directors associated with no movies. In our case we already know it's the case, but lets verify it using a left join:

MariaDB [movie_store]> SELECT director.name, title.name AS movie_title
    -> FROM director LEFT JOIN title ON title.director_id = director.id
The result of the query:

+----------------+--------------------------+
| name           | movie_title              |
+----------------+--------------------------+
| George Romero  | Night of the Living Dead |
| George Lucas   | Revenge of the Sith      |
| John McTiernan | Die Hard                 |
| Rian Johnson   | NULL                     |
+----------------+--------------------------+
The only director which has no movies in our store is Rian Johnson. When using an outer join the order in which we specify the tables is important. For example, using a LEFT JOIN, as we just did above, when the row from the left table (in this case director) has no match in the rows of the right table (title), a NULL value is specified in each requested column of the latter; when a match is found, instead, it's value is displayed just as happens with an inner join.

A RIGHT JOIN works just the same, the only difference is that the role of the tables are inverted. In the right join all row of the right table which has no match in the left table are marked with a NULL value.

This property of the outer joins it's very useful, but there are cases in which a little confusion can arise, particularly when a table has NULL value allowed in some of its columns.


Comments and Discussions