In a previous article we talked about the various type of JOIN
we can use in a MariaDB/MySQL database. This time, instead, we take a look at the UNION
statement: how it works, how we can use it to combine the result of queries run on different tables, and what are its peculiarities.
In this tutorial you will learn:
- How to use the UNION statement in a MariaDB/MySQL server
- What are the properties of the UNION statement
Software Requirements and Conventions Used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Os-independent |
Software | A working MariaDB/MySQL database |
Other | Basic knowledge of the MariaDB/MySQL database |
Conventions | # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
The UNION statement
The UNION
statement let us combine the results of two or more queries. While when performing a JOIN we can execute some kind of action or retrieve additional information on the base of existing relationships between tables, when using the UNION
statement, if some conditions are met, the rows resulting from queries launched on different, even unrelated tables, can be combined. In this tutorial we will see a basic and a real world example of how can we use the UNION
statement in a MariaDB/MySQL environment.
A Basic Example
Let’s start with a very basic example to introduce the peculiarities of the UNION
statement. Suppose we have two completely unrelated tables: the first called “movie” and the second one “color”. In the former, each row contains information about a movie: the title, the genre and the release date. The latter hosts just the name of some colors. Here is how the tables look like:
+----+---------------+---------+--------------+
| id | title | genre | release_date |
+----+---------------+---------+--------------+
| 1 | A New Hope | fantasy | 1977-05-25 |
| 2 | The Godfather | Drama | 1972-05-24 |
+----+---------------+---------+--------------+
+----+--------+
| id | name |
+----+--------+
| 1 | blue |
| 2 | yellow |
+----+--------+
And this is their description:
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| id | int(2) | NO | PRI | NULL | auto_increment |
| title | varchar(20) | NO | | NULL | |
| genre | varchar(20) | NO | | NULL | |
| release_date | date | NO | | NULL | |
+--------------+-------------+------+-----+---------+----------------+
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(2) | NO | PRI | NULL | auto_increment |
| name | varchar(10) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
As said before, this two tables have absolutely no connection with each other. By using the UNION
statement, however, we can combine the results of two separate queries launched on them. Let’s run:
SELECT title, genre FROM movie UNION SELECT id, name FROM color;
The command above returns the following result:
+---------------+---------+
| title | genre |
+---------------+---------+
| A New Hope | fantasy |
| The Godfather | Drama |
| 1 | blue |
| 2 | yellow |
+---------------+---------+
Let’s explain. We performed two different SELECT
queries: in the first one we selected the value of the columns “title” and “genre” for every row in the movie table. In the second one, instead, we selected the “id” and “name” columns from the “color” table, again without using any filter.
Even if the two tables are completely unrelated, since we used the UNION
statement between the two queries, the rows returned by each of them are combined: the result is the table you can see above.
Even if in the vast majority of real world cases the columns selected from the tables involved would probably have the same data types, in the silly example above, we can clearly see how the UNION
happens even if the columns of the two original tables contains different data types: both the column selected from the “movie” table are of the VARCHAR
data type, while the “id” column of the “color” table is of type INT
. This is possible because the database automatically performs the needed data conversions.
Another very important thing to notice is that the columns in the UNION
result, inherited their names from the ones selected in the first
query, the one at the left of the UNION
keyword: “title” and “genre”. Looking at the above example would probably make you ask what the UNION
statement can be useful for in real life scenario: let’s see another example.
The fantasy football case
Some time ago I have been involved in the creation of a small fantasy football application. In the database of the application, there was a table called “club”, which hosted information about the fantasy clubs involved in the competition. This is an extract of it:
+----+-----------------+--------+
| id | name | budget |
+----+-----------------+--------+
| 1 | Havana Blu | 4 |
| 2 | Longobarda | 4 |
| 3 | Real Siderno | 0 |
| 4 | Earthquake Team | 66 |
| 5 | Kalapagos | 33 |
| 6 | Cantasant | 5 |
| 7 | F.C. Mojito | 0 |
| 8 | Apoel Nicotina | 1 |
| 9 | Dharma | 0 |
| 10 | Real 1908 | 12 |
+----+-----------------+--------+
In the same project there was also a table call “calendar”, in which each row represented a match between two of the clubs listed above. Since we had 10 clubs, each championship day hosted a total of 5 matches. As an example, here is an extract of all the matches of the first four days:
+----+-----+------+-------------+-------+--------------+
| id | day | host | host_scores | guest | guest_scores |
+----+-----+------+-------------+-------+--------------+
| 1 | 1 | 2 | 75.5 | 8 | 67 |
| 2 | 1 | 4 | 80 | 6 | 77 |
| 3 | 1 | 7 | 63 | 9 | 71.5 |
| 4 | 1 | 3 | 79.5 | 5 | 68 |
| 5 | 1 | 10 | 64 | 1 | 72.5 |
| 6 | 2 | 5 | 66.5 | 10 | 65.5 |
| 7 | 2 | 9 | 82 | 3 | 62.5 |
| 8 | 2 | 6 | 83 | 7 | 69.5 |
| 9 | 2 | 8 | 77 | 4 | 79.5 |
| 10 | 2 | 1 | 67 | 2 | 81.5 |
| 11 | 3 | 4 | 73 | 2 | 58 |
| 12 | 3 | 7 | 70.5 | 8 | 75.5 |
| 13 | 3 | 3 | 66.5 | 6 | 88 |
| 14 | 3 | 10 | 74.5 | 9 | 60.5 |
| 15 | 3 | 5 | 68.5 | 1 | 72.5 |
| 16 | 4 | 9 | 68 | 5 | 69 |
| 17 | 4 | 6 | 60 | 10 | 66 |
| 18 | 4 | 8 | 70.5 | 3 | 73.5 |
| 19 | 4 | 2 | 71.5 | 7 | 79 |
| 20 | 4 | 1 | 68.5 | 4 | 68 |
+----+-----+------+-------------+-------+--------------+
The first column of each row contains a surrogate key
used as the primary key
for the table. The second one contains the integer representing the day the match is part of. The host
, host_scores
, and guest
, guest_scores
columns contain, respectively, the id and the scores of the club which played as host and the ones of the club which played as guest.
Now, say we want to generate a rank in which all the clubs are listed in descending order on the base of the total scores they performed in the first four championship days. If each club id was listed only in a column, say “host”, the operation would be really easy: we would just calculate the sum of the scores using the SUM()
aggregate function, and group the results by the id of the clubs, displaying them in descending order:
SELECT
host, SUM(host_scores) AS total_scores
FROM
calendar
GROUP BY
host
ORDER BY
total_scores DESC
However, since each championship day a club plays alternatively as host and as guest, the query above would not return the results we want, but would produce the total scores of a team including only the scores made when it played as host (or alternatively, as guest).
That’s one case where the UNION
statement can come in handy: we can perform two separate queries, one involving the “host” and “host_scores” columns, and the other involving the “guest” and “guest_scores” ones; we can then use the UNION
statement to append the row resulting from the second query to the ones returned by the first, and finally calculate the aggregate values. Additionally, we can perform a JOIN with the “club” table, to make the name of each club appear in the result. Here is the complete query:
SELECT
data.team_id
, club.name
, SUM(scores) AS total_scores
FROM (
SELECT
host as team_id
,host_scores AS scores
FROM
calendar
UNION ALL
SELECT
guest
,guest_scores
FROM
calendar
) AS data
JOIN club ON club.id = data.team_id
GROUP BY
data.team_id
ORDER BY
total_scores DESC;
Here is the result of the query:
+---------+-----------------+--------------+
| team_id | name | total_scores |
+---------+-----------------+--------------+
| 6 | Cantasant | 308 |
| 4 | Earthquake Team | 300.5 |
| 8 | Apoel Nicotina | 290 |
| 2 | Longobarda | 286.5 |
| 3 | Real Siderno | 282 |
| 9 | Dharma | 282 |
| 7 | F.C. Mojito | 282 |
| 1 | Havana Blu | 280.5 |
| 5 | Kalapagos | 272 |
| 10 | Real 1908 | 270 |
+---------+-----------------+--------------+
As you can see, at the end of the fourth championship day, the “Cantasant” team was the one with the highest scores. Another thing to notice in the query above, is the use of the ALL
keyword together with UNION
: it was necessary because when the UNION
statement is used, by default, duplicate rows are removed; if UNION ALL
is used, instead, the rows are preserved.
Conclusions
In this tutorial we learned to know the UNION
statement in MariaDB/MySQL databases. We saw a basic example to demonstrate some of the properties of the statement and a real world example, taken from a real project. To summarize, the characteristics of a UNION
statement:
- In the resulting table, the name of the columns selected in the first query are used;
- The number of columns must be the same in all the queries;
- The data types of the columns can be different, the database will perform the conversion;
- By default, when the
UNION
statement is used, duplicate rows in the results are removed: to avoid this we can useUNION ALL
;
Take further expand your knowledge of the UNION statement, you can take a look at the official documentation.