How to combine the results of multiple SQL queries using the UNION statement

In a previous article we talked about the various type of JOIN we can use in a MariaDB/MySQL database. This time, instead, we take a look at the UNION statement: how it works, how we can use it to combine the result of queries run on different tables, and what are its peculiarities.

In this tutorial you will learn:

  • How to use the UNION statement in a MariaDB/MySQL server
  • What are the properties of the UNION statement


union-statement-result

The result of an UNION statement

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Os-independent
Software A working MariaDB/MySQL database
Other Basic knowledge of the MariaDB/MySQL database
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

The UNION statement

The UNION statement let us combine the results of two or more queries. While when performing a JOIN we can execute some kind of action or retrieve additional information on the base of existing relationships between tables, when using the UNION statement, if some conditions are met, the rows resulting from queries launched on different, even unrelated tables, can be combined. In this tutorial we will see a basic and a real world example of how can we  use the UNION statement in a MariaDB/MySQL environment.

A Basic Example

Let’s start with a very basic example to introduce the peculiarities of the UNION statement. Suppose we have two completely unrelated tables: the first called “movie” and the second one “color”. In the former, each row contains information about a movie: the title, the genre and the release date. The latter hosts just the name of some colors. Here is how the tables look like:

+----+---------------+---------+--------------+
| id | title         | genre   | release_date |
+----+---------------+---------+--------------+
|  1 | A New Hope    | fantasy | 1977-05-25   |
|  2 | The Godfather | Drama   | 1972-05-24   |
+----+---------------+---------+--------------+

+----+--------+
| id | name   |
+----+--------+
|  1 | blue   |
|  2 | yellow |
+----+--------+


And this is their description:

+--------------+-------------+------+-----+---------+----------------+
| Field        | Type        | Null | Key | Default | Extra          |
+--------------+-------------+------+-----+---------+----------------+
| id           | int(2)      | NO   | PRI | NULL    | auto_increment |
| title        | varchar(20) | NO   |     | NULL    |                |
| genre        | varchar(20) | NO   |     | NULL    |                |
| release_date | date        | NO   |     | NULL    |                |
+--------------+-------------+------+-----+---------+----------------+

+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(2)      | NO   | PRI | NULL    | auto_increment |
| name  | varchar(10) | NO   |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+

As said before, this two tables have absolutely no connection with each other. By using the UNION statement, however, we can combine the results of two separate queries launched on them. Let’s run:

SELECT title, genre FROM movie UNION SELECT id, name FROM color;

The command above returns the following result:

+---------------+---------+
| title         | genre   |
+---------------+---------+
| A New Hope    | fantasy |
| The Godfather | Drama   |
| 1             | blue    |
| 2             | yellow  |
+---------------+---------+

Let’s explain. We performed two different SELECT queries: in the first one we selected the value of the columns “title” and “genre” for every row in the movie table. In the second one, instead, we selected the “id” and “name” columns from the “color” table, again without using any filter.

Even if the two tables are completely unrelated, since we used the UNION statement between the two queries, the rows returned by each of them are combined: the result is the table you can see above.

Even if in the vast majority of real world cases the columns selected from the tables involved would probably have the same data types, in the silly example above, we can clearly see how the UNION happens even if the columns of the two original tables contains different data types: both the column selected from the “movie” table are of  the VARCHAR data type, while the “id” column of the “color” table is of type INT. This is possible because the database automatically performs the needed data conversions.



Another very important thing to notice is that the columns in the UNION result, inherited their names from the ones selected in the first query, the one at the left of the UNION keyword: “title” and “genre”. Looking at the above example would probably make you ask what the UNION statement can be useful for in real life scenario: let’s see another example.

The fantasy football case

Some time ago I have been involved in the creation of a small fantasy football application. In the database of the application, there was a table called “club”, which hosted information about the fantasy clubs involved in the competition. This is an extract of it:

+----+-----------------+--------+
| id | name            | budget |
+----+-----------------+--------+
|  1 | Havana Blu      |      4 |
|  2 | Longobarda      |      4 |
|  3 | Real Siderno    |      0 |
|  4 | Earthquake Team |     66 |
|  5 | Kalapagos       |     33 |
|  6 | Cantasant       |      5 |
|  7 | F.C. Mojito     |      0 |
|  8 | Apoel Nicotina  |      1 |
|  9 | Dharma          |      0 |
| 10 | Real 1908       |     12 |
+----+-----------------+--------+

In the same project there was also a table call “calendar”, in which each row represented a match between two of the clubs listed above. Since we had 10 clubs, each championship day hosted a total of 5 matches. As an example, here is an extract of  all the matches of the first four days:

+----+-----+------+-------------+-------+--------------+
| id | day | host | host_scores | guest | guest_scores |
+----+-----+------+-------------+-------+--------------+
|  1 |   1 |    2 |        75.5 |     8 |           67 |
|  2 |   1 |    4 |          80 |     6 |           77 |
|  3 |   1 |    7 |          63 |     9 |         71.5 |
|  4 |   1 |    3 |        79.5 |     5 |           68 |
|  5 |   1 |   10 |          64 |     1 |         72.5 |
|  6 |   2 |    5 |        66.5 |    10 |         65.5 |
|  7 |   2 |    9 |          82 |     3 |         62.5 |
|  8 |   2 |    6 |          83 |     7 |         69.5 |
|  9 |   2 |    8 |          77 |     4 |         79.5 |
| 10 |   2 |    1 |          67 |     2 |         81.5 |
| 11 |   3 |    4 |          73 |     2 |           58 |
| 12 |   3 |    7 |        70.5 |     8 |         75.5 |
| 13 |   3 |    3 |        66.5 |     6 |           88 |
| 14 |   3 |   10 |        74.5 |     9 |         60.5 |
| 15 |   3 |    5 |        68.5 |     1 |         72.5 |
| 16 |   4 |    9 |          68 |     5 |           69 |
| 17 |   4 |    6 |          60 |    10 |           66 |
| 18 |   4 |    8 |        70.5 |     3 |         73.5 |
| 19 |   4 |    2 |        71.5 |     7 |           79 |
| 20 |   4 |    1 |        68.5 |     4 |           68 |
+----+-----+------+-------------+-------+--------------+

The first column of each row contains a surrogate key used as the primary key for the table. The second one contains the integer representing the day the match is part of. The host, host_scores, and guest, guest_scores columns contain, respectively, the id and the scores of the club which played as host and the ones of the club which played as guest.



Now, say we want to generate a rank in which all the clubs are listed in descending order on the base of the total scores they performed in the first four championship days. If each club id was listed only in a column, say “host”, the operation would be really easy: we would just calculate the sum of the scores using the SUM() aggregate function, and group the results by the id of the clubs, displaying them in descending order:

SELECT
    host, SUM(host_scores) AS total_scores
FROM
    calendar
GROUP BY
    host
ORDER BY
    total_scores DESC

However, since each championship day a club plays alternatively as host and as guest, the query above would not return the results we want, but would produce the total scores of a team including only the scores made when it played as host (or alternatively, as guest).

That’s one case where the UNION statement can come in handy: we can perform two separate queries, one involving the “host” and “host_scores” columns, and the other involving the “guest” and “guest_scores” ones; we can then use the UNION statement to append the row resulting from the second query to the ones returned by the first, and finally calculate the aggregate values. Additionally, we can perform a JOIN with the “club” table, to make the name of each club appear in the result. Here is the complete query:

SELECT
    data.team_id
    , club.name
    , SUM(scores) AS total_scores

FROM  (
    SELECT
        host as team_id
        ,host_scores AS scores
    FROM
        calendar

    UNION ALL

    SELECT
        guest
        ,guest_scores
    FROM
        calendar
) AS data

JOIN club ON club.id = data.team_id
GROUP BY
    data.team_id
ORDER BY
    total_scores DESC;

Here is the result of the query:

+---------+-----------------+--------------+
| team_id | name            | total_scores |
+---------+-----------------+--------------+
|       6 | Cantasant       |          308 |
|       4 | Earthquake Team |        300.5 |
|       8 | Apoel Nicotina  |          290 |
|       2 | Longobarda      |        286.5 |
|       3 | Real Siderno    |          282 |
|       9 | Dharma          |          282 |
|       7 | F.C. Mojito     |          282 |
|       1 | Havana Blu      |        280.5 |
|       5 | Kalapagos       |          272 |
|      10 | Real 1908       |          270 |
+---------+-----------------+--------------+

As you can see, at the end of the fourth championship day, the “Cantasant” team was the one with the highest scores. Another thing to notice in the query above, is the use of the ALL keyword together with UNION: it was necessary because when the UNION statement is used, by default, duplicate rows are removed; if UNION ALL is used, instead, the rows are preserved.

Conclusions

In this tutorial we learned to know the UNION statement in MariaDB/MySQL databases. We saw a basic example to demonstrate some of the properties of the statement and a real world example, taken from a real project. To summarize, the characteristics of a UNION statement:

  • In the resulting table, the name of the columns selected in the first query are used;
  • The number of columns must be the same in all the queries;
  • The data types of the columns can be different, the database will perform the conversion;
  • By default, when the UNION statement is used, duplicate rows in the results are removed: to avoid this we can use UNION ALL ;

Take further expand your knowledge of the UNION statement, you can take a look at the official documentation.



Comments and Discussions
Linux Forum