How to build a docker image using a Dockerfile

Docker skills are high in demand mainly because, thanks to the Docker we can automate the deployment of applications inside so-called containers, creating tailored environments that can be easily replicated anywhere the Docker technology is supported. In this tutorial we will see how to create a Docker image from scratch, using a Dockerfile. We will learn the most important instructions we can use to customize our image, how to build the image, and how to run containers based on it.

In this tutorial you will learn:

How to create a docker image using a Dockerfile
Some of the most frequently used Dockerfile instructions
How to achieve data persistence in containers

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions
Category	Requirements, Conventions or Software Version Used
System	Os-independent
Software	Docker
Other	A running Docker daemon The docker command line utility Familiarity with the Linux command line interface
Conventions	# – requires given linux commands to be executed with root privileges either directly as a root user or by use of `sudo` command $ – requires given linux commands to be executed as a regular non-privileged user

Images and containers

Before we start, it may be useful to define clearly what we mean when we talk about images and containers in the context of Docker. Images can be considered as building blocks of the Docker world. They represents the “blueprints” used to create containers. Indeed, when a container is created it represents a concrete instance of the images it is based on.

Many containers can be created from the same image. In the rest of this article we will learn how to provide the instructions needed to create an image tailored to our needs inside a Dockerfile, how to actually build the image, and how to run a container based on it.

Build our own image using a Dockerfile

To build our own image we will use a Dockerfile. A Dockerfile contains all the instructions needed to create and setup an image. Once our Dockerfile is ready we will use the docker build command to actually build the image.

The first thing we should do is to create a new directory to host our project. For the sake of this tutorial we will build an image containing the Apache web server, so we will name the root directory of the project “dockerized-apache”:

$ mkdir dockerized-apache

This directory is what we call the build context. During the build process, all the files and directories contained in it, including the Dockerfile we will create, are sent to the Docker daemon so they can be easily accessed, unless they are listed into the .dockerignore file.

Let’s create our Dockerfile. The file must be called Dockerfile and will contain, as we said above, all the instructions needed to create an image with the desired features. We fire up our favorite text editor and start by writing the following instructions:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

The first instruction we must provide is FROM: with it we can specify an existing image we will use as base (this is called a base image), to create our own. In this case our base image will be ubuntu. Apart from the image name, we also used a tag, in order to specify the version of the image we want to use, in this case 18.10. If no tag is specified the latest tag is used by default: this will cause the latest available version of the base image to be used. If the image is not already present on our system it will be downloaded from dockerhub.

After the FROM instruction, we used LABEL. This instruction is optional, can be repeated multiple times, and is used to add metadata to our image. In this case we used it to specify the image maintainer.

The RUN instruction

At this point, if we run docker build, we will just produce an image identical to the base one, except for the metadata we added. This would be of no use for us. We said we want to “dockerize” the Apache web server, so the next thing to do in our Dockerfile, is to provide an instruction to install the web server as part of the image. The instruction that let us accomplish this task is RUN:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2

The RUN instruction is used to execute commands on top of the image. One very important thing to remember is that for every RUN instruction we use, a new layer is created and added to the stack. On this regard Docker is very smart: already built layers will be “cached”: this means that if we build an image based on our Dockerfile, and then we decide, for example, to add another RUN instruction (and thus a new layer) at the end of it, the build will not start from scratch, but will run only the new instructions.

For this to happen, of course, the instructions already built on the Dockerfile must not be modified. Is even possible to avoid this behavior completely when building an image, just using the --no-cache option of the docker build command.

In our case we used the RUN instruction to execute the apt-get update && apt-get -y install apache2 commands. Notice how we passed the -y option to the apt-get install command: this option makes so that an affirmative answer is given automatically to all the confirmations required by the command. This is necessary because we are installing the package non-interactively.

Exposing port 80

As we know, the Apache web server listens on port 80 for standard connections. We must instruct Docker to make that port accessible on the container. To accomplish the task we use the EXPOSE function and provide the port number. For security reasons the specified port is opened only when the container is launched. Let’s add this instruction to our Dockerfile:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

Building the image

At this point we can already try to build our image. From inside the root directory of our project, “dockerized-apache”, we run the following command:

$ sudo docker build -t linuxconfig/dockerized-apache .

Let’s examine the command. First of all, we prefixed the command with sudo, in order to run it with administrative privileges. It is possible to avoid this, by adding an user to the docker group, but this represents a security risk. The -t option we provided, short for --tag, let us apply a repository name and optionally a tag to our image if the build succeeds.

Finally, the . instructs docker to look for the Dockerfile in the current directory. As soon as we launch the command, the build process will start. The progress and build messages will be displayed on screen:

Sending build context to Docker daemon 2.048
kB
Step 1/4 : FROM ubuntu:18.10
Trying to pull repository docker.io/library/ubuntu ...
[...]

In few minutes our image should be created successfully. To verify it, we can run the docker images command, which returns a list of all the images existing in our local Docker repository:

$ sudo docker images
REPOSITORY                        TAG                 IMAGE ID
CREATED             SIZE
linuxconfig/dockerized-apache     latest              7ab7b6873614        2
minutes ago       191 MB

As expected the image appears in the list. As we can notice, since we didn’t provide a tag (only a repository name, linuxconfig/dockerized-apache) the latest tag has been automatically applied to our image. An ID has been also assigned to the it, 7ab7b6873614: we can use it to reference the image in future commands.

Launching a container based on the image

Now that our image is ready, we can create and launch a container based on it. To accomplish the task we use the docker run command:

$ sudo docker run --name=linuxconfig-apache -d -p 8080:80
linuxconfig/dockerized-apache apachectl -D FOREGROUND

Let’s examine the command above. The first option we provided was --name: with it, we specify a name for the container, in this case “linuxconfig-apache”. If we omitted this option a random generated name would have been assigned to our container.

The -d option (short for --detach) causes the container to run in background.

The -p option, short for --publish, is needed in order to publish a container port (or a range of ports) to the host system. The syntax of the option is the following:

-p localhost_port:container_port

In this case we published the port 80 we previously exposed in the container, to the host port 8080. For the sake of completeness we must say that it’s also possible to use the -P option (short for --publish-all) instead, causing all the ports exposed in the container to be mapped to random ports on the host.

The last two things we specified in the command above, are: the image the container should be based on, and the command to run when the container is started, which is optional. The image is of course linuxconfig/dockerized-apache, the one we built before.

The command we specified is apachectl -D FOREGROUND. With this command the Apache web server is launched in foreground mode: this is mandatory for it to work in the container. The docker run command runs the specified command on a new container:

$ sudo docker run --name=linuxconfig-apache -d
-p 8080:80 linuxconfig/dockerized-apache apachectl -D FOREGROUND
a51fc9a6dd66b02117f00235a341003a9bf0ffd53f90a040bc1122cbbc453423

What is the number printed on the screen? It is the ID of the container! Once we have the container up and running, we should be able to access the page served by the default Apache VirtualHost at the localhost:8080 address (port 8080 on the host is mapped on port 80 on the container):

Default Apache index.html page

Our setup is working correctly. If we run the docker ps command, which lists all the active containers in the system, we can retrieve information about our container: id (short version, easier to reference form the command line for a human), the image it was run from, the command used, its creation time and current status, ports mapping and name.

$ sudo docker ps
CONTAINER ID        IMAGE                           COMMAND
CREATED             STATUS              PORTS                  NAMES
a51fc9a6dd66        linuxconfig/dockerized-apache   "apachectl -D FORE..."   28
seconds ago      Up 28 seconds       0.0.0.0:8080->80/tcp
linuxconfig-apache

To stop it the container all we need to do is to reference it by its id or name, and run the docker stop command. For example:

$ sudo docker stop linuxconfig-apache

To start it again:

$ sudo docker start linuxconfig-apache

Execute command directly via the Dockerfile

Since here we built a basic image, and at runtime, using the docker run command, we specified the command to be launched when the container is started. Sometimes we want to specify the latter directly inside the Dockerfile. We can do it in two ways: using CMD or ENTRYPOINT.

Both instructions can be used for the same purpose but they behave differently when a command is also specified from the command line. Let’s see how.

The CMD instruction

The CMD instruction can basically be used in two forms. The first is the exec form:

CMD ["/usr/sbin/apachectl", "-D", "FOREGROUND"]

The other one is the shell form:

CMD /usr/sbin/apachectl -D FOREGROUND

The exec from is usually preferred. It is worth notice that when using the exec form a shell is not invoked, therefore variable expansions will not happen. If variable expansion is needed we can use the shell form or we can invoke a shell directly in the exec mode, as:

CMD ["sh", "-c", "echo", "$HOME"]

The CMD instruction can be specified only once in the Dockerfile. If multiple CMD options are provided, only the last will take effect. The purpose of the instruction is to provide a default command to be launched when the container starts:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

CMD ["/usr/sbin/apachectl", "-D", "FOREGROUND"]

The command specified with CMD inside the Dockerfile, works as a default, and will be overridden if another command is specified from the command line when executing docker run.

The ENTRYPOINT instruction

The ENTRYPOINT instruction can also be used to configure a command to be used when the container is started, and like CMD, both the exec and shell form can be used with it. The big difference between the two is that a command passed from the command line will not override the one specified with ENTRYPOINT: instead it will be appended to it.

By using this instruction we can specify a basic command and modify it with the options we provide when running the docker-run command, making our container behave like an executable. Let’s see an example with our Dockerfile:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

ENTRYPOINT ["/usr/sbin/apachectl"]

In this case we substituted the CMD instruction with ENTRYPOINT and also removed the -D FOREGROUND option from the exec format. Suppose we now rebuild the image, and recreate the container using the following command:

$ sudo docker run --name=linuxconfig-apache -d -p 8080:80
linuxconfig/dockerized-apache -D FOREGROUND

When the container starts, the -D FOREGROUND arguments is appended to the command provided in the Dockerfile with the ENTRYPOINT instruction, but only if using the exec form. This can be verified by running the docker ps command (here we added some options to the command, to better display and format its output, selecting only the information we need):

$ sudo docker ps --no-trunc --format
"{{.Names}}\t{{.Command }}"
linuxconfig-apache	"/usr/sbin/apachectl -D FOREGROUND"

Just like CMD, the ENTRYPOINT instruction can be provided only one time. If it appears multiple time in the Dockerfile, only the last occurrence will be considered. It is possibile to override the default ENTRYPOINT of the image from the command line, by using the --entrypoint option of the docker run command.

Combining CMD and ENTRYPOINT

Now that we know the peculiarity of the CMD and ENTRYPOINT instructions we can also combine them. What can we obtain by doing so? We can use ENTRYPOINT to specify a valid base command, and the CMD instruction to specify default parameters for it.

The command will run with those default parameters by default, unless we override them from the command line when running docker run. Sticking to our Dockerfile, we could write:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

ENTRYPOINT ["/usr/sbin/apachectl"]
CMD ["-D", "FOREGROUND"]

If we rebuild the image from this Dockerfile, remove the previous container we created, and re-launch the docker run command without specifying any additional argument, the /usr/bin/apachectl -D FOREGROUND command will be executed. If we instead provide some arguments, they will override those specified in the Dockerfile with the CMD instruction. For example, if we run:

$ sudo docker run --name=linuxconfig-apache -d -p 8080:80
linuxconfig/dockerized-apache -X

The command that will be executed when starting the container will be /usr/bin/apachectl -X. Let’s verify it:

$ sudo docker ps --no-trunc --format
"{{.Names}}\t{{.Command }}"
linuxconfig-apache	"/usr/sbin/apachectl -X"

The command launched, was as expected: the -X option, by the way, makes so that the httpd daemon is launched in debug mode.

Copying files into the container

Our “dockerized” Apache server works. As we saw, if we navigate to localhost:8080, we visualize the default apache welcome page. Now, say we have a website ready to be shipped with the container, how can we “load” it so that Apache will serve it instead?

Well, for the sake of this tutorial we will just replace the default index.html file. To accomplish the task we can use the COPY instruction. Suppose we have an alternative index.html file inside the root of our project (our build context) with this content:

<html>
  <body>
    <h2>Hello!</h2>
    <h3>This file has been copied into the container with the COPY instruction!</h3>
  </body>
</html>

We want to load it and copy it to the /var/www/html directory inside the container, therefore inside our Dockerfile we add the COPY instruction:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

ENTRYPOINT ["/usr/sbin/apachectl"]
CMD ["-D", "FOREGROUND"]
COPY index.html /var/www/html/index.html

We rebuild the image and the container. If now navigate to localhost:8080, we will see the new message:

# new message

The COPY instruction can be used to copy both files and directories. When the destination path doesn’t exist it is created inside the container. All new files and directories are created with a UID and GID of 0.

Another possibile solution to copy files inside the container is to use the ADD instruction, which is more powerful than COPY. With this instruction we can copy files, directories but also URLs. Additionally, if we copy a local tar archive with a recognized compressed format, it will be automatically uncompressed and copied as a directory inside the container.

The ideal strategy would be to use COPY unless the additional features provided by ADD are really needed.

Creating a VOLUME

In the previous example, to demonstrate how the COPY instruction works, we replaced the default index.html file of the default Apache VirtualHost inside the container.

If we stop and start the container, we will still find the modification we made, but if the container for some reason is removed, all the data contained on its writable layer will be lost with it. How to solve this problem? One approach is to use the VOLUME instruction:

FROM ubuntu:18.10
LABEL maintainer="egidio.docile@linuxconfig.org"

RUN apt-get update && apt-get -y install apache2
EXPOSE 80

ENTRYPOINT ["/usr/sbin/apachectl"]
CMD ["-D", "FOREGROUND"]
COPY index.html /var/www/html/index.html
VOLUME /var/www/html

The VOLUME instruction takes one or more directories (in this case /var/www/html) and causes them to be used as mountpoints for external, randomly-named volumes generated when the container is created.

This way, the data we put into the directories used as mountpoints will be persisted inside the mounted volumes and will still exist even if the container is destroyed. If a directory set to be used as a mountpoint already contains data at initialization time, that data is copied inside the volume that is mounted on it.

Let’s rebuild the image and the container. We can now verify that the volume has been created and its in use by inspecting the container:

$ sudo docker inspect linuxconfig-apache
[...]
"Mounts": [
            {
                "Type": "volume",
                "Name": "8f24f75459c24c491b2a5e53265842068d7c44bf1b0ef54f98b85ad08e673e61",
                "Source": "/var/lib/docker/volumes/8f24f75459c24c491b2a5e53265842068d7c44bf1b0ef54f98b85ad08e673e61/_data",
                "Destination": "/var/www/html",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
[...]

As already said, the volume will survive even after the container is destroyed so our data will not be lost.

The VOLUME instruction inside the Dockefile, as we can see from the output of the docker inspect command above, makes so that a randomly named volume is created. To define a named volume, or to mount an already existing volume inside a container, we must specify it at runtime, when running the docker run command, using the -v option (short for --volume). Let’s see an example:

$ sudo docker run --name=linuxconfig-apache -d -p 8080:80 -v
myvolume:/var/www/html linuxconfig/dockerized-apache

In the command above, we used the -v option specifying the volume name (very important: notice that it is not a path, but a simple name) and the mountpoint inside the container using the following syntax:

<volume_name>:<mountpoint>

When we perform such command the volume named “myvolume” will be mounted at the specific path inside the container (the volume will be created if it doesn’t already exist). As we said before, if the volume is empty, the data already existing on the mountpoint inside the container will be copied inside of it. Using the docker volume ls command, we can confirm a volume with the name we specified has been created:

$ sudo docker volume ls
DRIVER              VOLUME NAME
local               myvolume

To remove a volume we use the docker volume rm command, and provide the name of the volume to remove. Docker, however, will not let us remove a volume used by an active container:

$ sudo docker volume rm myvolume
Error response from daemon: Unable to remove volume, volume still in use: remove
myvolume: volume is in use -
[95381b7b6003f6165dfe2e1912d2f827f7167ac26e22cf26c1bcab704a2d7e02]

Another approach for data persistance, especially useful during development, is to bind-mount a host directory inside the container. This approach has the advantage of letting us work on our code locally with our favorite tools and see the effect of the changes immediately reflected inside the container, but has a big disadvantage: the container becomes dependent on the host directory structure.

For this reason, since portability is one of the main targets of Docker, is not possible to define a bind-mount inside a Dockerfile, but only at runtime. To accomplish this task, we use the -v option of docker run command again, but this time we provide the path of a directory inside the host filesystem instead of a volume name:

$ sudo docker run --name=linuxconfig-apache -d -p 8080:80 -v
/path/on/host:/var/www/html linuxconfig/dockerized-apache

When launching the command above, the host directory /path/on/host will be mounted on /var/www/html inside the container. If the directory on host doesn’t exist it is created automatically. In this case data in the mountpoint directory inside the container (/var/www/html in our example) is not copied to the host directory that is mounted on it, as it happens for volumes instead.

Conclusion

In this tutorial we learned the basics concepts needed to create and build a docker image using a Dockerfile and how to run a container based on it. We built a very simple image which let us run a “dockerized” version of the Apache web server. In the process, we saw how to use the FROM instruction, which is mandatory to specify a base image to work on, the LABEL instruction to add metadata to our image, the EXPOSE instruction to declare the ports to be exposed in the container. We also learned how to map said port(s) to the host system port(s).

We learned how to use the
RUN instruction to run commands on the image, and we learned how to specify a command to be executed when the container is started both from command line and inside the Dockerfile. We saw how to accomplish this by using the CMD and ENTRYPOINT instructions, and what are the differences between the two. Finally, we saw how to COPY data inside the container, and how to achieve data persistence using volumes. In our examples, we discussed only a small subset of the instructions that can be used in a Dockerfile.

For a complete and detailed list, please consult the official Docker documentation. In the meantime, if you want to know how to build an entire LAMP stack using Docker and the docker-compose tool, you can take a look at our article on How to create a docker-based LAMP stack using docker-compose on Ubuntu 18.04 Bionic Beaver Linux.