How to compare files using diff

The diff utility is, in the vast majority of cases, installed by default in every Linux distribution out there. The program is used to calculate and display the differences between the contents of two files. It is mainly used when working with source code two compare the same versions of two files and highlight the differences between them. In this article we will learn the various modes in which diff can work and how to create a diff file which can later be applied as a patch with the patch utility.

In this tutorial you will learn:

  • How to use diff
  • How to display the output of diff on two columns when using diff in normal mode
  • How to read the diff output in normal, context and unified mode
  • How to create a diff file and apply it as a patch with the patch utility

How to compare files using diff

How to compare files using diff

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution independent
Software diff, patch
Other None
Conventions # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux-commands to be executed as a regular non-privileged user

The diff utility

The diff utility compares files line by line; its syntax is very simple:

$ diff [OPTION] FILES

All we have to do is to invoke the program followed by the path of the files we want to compare. Before we can take a look at some usage examples, we need to learn to read the output of the utility, and what is the meaning of the symbols used in the output produced by it. We can summarize them in the following table:

Symbol Meaning
a An “addition” is needed in order for the content of the two files to match
c A “change” action is needed in order for the content of the two files to match
d A “delete” action is needed in order for the content of the two files to match
< Indicates a line from the first file
> Indicates a line from the second file


We can now see some examples of the basic diff usage. Suppose we have two files, called lotr0.txt and lotr1.txt. The content of the first file is the following:

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all, and in the darkness bind them,
In the Land of Mordor where the Shadows lie
# end

You surely recognized the “ring” poem from the “Lord of The Rings” book. Now suppose the second file, lotr1.txt, contains the following lines instead:

# The ring poem in the black speech of mordor
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
Ash nazg durbatulûk, ash nazg gimbatul,
ash nazg thrakatulûk,agh burzum-ishi krimpatul,
In the Land of Mordor where the Shadows lie

The content of the two files is pretty similar, but not identical. Let’s run the diff utility on them, and see what output it produces:

$ diff lotr0.txt lotr1.txt
0a1
> # The ring poem in the black speech of mordor
6,7c7,8
< One Ring to rule them all, One Ring to find them,
< One Ring to bring them all, and in the darkness bind them,
---
> Ash nazg durbatulûk, ash nazg gimbatul,
> ash nazg thrakatulûk,agh burzum-ishi krimpatul,
9d9
< # end

On the first line of the output, we can read 0a1; what does this mean? In this case we are notified that for the first file to match the content of the second, at its beginning (line 0), a new line should be “added” (a), which corresponds to the first line (1) of the second file. What is this line? The one reported after the > symbol on the second line of the output:

> # The ring poem in the black speech of mordor

This makes sense: the line doesn’t exist in the first file, so it should be added for the content of the two files to match.

Let’s continue. We can see the following notation 6,7c7,8: this means that lines 6 to 7 in the first file (6,7) should be changed in order to match lines 7 to 8 (7,8) int the second file. How should them be changed? The lines from the first file, which we can distinguish because preceded by the < symbol, are:

< One Ring to rule them all, One Ring to find them,
< One Ring to bring them all, and in the darkness bind them,

They should be changed to the following lines of the second file, which can be spotted because they are preceded by the > symbol in the diff output:

> Ash nazg durbatulûk, ash nazg gimbatul,
> ash nazg thrakatulûk,agh burzum-ishi krimpatul,

The lines from the first file, and the lines from the second one, in the output, are separated by three dashes: (---).

Finally, we have the 9d9 notation: this means that in order for the the content of the two files to match, line 9 in the first file (# end) should be deleted in order to match line 9 of the second file.

Displaying the output side-by-side

In the examples above we can see that the output produced by the diff utility is organized “vertically”. If we prefer, we can make so that it is formatted and displayed using two columns. All we have to do is to use the -y option (short
for --side-by-side):

$ diff -y lotr0.txt lotr1.txt
                                                              > # The ring poem in the black speech of mordor
Three Rings for the Elven-kings under the sky,                  Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,              Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,                              Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne                        One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.                    In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,             | Ash nazg durbatulûk, ash nazg gimbatul,
One Ring to bring them all, and in the darkness bind them,    | ash nazg thrakatulûk,agh burzum-ishi krimpatul,
In the Land of Mordor where the Shadows lie                     In the Land of Mordor where the Shadows lie
# end                                                         <

The content of the first file is displayed in the left column, and that of the second one, in the right one. We can easily spot the differences between them: what lines exist only in one of the two, and what lines are different. The -y option can only be used when working with diff in “normal” mode, which is the default. Other modes exists: we talk about them in the next section.

Normal, context and unified mode

By default the diff utility works in normal mode, and produces an output similar to the one we saw in the previous examples. There are, however, other two modes we can use: the context and unified mode. Let’s take a look
at them.

The context mode

The context mode can be used by invoking the program with the -c option, (short for --context). In our case it would produce the following output:

$ diff -c lotr0.txt lotr1.txt
*** lotr0.txt   2021-03-13 16:10:25.248286081 +0100
--- lotr1.txt   2021-03-13 15:30:54.060911632 +0100
***************
*** 1,9 ****
  Three Rings for the Elven-kings under the sky,
  Seven for the Dwarf-lords in their halls of stone,
  Nine for Mortal Men doomed to die,
  One for the Dark Lord on his dark throne
  In the Land of Mordor where the Shadows lie.
! One Ring to rule them all, One Ring to find them,
! One Ring to bring them all, and in the darkness bind them,
  In the Land of Mordor where the Shadows lie
- # end
--- 1,9 ----
+ # The ring poem in the black speech of mordor
  Three Rings for the Elven-kings under the sky,
  Seven for the Dwarf-lords in their halls of stone,
  Nine for Mortal Men doomed to die,
  One for the Dark Lord on his dark throne
  In the Land of Mordor where the Shadows lie.
! Ash nazg durbatulûk, ash nazg gimbatul,
! ash nazg thrakatulûk,agh burzum-ishi krimpatul,
  In the Land of Mordor where the Shadows lie


Let’s take a look at this result. First of all we can see that the two files are referenced by using different symbols: *** for the first one, and --- for the second one.

The first two lines provide information about the two files. We can see:

  • The file name
  • The file modification time with timezone (+0100 in this case)

The first two lines are separated from the rest of the output by 15 asterisk (***************).

What we see immediately after the separator, is the notation which specifies what is the range of lines of the first file reported in the output, in this case lines 1 to 9 (1,9). After this notation, the lines themselves are reported. The same happens for the second file. We can see that certain lines are preceded by some symbols; let’s see what is their meaning:

Symbol Meaning
! The lines prefixed by this symbol in the first file needs to be changed to the lines preceded by it in the second file, in order for the content of the two files to match
The lines preceded by this symbol in the first file, should be deleted in order for the content of the two files to match
+ The lines in the second file preceded by this symbol should be added to the first file for the content of the two files to match

The unified mode

To use the diff utility in “unified” mode, we must invoke it by using the -u option, which is the short form of --unified. That’s how the output of diff in unified mode would look in this case:

$ diff -u lotr0.txt lotr1.txt
--- lotr0.txt   2021-03-13 16:10:25.248286081 +0100
+++ lotr1.txt   2021-03-13 15:30:54.060911632 +0100
@@ -1,9 +1,9 @@
+# The ring poem in the black speech of mordor
 Three Rings for the Elven-kings under the sky,
 Seven for the Dwarf-lords in their halls of stone,
 Nine for Mortal Men doomed to die,
 One for the Dark Lord on his dark throne
 In the Land of Mordor where the Shadows lie.
-One Ring to rule them all, One Ring to find them,
-One Ring to bring them all, and in the darkness bind them,
+Ash nazg durbatulûk, ash nazg gimbatul,
+ash nazg thrakatulûk,agh burzum-ishi krimpatul,
 In the Land of Mordor where the Shadows lie
-# end

The first two lines produced when diff is invoked with the -u option, are the same of the “context” mode, and displays information about the two files. The only big difference here is that the output is not separated depending on the file it belongs to: all the lines are “unified”.

Creating a diff file and applying it as a patch

Suppose we want to apply the necessary changes to the content of the first file we used in the previous examples, lotr0.txt, so that it is updated to match the content of the second file, lotr1.txt; how would we proceed? To achieve our goal we can use the patch utility and apply a diff file to the original one. A diff file contains the output of diff, so to create one, all we have to do is to redirect the output of the utility:

$ diff -u lotr0.txt lotr1.txt > lotr.patch


Once we have our diff file, we can apply the necessary changes to the original file using the patch utility:

$ patch -b lotr0.txt lotr.patch

We invoked patch using the -b option: this is not mandatory but is useful since it makes so that a backup of the original file is created before the patch is applied (in this case it will be named lotr0.txt.orig). The arguments we
provided are:

  • The name of the original file on which the patch should be applied
  • The name of the file containing the patch.

After the patch is applied the lotr0.txt file should be identical to lotr1.txt. We can verify it by using diff again, which this time should produce no output:

$ diff lotr0.txt lotr1.txt

Conclusions

In this tutorial we learn how to use diff to calculate the differences between two files. We saw what are the modes in which diff can be used and what is the meaning of the symbols used in the diff output. Finally we saw how to create a diff file, and how to apply it as a patch using the patch utility.



Comments and Discussions
Linux Forum