GDB debugging tutorial for beginners

You may already be versed in debugging Bash scripts (see How to Debug Bash Scripts if you are not familiar with debugging Bash yet), yet how to debug C or C++? Let’s explore.

GDB is a long-standing and comprehensive Linux debugging utility, which would take many years to learn if you wanted to know the tool well. However, even for beginners, the tool can be very powerful and useful when it comes to debugging C or C++.

For example, if you’re a QA engineer and would like to debug a C program and binary your team is working on and it crashes, you can use GDB to obtain a backtrace (a stack list of functions called – like a tree – which eventually led to the crash). Or, if you are a C or C++ developer and you just introduced a bug into your code, then you can use GDB to debug variables, code and more! Let’s dive in!

In this tutorial you will learn:

  • How to install and use the GDB utility from the command line in Bash
  • How to do basic GDB debugging using the GDB console and prompt
  • Learn more about the detailed output GDB produces

GDB debugging tutorial for beginners

GDB debugging tutorial for beginners

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Linux Distribution-independent
Software Bash and GDB command lines, Linux based system
Other The GDB utility can be installed using the commands provided below
Conventions # – requires linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires linux-commands to be executed as a regular non-privileged user

Setting up GDB and a test program

For this article, we will look at a small test.c program in the C development language, which introduces a division-by-zero error in the code. The code is a bit longer then what is needed in real life (a few lines would do, and no function use would be required), but this was done on purpose to highlight how function names can be seen clearly inside GDB when debugging.

Let’s first install the tools we will require using sudo apt install (or sudo yum install if you use a Red Hat based distribution):

sudo apt install gdb build-essential gcc

The build-essential and gcc are going to help you compile the test.c C program on your system.

Next, let us define the test.c script as follows (you can copy and paste the following into your favorite editor and save the file as test.c):

int actual_calc(int a, int b){
  int c;
  c=a/b;
  return 0;
}

int calc(){
  int a;
  int b;
  a=13;
  b=0;
  actual_calc(a, b);
  return 0;
}

int main(){
  calc();
  return 0;
}


A few notes about this script: You can see that when the main function will be started (the main function is the always the main and first function called when you start the compiled binary, this is part of the C standard), it immediately calls the function calc, which in turn calls atual_calc after setting a few variables a and b to 13 and 0 respectively.

Executing our script and configuring core dumps

Let us now compile this script using gcc and execute the same:

$ gcc -ggdb test.c -o test.out
$ ./test.out
Floating point exception (core dumped)

The -ggdb option to gcc will ensure that our debugging session using GDB will be a friendly one; it adds GDB specific debugging information to the test.out binary. We name this output binary file using the -o option to gcc, and as input we have our script test.c.

When we execute the script we immediately get a cryptic message Floating point exception (core dumped). The part we are interested for the moment is the core dumped message. If you do not see this message (or if you do see the message but cannot locate the core file), you can setup better core dumping as follows:

if ! grep -qi 'kernel.core_pattern' /etc/sysctl.conf; then
  sudo sh -c 'echo "kernel.core_pattern=core.%p.%u.%s.%e.%t" >> /etc/sysctl.conf'
  sudo sysctl -p
fi
ulimit -c unlimited

Here we are first making sure there is no Linux Kernel core pattern (kernel.core_pattern) setting made yet in /etc/sysctl.conf (the configuration file for setting system variables on Ubuntu and other operating systems), and – provided no existing core pattern was found – add a handy core file name pattern (core.%p.%u.%s.%e.%t) to the same file.

The sysctl -p command (to be executed as root, hence the sudo) next ensures the file is immediately reloaded without requiring a reboot. For more information on the core pattern, you can see the Naming of core dump files section which can be accessed by using the man core command.

Finally, the ulimit -c unlimited command simply sets the core file size maximum to unlimited for this session. This setting is not persistent across restarts. To make it permanent, you can do:

sudo bash -c "cat << EOF > /etc/security/limits.conf
* soft core unlimited
* hard core unlimited
EOF

Which will add * soft core unlimited and * hard core unlimited to /etc/security/limits.conf, ensuring there are no limits for core dumps.

When you now re-execute the test.out file you should see the core dumped message and you should be able to see a core file (with the specified core pattern), as follows:

$ ls
core.1341870.1000.8.test.out.1598867712  test.c  test.out

Let’s next examine the metadata of the core file:

$ file core.1341870.1000.8.test.out.1598867712
core.1341870.1000.8.test.out.1598867712: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from './test.out', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: './test.out', platform: 'x86_64'

We can see that this is a 64-Bit core file, which user ID was in use, what the platform was, and finally what executable was used. We can also see from the filename (.8.) that it was a signal 8 which terminated the program. Signal 8 is SIGFPE, a Floating point exception. GDB will later show us that this is a arithmetic exception.

Using GDB to analyze the core dump

Let’s open the core file with GDB and assume for a second we do not know what happened (if you’re a seasoned developer, you may have already seen the actual bug in the source!):

$ gdb ./test.out ./core.1341870.1000.8.test.out.1598867712
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test.out...
[New LWP 1341870]

Core was generated by `./test.out'.
Program terminated with signal SIGFPE, Arithmetic exception.

#0  0x000056468844813b in actual_calc (a=13, b=0) at test.c:3
3     c=a/b;
(gdb)


As you can see, on the first line we called gdb with as first option our binary and as second option the core file. Simply remember binary and core. Next we see GDB initialize, and we are presented with some information.

If you see a warning: Unexpected size of section.reg-xstate/1341870’ in core file.` or similar message, you may ignore it for the time being.

We see that the core dump was generated by test.out and were are told that the signal was an SIGFPE, arithmetic exception. Great; we already know something is amiss with our mathematics, and perhaps not with our code!

Next we see the frame (please think about a frame like a procedure in code for the time being) on which the program terminated: frame #0. GDB adds all sorts of handy information to this: the memory address, the procedure name actual_calc, what our variable values were, and even at one line (3) of which file (test.c) the issue happened.

Next we see the line of code (line 3) again, this time with the actual code (c=a/b;) from that line included. Finally we are presented with a GDB prompt.

The issue is likely very clear by now; we did c=a/b, or with variables filled in c=13/0. But human cannot divide by zero, and a computer can’t therefore either. As no-one told a computer how to divide by zero, an exception occurred, an arithmetic exception, a floating point exception / error.

Backtracing

So let’s see what else we can discover about GDB. Let’s look at a few basic commands. The fist one is the one you are most likely to use most often: bt:

(gdb) bt
#0  0x000056468844813b in actual_calc (a=13, b=0) at test.c:3
#1  0x0000564688448171 in calc () at test.c:12
#2  0x000056468844818a in main () at test.c:17

This command is a shorthand for backtrace and basically gives us a trace of the current state (procedure after procedure called) of the program. Think about it like a reverse order of things that happened; frame #0 (the first frame) is the last function which was being executed by the program when it crashed, and frame #2 was the very first frame called when the program was started.

We can thus analyze what happened: the program started, and main() was automatically called. Next, main() called calc() (and we can confirm this in the source code above), and finally calc() called actual_calc and there things went wrong.

Nicely, we can see each line at which something happened. For example, the actual_calc() function was called from line 12 in test.c. Note that it is not calc() which was called from line 12 but rather actual_calc() which makes sense; test.c ended up executing to line 12 as far as the calc() function is concerned, as this is where the calc() function called actual_calc().

Power user tip: if you use multiple threads, you can use the command thread apply all bt to obtain a backtrace for all threads which were running as the program crashed!

Frame inspection

If we want, we can inspect each frame, the matching source code (if it is available), and each variable step by step:

(gdb) f 2
#2  0x000055fa2323318a in main () at test.c:17
17    calc();
(gdb) list
12    actual_calc(a, b);
13    return 0;
14  }
15  
16  int main(){
17    calc();
18    return 0;
19  }
(gdb) p a
No symbol "a" in current context.

Here we ‘jump into’ frame 2 by using the f 2 command. f is a short hand for the frame command. Next we list the source code by using the list command, and finally try to print (using the p shorthand command) the value of the a variable, which fails, as at this point a was not defined yet at this point in the code; note we are working at line 17 in the function main(), and the actual context it existed in within the bounds of this function/frame.

Note that the source code display function, including some of the source code displayed in the previous outputs above, is only available if the actual source code is available.

Here we immediately also see a gotcha; if the source code is different then the code which the binary was compiled from, one can be easily misled; the output may show non-applicable / changed source. GDB does not check if there is a source code revision match! It is thus of paramount importance that you use the exact same source code revision as the one from which your binary was compiled.

An alternative is to not use the source code at all, and simply debug a particular situation in a particular function, using a newer revision of the source code. This often happens for advanced developers and debuggers who likely do not need too many clues about where the issue may be in a given function and with provided variable values.

Let’s next examine frame 1:

(gdb) f 1
#1  0x000055fa23233171 in calc () at test.c:12
12    actual_calc(a, b);
(gdb) list
7   int calc(){
8     int a;
9     int b;
10    a=13;
11    b=0;
12    actual_calc(a, b);
13    return 0;
14  }
15  
16  int main(){

Here we can again see plenty of information being output by GDB which will aid the developer in debugging the issue at hand. Since we are now in calc (on line 12), and we have already initialized and subsequently set the variables a and b to 13 and 0 respectively, we can now print their values:

(gdb) p a
$1 = 13
(gdb) p b
$2 = 0
(gdb) p c
No symbol "c" in current context.
(gdb) p a/b
Division by zero


Note that when we try and print the value of c, it still fails as again c is not defined up to this point (developers may speak about ‘in this context’) yet.

Finally, we look into frame #0, our crashing frame:

(gdb) f 0
#0  0x000055fa2323313b in actual_calc (a=13, b=0) at test.c:3
3     c=a/b;
(gdb) p a
$3 = 13
(gdb) p b
$4 = 0
(gdb) p c
$5 = 22010

All self evident, except for the value reported for c. Note that we had defined the variable c, but had not given it an initial value yet. As such c is really undefined (and it was not filled by the equation c=a/b yet as that one failed) and the resulting value was likely read from some address space to which the variable c was assigned (and that memory space was not initialized/cleared yet).

Conclusion

Great. We were able to debug a core dump for a C program, and we leaned the basics of GDB debugging in the meantime. If you are a QA engineer, or a junior developer, and you have understood and learned everything in this tutorial well, you are already quite a bit ahead of most QA engineers, and potentially other developers around you.

And the next time you watch Star Trek and Captain Janeway or Captain Picard want to ‘dump the core’, you’ll make a broader smile for sure. Enjoy debugging your next dumped core, and leave us a comment below with your debugging adventures.



Comments and Discussions
Linux Forum