Extract a Backtrace from a Running Daemon

More Information

http://www.bonsai.com/wiki/howtos/debugging/daemon_backtrace/
GDB Documentation

Attach to the daemon with gdb

IMPORTANT NOTES:

  1. You’ll need to be root (or the owner of the daemon process).
  2. You’ll want to do this first part “briskly” since the daemon will be stopped until you give the “continue” command.
  3. This is reasonably invasive; be prepared to restart the daemon if something goes wrong.

Figure out the pid of the daemon with “pidof” (“ps” works too):

[root@lap2 tmp]# /sbin/pidof ntpd
2580

Attach gdb to the daemon. The gdb command line wants the absolute path to the daemon binary and the pid as arguments. You can either type them in directly or try using the fancy command line below which uses “which” and “pidof” directly. After it starts and attachs it prints a bunch of attaching stuff we issue the “continue” command:

[root@lap2 tmp]# gdb `which ntpd` `pidof ntpd`
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
................................................................
(lots of startup removed)
................................................................
Loaded symbols for /lib64/libresolv.so.2
0x0000000000e04a53 in __select_nocancel () from /lib64/libc.so.6
(gdb) continue
Continuing.

At this point the daemon should be running normally. There should be no performance impact on the daemon. You can take a coffee break here if you like …

Dumping all thread backtraces to a log file

It takes a few seconds to dump the thread backtraces. Once again while you are following this procedure the daemon is stopped. Execute this procedure “briskly” to get to the continue command at the end.

First we’ll need to “break” back into the debugger. In the gdb session, which now should not have a prompt, type Control-C:

^C
Program received signal SIGINT, Interrupt.
0x0000000000e04a53 in __select_nocancel () from /lib64/libc.so.6
(gdb)

At this point the gdb prompt returns. The following commands will disable the silly “more” paging prompt, enable logging to a file and dump all thread backtraces to that file. Once we are done we’ll use the “continue” command again:

(gdb) set height 0
(gdb) set logging file /tmp/mybacktrace.log
(gdb) set logging on
Copying output to /tmp/mybacktrace.log.
(gdb) thread apply all bt
Thread 1 (Thread 0x7f40ebc686f0 (LWP 2580)):
#0  0x0000000000e04a53 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007f40ebca7506 in settimeofday () from /usr/sbin/ntpd
#2  0x0000000000d44576 in __libc_start_main (main=0x7f40ebca7dc0 ,
argc=6, ubp_av=0x7ffff3d12fd8, init=0x7f40ebcf93a0 ,
fini=, rtld_fini=,
stack_end=0x7ffff3d12fc8) at libc-start.c:220
#3  0x00007f40ebc9dd49 in settimeofday () from /usr/sbin/ntpd
#4  0x00007ffff3d12fc8 in ?? ()
#5  0x000000000000001c in ?? ()
#6  0x0000000000000006 in ?? ()
#7  0x00007ffff3d13ede in ?? ()
#8  0x00007ffff3d13ee3 in ?? ()
#9  0x00007ffff3d13ee6 in ?? ()
#10 0x00007ffff3d13eee in ?? ()
#11 0x00007ffff3d13ef1 in ?? ()
#12 0x00007ffff3d13f03 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) continue
Continuing.

At this point the daemon is happily running under the debugger again and you can go for another coffee break …

Detaching from the daemon (leaving it running)

To detach from the daemon and leave it running simply “break” into the debugger with ^C and then give the “quit” command. You will be asked to confirm that you want to detach and leave the process running:

^C
Program received signal SIGINT, Interrupt.
0x0000000000e04a53 in __select_nocancel () from /lib64/libc.so.6
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/sbin/ntpd, process 2580

That’s it! The daemon should be running as though nothing had happened.