Stacks, Frames and GDB

One of the things I have been working on in the Python API (hopefully for 7.5) is frame-printing and filtering in GDB.   This is where we allow customization when printing a frame.  The most common example of this is where the user wants a “backtrace”: GDB prints each stack up to the point of where the program stopped executing.

What does this mean, what is  frame-printing and filtering anyway?

A little history.  This is not a new concept.  This capability already exists in the Fedora shipped versions of GDB. It is, however, written as a number of Python wrappers, and is a utility rather than “true” integration with GDB. Because of that you cannot customize existing GDB commands (like backtrace, for example).

What we want to do it tightly integrate it with GDB internals, so that in every instance that GDB needs to print a frame, the user can intercept and customize this action.

What is frame-printing?

Each frame is comprised of a number of individual components, each of which come together to form the printed frame.  Take this snippet of an example backtrace from GDB:

#0  0x00000038ebce6ab8 in poll () from /lib64/
#1  0x00000000005952bb in gdb_wait_for_event (block=1) at ../../archer/gdb/event-loop.c:863
#2  0x000000000059559a in gdb_do_one_event () at ../../archer/gdb/event-loop.c:461
#3  0x0000000000595735 in start_event_loop () at ../../archer/gdb/event-loop.c:490

That is a fairly typical backtrace. In that example there is:  the frame number,  address,  function name,  arguments, and location in the source. This is how GDB prints each frame. Currently, you can modify how GDB prints each frame with a few modifiers such as: “full”, but it is fairly limited. For frame printing, there are two aspects we want to allow:

  • Allow customization of each element in the frame.

This means calling a Python object each time a frame is ready to be printed.  This will be similar to how value printers currently work.  We keep three registration collections: one list for the current object-file, one for the current program-space, and one global list.  Frame formatters/filters self-register in whatever lists are appropriate.  When GDB is ready to print a frame, that frame is passed to each element in the list until one shows interest in printing.  If there are no interested objects, GDB prints it the old fashioned way.

When a frame formatter/filter object shows interest in that frame we call several methods in that object.  The GDB frame is always passed to the Python object as a point of reference, so it may interrogate the current frame’s data.  We don’t really care how the object is constructed, just as long as we can call several methods that the objects contains.  So for the “function” element, we could call the method in the object that describes the “function” in the backtrace.  Each element in the frame that needs to be printed, will have a corresponding method call.  This allows the user to customize that data presented in the backtrace.

  • Allow ad-hoc options

The next problem is if we want to allow customization to printing, we have to somehow allow ad-hoc options to be specified to existing in-built GDB commands like “backtrace”.  This also has bearing on the second part of the frame filtering aspect as well.  But for now, allowing script writers to take options beyond the usual inbuilt options is important too.

What is frame-filtering?

Currently backtraces are printed in a sequential fashion.  They have been printed in this way since pretty much GDB was first written (as well as, I hazard to guess, most debuggers).  This seemed to be the most straightforward and useful way to present the flow of execution to a user.  But, if you think about it, that is not always necessarily the case.  There are a lot of frames, and data, that the user has no interest in,  or, there are several frames that present data contextually that is more useful as one frame.  Anyone who hacks on Python C API knows this ;)

The concept of frame-filtering is best described as “replacement” synthetic frames, and “replaced” frames.  For example take a scripting language. Currently there may be three frames that describe one atomic action in the scripting side of the language:

  • frame 1
  • frame 2
  • frame 3

These are useful from a contextual view of how the interpreter is preparing and constructing that one script-side operation, but not very useful from a presentation perspective of what that one operation really is.  What if we could organize that a little  better?  With frame-filters we allow you to create the concept of a “replacement” synthetic frame,  and “replaced” children:

  • synthetic frame (ie does not really exist)
    • replaced frame 1
    • replaced frame 2
    • replaced frame 3

So in that example,  the user, via a frame-formatter object, has created a synthetic frame that describes what the interpreter is really doing, but we also include the original composite frames that make up that synthetic frame.



There are a number of questions that present themselves with this concept.

  • How do you represent these to MI and Annotations in GDB?
  • Should frame filters/formatters also run on replaced frames?
  • Should we allow the the user to omit replaced frames?
  • How do we number synthetic frames, and replaced frames so that we still honor backtraces that are bound (ie bt 20)?
  • Should we allow frames to be omitted completely?
  • Should value pretty-printers be omitted from the the data printed out in “args” and “locals”, and vice-versa, should frame filters even be passed the value?  (IE, should frame filters be only allowed the manipulate the arg/variable name, but not the content.

I hope to answer these questions in the future, in a future series of blog articles.  As always if you have any feedback, just email me or leave a comment.




GDB Python Scripting

It’s been quite some time since I last wrote something.  I’ve been busy working on Project Archer and GDB – mainly on the GDB Python scripting support.  My big goal at the moment is to move all of of the existing Python scripting API from the Archer git repository to the FSF CVS repository.  This of course means public code reviews.  This is always sometimes a little nerve-racking – even after years of doing it. But … so far, so good!  GDB is very responsive to reviews, and to date everything has gone great.  Hopefully this means we’ll get done by 7.2.  But there is lots left to do, and bugs to fix (not even mentioning new features at some point).  Do you use the Python GDB support? Do you have pretty-printers written using the API? I’d like to know!


I’ve been working on Project Archer for some months now, and it has been pretty interesting. It has also been challenging. There are several deep dark wells of technical knowledge that I’ve had to explore in detail: unwinding, dwarf, debuginfo, and exceptions (generation, handling and personality routines). So I’ve been reading about, and stepping through a lot of these areas in GDB this last week. When does a program grow so big that one mortal human cannot work on its entirety? I don’t know the metric, but I bet GDB surpasses it.

As I’ve worked on improved C++ exception handling in GDB, it occurred to me that the different bugs I’ve filed could ultimately be put in one: “Make GDB work better with the GCC unwinder.” As GCC has changed in some areas, GDB has not changed in tandem with GCC. The next or finish commands relying purely on longjmp breakpoints is an example. (If you “next” over a C++ “throw” statement in GDB you will lose control of the inferior. GDB sets a “longjmp” breakpoint via the “next” command code to re-establish control – but the unwinder for C++ does not use setjmp/longjmp semantics to switch context. Once resumed, the inferior won’t stop at all, or where expected)

So this is a problem. It really irritates me when I lose control of an inferior when debugging. The pain is in proportion to the length of the debugging session. Sometimes I spend hours stepping a process. I’ve cursed a good line on several occasions where this has happened

It’s easy to see this negatively, and even easier to write a negative thing about it. But it is a fact of life. So what’s the problem? Well in most areas the longjmp trick will work. It won’t for C++ exceptions. But this grey area really bothers me. What if there are other areas where expectations do not match? Both GCC and GDB are highly complex programs. They change all the time, and where there is no direct transactional specification (ie debuginfo is written to a specification, so are elf binaries, and so on) the assumptions about how GCC generates code will eventually break. If they break in a big way, they will be fixed – and quickly. But if they break in minor little ways, then the user experience dies as a result of a thousand tiny paper cuts. Or a thousand tiny curses.

Systemtap Editor home

Thanks to all the emails, and suggestion regarding where to host the Systemtap Editor for Eclipse that I am hacking on. I ended up hosting it – under incubation – at the Eclipse Linux Distributions Project.

ViewVC of the subversion repository (ViewVC link)

The editor is under the Systemtap Module.

The danger of rainy weekends

Besides Project Archer I have been mucking around with Systemtap. I’ve always had a bit of trouble writing Systemtap scripts – my brain is not big enough, or my practice high enough to write a comprehensive script without continually looking at the man-pages, language-reference guide or poking around in the Systemtap source. It makes for slow going sometimes.

A couple of days ago, I was chatting with  Frank and he mentioned that Systemtap can now generate coverage on Systemtap’s tapset library with:

stap -L tapset.*

I thought … hmmm.

I’ve been ittching to get back to some Eclipse hacking, and I’ve been waiting for something to come and scratch that itch.

I thought … hmmm.

It’s raining Saturday. “I’ll hack on this for a bit,” I thought.

hmmmmm ….

Well it ate up my whole weekend, but I hacked up a little Systemtap editor in Eclipse that offers syntax highlighting and probe completion.

Here is a view of the editor and completion:

Systemtap Editor Syscall Completion

And here is a screen-shot showing the completion window as we drill down through all the signal probes (in this case to*)

Systemtep Editor Partial Signal Completion

I was very impressed with Eclipse, and how everything just worked on Fedora. It took about a day to get the completion, syntax highlighting and the engine-room work to generate the completion meta-data from Systemtap (have to do it dynamically and cache it). I’ll hack on this project as my “weekend” project – it is still pretty raw. I’ll put the plug-in and source up when I can work-out a place to host it.

Also, while I’m here I’d like to point you in the direction of another Systemtap UI. This has a different focus to what I hack on, and seems to concentrate more on execution. I am more focused on script development. It’s all good.

Project Archer

I’ve started working on Project Archer with a few other hackers. The purpose is to improve C++ debugging with GDB. Under review is the Roadmap and the Development process. If you want to get involved either as a hacker, commentator, tester or are just generally interested, come find us on the Mailing list.

Cool little Systemtap scriptlet

One of the things I’ve always found hard to do via ptrace is system-based state. Watching all processes across a system for a behaviour “trend”.  This is difficult as ptrace is not really designed for that. Frysk tried to address this in a different way. But Systemtap does it in a very scriptable way.

So  …. lately, I’ve been writing a series of articles around Systemtap, and I was hacking up a little script. I found this little tiny scriptlet very useful. It is so simple as well – and child’s play for the experienced Systemtap hackers out there. It simply watches every process for fork/clone and exec. It prints the name and pid for the processes involved. It also watches for a process exec and prints the process name, pid and executable to exec. The actual heavy lifting is done in 6 lines of code, which I find remarkable.

#! /usr/bin/env stap

probe begin {
print ("Tracking process creations .... \n\n")

probe process.create {
printf("%s (%d) created %d\n", execname(), pid(), new_pid)

probe process.exec {
printf("%s (%d) is exec'ing %s\n", execname(), pid(), filename)

probe end {
print("All done!\n")

Example output. During this script run, I run thunderbird for the gnome panel:

sudo ./stap -v ~/process_creation.stp 

Pass 1: parsed user script and 43 library script(s) in 220usr/10sys/223real ms.
Pass 2: analyzed script: 5 probe(s), 7 function(s), 1 embed(s), 0 global(s) in 220usr/60sys/294real ms.
Pass 3: using cached .systemtap/cache/dd/stap_dd2b93e5305e7a0f5b95894e9f0d798a_2825.c
Pass 4: using cached .systemtap/cache/dd/stap_dd2b93e5305e7a0f5b95894e9f0d798a_2825.ko
Pass 5: starting run.
Tracking process creations .... 

hald-runner (2128) created 21509
hald-runner (21509) is exec'ing /usr/lib64/hal/scripts/hal-system-killswitch-get-power
hal-system-kill (21509) created 21510
hal-system-kill (21510) is exec'ing /usr/bin/hal-is-caller-privileged
hal-system-kill (21509) created 21511
hal-system-kill (21511) is exec'ing /bin/basename
hal-system-kill (21509) is exec'ing /usr/lib64/hal/scripts/linux/hal-system-killswitch-get-power-linux
hal-system-kill (21509) created 21512
hal-system-kill (21512) is exec'ing /usr/libexec/hal-ipw-killswitch-linux
gnome-panel (3031) created 21513
gnome-panel (21513) created 21514
gnome-panel (21514) is exec'ing /usr/lib64/qt-3.3/bin/thunderbird
gnome-panel (21514) is exec'ing /usr/kerberos/bin/thunderbird
gnome-panel (21514) is exec'ing /usr/lib64/ccache/thunderbird
gnome-panel (21514) is exec'ing /usr/local/bin/thunderbird
gnome-panel (21514) is exec'ing /usr/bin/thunderbird
thunderbird (21514) created 21515
thunderbird (21515) is exec'ing /bin/uname
thunderbird (21514) is exec'ing /usr/lib64/thunderbird-
thunderbird (21514) created 21516
thunderbird (21516) is exec'ing /usr/bin/dirname
thunderbird (21514) created 21517
thunderbird (21517) is exec'ing /bin/basename
thunderbird (21514) created 21518
thunderbird (21518) is exec'ing /usr/lib64/thunderbird- (21518) created 21519 (21519) is exec'ing /bin/basename (21518) created 21520 (21520) is exec'ing /usr/bin/dirname (21518) created 21521 (21521) created 21522 (21522) is exec'ing /usr/bin/which (21518) created 21523 (21523) is exec'ing /usr/lib64/thunderbird-

Getting started with Systemtap (Part 2)

I’ll continue part 2 of this article on how I built Systemtap from source and installed it.

After I fetched the  source with:

git clone git://

A “systemtap” directory with source was created in my pwd. I like to build out-of-tree to keep the source pristine, so I created a new build directory:

mkdir systemtap_obj
cd systemtap_obj

and ran the configure step


On a Fedora 9 LiveCD install, with a few extra custom rpm’s added, I found I had to install some libraries. The steps to install them are all a bit similar, but here is an example of a missing library error I encountered:

checking sys/capability.h usability... no
checking sys/capability.h presence... no
checking for sys/capability.h... no
configure: error: cannot find required libcap header (libcap-devel may need to be installed)

And here is how I installed the library to fix for this error:

sudo yum install libcap-devel

I had to rerun the configure script several times to catch all the missing libraries.  In the end I had to install both libcap-devel, and elfutils-devel. Your experience may vary depending on your install.

And finally,  I built Systemtap with:


The build took a few minutes. I installed Systemtap with:

sudo make install

The whole process from fetching source, to building it, to installing it took less than five minutes, which was a pleasant surprise.

Tommorrow I’ll take a look at example scripts, but here is a neat example I ran:

sudo stap ~/systemtap/testsuite/systemtap.examples/syscalls_by_proc.stp
Collecting data... Type Ctrl-C to exit and display results

#SysCalls  Process Name

917        thunderbird-bin
807        firefox
489        hal-system-kill
390        tpb
206        dbus-daemon

Getting started with Systemtap (Part 1)

I decided to write this up as a series of articles. I am really interested in the psychology of an individual becoming interested, using and hopefully participating in an open-source project.  So I decided to journal my experiences in a new project. I always like to dabble in side-projects as a hobby to my main job.  And Systemtap is so close to what I do,  so it became a natural choice.

So here is the first short journal of a newbie’s journey of getting involved with Systemtap.  I’ll keep the dispatches short. A dabbler’s use case, if you will. I’ve always wished that someone would do this for Frysk;  hackers – myself included – can sometimes lose the ground-level  perspective. I constantly worry that our projects are too technical, too complex and oblique to attract new developers. So as a new user of Systemtap, I thought, hey,  time to do what I ask for.

I’ll reproduce a lot of the instructions from the website with some small tweaks. The website for getting started is here:

Installing Systemtap from yum on Fedora 9

To install Systemtap from yum on Fedora, as a superuser (or sudo) do:

yum install systemtap kernel-devel

We’ll also need to install the kernel debuginfo packages. It is an important point to stress that  as your kernel updates, you also need to keep the debuginfo packages up to date as well. This caught me a few times, producing unreliable/inaccurate results when a mistmatch occured. To install the debuginfo:

 yum --enablerepo=updates-debuginfo install kernel-debuginfo

This is different than noted on the site. The yum command on the Getting Started Guide also enables my rawhide repo, and it installed the rawhide kernel debuginfo.  Your experience may vary.

And that is it. This will install the last release. And that’s ok. But … if I’m going to participate, I prefer to be at the leading edge. So I’ll be brave, and go straight to the source. Will need git for this, so install that  first:

yum install git

To get the source type this into a shell where you wish to fetch the Systemtap repo.

git clone git://

Tomorrow I’ll write about building systemtap and running the examples

Hardware Watchpoints and Frysk 0.3

There is some beta/experimental hardware watchpoint code in Frysk 0.3. Give it a try and file bugs. Use the “watch” command from fhpd to access it. Also note these are purely hardware watchpoints, so sizes are 1, 2 and 4 bytes  (and 8 bytes on x8664). Teresa is working on some code for 0.4 that allows chaining of watchpoints together to watch bigger spaces.

Copyright © Phil Muldoon

Built on Notes Blog Core
Powered by WordPress