VMware debugging II: "Hardware" debugging

A few days ago I wrote an article about debugging the OS X kernel with VMware and GDB, using Apple’s Kernel Debugger Protocol (KDP). There is another method of debugging XNU that is worth mentioning - VMware Fusion’s built in debug server. This is the virtual equivalent of a hardware debugger on a physical machine. According to a VMware engineer:

… when you stop execution, all cores are halted, the guest doesn’t even know that time has stopped, and you can happily single-step interrupt handlers, exceptions, etc.

This is pretty awesome, and has a few advantages over KDP:

  • It’s easier to break into the debugger - you can use the normal ^C method from the GDB session, rather than having to either insert int 3’s into your code or insert breakpoints on predictable function calls like kext_alloc() when you attach the debugger at boot time.
  • It’s faster - KDP works over UDP and seems to have a few timing issues where it drops packets or the target kernel doesn’t respond in time (particularly in the more complex kgmacros commands), whereas the VMware debug stub seems to be substantially faster and (so far) more reliable.
  • You can debug anything from the time the VM is powered on - this means that you can debug non-DEBUG XNU kernels, along with EFI stuff, the bootloader (boot.efi), whatever you want.

VMware setup

Getting this going is pretty easy, it just requires a couple of config options to be added to the .vmx file for your virtual machine. For example, if you have a VM called Lion.vmwarevm there’ll be a file inside called Lion.vmx which contains the configuration for the VM. Add the following lines (while the VM is not running):

debugStub.listen.guest32 = "TRUE"
debugStub.listen.guest64 = "TRUE"

The debug stub listens on the loopback interface on the Mac OS X host OS on which Fusion is running. If you want to debug from another machine (or VM) you need to enable the ‘remote’ listener in the .vmx file instead of (or as well as) the local listener:

debugStub.listen.guest32.remote = "TRUE"
debugStub.listen.guest64.remote = "TRUE"

Using this method you can connect to the debug stub from an instance of the FSF version of GDB on a Linux box.

That’s it, start up the VM. If you’re using a VM with a DEBUG kernel and you’ve set the boot-args variable in NVRAM to contain debug=0x1, as per the previous article, you will need to attach another instance of GDB via KDP at this point and continue in that instance to let the boot process finish.

GDB

I’ve found that if you try to connect to the debug stub without loading a file to debug you get errors like this:

[New thread 1]
Remote register badly formatted: T05thread:00000001;06:10d3fc7f00000000;07:c0d2fc7f00000000;10:8a18a07d00000000;
here: 0000000;07:c0d2fc7f00000000;10:8a18a07d00000000;

So start up GDB with whatever you’re intending to debug. In this example, the DEBUG kernel that is installed on the VM:

$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel

If you’re debugging a 32-bit VM on a 64-bit machine, you’ll need to set the architecture:

gdb$ set architecture i386

Or, if you are debugging 64-bit on 64-bit and have trouble connecting to the debug stub, you may need to explicitly set it to 64-bit:

gdb$ set architecture i386:x86-64

If you’re debugging a 64-bit VM, connect to the 64-bit debug stub:

gdb$ target remote localhost:8864

Or the 32-bit debug stub for a 32-bit VM:

gdb$ target remote localhost:8832

At this point you should be connected to the debug stub, and the VM should be paused. You’ll see a dark translucent version of the ‘play’ button used to start the VM on the VM console (indicating the VM is paused and the debugger has control), and something like this in GDB:

[New thread 1]
warning: Error 268435459 getting port names from mach_port_names
[Switching to process 1 thread 0x0]
0xffffff80008bf4c2 in tweak_crypt_group ()
gdb$

tweak_crypt_group() - heh. My VM is encrypting its disk at the moment.

Now you’re in familiar territory:

gdb$ source /Volumes/KernelDebugKit/kgmacros 
Loading Kernel GDB Macros package.  Type "help kgm" for more info.
gdb$ bt
#0  0xffffff7f817315b4 in ?? ()
#1  0xffffff7f8172343e in ?? ()
#2  0xffffff7f81724f68 in ?? ()
#3  0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4  0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
#5  0xffffff8000257060 in thread_select_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1793
#6  0xffffff8000256d8e in thread_select (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1728
#7  0xffffff8000258bbf in thread_block_reason (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>, parameter=0x0, reason=0x0) at sched_prim.c:2396
#8  0xffffff8000258cbc in thread_block (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>) at sched_prim.c:2415
#9  0xffffff8000227357 in ipc_mqueue_receive (mqueue=0xffffff8008854728, option=0x7000006, max_size=0xc00, rcv_timeout=0xffffffff, interruptible=0x2) at ipc_mqueue.c:698
#10 0xffffff8000237542 in mach_msg_overwrite_trap (args=0xffffff800872b804) at mach_msg.c:528
#11 0xffffff80002375b4 in mach_msg_trap (args=0xffffff800872b804) at mach_msg.c:554
#12 0xffffff8000354a01 in mach_call_munger64 (state=0xffffff800872b800) at bsd_i386.c:534
gdb$ showalltasks
task                vm_map              ipc_space          #acts   pid  process             io_policy    wq_state   command
0xffffff80067ac938  0xffffff800249ee98  0xffffff80066ebdb0    60     0  0xffffff8000cb4c20                          kernel_task
0xffffff80067ac5a0  0xffffff800249e200  0xffffff80066ebd10     3     1  0xffffff8007576820                          launchd
0xffffff80067ac208  0xffffff800249e010  0xffffff80066ebc70     1     2  0xffffff80075763d0                          launchctl
0xffffff80067ab740  0xffffff800249e108  0xffffff80066eba90     3    10  0xffffff80075756e0                2  1  0   kextd
0xffffff80067abe70  0xffffff8007003568  0xffffff80066ebbd0     3    11  0xffffff8007575f80                1  0  0   UserEventAgent
0xffffff80067abad8  0xffffff8007e692f8  0xffffff80066ebb30     3    12  0xffffff8007575b30                1  0  0   mDNSResponder
<snip>

Don’t forget you can just ^C to drop back into the debuggger just like back in the good old userland days:

gdb$ c
^C
Program received signal SIGINT, Interrupt.
0xffffff7f817315b4 in ?? ()
gdb$ bt
#0  0xffffff7f817315b4 in ?? ()
#1  0xffffff7f8172343e in ?? ()
#2  0xffffff7f81724f68 in ?? ()
#3  0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4  0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
<snip>

Enjoy.