A few days ago I wrote an article about debugging the OS X kernel with VMware and GDB, using Apple’s Kernel Debugger Protocol (KDP). There is another method of debugging XNU that is worth mentioning - VMware Fusion’s built in debug server. This is the virtual equivalent of a hardware debugger on a physical machine. According to a VMware engineer:
… when you stop execution, all cores are halted, the guest doesn’t even know that time has stopped, and you can happily single-step interrupt handlers, exceptions, etc.
This is pretty awesome, and has a few advantages over KDP:
- It’s easier to break into the debugger - you can use the normal
^Cmethod from the GDB session, rather than having to either insert
int 3’s into your code or insert breakpoints on predictable function calls like
kext_alloc()when you attach the debugger at boot time.
- It’s faster - KDP works over UDP and seems to have a few timing issues where it drops packets or the target kernel doesn’t respond in time (particularly in the more complex
kgmacroscommands), whereas the VMware debug stub seems to be substantially faster and (so far) more reliable.
- You can debug anything from the time the VM is powered on - this means that you can debug non-
DEBUGXNU kernels, along with EFI stuff, the bootloader (
boot.efi), whatever you want.
Getting this going is pretty easy, it just requires a couple of config options to be added to the
.vmx file for your virtual machine. For example, if you have a VM called Lion.vmwarevm there’ll be a file inside called Lion.vmx which contains the configuration for the VM. Add the following lines (while the VM is not running):
debugStub.listen.guest32 = "TRUE" debugStub.listen.guest64 = "TRUE"
The debug stub listens on the loopback interface on the Mac OS X host OS on which Fusion is running. If you want to debug from another machine (or VM) you need to enable the ‘remote’ listener in the
.vmx file instead of (or as well as) the local listener:
debugStub.listen.guest32.remote = "TRUE" debugStub.listen.guest64.remote = "TRUE"
Using this method you can connect to the debug stub from an instance of the FSF version of GDB on a Linux box.
That’s it, start up the VM. If you’re using a VM with a
DEBUG kernel and you’ve set the
boot-args variable in NVRAM to contain
debug=0x1, as per the previous article, you will need to attach another instance of GDB via KDP at this point and
continue in that instance to let the boot process finish.
I’ve found that if you try to connect to the debug stub without loading a file to debug you get errors like this:
[New thread 1] Remote register badly formatted: T05thread:00000001;06:10d3fc7f00000000;07:c0d2fc7f00000000;10:8a18a07d00000000; here: 0000000;07:c0d2fc7f00000000;10:8a18a07d00000000;
So start up GDB with whatever you’re intending to debug. In this example, the
DEBUG kernel that is installed on the VM:
$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel
If you’re debugging a 32-bit VM on a 64-bit machine, you’ll need to set the architecture:
gdb$ set architecture i386
Or, if you are debugging 64-bit on 64-bit and have trouble connecting to the debug stub, you may need to explicitly set it to 64-bit:
gdb$ set architecture i386:x86-64
If you’re debugging a 64-bit VM, connect to the 64-bit debug stub:
gdb$ target remote localhost:8864
Or the 32-bit debug stub for a 32-bit VM:
gdb$ target remote localhost:8832
At this point you should be connected to the debug stub, and the VM should be paused. You’ll see a dark translucent version of the ‘play’ button used to start the VM on the VM console (indicating the VM is paused and the debugger has control), and something like this in GDB:
[New thread 1] warning: Error 268435459 getting port names from mach_port_names [Switching to process 1 thread 0x0] 0xffffff80008bf4c2 in tweak_crypt_group () gdb$
tweak_crypt_group() - heh. My VM is encrypting its disk at the moment.
Now you’re in familiar territory:
gdb$ source /Volumes/KernelDebugKit/kgmacros Loading Kernel GDB Macros package. Type "help kgm" for more info. gdb$ bt #0 0xffffff7f817315b4 in ?? () #1 0xffffff7f8172343e in ?? () #2 0xffffff7f81724f68 in ?? () #3 0xffffff8000379b18 in machine_idle () at pmCPU.c:107 #4 0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928 #5 0xffffff8000257060 in thread_select_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1793 #6 0xffffff8000256d8e in thread_select (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1728 #7 0xffffff8000258bbf in thread_block_reason (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>, parameter=0x0, reason=0x0) at sched_prim.c:2396 #8 0xffffff8000258cbc in thread_block (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>) at sched_prim.c:2415 #9 0xffffff8000227357 in ipc_mqueue_receive (mqueue=0xffffff8008854728, option=0x7000006, max_size=0xc00, rcv_timeout=0xffffffff, interruptible=0x2) at ipc_mqueue.c:698 #10 0xffffff8000237542 in mach_msg_overwrite_trap (args=0xffffff800872b804) at mach_msg.c:528 #11 0xffffff80002375b4 in mach_msg_trap (args=0xffffff800872b804) at mach_msg.c:554 #12 0xffffff8000354a01 in mach_call_munger64 (state=0xffffff800872b800) at bsd_i386.c:534 gdb$ showalltasks task vm_map ipc_space #acts pid process io_policy wq_state command 0xffffff80067ac938 0xffffff800249ee98 0xffffff80066ebdb0 60 0 0xffffff8000cb4c20 kernel_task 0xffffff80067ac5a0 0xffffff800249e200 0xffffff80066ebd10 3 1 0xffffff8007576820 launchd 0xffffff80067ac208 0xffffff800249e010 0xffffff80066ebc70 1 2 0xffffff80075763d0 launchctl 0xffffff80067ab740 0xffffff800249e108 0xffffff80066eba90 3 10 0xffffff80075756e0 2 1 0 kextd 0xffffff80067abe70 0xffffff8007003568 0xffffff80066ebbd0 3 11 0xffffff8007575f80 1 0 0 UserEventAgent 0xffffff80067abad8 0xffffff8007e692f8 0xffffff80066ebb30 3 12 0xffffff8007575b30 1 0 0 mDNSResponder <snip>
Don’t forget you can just
^C to drop back into the debuggger just like back in the good old userland days:
gdb$ c ^C Program received signal SIGINT, Interrupt. 0xffffff7f817315b4 in ?? () gdb$ bt #0 0xffffff7f817315b4 in ?? () #1 0xffffff7f8172343e in ?? () #2 0xffffff7f81724f68 in ?? () #3 0xffffff8000379b18 in machine_idle () at pmCPU.c:107 #4 0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928 <snip>