Resolving kernel symbols

KXLD doesn’t like us much. He has KPIs to meet and doesn’t have time to help out shifty rootkit developers. KPIs are Kernel Programming Interfaces - lists of symbols in the kernel that KXLD (the kernel extension linker) will allow kexts to be linked against. The KPIs on which your kext depends are specified in the Info.plist file like this:

<key>OSBundleLibraries</key>
<dict>
	<key>com.apple.kpi.bsd</key>
	<string>11.0</string>
	<key>com.apple.kpi.libkern</key>
	<string>11.0</string>
	<key>com.apple.kpi.mach</key>
	<string>11.0</string>
	<key>com.apple.kpi.unsupported</key>
	<string>11.0</string>
	<key>com.apple.kpi.iokit</key>
	<string>11.0</string>
	<key>com.apple.kpi.dsep</key>
	<string>11.0</string>
</dict>

Those bundle identifiers correspond to the CFBundleIdentifier key specified in the Info.plist files for “plug-ins” to the System.kext kernel extension. Each KPI has its own plug-in kext - for example, the com.apple.kpi.bsd symbol table lives in BSDKernel.kext. These aren’t exactly complete kexts, they’re just Mach-O binaries with symbol tables full of undefined symbols (they really reside within the kernel image), which you can see if we dump the load commands:

$ otool -l /System/Library/Extensions/System.kext/PlugIns/BSDKernel.kext/BSDKernel 
/System/Library/Extensions/System.kext/PlugIns/BSDKernel.kext/BSDKernel:
Load command 0
     cmd LC_SYMTAB
 cmdsize 24
  symoff 80
   nsyms 830
  stroff 13360
 strsize 13324
Load command 1
     cmd LC_UUID
 cmdsize 24
    uuid B171D4B0-AC45-47FC-8098-5B2F89B474E6

That’s it - just the LC_SYMTAB (symbol table). So, how many symbols are there in the kernel image?

$ nm /mach_kernel|wc -l
   16122

Surely all the symbols in all the KPI symbol tables add up to the same number, right?

$ find /System/Library/Extensions/System.kext/PlugIns -type f|grep -v plist|xargs nm|sort|uniq|wc -l
    7677

Nope. Apple doesn’t want us to play with a whole bunch of their toys. 8445 of them. Some of them are pretty fun too :( Like allproc:

$ nm /mach_kernel|grep allproc
ffffff80008d9e40 S _allproc
$ find /System/Library/Extensions/System.kext/PlugIns -type f|grep -v plist|xargs nm|sort|uniq|grep allproc
$ 

Damn. The allproc symbol is the head of the kernel’s list (the queue(3) kind of list) of running processes. It’s what gets queried when you run ps(1) or top(1). Why do we want to find allproc? If we want to hide processes in a kernel rootkit that’s the best place to start. So, what happens if we build a kernel extension that imports allproc and try to load it?

bash-3.2# kextload AllProcRocks.kext
/Users/admin/AllProcRocks.kext failed to load - (libkern/kext) link error; check the system/kernel logs for errors or try kextutil(8).

Console says:

25/02/12 6:30:47.000 PM kernel: kxld[ax.ho.kext.AllProcRocks]: The following symbols are unresolved for this kext:
25/02/12 6:30:47.000 PM kernel: kxld[ax.ho.kext.AllProcRocks]: 	_allproc

OK, whatever.

What do we do?

There are a few steps that we need to take in order to resolve symbols in the kernel (or any other Mach-O binary):

  • Find the __LINKEDIT segment - this contains an array of struct nlist_64’s which represent all the symbols in the symbol table, and an array of symbol name strings.
  • Find the LC_SYMTAB load command - this contains the offsets within the file of the symbol and string tables.
  • Calculate the position of the string table within __LINKEDIT based on the offsets in the LC_SYMTAB load command.
  • Iterate through the struct nlist_64’s in __LINKEDIT, comparing the corresponding string in the string table to the name of the symbol we’re looking for until we find it (or reach the end of the symbol table).
  • Grab the address of the symbol from the struct nlist_64 we’ve found.

Parse the load commands

One easy way to look at the symbol table would be to read the kernel file on disk at /mach_kernel, but we can do better than that if we’re already in the kernel - the kernel image is loaded into memory at a known address. If we have a look at the load commands for the kernel binary:

$ otool -l /mach_kernel
/mach_kernel:
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 472
  segname __TEXT
   vmaddr 0xffffff8000200000
   vmsize 0x000000000052f000
  fileoff 0
 filesize 5435392
  maxprot 0x00000007
 initprot 0x00000005
   nsects 5
    flags 0x0
<snip>

We can see that the vmaddr field of the first segment is 0xffffff8000200000. If we fire up GDB and point it at a VM running Mac OS X (as per my previous posts here and here), we can see the start of the Mach-O header in memory at this address:

gdb$ x/xw 0xffffff8000200000
0xffffff8000200000:	0xfeedfacf

0xfeedfacf is the magic number denoting a 64-bit Mach-O image (the 32-bit version is 0xfeedface). We can actually display this as a struct if we’re using the DEBUG kernel with all the DWARF info:

gdb$ print *(struct mach_header_64 *)0xffffff8000200000
$1 = {
  magic = 0xfeedfacf, 
  cputype = 0x1000007, 
  cpusubtype = 0x3, 
  filetype = 0x2, 
  ncmds = 0x12, 
  sizeofcmds = 0x1010, 
  flags = 0x1, 
  reserved = 0x0
}

The mach_header and mach_header_64 structs (along with the other Mach-O-related structs mentioned in this post) are documented in the Mach-O File Format Reference, but we aren’t particularly interested in the header at the moment. I recommend having a look at the kernel image with MachOView to get the gist of where everything is and how it’s laid out.

Directly following the Mach-O header is the first load command:

gdb$ set $mh=(struct mach_header_64 *)0xffffff8000200000
gdb$ print *(struct load_command*)((void *)$mh + sizeof(struct mach_header_64))
$6 = {
  cmd = 0x19, 
  cmdsize = 0x1d8
}

This is the load command for the first __TEXT segment we saw with otool. We can cast it as a segment_command_64 in GDB and have a look:

gdb$ set $lc=((void *)$mh + sizeof(struct mach_header_64))
gdb$ print *(struct segment_command_64 *)$lc
$7 = {
  cmd = 0x19, 
  cmdsize = 0x1d8, 
  segname = "__TEXT\000\000\000\000\000\000\000\000\000", 
  vmaddr = 0xffffff8000200000, 
  vmsize = 0x8c8000, 
  fileoff = 0x0, 
  filesize = 0x8c8000, 
  maxprot = 0x7, 
  initprot = 0x5, 
  nsects = 0x5, 
  flags = 0x0
}

This isn’t the load command we are looking for, so we have to iterate through all of them until we come across a segment with cmd of 0x19 (LC_SEGMENT_64) and segname of __LINKEDIT. In the debug kernel, this happens to be located at 0xffffff8000200e68:

gdb$ set $lc=0xffffff8000200e68
gdb$ print *(struct load_command*)$lc   
$14 = {
  cmd = 0x19, 
  cmdsize = 0x48
}
gdb$ print *(struct segment_command_64*)$lc
$16 = {
  cmd = 0x19, 
  cmdsize = 0x48, 
  segname = "__LINKEDIT\000\000\000\000\000", 
  vmaddr = 0xffffff8000d08000, 
  vmsize = 0x109468, 
  fileoff = 0xaf4698, 
  filesize = 0x109468, 
  maxprot = 0x7, 
  initprot = 0x1, 
  nsects = 0x0, 
  flags = 0x0
}

Then we grab the vmaddr field from the load command, which specifies the address at which the __LINKEDIT segment’s data will be located:

gdb$ set $linkedit=((struct segment_command_64*)$lc)->vmaddr
gdb$ print $linkedit
$19 = 0xffffff8000d08000
gdb$ print *(struct nlist_64 *)$linkedit
$20 = {
  n_un = {
    n_strx = 0x68a29
  }, 
  n_type = 0xe, 
  n_sect = 0x1, 
  n_desc = 0x0, 
  n_value = 0xffffff800020a870
}

And there’s the first struct nlist_64.

As for the LC_SYMTAB load command, we just need to iterate through the load commands until we find one with the cmd field value of 0x02 (LC_SYMTAB). In this case, it’s located at 0xffffff8000200eb0:

gdb$ set $symtab=*(struct symtab_command*)0xffffff8000200eb0
gdb$ print $symtab
$23 = {
  cmd = 0x2, 
  cmdsize = 0x18, 
  symoff = 0xaf4698, 
  nsyms = 0x699d, 
  stroff = 0xb5e068, 
  strsize = 0x9fa98
}

The useful parts here are the symoff field, which specifies the offset in the file to the symbol table (start of the __LINKEDIT segment), and the stroff field, which specifies the offset in the file to the string table (somewhere in the middle of the __LINKEDIT segment). Why, you ask, did we need to find the __LINKEDIT segment as well, since we have the offset here in the LC_SYMTAB command? If we were looking at the file on disk we wouldn’t have needed to, but as the kernel image we’re inspecting has already been loaded into memory, the binary segments have been loaded at the virtual memory addresses specified in their load commands. This means that the symoff and stroff fields are not correct any more. However, they’re still useful, as the difference between the two helps us figure out the offset into the __LINKEDIT segment at which the string table exists:

gdb$ print $linkedit
$24 = 0xffffff8000d08000
gdb$ print $linkedit + ($symtab->stroff - $symtab->symoff)
$25 = 0xffffff8000d719d0
gdb$ set $strtab=$linkedit + ($symtab->stroff - $symtab->symoff)
gdb$ x/16s $strtab
0xffffff8000d719d0:	 ""
0xffffff8000d719d1:	 ""
0xffffff8000d719d2:	 ""
0xffffff8000d719d3:	 ""
0xffffff8000d719d4:	 ".constructors_used"
0xffffff8000d719e7:	 ".destructors_used"
0xffffff8000d719f9:	 "_AddFileExtent"
0xffffff8000d71a08:	 "_AllocateNode"
0xffffff8000d71a16:	 "_Assert"
0xffffff8000d71a1e:	 "_BF_decrypt"
0xffffff8000d71a2a:	 "_BF_encrypt"
0xffffff8000d71a36:	 "_BF_set_key"
0xffffff8000d71a42:	 "_BTClosePath"
0xffffff8000d71a4f:	 "_BTDeleteRecord"
0xffffff8000d71a5f:	 "_BTFlushPath"
0xffffff8000d71a6c:	 "_BTGetInformation"

Actually finding some symbols

Now that we know where the symbol table and string table live, we can get on to the srs bznz. So, let’s find that damn _allproc symbol we need. Have a look at that first struct nlist_64 again:

gdb$ print *(struct nlist_64 *)$linkedit
$28 = {
  n_un = {
    n_strx = 0x68a29
  }, 
  n_type = 0xe, 
  n_sect = 0x1, 
  n_desc = 0x0, 
  n_value = 0xffffff800020a870
}

The n_un.nstrx field there specifies the offset into the string table at which the string corresponding to this symbol exists. If we add that offset to the address at which the string table starts, we’ll see the symbol name:

gdb$ x/s $strtab + ((struct nlist_64 *)$linkedit)->n_un.n_strx
0xffffff8000dda3f9:	 "_ps_vnode_trim_init"

Now all we need to do is iterate through all the struct nlist_64’s until we find the one with the matching name. In this case it’s at 0xffffff8000d482a0:

gdb$ set $nlist=0xffffff8000d482a0
gdb$ print *(struct nlist_64*)$nlist
$31 = {
  n_un = {
    n_strx = 0x35a07
  }, 
  n_type = 0xf, 
  n_sect = 0xb, 
  n_desc = 0x0, 
  n_value = 0xffffff8000cb5ca0
}
gdb$ x/s $strtab + ((struct nlist_64 *)$nlist)->n_un.n_strx
0xffffff8000da73d7:	 "_allproc"

The n_value field there (0xffffff8000cb5ca0) is the virtual memory address at which the symbol’s data/code exists. _allproc is not a great example as it’s a piece of data, rather than a function, so let’s try it with a function:

gdb$ set $nlist=0xffffff8000d618f0
gdb$ print *(struct nlist_64*)$nlist
$32 = {
  n_un = {
    n_strx = 0x52ed3
  }, 
  n_type = 0xf, 
  n_sect = 0x1, 
  n_desc = 0x0, 
  n_value = 0xffffff80007cceb0
}
gdb$ x/s $strtab + ((struct nlist_64 *)$nlist)->n_un.n_strx
0xffffff8000dc48a3:	 "_proc_lock"

If we disassemble a few instructions at that address:

gdb$ x/12i 0xffffff80007cceb0
0xffffff80007cceb0 <proc_lock>:	push   rbp
0xffffff80007cceb1 <proc_lock+1>:	mov    rbp,rsp
0xffffff80007cceb4 <proc_lock+4>:	sub    rsp,0x10
0xffffff80007cceb8 <proc_lock+8>:	mov    QWORD PTR [rbp-0x8],rdi
0xffffff80007ccebc <proc_lock+12>:	mov    rax,QWORD PTR [rbp-0x8]
0xffffff80007ccec0 <proc_lock+16>:	mov    rcx,0x50
0xffffff80007cceca <proc_lock+26>:	add    rax,rcx
0xffffff80007ccecd <proc_lock+29>:	mov    rdi,rax
0xffffff80007cced0 <proc_lock+32>:	call   0xffffff800035d270 <lck_mtx_lock>
0xffffff80007cced5 <proc_lock+37>:	add    rsp,0x10
0xffffff80007cced9 <proc_lock+41>:	pop    rbp
0xffffff80007cceda <proc_lock+42>:	ret

We can see that GDB has resolved the symbol for us, and we’re right on the money.

Sample code

I’ve posted an example kernel extension on github to check out. When we load it with kextload KernelResolver.kext, we should see something like this on the console:

25/02/12 8:06:49.000 PM kernel: [+] _allproc @ 0xffffff8000cb5ca0
25/02/12 8:06:49.000 PM kernel: [+] _proc_lock @ 0xffffff80007cceb0
25/02/12 8:06:49.000 PM kernel: [+] _kauth_cred_setuidgid @ 0xffffff80007abbb0
25/02/12 8:06:49.000 PM kernel: [+] __ZN6OSKext13loadFromMkextEjPcjPS0_Pj @ 0xffffff80008f8606

Update: It was brought to my attention that I was using a debug kernel in these examples. Just to be clear - the method described in this post, as well as the sample code, works on a non-debug, default install >=10.7.0 (xnu-1699.22.73) kernel as well, but the GDB inspection probably won’t (unless you load up the struct definitions etc, as they are all stored in the DEBUG kernel). The debug kernel contains every symbol from the source, whereas many symbols are stripped from the distribution kernel (e.g. sLoadedKexts). Previously (before 10.7), the kernel would write out the symbol table to a file on disk and jettison it from memory altogether. I suppose when kernel extensions were loaded, kextd or kextload would resolve symbols from within that on-disk symbol table or from the on-disk kernel image. These days the symbol table memory is just marked as pageable, so it can potentially get paged out if the system is short of memory.

I hope somebody finds this useful. Shoot me an email or get at me on twitter if you have any questions. I’ll probably sort out comments for this blog at some point, but I cbf at the moment.


VMware debugging II: "Hardware" debugging

A few days ago I wrote an article about debugging the OS X kernel with VMware and GDB, using Apple’s Kernel Debugger Protocol (KDP). There is another method of debugging XNU that is worth mentioning - VMware Fusion’s built in debug server. This is the virtual equivalent of a hardware debugger on a physical machine. According to a VMware engineer:

… when you stop execution, all cores are halted, the guest doesn’t even know that time has stopped, and you can happily single-step interrupt handlers, exceptions, etc.

This is pretty awesome, and has a few advantages over KDP:

  • It’s easier to break into the debugger - you can use the normal ^C method from the GDB session, rather than having to either insert int 3’s into your code or insert breakpoints on predictable function calls like kext_alloc() when you attach the debugger at boot time.
  • It’s faster - KDP works over UDP and seems to have a few timing issues where it drops packets or the target kernel doesn’t respond in time (particularly in the more complex kgmacros commands), whereas the VMware debug stub seems to be substantially faster and (so far) more reliable.
  • You can debug anything from the time the VM is powered on - this means that you can debug non-DEBUG XNU kernels, along with EFI stuff, the bootloader (boot.efi), whatever you want.

VMware setup

Getting this going is pretty easy, it just requires a couple of config options to be added to the .vmx file for your virtual machine. For example, if you have a VM called Lion.vmwarevm there’ll be a file inside called Lion.vmx which contains the configuration for the VM. Add the following lines (while the VM is not running):

debugStub.listen.guest32 = "TRUE"
debugStub.listen.guest64 = "TRUE"

The debug stub listens on the loopback interface on the Mac OS X host OS on which Fusion is running. If you want to debug from another machine (or VM) you need to enable the ‘remote’ listener in the .vmx file instead of (or as well as) the local listener:

debugStub.listen.guest32.remote = "TRUE"
debugStub.listen.guest64.remote = "TRUE"

Using this method you can connect to the debug stub from an instance of the FSF version of GDB on a Linux box.

That’s it, start up the VM. If you’re using a VM with a DEBUG kernel and you’ve set the boot-args variable in NVRAM to contain debug=0x1, as per the previous article, you will need to attach another instance of GDB via KDP at this point and continue in that instance to let the boot process finish.

GDB

I’ve found that if you try to connect to the debug stub without loading a file to debug you get errors like this:

[New thread 1]
Remote register badly formatted: T05thread:00000001;06:10d3fc7f00000000;07:c0d2fc7f00000000;10:8a18a07d00000000;
here: 0000000;07:c0d2fc7f00000000;10:8a18a07d00000000;

So start up GDB with whatever you’re intending to debug. In this example, the DEBUG kernel that is installed on the VM:

$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel

If you’re debugging a 32-bit VM on a 64-bit machine, you’ll need to set the architecture:

gdb$ set architecture i386

Or, if you are debugging 64-bit on 64-bit and have trouble connecting to the debug stub, you may need to explicitly set it to 64-bit:

gdb$ set architecture i386:x86-64

If you’re debugging a 64-bit VM, connect to the 64-bit debug stub:

gdb$ target remote localhost:8864

Or the 32-bit debug stub for a 32-bit VM:

gdb$ target remote localhost:8832

At this point you should be connected to the debug stub, and the VM should be paused. You’ll see a dark translucent version of the ‘play’ button used to start the VM on the VM console (indicating the VM is paused and the debugger has control), and something like this in GDB:

[New thread 1]
warning: Error 268435459 getting port names from mach_port_names
[Switching to process 1 thread 0x0]
0xffffff80008bf4c2 in tweak_crypt_group ()
gdb$

tweak_crypt_group() - heh. My VM is encrypting its disk at the moment.

Now you’re in familiar territory:

gdb$ source /Volumes/KernelDebugKit/kgmacros 
Loading Kernel GDB Macros package.  Type "help kgm" for more info.
gdb$ bt
#0  0xffffff7f817315b4 in ?? ()
#1  0xffffff7f8172343e in ?? ()
#2  0xffffff7f81724f68 in ?? ()
#3  0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4  0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
#5  0xffffff8000257060 in thread_select_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1793
#6  0xffffff8000256d8e in thread_select (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1728
#7  0xffffff8000258bbf in thread_block_reason (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>, parameter=0x0, reason=0x0) at sched_prim.c:2396
#8  0xffffff8000258cbc in thread_block (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>) at sched_prim.c:2415
#9  0xffffff8000227357 in ipc_mqueue_receive (mqueue=0xffffff8008854728, option=0x7000006, max_size=0xc00, rcv_timeout=0xffffffff, interruptible=0x2) at ipc_mqueue.c:698
#10 0xffffff8000237542 in mach_msg_overwrite_trap (args=0xffffff800872b804) at mach_msg.c:528
#11 0xffffff80002375b4 in mach_msg_trap (args=0xffffff800872b804) at mach_msg.c:554
#12 0xffffff8000354a01 in mach_call_munger64 (state=0xffffff800872b800) at bsd_i386.c:534
gdb$ showalltasks
task                vm_map              ipc_space          #acts   pid  process             io_policy    wq_state   command
0xffffff80067ac938  0xffffff800249ee98  0xffffff80066ebdb0    60     0  0xffffff8000cb4c20                          kernel_task
0xffffff80067ac5a0  0xffffff800249e200  0xffffff80066ebd10     3     1  0xffffff8007576820                          launchd
0xffffff80067ac208  0xffffff800249e010  0xffffff80066ebc70     1     2  0xffffff80075763d0                          launchctl
0xffffff80067ab740  0xffffff800249e108  0xffffff80066eba90     3    10  0xffffff80075756e0                2  1  0   kextd
0xffffff80067abe70  0xffffff8007003568  0xffffff80066ebbd0     3    11  0xffffff8007575f80                1  0  0   UserEventAgent
0xffffff80067abad8  0xffffff8007e692f8  0xffffff80066ebb30     3    12  0xffffff8007575b30                1  0  0   mDNSResponder
<snip>

Don’t forget you can just ^C to drop back into the debuggger just like back in the good old userland days:

gdb$ c
^C
Program received signal SIGINT, Interrupt.
0xffffff7f817315b4 in ?? ()
gdb$ bt
#0  0xffffff7f817315b4 in ?? ()
#1  0xffffff7f8172343e in ?? ()
#2  0xffffff7f81724f68 in ?? ()
#3  0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4  0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
<snip>

Enjoy.


Debugging the Mac OS X kernel with VMware and GDB

Edit 13 July 2013: I’ve made a couple of updates to this post to clarify a couple of things and resolve issues people have had.

fG! did a great write up here on how to set up two-machine debugging with VMware on Leopard a couple of years ago, but as a few things have changed since then and I will probably refer to this topic in future posts I thought it was worth revisiting.

Debugging kernel extensions can be a bit of a pain. printf()-debugging is the worst, and being in kernel-land, it might not be immediately obvious how to go about debugging your (or other people’s) code. Apple has long provided methods for kernel debugging via the Kernel Debugger Protocol (KDP), along with ddb, the in-kernel serial debugger. KDP is implemented in-kernel by an instance of IOKernelDebugger, and allows you to connect to the debug stub from an instance of gdb (Apple’s outdated fork only AFAIK) running on another machine connected via FireWire or Ethernet. ddb can be used to debug the running kernel from the target machine itself, but is pretty low-level and arcane. Apple suggests in the Kernel Programming Guide that you are better off using gdb for most tasks, so that’s what we’ll do.

Enter VMware

We don’t really want to use two physical machines for debugging, because who the hell uses physical boxes these days when VMs will do the job? With the release of Mac OS X 10.7 (Lion), Apple changed the EULA to allow running virtualised instances of Lion on top of an instance running on bare metal. Prior to this, only the “server” version of Mac OS X was allowed to be virtualised, and VMware ‘prevented’ the client version from being installed through some hardcoded logic in vmware-vmx (which some sneaky hackers patched). VMware Fusion 4 introduced the ability to install Mac OS X 10.7 into a VM without any dodgy hacks, just by choosing the Install Mac OS X Lion.app bundle as the installation disc.

So, the first step of the process is: install yourself a Mac OS X VM as per the VMware documentation.

Edit 13 July 2013: Once you’re done it’s probably a good idea to take a snapshot of your VM in case there are problems installing the debug kernel. Generally it’s not a problem, but it’s annoying to roll back and much easier to use a VMware snapshot.

Install the debug kernel

Once we’ve got our VM installed, we need to install the Kernel Debug Kit. This contains a version of the XNU kernel built with the DEBUG flag set, which includes the debug stubs for KDP and ddb, and a second DEBUG version with a full symbol table to load in GDB so we can use breakpoints on symbol names and not go insane. The debug kits used to live here, but it seems Apple decided they only want ADC members to be able to access them, so now they’re here (requires ADC login). Download the appropriate version for the target kernel you’re debugging in the VM (not necessarily the same as the kernel version on your host debugger machine). In this case I’m using Kernel Debug Kit 10.7.3 build 11D50. Copy this image up to the target VM, and install the debug kernel as per the instructions in the readme file:

macvm$ sudo -s
macvm# cd /
macvm# ditto /Volumes/KernelDebugKit/DEBUG_Kernel/System.kext /System/Library/Extensions/System.kext
macvm# cp -r /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel* /
macvm# chown -R root:wheel /System/Library/Extensions/System.kext /mach_kernel*
macvm# chmod -R g-w /System/Library/Extensions/System.kext /mach_kernel*
macvm# touch /System/Library/Extensions
macvm# shutdown -r now

Hopefully your VM has successfully booted with the debug kernel and no magic blue smoke has been let out.

Edit 13 July 2013: If your VM has panicked at boot time make sure you’ve allocated at least 4GB of RAM to the VM or it will not boot on newer OS X versions.

Next we need to set the kernel boot arguments to tell it to wait for a debugger connection at boot time. There are other options but, as fG! said previously, there isn’t an obvious way to generate an NMI within VMware (I haven’t really looked further into this - if there is I’d like to hear about it). In VMware Fusion 4, the proper NVRAM support means we can specify normal boot-args in NVRAM rather than the old com.apple.Boot.plist, by using the nvram utility on the target VM like this:

macvm# nvram boot-args="-v debug=0x1"

Now we’ll do a bit of config on the debug host, then reboot the VM.

Debug host config

Traditionally, two-machine debugging would either use FireWire or Ethernet. We can simulate Ethernet with the VMware network bridging.

Edit 13 July 2013: With newer versions of OS X (I’m not sure exactly when they introduced this but it definitely works on 10.8.4) you don’t actually need to do this static ARP trick any more. When the VM boots it will stop at “Waiting for remote debugger connection” after telling you its MAC and IP address. You should be able to skip the static ARP and just kdp-reattach (as below) directly to the IP address displayed here.

Grab the MAC address and IP address of your VM:

macvm$ ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	options=2b<RXCSUM,TXCSUM,VLAN_HWTAGGING,TSO4>
	ether 00:0c:29:d6:df:02 
	inet6 fe80::20c:29ff:fed6:df02%en0 prefixlen 64 scopeid 0x4 
	inet 10.0.0.15 netmask 0xffffff00 broadcast 10.0.0.255
	media: autoselect (1000baseT <full-duplex>)
	status: active

And back on your debug host, add a static ARP entry for the VM:

debughost# arp -s 10.0.0.15 0:c:29:d6:df:2
debughost# arp 10.0.0.15
macvm (10.0.0.15) at 0:c:29:d6:df:2 on en0 permanent [ethernet]

I also have an /etc/hosts entry for the VM, hence the hostname macvm.

Now we should be able to reboot the VM and it will pause waiting for the debugger connection at the start of the boot process. It used to actually say Waiting for debugger connection… or something similar in previous kernel versions, but it seems to pause after [PCI configuration begin] on 10.7.

Fire up GDB

Now it’s time to actually start GDB and connect to the KDP debug stub. Assuming you’ve just mounted the Kernel Debug Kit dmg file, the following paths should be correct. On the debug host machine:

$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Thu Nov  3 21:59:02 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...

This is contrary to the instructions in the readme file for the Kernel Debug Kit, which tells you to target /Volumes/KernelDebugKit/mach_kernel with gdb. I haven’t been able to get this kernel to work correctly - symbols are not looked up properly and lots of addresses seem to be wrong, resulting in the kgmacros stuff not working, and breakpoints being set at the wrong addresses. If you load the kernel in the DEBUG_Kernel directory it works OK.

Next, source the kgmacros file - this contains a bunch of GDB macros that make dealing with kernel introspection and debugging much easier (particularly when you want to start looking at stuff like the virtual memory subsystem, and other fun stuff):

gdb$ source /Volumes/KernelDebugKit/kgmacros 
Loading Kernel GDB Macros package.  Type "help kgm" for more info.

Note: if you’re attaching to a kernel running on a different arch (ie. you created a 32-bit VM on a 64-bit machine), you’ll need to use the --arch flag:

The –arch=i386 option allows you to use a system running the 64-bit kernel to connect to a target running the 32-bit kernel. The –arch=x86_64 option allows you to go the other direction.

Now we attach to the debug target machine:

gdb$ kdp-reattach 10.0.0.15
Connected.

Edit 13 July 2013: If you’re using a recent OS X you can kdp-reattach to the IP address that was printed when the debug kernel paused waiting for the debugger.

You can also attach using target remote-kdp and attach 10.0.0.15. Allow the kernel to continue execution:

gdb$ c

At this point the disk icon in VMware should be going blue with activity, and the VM should continue booting as normal.

Breaking into the debugger

Unfortunately, we can’t use the normal method of hitting ^C in the debugger to pause execution, so we have to rely on software breakpoints. The method fG! initially suggested was to break on tcp_connect() or something similar, so you can drop into the debugger by attempting to telnet somewhere. This proves to be a bit cumbersome in Lion with all the fancy (scary) network autodetect stuff - connections going out from agents all over the place means constantly dropping into the debugger.

The method that I have primarily used is to set a breakpoint on the kext_alloc() function. This is called once during the initialisation of a kernel extension, so it can be a reasonably useful point at which to break if you want to debug the initialisation of the kext, and a good on-demand breakpoint for general kernel memory inspection.

Edit 13 July 2013: @chicagoben pointed me at a simple method of replicating the behaviour of an NMI and dropping into the debugger using the technique in this handy kernel module.

Breaking on kext_alloc():

Breakpoint 1, kext_alloc (_addr=0xffffff804650b5f0, size=0x3000, fixed=0x0) at kext_alloc.c:107
107	in kext_alloc.c

And getting a stack trace:

gdb$ bt
#0  kext_alloc (_addr=0xffffff804650b5f0, size=0x3000, fixed=0x0) at kext_alloc.c:107
#1  0xffffff80008f4166 in kern_allocate (size=0x3000, flags=0xffffff804650b664, user_data=0xffffff80096f9880) at OSKext.cpp:408
#2  0xffffff8000922874 in allocate_kext (context=0xffffff800af06420, callback_data=0xffffff80096f9880, vmaddr_out=0xffffff804650b710, vmsize_out=0xffffff804650b708, linked_object_alloc_out=0xffffff804650b6f8) at kxld.c:468
#3  0xffffff8000921e69 in kxld_link_file (context=0xffffff800af06420, file=0xffffff8036641000 "????\a", size=0x2600, name=0xffffff8007e14a90 "ax.ho.kext.DebugTest", callback_data=0xffffff80096f9880, dependencies=0xffffff80091e4a60, ndependencies=0x6, linked_object_out=0xffffff804650b8f8, kmod_info_kern=0xffffff80096f98c8) at kxld.c:273
#4  0xffffff80008f0b55 in OSKext::loadExecutable (this=0xffffff80096f9880) at OSKext.cpp:4751
#5  0xffffff80008f3cc4 in OSKext::load (this=0xffffff80096f9880, startOpt=0x0, startMatchingOpt=0x0, personalityNames=0x0) at OSKext.cpp:4420
#6  0xffffff80008f741b in OSKext::loadKextWithIdentifier (kextIdentifier=0xffffff8007e1adf0, allowDeferFlag=0x0, delayAutounloadFlag=0x0, startOpt=0x0, startMatchingOpt=0x0, personalityNames=0x0) at OSKext.cpp:4184
#7  0xffffff80008f8c91 in OSKext::loadFromMkext (clientLogFilter=0x0, mkextBuffer=0xffffff8046362400 "MKXTMOSX", mkextBufferLength=0x2da8, logInfoOut=0xffffff804650bc30, logInfoLengthOut=0xffffff804650bc2c) at OSKext.cpp:3271
#8  0xffffff8000909f32 in kext_request (hostPriv=0xffffff8000c8bec0, clientLogSpec=0x0, requestIn=0xffffff80075c9d30, requestLengthIn=0x2da8, responseOut=0xffffff800a976918, responseLengthOut=0xffffff800a976940, logDataOut=0xffffff800a976928, logDataLengthOut=0xffffff800a976944, op_result=0xffffff800a976948) at OSKextLib.cpp:281
#9  0xffffff800028d9ab in _Xkext_request (InHeadP=0xffffff800abbec38, OutHeadP=0xffffff800a9768f4) at host_priv_server.c:5961
#10 0xffffff80002443d2 in ipc_kobject_server (request=0xffffff800abbebc0) at ipc_kobject.c:339
#11 0xffffff8000221570 in ipc_kmsg_send (kmsg=0xffffff800abbebc0, option=0x0, send_timeout=0x0) at ipc_kmsg.c:1376
#12 0xffffff8000237393 in mach_msg_overwrite_trap (args=0xffffff80067c65a4) at mach_msg.c:487
#13 0xffffff80002375b4 in mach_msg_trap (args=0xffffff80067c65a4) at mach_msg.c:554
#14 0xffffff8000354a01 in mach_call_munger64 (state=0xffffff80067c65a0) at bsd_i386.c:534

If you’re debugging a kernel extension that you are writing yourself (or have the code for) a better method of dropping into the debugger is to put an int 3 (software breakpoint) in your code at the point you want to break, like this:

kern_return_t DebugTest_start(kmod_info_t * ki, void *d)
{
    printf("hurr\n");
    asm("int $3");
    derp();
    return KERN_SUCCESS;
}

Now when we load this kext we get dropped into the debugger:

Program received signal SIGTRAP, Trace/breakpoint trap.
0xffffff7f80b2af12 in ?? ()

The call stack at this point looks somewhat similar to before, passing through the OSKext class:

gdb$ bt
#0  0xffffff7f80b27f12 in ?? ()
#1  0xffffff80008eebb4 in OSKext::start (this=0xffffff8007d37400, startDependenciesFlag=0x1) at OSKext.cpp:5456
#2  0xffffff80008f3e97 in OSKext::load (this=0xffffff8007d37400, startOpt=0x0, startMatchingOpt=0x0, personalityNames=0x0) at OSKext.cpp:4475
#3  0xffffff80008f741b in OSKext::loadKextWithIdentifier (kextIdentifier=0xffffff80068955b0, allowDeferFlag=0x0, delayAutounloadFlag=0x0, startOpt=0x0, startMatchingOpt=0x0, personalityNames=0x0) at OSKext.cpp:4184
#4  0xffffff80008f8c91 in OSKext::loadFromMkext (clientLogFilter=0x0, mkextBuffer=0xffffff804623e400 "MKXTMOSX", mkextBufferLength=0x2da8, logInfoOut=0xffffff8045c23c30, logInfoLengthOut=0xffffff8045c23c2c) at OSKext.cpp:3271
<snip>

And we can disassemble the code at and after the breakpoint:

gdb$ x/11i 0xffffff7f80b2df12 - 1
0xffffff7f80b2df11:	int3   
0xffffff7f80b2df12:	xor    cl,cl
0xffffff7f80b2df14:	mov    al,cl
0xffffff7f80b2df16:	call   0xffffff7f80b2df70
0xffffff7f80b2df1b:	mov    DWORD PTR [rbp-0x18],0x0
0xffffff7f80b2df22:	mov    eax,DWORD PTR [rbp-0x18]
0xffffff7f80b2df25:	mov    DWORD PTR [rbp-0x14],eax
0xffffff7f80b2df28:	mov    eax,DWORD PTR [rbp-0x14]
0xffffff7f80b2df2b:	add    rsp,0x20
0xffffff7f80b2df2f:	pop    rbp
0xffffff7f80b2df30:	ret

This corresponds to the following code from the binary (extracted using otool -tv):

0000000000000f11	int	$0x3
0000000000000f12	xorb	%cl,%cl
0000000000000f14	movb	%cl,%al
0000000000000f16	callq	0x00000f70
0000000000000f1b	movl	$0x00000000,0xe8(%rbp)
0000000000000f22	movl	0xe8(%rbp),%eax
0000000000000f25	movl	%eax,0xec(%rbp)
0000000000000f28	movl	0xec(%rbp),%eax
0000000000000f2b	addq	$0x20,%rsp
0000000000000f2f	popq	%rbp
0000000000000f30	ret

Poking around in kernel memory

Let’s check out a few neat things in memory. The start of the Mach-O header for the kernel image in memory:

gdb$ x/x 0xffffff8000200000
0xffffff8000200000:	0xfeedfacf

This is the “magic number” indicating a 64-bit Mach-O executable. The 32-bit version is 0xfeedface.

The “system verification code”:

gdb$ x/s 0xffffff8000002000
0xffffff8000002000:	 "Catfish "

On previous PowerPC versions of the OS this was located at 0x5000 and said "Hagfish ". Here is the corresponding assembly source from osfmk/x86_64/lowmem_vectors.s in the kernel source tree:

/* 
 * on x86_64 the low mem vectors live here and get mapped to 0xffffff8000200000 at
 * system startup time
 */

	.text
	.align	12
	.globl	EXT(lowGlo)
EXT(lowGlo):

	.ascii "Catfish "	/* +0x000 System verification code */

Interestingly, that comment appears to be incorrect - 0xffffff8000200000 is where the kernel image itself starts and the stuff in lowmem_vectors.s starts at 0xffffff8000002000 as we’ve seen.

If you’re interested in kernel internals (which you probably are if you’re reading this) then you might want to have a look at the kgmacros help at this point:

gdb$ help kgm
| These are the kernel gdb macros.  These gdb macros are intended to be
| used when debugging a remote kernel via the kdp protocol.  Typically, you
| would connect to your remote target like so:
| 		 (gdb) target remote-kdp
| 		 (gdb) attach <name-of-remote-host>
<snip>

There’s heaps of cool and useful stuff there to look at.

Listing the process tree by walking the list from allproc down:

gdb$ showproctree
PID   PROCESS       POINTER]
===   =======       =======
0    kernel_task      [ 0xffffff80073e8820 ]
|--1    launchd      [ 0xffffff80073e8820 ]
|  |--163    xpchelper      [ 0xffffff800912a9f0 ] 
|  |--158    launchd    [ 0xffffff8007c65e40 ]
|  |  |--162    distnoted      [ 0xffffff80081f8010 ] 
|  |  |--161    mdworker    [ 0xffffff80073e83d0 ]
|  |--157    mdworker    [ 0xffffff80082c6010 ]
|  |--139    com.apple.dock.e    [ 0xffffff800912ae40 ]
|  |--138    filecoordination    [ 0xffffff800912b290 ]
|  |--111    xpchelper    [ 0xffffff8007c66f80 ]
|  |--106    launchdadd    [ 0xffffff80081fa290 ]
|  |--104    launchd    [ 0xffffff80082c86e0 ]
<snip>

Print the struct proc (kernel version, not the userland one) for the kernel task:

gdb$ print *(struct proc *)0xffffff80073e8820
$4 = {
  p_list = {
    le_next = 0xffffff8000cb4c20, 
    le_prev = 0xffffff80073e76e0
  }, 
  p_pid = 0x1, 
  task = 0xffffff80067c25a0, 
  p_pptr = 0xffffff8000cb4c20, 
  p_ppid = 0x0, 
  p_pgrpid = 0x1, 
  p_uid = 0x0, 
  p_gid = 0x0, 
  <snip>

Have a poke around and see what you can find.

Source-level debugging

Now that we’ve explored kernel memory a bit, it’s probably worth noting that you can use the kernel source for source-level debugging within GDB, or possibly even in Xcode (anybody done this?). Some of the documentation seems to be a bit out of date on this - e.g. the Kernel Programming Guide references a .gdbinit file defined in the osfmk directory (the Mach part of the kernel) which no longer exists, and previous documentation mentions creation of a /SourceCache/xnu/... directory for source-level debugging, but this trick doesn’t seem to work any more. It seems that these days the kernel debug symbol information relates only to filename and line number, not full file path, like this:

Breakpoint 1, kext_alloc (_addr=0xffffff80463735f0, size=0x3000, fixed=0x0) at kext_alloc.c:107
107	kext_alloc.c: No such file or directory.

We can still load source code on a per-directory basis if we know where the file in question is located. In this instance it’s osfmk/kern/kext_alloc.c within the kernel source tree, we’ll do this:

gdb$ dir /path/to/xnu-1699.24.23/osfmk/kern/

And magic:

gdb$ l
102	}
103	
104	kern_return_t
105	kext_alloc(vm_offset_t *_addr, vm_size_t size, boolean_t fixed)
106	{
107	    kern_return_t rval = 0;
108	    mach_vm_offset_t addr = (fixed) ? *_addr : kext_alloc_base;
109	    int flags = (fixed) ? VM_FLAGS_FIXED : VM_FLAGS_ANYWHERE;
110	 
111	    /* Allocate the kext virtual memory */

Go grab yourself a copy of the source for your kernel version at opensource.apple.com and give it a try.

So, yeah…

Have fun.