SyScan 2012 is Over
SyScan 2012 was a blast. I talked shit about EFI rootkits, which was pretty fun. My slides are uploaded here if you’re interested.
A couple of highlights for me were Brett Moore’s talk about process continuation (I’m kinda surprised IE didn’t crash spontaneously and ruin his demos), Alex Ionescu’s talk about ACPI 5.0 rootkits (Alex lost his laptop on the way over and had to rewrite his talk AND demos - still nailed it), and Stefan Esser’s talk about the iOS kernel heap (crossover with OS X kernel is very interesting to me). Oh and the chilli crab.
I’ll definitely be making an effort to get over to Singapore for SyScan 2013. Thomas Lim knows how to put on a con/party.
RIP-Relative Addressing and Kernel Payloads
The x86-64 architecture introduced a new way to generate Position-Independent Code (PIC) – RIP-relative addressing. RIP-relative addressing works by referencing data and functions by an address relative to the current instruction pointer, so that “fixups” are not needed for local functions when relocating a piece of code to a base address other than that for which it was linked. I won’t go into too much detail about load-time relocation or PIC on x86, but if you’re interested in the details I recommend reading Eli Bendersky’s excellent write ups on how load-time relocation, x86 PIC and x86_64 PIC work on Linux/ELF, as the concepts are fairly similar to how it works on OS X/Mach-O.
RIP-relative addressing became a bit of a problem for me when I was generating kernel payloads that I wanted to be able to relocate to different areas of memory. I’ll explain by way of example.
Consider the following dummy kernel extension:
#include <mach/mach_types.h>
#include <sys/systm.h>
kern_return_t TestPayload_start(kmod_info_t * ki, void *d);
kern_return_t TestPayload_stop(kmod_info_t *ki, void *d);
kern_return_t TestPayload_start(kmod_info_t * ki, void *d)
{
printf("sup\n");
return KERN_SUCCESS;
}
kern_return_t TestPayload_stop(kmod_info_t *ki, void *d)
{
return KERN_SUCCESS;
}
This is only slightly modified from the default code that is generated when we create a new Kernel Extension project in Xcode – I just added the printf() and relevant #include. If we compile this in the normal way with Xcode, and disassemble the executable:
$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload
_TestPayload_start:
0000000000000f20 pushq %rbp
0000000000000f21 movq %rsp,%rbp
0000000000000f24 subq $0x20,%rsp
0000000000000f28 movq %rdi,0xf8(%rbp)
0000000000000f2c movq %rsi,0xf0(%rbp)
0000000000000f30 xorb %al,%al
0000000000000f32 leaq 0x000000b3(%rip),%rcx
0000000000000f39 movq %rcx,%rdi
0000000000000f3c callq 0x00000f41
0000000000000f41 movl $0x00000000,0xe8(%rbp)
0000000000000f48 movl 0xe8(%rbp),%eax
0000000000000f4b movl %eax,0xec(%rbp)
0000000000000f4e movl 0xec(%rbp),%eax
0000000000000f51 addq $0x20,%rsp
0000000000000f55 popq %rbp
0000000000000f56 ret
0000000000000f57 nopw 0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f60 pushq %rbp
0000000000000f61 movq %rsp,%rbp
0000000000000f64 subq $0x18,%rsp
0000000000000f68 movq %rdi,0xf8(%rbp)
0000000000000f6c movq %rsi,0xf0(%rbp)
0000000000000f70 movl $0x00000000,0xe8(%rbp)
0000000000000f77 movl 0xe8(%rbp),%eax
0000000000000f7a movl %eax,0xec(%rbp)
0000000000000f7d movl 0xec(%rbp),%eax
0000000000000f80 addq $0x18,%rsp
0000000000000f84 popq %rbp
0000000000000f85 ret
<snip>
Note the callq 0x00000f41 at 0xf3c there – that’s the call to printf(). If we dump the section without disassembling:
$ otool -t TestPayload.kext/Contents/MacOS/TestPayload
TestPayload:
(__TEXT,__text) section
0000000000000f20 55 48 89 e5 48 83 ec 20 48 89 7d f8 48 89 75 f0
0000000000000f30 30 c0 48 8d 0d b3 00 00 00 48 89 cf e8 00 00 00
0000000000000f40 00 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45
0000000000000f50 ec 48 83 c4 20 5d c3 66 0f 1f 84 00 00 00 00 00
0000000000000f60 55 48 89 e5 48 83 ec 18 48 89 7d f8 48 89 75 f0
0000000000000f70 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45 ec
0000000000000f80 48 83 c4 18 5d c3 55 48 89 e5 48 8d 05 37 01 00
0000000000000f90 00 48 8b 00 48 85 c0 75 04 31 c0 5d c3 5d ff e0
0000000000000fa0 55 48 89 e5 48 8d 05 55 00 00 00 48 83 c0 10 5d
0000000000000fb0 c3 55 48 89 e5 48 8d 05 44 00 00 00 48 83 c0 50
0000000000000fc0 5d c3 55 48 89 e5 48 8d 05 33 00 00 00 8b 40 0c
0000000000000fd0 5d c3 55 48 89 e5 48 8d 05 f3 00 00 00 48 8b 00
0000000000000fe0 48 85 c0 75 04 31 c0 5d c3 5d ff e0
We can see at 0xf3c an instruction that looks like e8 00 00 00 00 – this is a RIP-relative call instruction opcode (e8), followed by the 32-bit displacement (00 00 00 00). This is supposed to be the printf() call? Well, yeah. The compiler doesn’t know the address of the printf() function in the kernel at compile time, so it puts in 0x0 as a placeholder which will be updated when the executable is loaded and linked by KXLD. So how does KXLD know that this instruction needs updating? Relocation entries. Have a look at the relocation entries for the executable:
$ otool -r TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address pcrel length extern type scattered symbolnum/value
00000f3d 1 2 1 2 0 31
<snip>
We’re only concerned about the external relocations in this instance – we can see there is only one of these, and its address (offset within the executable file) is 0xf3d. This happens to be one byte after the e8 (call) instruction – the location of the displacement value for the RIP-relative call. It’s also worth noting there that the pcrel field is 1 – indicating that this is, in fact, a RIP-relative instruction. The other fields give the linker more information about how the relocation entry should be handled. You can find more info about these fields in the ABI documentation.
So, back to my kernel payloads – I wanted to be able to move the payload around in memory without having to update the relocation entries each time, as that would require keeping the code to perform this updating within the payload. There are a few compiler options for generating slightly-more-position-independent code, but the OS X version of GCC doesn’t seem to support them. Fortunately, Clang does. If we compile with the -mcmodel=large option (by adding it to the “Other C Flags” field in the Xcode build settings), and disassemble the executable:
$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
(__TEXT,__text) section
_TestPayload_start:
0000000000000f30 pushq %rbp
0000000000000f31 movq %rsp,%rbp
0000000000000f34 subq $0x20,%rsp
0000000000000f38 movq %rdi,0xf8(%rbp)
0000000000000f3c movq %rsi,0xf0(%rbp)
0000000000000f40 xorb %al,%al
0000000000000f42 movq $0x0000000000000ff1,%rdi
0000000000000f4c movq $0x0000000000000000,%rsi
0000000000000f56 call *%rsi
0000000000000f58 movl $0x00000000,%ecx
0000000000000f5d movl %eax,0xec(%rbp)
0000000000000f60 movl %ecx,%eax
0000000000000f62 addq $0x20,%rsp
0000000000000f66 popq %rbp
0000000000000f67 ret
0000000000000f68 nopl 0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f70 pushq %rbp
0000000000000f71 movq %rsp,%rbp
0000000000000f74 subq $0x10,%rsp
0000000000000f78 movl $0x00000000,%eax
0000000000000f7d movq %rdi,0xf8(%rbp)
0000000000000f81 movq %rsi,0xf0(%rbp)
0000000000000f85 addq $0x10,%rsp
0000000000000f89 popq %rbp
0000000000000f8a ret
<snip>
Now we have a call with an absolute 64-bit address by moving the address of the function into a register and calling the value of that register. If we have a look at the relocation entries now:
$ otool -r TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address pcrel length extern type scattered symbolnum/value
00000f4e 0 3 1 0 0 31
<snip>
Notice pcrel is now 0, as it’s an absolute 64-bit address that we’re updating instead of a 32-bit displacement from RIP. This means that we can look up the address of the symbol (e.g. printf()) once when we initially load the payload, and update the relocation entry (or entries) to point to that address. Unfortunately this inflates the size of the code a bit, as all function calls are treated this way, which kind of defeats the purpose of trimming the relocation code – once we reach a certain payload size anyway. Oh well, it’s still a bit easier to handle. Next stop might be to write an LLVM pass to convert only external calls to absolute calls.
I’m not sure how useful this will be to others, but I thought it was interesting!
Resolving kernel symbols
KXLD doesn’t like us much. He has KPIs to meet and doesn’t have time to help out shifty rootkit developers. KPIs are Kernel Programming Interfaces - lists of symbols in the kernel that KXLD (the kernel extension linker) will allow kexts to be linked against. The KPIs on which your kext depends are specified in the Info.plist file like this:
<key>OSBundleLibraries</key>
<dict>
<key>com.apple.kpi.bsd</key>
<string>11.0</string>
<key>com.apple.kpi.libkern</key>
<string>11.0</string>
<key>com.apple.kpi.mach</key>
<string>11.0</string>
<key>com.apple.kpi.unsupported</key>
<string>11.0</string>
<key>com.apple.kpi.iokit</key>
<string>11.0</string>
<key>com.apple.kpi.dsep</key>
<string>11.0</string>
</dict>
Those bundle identifiers correspond to the CFBundleIdentifier key specified in the Info.plist files for “plug-ins” to the System.kext kernel extension. Each KPI has its own plug-in kext - for example, the com.apple.kpi.bsd symbol table lives in BSDKernel.kext. These aren’t exactly complete kexts, they’re just Mach-O binaries with symbol tables full of undefined symbols (they really reside within the kernel image), which you can see if we dump the load commands:
$ otool -l /System/Library/Extensions/System.kext/PlugIns/BSDKernel.kext/BSDKernel
/System/Library/Extensions/System.kext/PlugIns/BSDKernel.kext/BSDKernel:
Load command 0
cmd LC_SYMTAB
cmdsize 24
symoff 80
nsyms 830
stroff 13360
strsize 13324
Load command 1
cmd LC_UUID
cmdsize 24
uuid B171D4B0-AC45-47FC-8098-5B2F89B474E6
That’s it - just the LC_SYMTAB (symbol table). So, how many symbols are there in the kernel image?
$ nm /mach_kernel|wc -l
16122
Surely all the symbols in all the KPI symbol tables add up to the same number, right?
$ find /System/Library/Extensions/System.kext/PlugIns -type f|grep -v plist|xargs nm|sort|uniq|wc -l
7677
Nope. Apple doesn’t want us to play with a whole bunch of their toys. 8445 of them. Some of them are pretty fun too :( Like allproc:
$ nm /mach_kernel|grep allproc
ffffff80008d9e40 S _allproc
$ find /System/Library/Extensions/System.kext/PlugIns -type f|grep -v plist|xargs nm|sort|uniq|grep allproc
$
Damn. The allproc symbol is the head of the kernel’s list (the queue(3) kind of list) of running processes. It’s what gets queried when you run ps(1) or top(1). Why do we want to find allproc? If we want to hide processes in a kernel rootkit that’s the best place to start. So, what happens if we build a kernel extension that imports allproc and try to load it?
bash-3.2# kextload AllProcRocks.kext
/Users/admin/AllProcRocks.kext failed to load - (libkern/kext) link error; check the system/kernel logs for errors or try kextutil(8).
Console says:
25/02/12 6:30:47.000 PM kernel: kxld[ax.ho.kext.AllProcRocks]: The following symbols are unresolved for this kext:
25/02/12 6:30:47.000 PM kernel: kxld[ax.ho.kext.AllProcRocks]: _allproc
OK, whatever.
What do we do?
There are a few steps that we need to take in order to resolve symbols in the kernel (or any other Mach-O binary):
- Find the
__LINKEDITsegment - this contains an array ofstruct nlist_64’s which represent all the symbols in the symbol table, and an array of symbol name strings. - Find the
LC_SYMTABload command - this contains the offsets within the file of the symbol and string tables. - Calculate the position of the string table within
__LINKEDITbased on the offsets in theLC_SYMTABload command. - Iterate through the
struct nlist_64’s in__LINKEDIT, comparing the corresponding string in the string table to the name of the symbol we’re looking for until we find it (or reach the end of the symbol table). - Grab the address of the symbol from the
struct nlist_64we’ve found.
Parse the load commands
One easy way to look at the symbol table would be to read the kernel file on disk at /mach_kernel, but we can do better than that if we’re already in the kernel - the kernel image is loaded into memory at a known address. If we have a look at the load commands for the kernel binary:
$ otool -l /mach_kernel
/mach_kernel:
Load command 0
cmd LC_SEGMENT_64
cmdsize 472
segname __TEXT
vmaddr 0xffffff8000200000
vmsize 0x000000000052f000
fileoff 0
filesize 5435392
maxprot 0x00000007
initprot 0x00000005
nsects 5
flags 0x0
<snip>
We can see that the vmaddr field of the first segment is 0xffffff8000200000. If we fire up GDB and point it at a VM running Mac OS X (as per my previous posts here and here), we can see the start of the Mach-O header in memory at this address:
gdb$ x/xw 0xffffff8000200000
0xffffff8000200000: 0xfeedfacf
0xfeedfacf is the magic number denoting a 64-bit Mach-O image (the 32-bit version is 0xfeedface). We can actually display this as a struct if we’re using the DEBUG kernel with all the DWARF info:
gdb$ print *(struct mach_header_64 *)0xffffff8000200000
$1 = {
magic = 0xfeedfacf,
cputype = 0x1000007,
cpusubtype = 0x3,
filetype = 0x2,
ncmds = 0x12,
sizeofcmds = 0x1010,
flags = 0x1,
reserved = 0x0
}
The mach_header and mach_header_64 structs (along with the other Mach-O-related structs mentioned in this post) are documented in the Mach-O File Format Reference, but we aren’t particularly interested in the header at the moment. I recommend having a look at the kernel image with MachOView to get the gist of where everything is and how it’s laid out.
Directly following the Mach-O header is the first load command:
gdb$ set $mh=(struct mach_header_64 *)0xffffff8000200000
gdb$ print *(struct load_command*)((void *)$mh + sizeof(struct mach_header_64))
$6 = {
cmd = 0x19,
cmdsize = 0x1d8
}
This is the load command for the first __TEXT segment we saw with otool. We can cast it as a segment_command_64 in GDB and have a look:
gdb$ set $lc=((void *)$mh + sizeof(struct mach_header_64))
gdb$ print *(struct segment_command_64 *)$lc
$7 = {
cmd = 0x19,
cmdsize = 0x1d8,
segname = "__TEXT\000\000\000\000\000\000\000\000\000",
vmaddr = 0xffffff8000200000,
vmsize = 0x8c8000,
fileoff = 0x0,
filesize = 0x8c8000,
maxprot = 0x7,
initprot = 0x5,
nsects = 0x5,
flags = 0x0
}
This isn’t the load command we are looking for, so we have to iterate through all of them until we come across a segment with cmd of 0x19 (LC_SEGMENT_64) and segname of __LINKEDIT. In the debug kernel, this happens to be located at 0xffffff8000200e68:
gdb$ set $lc=0xffffff8000200e68
gdb$ print *(struct load_command*)$lc
$14 = {
cmd = 0x19,
cmdsize = 0x48
}
gdb$ print *(struct segment_command_64*)$lc
$16 = {
cmd = 0x19,
cmdsize = 0x48,
segname = "__LINKEDIT\000\000\000\000\000",
vmaddr = 0xffffff8000d08000,
vmsize = 0x109468,
fileoff = 0xaf4698,
filesize = 0x109468,
maxprot = 0x7,
initprot = 0x1,
nsects = 0x0,
flags = 0x0
}
Then we grab the vmaddr field from the load command, which specifies the address at which the __LINKEDIT segment’s data will be located:
gdb$ set $linkedit=((struct segment_command_64*)$lc)->vmaddr
gdb$ print $linkedit
$19 = 0xffffff8000d08000
gdb$ print *(struct nlist_64 *)$linkedit
$20 = {
n_un = {
n_strx = 0x68a29
},
n_type = 0xe,
n_sect = 0x1,
n_desc = 0x0,
n_value = 0xffffff800020a870
}
And there’s the first struct nlist_64.
As for the LC_SYMTAB load command, we just need to iterate through the load commands until we find one with the cmd field value of 0x02 (LC_SYMTAB). In this case, it’s located at 0xffffff8000200eb0:
gdb$ set $symtab=*(struct symtab_command*)0xffffff8000200eb0
gdb$ print $symtab
$23 = {
cmd = 0x2,
cmdsize = 0x18,
symoff = 0xaf4698,
nsyms = 0x699d,
stroff = 0xb5e068,
strsize = 0x9fa98
}
The useful parts here are the symoff field, which specifies the offset in the file to the symbol table (start of the __LINKEDIT segment), and the stroff field, which specifies the offset in the file to the string table (somewhere in the middle of the __LINKEDIT segment). Why, you ask, did we need to find the __LINKEDIT segment as well, since we have the offset here in the LC_SYMTAB command? If we were looking at the file on disk we wouldn’t have needed to, but as the kernel image we’re inspecting has already been loaded into memory, the binary segments have been loaded at the virtual memory addresses specified in their load commands. This means that the symoff and stroff fields are not correct any more. However, they’re still useful, as the difference between the two helps us figure out the offset into the __LINKEDIT segment at which the string table exists:
gdb$ print $linkedit
$24 = 0xffffff8000d08000
gdb$ print $linkedit + ($symtab->stroff - $symtab->symoff)
$25 = 0xffffff8000d719d0
gdb$ set $strtab=$linkedit + ($symtab->stroff - $symtab->symoff)
gdb$ x/16s $strtab
0xffffff8000d719d0: ""
0xffffff8000d719d1: ""
0xffffff8000d719d2: ""
0xffffff8000d719d3: ""
0xffffff8000d719d4: ".constructors_used"
0xffffff8000d719e7: ".destructors_used"
0xffffff8000d719f9: "_AddFileExtent"
0xffffff8000d71a08: "_AllocateNode"
0xffffff8000d71a16: "_Assert"
0xffffff8000d71a1e: "_BF_decrypt"
0xffffff8000d71a2a: "_BF_encrypt"
0xffffff8000d71a36: "_BF_set_key"
0xffffff8000d71a42: "_BTClosePath"
0xffffff8000d71a4f: "_BTDeleteRecord"
0xffffff8000d71a5f: "_BTFlushPath"
0xffffff8000d71a6c: "_BTGetInformation"
Actually finding some symbols
Now that we know where the symbol table and string table live, we can get on to the srs bznz. So, let’s find that damn _allproc symbol we need. Have a look at that first struct nlist_64 again:
gdb$ print *(struct nlist_64 *)$linkedit
$28 = {
n_un = {
n_strx = 0x68a29
},
n_type = 0xe,
n_sect = 0x1,
n_desc = 0x0,
n_value = 0xffffff800020a870
}
The n_un.nstrx field there specifies the offset into the string table at which the string corresponding to this symbol exists. If we add that offset to the address at which the string table starts, we’ll see the symbol name:
gdb$ x/s $strtab + ((struct nlist_64 *)$linkedit)->n_un.n_strx
0xffffff8000dda3f9: "_ps_vnode_trim_init"
Now all we need to do is iterate through all the struct nlist_64’s until we find the one with the matching name. In this case it’s at 0xffffff8000d482a0:
gdb$ set $nlist=0xffffff8000d482a0
gdb$ print *(struct nlist_64*)$nlist
$31 = {
n_un = {
n_strx = 0x35a07
},
n_type = 0xf,
n_sect = 0xb,
n_desc = 0x0,
n_value = 0xffffff8000cb5ca0
}
gdb$ x/s $strtab + ((struct nlist_64 *)$nlist)->n_un.n_strx
0xffffff8000da73d7: "_allproc"
The n_value field there (0xffffff8000cb5ca0) is the virtual memory address at which the symbol’s data/code exists. _allproc is not a great example as it’s a piece of data, rather than a function, so let’s try it with a function:
gdb$ set $nlist=0xffffff8000d618f0
gdb$ print *(struct nlist_64*)$nlist
$32 = {
n_un = {
n_strx = 0x52ed3
},
n_type = 0xf,
n_sect = 0x1,
n_desc = 0x0,
n_value = 0xffffff80007cceb0
}
gdb$ x/s $strtab + ((struct nlist_64 *)$nlist)->n_un.n_strx
0xffffff8000dc48a3: "_proc_lock"
If we disassemble a few instructions at that address:
gdb$ x/12i 0xffffff80007cceb0
0xffffff80007cceb0 <proc_lock>: push rbp
0xffffff80007cceb1 <proc_lock+1>: mov rbp,rsp
0xffffff80007cceb4 <proc_lock+4>: sub rsp,0x10
0xffffff80007cceb8 <proc_lock+8>: mov QWORD PTR [rbp-0x8],rdi
0xffffff80007ccebc <proc_lock+12>: mov rax,QWORD PTR [rbp-0x8]
0xffffff80007ccec0 <proc_lock+16>: mov rcx,0x50
0xffffff80007cceca <proc_lock+26>: add rax,rcx
0xffffff80007ccecd <proc_lock+29>: mov rdi,rax
0xffffff80007cced0 <proc_lock+32>: call 0xffffff800035d270 <lck_mtx_lock>
0xffffff80007cced5 <proc_lock+37>: add rsp,0x10
0xffffff80007cced9 <proc_lock+41>: pop rbp
0xffffff80007cceda <proc_lock+42>: ret
We can see that GDB has resolved the symbol for us, and we’re right on the money.
Sample code
I’ve posted an example kernel extension on github to check out. When we load it with kextload KernelResolver.kext, we should see something like this on the console:
25/02/12 8:06:49.000 PM kernel: [+] _allproc @ 0xffffff8000cb5ca0
25/02/12 8:06:49.000 PM kernel: [+] _proc_lock @ 0xffffff80007cceb0
25/02/12 8:06:49.000 PM kernel: [+] _kauth_cred_setuidgid @ 0xffffff80007abbb0
25/02/12 8:06:49.000 PM kernel: [+] __ZN6OSKext13loadFromMkextEjPcjPS0_Pj @ 0xffffff80008f8606
Update: It was brought to my attention that I was using a debug kernel in these examples. Just to be clear - the method described in this post, as well as the sample code, works on a non-debug, default install >=10.7.0 (xnu-1699.22.73) kernel as well, but the GDB inspection probably won’t (unless you load up the struct definitions etc, as they are all stored in the DEBUG kernel). The debug kernel contains every symbol from the source, whereas many symbols are stripped from the distribution kernel (e.g. sLoadedKexts). Previously (before 10.7), the kernel would write out the symbol table to a file on disk and jettison it from memory altogether. I suppose when kernel extensions were loaded, kextd or kextload would resolve symbols from within that on-disk symbol table or from the on-disk kernel image. These days the symbol table memory is just marked as pageable, so it can potentially get paged out if the system is short of memory.
I hope somebody finds this useful. Shoot me an email or get at me on twitter if you have any questions. I’ll probably sort out comments for this blog at some point, but I cbf at the moment.
Carving up EFI fat binaries
Apple uses a custom fat binary format so their EFI applications can contain both 32-bit and 64-bit sections. IDA Pro isn’t too keen on this format, and (last time I looked) won’t disassemble them unless you specify the starting offset for the architecture section you want to disassemble.
The format is just a header that looks like this:
typedef struct {
UINT32 magic; // Apple EFI fat binary magic number (0x0ef1fab9)
UINT32 num_archs; // number of architectures
EFIFatArchHeader archs[]; // architecture headers
} EFIFatHeader;
Followed by some architecture headers that look like this:
typedef struct {
UINT32 cpu_type; // probably 0x07 (CPU_TYPE_X86) or 0x01000007 (CPU_TYPE_X86_64)
UINT32 cpu_subtype; // probably 3 (CPU_SUBTYPE_I386_ALL)
UINT32 offset; // offset to beginning of architecture section
UINT32 size; // size of arch section
UINT32 align; // alignment
} EFIFatArchHeader;
Followed by the data for the sections.
I wrote a quick bit of python early last year to parse the headers and split the fat binaries into their single architecture sections and thought someone might find it useful. It’s on my github. I’ve got a couple of other half finished EFI-related scripts that I’ll add to that repo soon, when they are a bit more useful.
Running efi_lipo.py:
$ ./efi_lipo.py SmcFlasher.efi
processing 'SmcFlasher.efi'
this is an EFI fat binary with 2 architectures
architecture 0 (X86):
offset: 0x30
size: 0x8bd0
architecture 1 (X64):
offset: 0x8c00
size: 0x9e70
saving X86 section to 'SmcFlasher.efi.X86'
saving X64 section to 'SmcFlasher.efi.X64'
It might have been better to write an IDA Python script to do it instead, maybe I’ll do that at some stage, but this does the job for now.
The rEFIt site has some good info on the data structure layout, as does awkwardTV.
VMware debugging II: "Hardware" debugging
A few days ago I wrote an article about debugging the OS X kernel with VMware and GDB, using Apple’s Kernel Debugger Protocol (KDP). There is another method of debugging XNU that is worth mentioning - VMware Fusion’s built in debug server. This is the virtual equivalent of a hardware debugger on a physical machine. According to a VMware engineer:
… when you stop execution, all cores are halted, the guest doesn’t even know that time has stopped, and you can happily single-step interrupt handlers, exceptions, etc.
This is pretty awesome, and has a few advantages over KDP:
- It’s easier to break into the debugger - you can use the normal
^Cmethod from the GDB session, rather than having to either insertint 3’s into your code or insert breakpoints on predictable function calls likekext_alloc()when you attach the debugger at boot time. - It’s faster - KDP works over UDP and seems to have a few timing issues where it drops packets or the target kernel doesn’t respond in time (particularly in the more complex
kgmacroscommands), whereas the VMware debug stub seems to be substantially faster and (so far) more reliable. - You can debug anything from the time the VM is powered on - this means that you can debug non-
DEBUGXNU kernels, along with EFI stuff, the bootloader (boot.efi), whatever you want.
VMware setup
Getting this going is pretty easy, it just requires a couple of config options to be added to the .vmx file for your virtual machine. For example, if you have a VM called Lion.vmwarevm there’ll be a file inside called Lion.vmx which contains the configuration for the VM. Add the following lines (while the VM is not running):
debugStub.listen.guest32 = "TRUE"
debugStub.listen.guest64 = "TRUE"
The debug stub listens on the loopback interface on the Mac OS X host OS on which Fusion is running. If you want to debug from another machine (or VM) you need to enable the ‘remote’ listener in the .vmx file instead of (or as well as) the local listener:
debugStub.listen.guest32.remote = "TRUE"
debugStub.listen.guest64.remote = "TRUE"
Using this method you can connect to the debug stub from an instance of the FSF version of GDB on a Linux box.
That’s it, start up the VM. If you’re using a VM with a DEBUG kernel and you’ve set the boot-args variable in NVRAM to contain debug=0x1, as per the previous article, you will need to attach another instance of GDB via KDP at this point and continue in that instance to let the boot process finish.
GDB
I’ve found that if you try to connect to the debug stub without loading a file to debug you get errors like this:
[New thread 1]
Remote register badly formatted: T05thread:00000001;06:10d3fc7f00000000;07:c0d2fc7f00000000;10:8a18a07d00000000;
here: 0000000;07:c0d2fc7f00000000;10:8a18a07d00000000;
So start up GDB with whatever you’re intending to debug. In this example, the DEBUG kernel that is installed on the VM:
$ gdb /Volumes/KernelDebugKit/DEBUG_Kernel/mach_kernel
If you’re debugging a 32-bit VM on a 64-bit machine, you’ll need to set the architecture:
gdb$ set architecture i386
Or, if you are debugging 64-bit on 64-bit and have trouble connecting to the debug stub, you may need to explicitly set it to 64-bit:
gdb$ set architecture i386:x86-64
If you’re debugging a 64-bit VM, connect to the 64-bit debug stub:
gdb$ target remote localhost:8864
Or the 32-bit debug stub for a 32-bit VM:
gdb$ target remote localhost:8832
At this point you should be connected to the debug stub, and the VM should be paused. You’ll see a dark translucent version of the ‘play’ button used to start the VM on the VM console (indicating the VM is paused and the debugger has control), and something like this in GDB:
[New thread 1]
warning: Error 268435459 getting port names from mach_port_names
[Switching to process 1 thread 0x0]
0xffffff80008bf4c2 in tweak_crypt_group ()
gdb$
tweak_crypt_group() - heh. My VM is encrypting its disk at the moment.
Now you’re in familiar territory:
gdb$ source /Volumes/KernelDebugKit/kgmacros
Loading Kernel GDB Macros package. Type "help kgm" for more info.
gdb$ bt
#0 0xffffff7f817315b4 in ?? ()
#1 0xffffff7f8172343e in ?? ()
#2 0xffffff7f81724f68 in ?? ()
#3 0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4 0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
#5 0xffffff8000257060 in thread_select_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1793
#6 0xffffff8000256d8e in thread_select (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:1728
#7 0xffffff8000258bbf in thread_block_reason (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>, parameter=0x0, reason=0x0) at sched_prim.c:2396
#8 0xffffff8000258cbc in thread_block (continuation=0xffffff8000227270 <ipc_mqueue_receive_continue>) at sched_prim.c:2415
#9 0xffffff8000227357 in ipc_mqueue_receive (mqueue=0xffffff8008854728, option=0x7000006, max_size=0xc00, rcv_timeout=0xffffffff, interruptible=0x2) at ipc_mqueue.c:698
#10 0xffffff8000237542 in mach_msg_overwrite_trap (args=0xffffff800872b804) at mach_msg.c:528
#11 0xffffff80002375b4 in mach_msg_trap (args=0xffffff800872b804) at mach_msg.c:554
#12 0xffffff8000354a01 in mach_call_munger64 (state=0xffffff800872b800) at bsd_i386.c:534
gdb$ showalltasks
task vm_map ipc_space #acts pid process io_policy wq_state command
0xffffff80067ac938 0xffffff800249ee98 0xffffff80066ebdb0 60 0 0xffffff8000cb4c20 kernel_task
0xffffff80067ac5a0 0xffffff800249e200 0xffffff80066ebd10 3 1 0xffffff8007576820 launchd
0xffffff80067ac208 0xffffff800249e010 0xffffff80066ebc70 1 2 0xffffff80075763d0 launchctl
0xffffff80067ab740 0xffffff800249e108 0xffffff80066eba90 3 10 0xffffff80075756e0 2 1 0 kextd
0xffffff80067abe70 0xffffff8007003568 0xffffff80066ebbd0 3 11 0xffffff8007575f80 1 0 0 UserEventAgent
0xffffff80067abad8 0xffffff8007e692f8 0xffffff80066ebb30 3 12 0xffffff8007575b30 1 0 0 mDNSResponder
<snip>
Don’t forget you can just ^C to drop back into the debuggger just like back in the good old userland days:
gdb$ c
^C
Program received signal SIGINT, Interrupt.
0xffffff7f817315b4 in ?? ()
gdb$ bt
#0 0xffffff7f817315b4 in ?? ()
#1 0xffffff7f8172343e in ?? ()
#2 0xffffff7f81724f68 in ?? ()
#3 0xffffff8000379b18 in machine_idle () at pmCPU.c:107
#4 0xffffff800025c357 in processor_idle (thread=0xffffff8008712b80, processor=0xffffff8000c9be20) at sched_prim.c:3928
<snip>
Enjoy.