RIP-Relative Addressing and Kernel Payloads
The x86-64 architecture introduced a new way to generate Position-Independent Code (PIC) – RIP-relative addressing. RIP-relative addressing works by referencing data and functions by an address relative to the current instruction pointer, so that “fixups” are not needed for local functions when relocating a piece of code to a base address other than that for which it was linked. I won’t go into too much detail about load-time relocation or PIC on x86, but if you’re interested in the details I recommend reading Eli Bendersky’s excellent write ups on how load-time relocation, x86 PIC and x86_64 PIC work on Linux/ELF, as the concepts are fairly similar to how it works on OS X/Mach-O.
RIP-relative addressing became a bit of a problem for me when I was generating kernel payloads that I wanted to be able to relocate to different areas of memory. I’ll explain by way of example.
Consider the following dummy kernel extension:
#include <mach/mach_types.h>
#include <sys/systm.h>
kern_return_t TestPayload_start(kmod_info_t * ki, void *d);
kern_return_t TestPayload_stop(kmod_info_t *ki, void *d);
kern_return_t TestPayload_start(kmod_info_t * ki, void *d)
{
printf("sup\n");
return KERN_SUCCESS;
}
kern_return_t TestPayload_stop(kmod_info_t *ki, void *d)
{
return KERN_SUCCESS;
}
This is only slightly modified from the default code that is generated when we create a new Kernel Extension project in Xcode – I just added the printf()
and relevant #include
. If we compile this in the normal way with Xcode, and disassemble the executable:
$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload
_TestPayload_start:
0000000000000f20 pushq %rbp
0000000000000f21 movq %rsp,%rbp
0000000000000f24 subq $0x20,%rsp
0000000000000f28 movq %rdi,0xf8(%rbp)
0000000000000f2c movq %rsi,0xf0(%rbp)
0000000000000f30 xorb %al,%al
0000000000000f32 leaq 0x000000b3(%rip),%rcx
0000000000000f39 movq %rcx,%rdi
0000000000000f3c callq 0x00000f41
0000000000000f41 movl $0x00000000,0xe8(%rbp)
0000000000000f48 movl 0xe8(%rbp),%eax
0000000000000f4b movl %eax,0xec(%rbp)
0000000000000f4e movl 0xec(%rbp),%eax
0000000000000f51 addq $0x20,%rsp
0000000000000f55 popq %rbp
0000000000000f56 ret
0000000000000f57 nopw 0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f60 pushq %rbp
0000000000000f61 movq %rsp,%rbp
0000000000000f64 subq $0x18,%rsp
0000000000000f68 movq %rdi,0xf8(%rbp)
0000000000000f6c movq %rsi,0xf0(%rbp)
0000000000000f70 movl $0x00000000,0xe8(%rbp)
0000000000000f77 movl 0xe8(%rbp),%eax
0000000000000f7a movl %eax,0xec(%rbp)
0000000000000f7d movl 0xec(%rbp),%eax
0000000000000f80 addq $0x18,%rsp
0000000000000f84 popq %rbp
0000000000000f85 ret
<snip>
Note the callq 0x00000f41
at 0xf3c
there – that’s the call to printf()
. If we dump the section without disassembling:
$ otool -t TestPayload.kext/Contents/MacOS/TestPayload
TestPayload:
(__TEXT,__text) section
0000000000000f20 55 48 89 e5 48 83 ec 20 48 89 7d f8 48 89 75 f0
0000000000000f30 30 c0 48 8d 0d b3 00 00 00 48 89 cf e8 00 00 00
0000000000000f40 00 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45
0000000000000f50 ec 48 83 c4 20 5d c3 66 0f 1f 84 00 00 00 00 00
0000000000000f60 55 48 89 e5 48 83 ec 18 48 89 7d f8 48 89 75 f0
0000000000000f70 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45 ec
0000000000000f80 48 83 c4 18 5d c3 55 48 89 e5 48 8d 05 37 01 00
0000000000000f90 00 48 8b 00 48 85 c0 75 04 31 c0 5d c3 5d ff e0
0000000000000fa0 55 48 89 e5 48 8d 05 55 00 00 00 48 83 c0 10 5d
0000000000000fb0 c3 55 48 89 e5 48 8d 05 44 00 00 00 48 83 c0 50
0000000000000fc0 5d c3 55 48 89 e5 48 8d 05 33 00 00 00 8b 40 0c
0000000000000fd0 5d c3 55 48 89 e5 48 8d 05 f3 00 00 00 48 8b 00
0000000000000fe0 48 85 c0 75 04 31 c0 5d c3 5d ff e0
We can see at 0xf3c
an instruction that looks like e8 00 00 00 00
– this is a RIP-relative call
instruction opcode (e8
), followed by the 32-bit displacement (00 00 00 00
). This is supposed to be the printf()
call? Well, yeah. The compiler doesn’t know the address of the printf()
function in the kernel at compile time, so it puts in 0x0
as a placeholder which will be updated when the executable is loaded and linked by KXLD. So how does KXLD know that this instruction needs updating? Relocation entries. Have a look at the relocation entries for the executable:
$ otool -r TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address pcrel length extern type scattered symbolnum/value
00000f3d 1 2 1 2 0 31
<snip>
We’re only concerned about the external relocations in this instance – we can see there is only one of these, and its address (offset within the executable file) is 0xf3d
. This happens to be one byte after the e8
(call
) instruction – the location of the displacement value for the RIP-relative call. It’s also worth noting there that the pcrel
field is 1 – indicating that this is, in fact, a RIP-relative instruction. The other fields give the linker more information about how the relocation entry should be handled. You can find more info about these fields in the ABI documentation.
So, back to my kernel payloads – I wanted to be able to move the payload around in memory without having to update the relocation entries each time, as that would require keeping the code to perform this updating within the payload. There are a few compiler options for generating slightly-more-position-independent code, but the OS X version of GCC doesn’t seem to support them. Fortunately, Clang does. If we compile with the -mcmodel=large
option (by adding it to the “Other C Flags” field in the Xcode build settings), and disassemble the executable:
$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
(__TEXT,__text) section
_TestPayload_start:
0000000000000f30 pushq %rbp
0000000000000f31 movq %rsp,%rbp
0000000000000f34 subq $0x20,%rsp
0000000000000f38 movq %rdi,0xf8(%rbp)
0000000000000f3c movq %rsi,0xf0(%rbp)
0000000000000f40 xorb %al,%al
0000000000000f42 movq $0x0000000000000ff1,%rdi
0000000000000f4c movq $0x0000000000000000,%rsi
0000000000000f56 call *%rsi
0000000000000f58 movl $0x00000000,%ecx
0000000000000f5d movl %eax,0xec(%rbp)
0000000000000f60 movl %ecx,%eax
0000000000000f62 addq $0x20,%rsp
0000000000000f66 popq %rbp
0000000000000f67 ret
0000000000000f68 nopl 0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f70 pushq %rbp
0000000000000f71 movq %rsp,%rbp
0000000000000f74 subq $0x10,%rsp
0000000000000f78 movl $0x00000000,%eax
0000000000000f7d movq %rdi,0xf8(%rbp)
0000000000000f81 movq %rsi,0xf0(%rbp)
0000000000000f85 addq $0x10,%rsp
0000000000000f89 popq %rbp
0000000000000f8a ret
<snip>
Now we have a call
with an absolute 64-bit address by moving the address of the function into a register and call
ing the value of that register. If we have a look at the relocation entries now:
$ otool -r TestPayload.kext/Contents/MacOS/TestPayload
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address pcrel length extern type scattered symbolnum/value
00000f4e 0 3 1 0 0 31
<snip>
Notice pcrel
is now 0, as it’s an absolute 64-bit address that we’re updating instead of a 32-bit displacement from RIP. This means that we can look up the address of the symbol (e.g. printf()
) once when we initially load the payload, and update the relocation entry (or entries) to point to that address. Unfortunately this inflates the size of the code a bit, as all function calls are treated this way, which kind of defeats the purpose of trimming the relocation code – once we reach a certain payload size anyway. Oh well, it’s still a bit easier to handle. Next stop might be to write an LLVM pass to convert only external calls to absolute calls.
I’m not sure how useful this will be to others, but I thought it was interesting!