RIP-Relative Addressing and Kernel Payloads

The x86-64 architecture introduced a new way to generate Position-Independent Code (PIC) – RIP-relative addressing. RIP-relative addressing works by referencing data and functions by an address relative to the current instruction pointer, so that “fixups” are not needed for local functions when relocating a piece of code to a base address other than that for which it was linked. I won’t go into too much detail about load-time relocation or PIC on x86, but if you’re interested in the details I recommend reading Eli Bendersky’s excellent write ups on how load-time relocation, x86 PIC and x86_64 PIC work on Linux/ELF, as the concepts are fairly similar to how it works on OS X/Mach-O.

RIP-relative addressing became a bit of a problem for me when I was generating kernel payloads that I wanted to be able to relocate to different areas of memory. I’ll explain by way of example.

Consider the following dummy kernel extension:

#include <mach/mach_types.h>
#include <sys/systm.h>

kern_return_t TestPayload_start(kmod_info_t * ki, void *d);
kern_return_t TestPayload_stop(kmod_info_t *ki, void *d);

kern_return_t TestPayload_start(kmod_info_t * ki, void *d)
{
    printf("sup\n");
    return KERN_SUCCESS;
}

kern_return_t TestPayload_stop(kmod_info_t *ki, void *d)
{
    return KERN_SUCCESS;
}

This is only slightly modified from the default code that is generated when we create a new Kernel Extension project in Xcode – I just added the printf() and relevant #include. If we compile this in the normal way with Xcode, and disassemble the executable:

$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload
_TestPayload_start:
0000000000000f20	pushq	%rbp
0000000000000f21	movq	%rsp,%rbp
0000000000000f24	subq	$0x20,%rsp
0000000000000f28	movq	%rdi,0xf8(%rbp)
0000000000000f2c	movq	%rsi,0xf0(%rbp)
0000000000000f30	xorb	%al,%al
0000000000000f32	leaq	0x000000b3(%rip),%rcx
0000000000000f39	movq	%rcx,%rdi
0000000000000f3c	callq	0x00000f41
0000000000000f41	movl	$0x00000000,0xe8(%rbp)
0000000000000f48	movl	0xe8(%rbp),%eax
0000000000000f4b	movl	%eax,0xec(%rbp)
0000000000000f4e	movl	0xec(%rbp),%eax
0000000000000f51	addq	$0x20,%rsp
0000000000000f55	popq	%rbp
0000000000000f56	ret
0000000000000f57	nopw	0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f60	pushq	%rbp
0000000000000f61	movq	%rsp,%rbp
0000000000000f64	subq	$0x18,%rsp
0000000000000f68	movq	%rdi,0xf8(%rbp)
0000000000000f6c	movq	%rsi,0xf0(%rbp)
0000000000000f70	movl	$0x00000000,0xe8(%rbp)
0000000000000f77	movl	0xe8(%rbp),%eax
0000000000000f7a	movl	%eax,0xec(%rbp)
0000000000000f7d	movl	0xec(%rbp),%eax
0000000000000f80	addq	$0x18,%rsp
0000000000000f84	popq	%rbp
0000000000000f85	ret
<snip>

Note the callq 0x00000f41 at 0xf3c there – that’s the call to printf(). If we dump the section without disassembling:

$ otool -t TestPayload.kext/Contents/MacOS/TestPayload 
TestPayload:
(__TEXT,__text) section
0000000000000f20 55 48 89 e5 48 83 ec 20 48 89 7d f8 48 89 75 f0 
0000000000000f30 30 c0 48 8d 0d b3 00 00 00 48 89 cf e8 00 00 00 
0000000000000f40 00 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45 
0000000000000f50 ec 48 83 c4 20 5d c3 66 0f 1f 84 00 00 00 00 00 
0000000000000f60 55 48 89 e5 48 83 ec 18 48 89 7d f8 48 89 75 f0 
0000000000000f70 c7 45 e8 00 00 00 00 8b 45 e8 89 45 ec 8b 45 ec 
0000000000000f80 48 83 c4 18 5d c3 55 48 89 e5 48 8d 05 37 01 00 
0000000000000f90 00 48 8b 00 48 85 c0 75 04 31 c0 5d c3 5d ff e0 
0000000000000fa0 55 48 89 e5 48 8d 05 55 00 00 00 48 83 c0 10 5d 
0000000000000fb0 c3 55 48 89 e5 48 8d 05 44 00 00 00 48 83 c0 50 
0000000000000fc0 5d c3 55 48 89 e5 48 8d 05 33 00 00 00 8b 40 0c 
0000000000000fd0 5d c3 55 48 89 e5 48 8d 05 f3 00 00 00 48 8b 00 
0000000000000fe0 48 85 c0 75 04 31 c0 5d c3 5d ff e0 

We can see at 0xf3c an instruction that looks like e8 00 00 00 00 – this is a RIP-relative call instruction opcode (e8), followed by the 32-bit displacement (00 00 00 00). This is supposed to be the printf() call? Well, yeah. The compiler doesn’t know the address of the printf() function in the kernel at compile time, so it puts in 0x0 as a placeholder which will be updated when the executable is loaded and linked by KXLD. So how does KXLD know that this instruction needs updating? Relocation entries. Have a look at the relocation entries for the executable:

$ otool -r TestPayload.kext/Contents/MacOS/TestPayload 
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address  pcrel length extern type    scattered symbolnum/value
00000f3d 1     2      1      2       0         31
<snip>

We’re only concerned about the external relocations in this instance – we can see there is only one of these, and its address (offset within the executable file) is 0xf3d. This happens to be one byte after the e8 (call) instruction – the location of the displacement value for the RIP-relative call. It’s also worth noting there that the pcrel field is 1 – indicating that this is, in fact, a RIP-relative instruction. The other fields give the linker more information about how the relocation entry should be handled. You can find more info about these fields in the ABI documentation.

So, back to my kernel payloads – I wanted to be able to move the payload around in memory without having to update the relocation entries each time, as that would require keeping the code to perform this updating within the payload. There are a few compiler options for generating slightly-more-position-independent code, but the OS X version of GCC doesn’t seem to support them. Fortunately, Clang does. If we compile with the -mcmodel=large option (by adding it to the “Other C Flags” field in the Xcode build settings), and disassemble the executable:

$ otool -tv TestPayload.kext/Contents/MacOS/TestPayload 
TestPayload.kext/Contents/MacOS/TestPayload:
(__TEXT,__text) section
_TestPayload_start:
0000000000000f30	pushq	%rbp
0000000000000f31	movq	%rsp,%rbp
0000000000000f34	subq	$0x20,%rsp
0000000000000f38	movq	%rdi,0xf8(%rbp)
0000000000000f3c	movq	%rsi,0xf0(%rbp)
0000000000000f40	xorb	%al,%al
0000000000000f42	movq	$0x0000000000000ff1,%rdi
0000000000000f4c	movq	$0x0000000000000000,%rsi
0000000000000f56	call	*%rsi
0000000000000f58	movl	$0x00000000,%ecx
0000000000000f5d	movl	%eax,0xec(%rbp)
0000000000000f60	movl	%ecx,%eax
0000000000000f62	addq	$0x20,%rsp
0000000000000f66	popq	%rbp
0000000000000f67	ret
0000000000000f68	nopl	0x00000000(%rax,%rax)
_TestPayload_stop:
0000000000000f70	pushq	%rbp
0000000000000f71	movq	%rsp,%rbp
0000000000000f74	subq	$0x10,%rsp
0000000000000f78	movl	$0x00000000,%eax
0000000000000f7d	movq	%rdi,0xf8(%rbp)
0000000000000f81	movq	%rsi,0xf0(%rbp)
0000000000000f85	addq	$0x10,%rsp
0000000000000f89	popq	%rbp
0000000000000f8a	ret
<snip>

Now we have a call with an absolute 64-bit address by moving the address of the function into a register and calling the value of that register. If we have a look at the relocation entries now:

$ otool -r TestPayload.kext/Contents/MacOS/TestPayload 
TestPayload.kext/Contents/MacOS/TestPayload:
External relocation information 1 entries
address  pcrel length extern type    scattered symbolnum/value
00000f4e 0     3      1      0       0         31
<snip>

Notice pcrel is now 0, as it’s an absolute 64-bit address that we’re updating instead of a 32-bit displacement from RIP. This means that we can look up the address of the symbol (e.g. printf()) once when we initially load the payload, and update the relocation entry (or entries) to point to that address. Unfortunately this inflates the size of the code a bit, as all function calls are treated this way, which kind of defeats the purpose of trimming the relocation code – once we reach a certain payload size anyway. Oh well, it’s still a bit easier to handle. Next stop might be to write an LLVM pass to convert only external calls to absolute calls.

I’m not sure how useful this will be to others, but I thought it was interesting!