Or how toescape PIN in 5 instructions, using the self-modification technique seen in the previous post. Ready ? Go:
#include <stdio.h>
main() {
asm("call foo\n\t"
"foo: pop %rax\n\t"
"movl $0x4004e7, 10(%eax)\n\t" // put @nottraced() in the next mov
"movl $0x4004fb, %eax\n\t" // @traced(), will be overwritten
// by @nottraced() if not instrumented
"call *%rax\n\t");
}
// we don't want PIN to analyse this
nottraced() {
printf("trace me if you can!\n");
}
// we want PIN to analyse this, a dummy function
traced() {
printf("you're not supposed to get here\n");
}
As usual: compile, make the .text section and the program header writable, and run.
reynaudd@lhs-2:~/test/packed$ ./escape2
trace me if you can!
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-24110-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/inscount0.so -- ./escape2
you're not supposed to get here
‘Nuff said.
UPDATE: as the authors of PIN pointed out, this situation in handled correctly by PIN with the option -smc_strict. That’s because for performance reasons (and standards compliance), PIN makes the assumption that there is at least a taken branch between a modification of the code and its execution (i.e. no basic block modifies itself). My example violates this assumption.
I’m still playing with PIN. I thought it was cool that it could follow the flow of packed programs, and I wanted to push it a bit more. So here are a few more tests to check how PIN reacts to self-checking and self-modifying code.
A Self-Checking Function (using C function pointers)
The output is the same in the normal program and in the instrumented program. This is because the address of main() is replaced by the compiler with an immediate value, as you can see here in the following disassembly. PIN does not instrument main() directly but a copy of main(), therefore checksumming the original main() function is pointless because it is never executed.
reynaudd@lhs-2:~/test/packed$ gdb ./schello
GNU gdb 6.8-debian […]
This GDB was configured as “x86_64-linux-gnu”…
(gdb) disas main
Dump of assembler code for function main:
0x0000000000400598 <main+0>: push %rbp
0x0000000000400599 <main+1>: mov %rsp,%rbp
0x000000000040059c <main+4>: mov $0x20,%esi
0x00000000004005a1 <main+9>: mov $0x400598,%edi
0x00000000004005a6 <main+14>: callq 0x40051c <dump>
0x00000000004005ab <main+19>: mov $0x0,%eax
0x00000000004005b0 <main+24>: leaveq
0x00000000004005b1 <main+25>: retq
End of assembler dump.
(gdb)
A Self-Checking Function (using a call/pop to get the current address)
So the next step is to get the address of the instrumented function that is really executed. I tried to do that with a call followed by a pop to get the address of the pop instruction (thanks to joe from joebox.org for giving me a hand on this one):
#include <stdio.h>
void foo() {
int var = -1;
asm("call bar\n\t"
"bar: pop %%rax\n\t"
: "=rax"(var));
printf("0x%x\n", var);
}
int main() {
foo();
}
I was pretty surprised to see the same value both in the normal program and in the instrumented program. It means that PIN has sufficient control over the program to give it the value it would normally see. So, we still can’t get the address of the instrumented function that way.
A Self-Modifying Function
Now we know that PIN handles gracefully dynamically generated code and self-checking code, we still have to check how it handles self-modification. It is functionally equivalent to dynamic generation, but the technical difference might abuse PIN’s cache.
#include <stdio.h>
void foo() {
int var = -1;
asm ("call bar\n\t"
"bar: pop %%rax\n\t"
"movl $0xcafebabe, 10(%%eax)\n\t" // this is an attempt to replace 0xffffffff
// with 0xcafebabe in the next instruction
"movl $0xffffffff,%%eax\n\t"
: "=rax"(var));
printf("0x%x\n", var);
}
int main() {
foo();
}
Since this program modifies itself, we must set the .code section and the program header permissions to RWX, not just RX otherwise it will segfault. Matthieu Kaczmarek pointed me to HT editor, a very handy hex editor for this purpose.
Finally, we get two different results (this means we have a reliable way to detect PIN). The difference comes from the fact that the normal program really modifies the second mov before it occurs and we see the modified output. In the instrumented program, the modification occurs on the normal program (never executed) and not in the cache, therefore we see the unmodified output. The proper way to handle this situation would be to invalidate PIN’s cache when a memory write occurs in the code section I guess.
I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).
Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.
This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.
So let’s take the Linux ls utility as a test binary:
reynaudd@lhs-2:~/test/packed$ cp /bin/ls .
Let’s generate a list of memory references with a pintool:
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48
Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2008
UPX 3.03 Markus Oberhumer, Laszlo Molnar & John Reiser Apr 27th 2008
File size Ratio Format Name
-------------------- ------ ----------- -----------
ls 1/5 [.......................................................] 100.0%
ls 1/5 [******.................................................] 33.8%
ls 1/5 [************...........................................] 42.6%
ls 1/5 [*****************......................................] 45.4%
ls 1/5 [***********************................................] 43.3%
ls 1/5 [*****************************..........................] 45.5%
ls 1/5 [**********************************.....................] 45.8%
ls 1/5 [****************************************...............] 45.5%
ls 1/5 [*********************************************..........] 44.2%
ls 1/5 [***************************************************....] 41.5%
ls 1/5 [*******************************************************] 41.2%
ls 2/5 [.......................................................] 100.0%
ls 2/5 [******.................................................] 36.0%
ls 2/5 [************...........................................] 45.1%
ls 2/5 [*****************......................................] 49.1%
ls 2/5 [***********************................................] 47.4%
ls 2/5 [*****************************..........................] 49.1%
ls 2/5 [**********************************.....................] 49.5%
ls 2/5 [****************************************...............] 49.2%
ls 2/5 [*********************************************..........] 47.5%
ls 2/5 [***************************************************....] 44.4%
ls 2/5 [*******************************************************] 43.8%
ls 3/5 [.......................................................] 100.0%
ls 3/5 [*******************************************************] 26.2%
ls 4/5 [.......................................................] 100.0%
ls 5/5 [.......................................................] 100.0%
ls 5/5 [*******************************************************] 27.7%
101992 -> 43612 42.76% linux/ElfAMD ls
Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
198746 pinatrace.out.ls.normal
646742 pinatrace.out.ls.packed
845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
523197 itrace.out.ls.normal
2673802 itrace.out.ls.packed
3196999 total
As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:
#!/usr/bin/python
"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""
import sys
import getopt
def parse(pinatrace):
f = open(pinatrace, 'r')
writes = []
for line in f:
if "W" in line: # indicates a memory write
tokens = line.split()
writes.append(eval(tokens[len(tokens)-1]))
f.close()
return writes
def match(writes, itrace):
f = open(itrace, 'r')
for line in f:
if "0x" in line:
eip = eval(line)
if eip in writes: # this eip has previously been written
# to, we guess this is the oep
f.close()
return eip
return None
def main():
# parse command line options
try:
opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
except getopt.error, msg:
print msg
print "for help use --help"
sys.exit(2)
# process options
for o, a in opts:
if o in ("-h", "--help"):
print __doc__
sys.exit(0)
# process arguments
if len(args) != 2:
print __doc__
print "for help use --help"
sys.exit(2)
print "parsing", args[0]
writes = parse(args[0])
print "done, parsed", len(writes), "memory writes"
print "looking for hits in", args[1]
hit = match(writes, args[1])
if hit == None:
print "no hits found, the binary doesn't look packed"
else:
print "Candidate OEP: 0x%X" % hit
if __name__ == "__main__":
main()
Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):
reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000
reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed