Howdy fellas,
I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).
Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.
This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.
So let’s take the Linux ls utility as a test binary:
reynaudd@lhs-2:~/test/packed$ cp /bin/ls .
Let’s generate a list of memory references with a pintool:
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls reynaudd@lhs-2:~/test/packed$ head pinatrace.out 0x7fb878380a63: W 0x7fff80599d98 0x7fb878381070: W 0x7fff80599d90 0x7fb878381074: W 0x7fff80599d88 0x7fb878381076: W 0x7fff80599d80 0x7fb87838107b: W 0x7fff80599d78 0x7fb878381092: R 0x7fb87859bbe0 0x7fb87838109c: R 0x7fb87859bfb8 0x7fb8783810a3: W 0x7fb87859bda8 0x7fb8783810aa: W 0x7fb87859c528 0x7fb8783810b1: R 0x7fb87859be48
Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls reynaudd@lhs-2:~/test/packed$ head itrace.out 0x7f313aa1da60 0x7f313aa1da63 0x7f313aa1e070 0x7f313aa1e071 0x7f313aa1e074 0x7f313aa1e076 0x7f313aa1e078 0x7f313aa1e07b 0x7f313aa1e07c 0x7f313aa1e080
Now let’s pack ls with upx and see how it goes:
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null reynaudd@lhs-2:~/test/packed$ upx ls Ultimate Packer for eXecutables Copyright (C) 1996 - 2008 UPX 3.03 Markus Oberhumer, Laszlo Molnar & John Reiser Apr 27th 2008 File size Ratio Format Name -------------------- ------ ----------- ----------- ls 1/5 [.......................................................] 100.0% ls 1/5 [******.................................................] 33.8% ls 1/5 [************...........................................] 42.6% ls 1/5 [*****************......................................] 45.4% ls 1/5 [***********************................................] 43.3% ls 1/5 [*****************************..........................] 45.5% ls 1/5 [**********************************.....................] 45.8% ls 1/5 [****************************************...............] 45.5% ls 1/5 [*********************************************..........] 44.2% ls 1/5 [***************************************************....] 41.5% ls 1/5 [*******************************************************] 41.2% ls 2/5 [.......................................................] 100.0% ls 2/5 [******.................................................] 36.0% ls 2/5 [************...........................................] 45.1% ls 2/5 [*****************......................................] 49.1% ls 2/5 [***********************................................] 47.4% ls 2/5 [*****************************..........................] 49.1% ls 2/5 [**********************************.....................] 49.5% ls 2/5 [****************************************...............] 49.2% ls 2/5 [*********************************************..........] 47.5% ls 2/5 [***************************************************....] 44.4% ls 2/5 [*******************************************************] 43.8% ls 3/5 [.......................................................] 100.0% ls 3/5 [*******************************************************] 26.2% ls 4/5 [.......................................................] 100.0% ls 5/5 [.......................................................] 100.0% ls 5/5 [*******************************************************] 27.7% 101992 -> 43612 42.76% linux/ElfAMD ls Packed 1 file. reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.* 198746 pinatrace.out.ls.normal 646742 pinatrace.out.ls.packed 845488 total reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.* 523197 itrace.out.ls.normal 2673802 itrace.out.ls.packed 3196999 total
As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:
#!/usr/bin/python """Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>""" import sys import getopt def parse(pinatrace): f = open(pinatrace, 'r') writes = [] for line in f: if "W" in line: # indicates a memory write tokens = line.split() writes.append(eval(tokens[len(tokens)-1])) f.close() return writes def match(writes, itrace): f = open(itrace, 'r') for line in f: if "0x" in line: eip = eval(line) if eip in writes: # this eip has previously been written # to, we guess this is the oep f.close() return eip return None def main(): # parse command line options try: opts, args = getopt.getopt(sys.argv[1:], "h", ["help"]) except getopt.error, msg: print msg print "for help use --help" sys.exit(2) # process options for o, a in opts: if o in ("-h", "--help"): print __doc__ sys.exit(0) # process arguments if len(args) != 2: print __doc__ print "for help use --help" sys.exit(2) print "parsing", args[0] writes = parse(args[0]) print "done, parsed", len(writes), "memory writes" print "looking for hits in", args[1] hit = match(writes, args[1]) if hit == None: print "no hits found, the binary doesn't look packed" else: print "Candidate OEP: 0x%X" % hit if __name__ == "__main__": main()
Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):
reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed parsing pinatrace.out.ls.packed done, parsed 208638 memory writes looking for hits in itrace.out.ls.packed Candidate OEP: 0x129000 reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal parsing pinatrace.out.ls.normal done, parsed 55829 memory writes. sorting the writes list done looking for hits in itrace.out.ls.normal no hits found, the binary doesn't look packed
That’s all folks!
thanks about your analyzis ;-) and you unoptimized py source helps, Marc
hi Marc, thanks for stopping by!