Automated Unpacking, Dynamic Binary Instrumentation and You

Howdy fellas,

I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).

Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.

This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.

So let’s take the Linux ls utility as a test binary:

reynaudd@lhs-2:~/test/packed$ cp /bin/ls .

Let’s generate a list of memory references with a pintool:

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48

Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head itrace.out
0x7f313aa1da60
0x7f313aa1da63
0x7f313aa1e070
0x7f313aa1e071
0x7f313aa1e074
0x7f313aa1e076
0x7f313aa1e078
0x7f313aa1e07b
0x7f313aa1e07c
0x7f313aa1e080

Now let’s pack ls with upx and see how it goes:

reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
ls  1/5  [.......................................................]  100.0%
ls  1/5  [******.................................................]   33.8%
ls  1/5  [************...........................................]   42.6%
ls  1/5  [*****************......................................]   45.4%
ls  1/5  [***********************................................]   43.3%
ls  1/5  [*****************************..........................]   45.5%
ls  1/5  [**********************************.....................]   45.8%
ls  1/5  [****************************************...............]   45.5%
ls  1/5  [*********************************************..........]   44.2%
ls  1/5  [***************************************************....]   41.5%
ls  1/5  [*******************************************************]   41.2%
ls  2/5  [.......................................................]  100.0%
ls  2/5  [******.................................................]   36.0%
ls  2/5  [************...........................................]   45.1%
ls  2/5  [*****************......................................]   49.1%
ls  2/5  [***********************................................]   47.4%
ls  2/5  [*****************************..........................]   49.1%
ls  2/5  [**********************************.....................]   49.5%
ls  2/5  [****************************************...............]   49.2%
ls  2/5  [*********************************************..........]   47.5%
ls  2/5  [***************************************************....]   44.4%
ls  2/5  [*******************************************************]   43.8%
ls  3/5  [.......................................................]  100.0%
ls  3/5  [*******************************************************]   26.2%
ls  4/5  [.......................................................]  100.0%
ls  5/5  [.......................................................]  100.0%
ls  5/5  [*******************************************************]   27.7%
 101992 ->     43612   42.76%  linux/ElfAMD   ls

Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
 198746 pinatrace.out.ls.normal
 646742 pinatrace.out.ls.packed
 845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
 523197 itrace.out.ls.normal
 2673802 itrace.out.ls.packed
 3196999 total

As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:

#!/usr/bin/python

"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""

import sys
import getopt

def parse(pinatrace):
    f = open(pinatrace, 'r')
    writes = []
    for line in f:
        if "W" in line: # indicates a memory write
            tokens = line.split()
            writes.append(eval(tokens[len(tokens)-1]))
    f.close()
    return writes

def match(writes, itrace):
    f = open(itrace, 'r')
    for line in f:
        if "0x" in line:
            eip = eval(line)
            if eip in writes: # this eip has previously been written
                                  # to, we guess this is the oep
                f.close()
                return eip
    return None

def main():
    # parse command line options
    try:
        opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
    except getopt.error, msg:
        print msg
        print "for help use --help"
        sys.exit(2)
    # process options
    for o, a in opts:
        if o in ("-h", "--help"):
            print __doc__
            sys.exit(0)
    # process arguments

    if len(args) != 2:
        print __doc__
        print "for help use --help"
        sys.exit(2)
    print "parsing", args[0]
    writes = parse(args[0])
    print "done, parsed", len(writes), "memory writes"
    print "looking for hits in", args[1]
    hit = match(writes, args[1])
    if hit == None:
        print "no hits found, the binary doesn't look packed"
    else:
        print "Candidate OEP: 0x%X" % hit

if __name__ == "__main__":
    main()

Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):

reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000

reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed

That’s all folks!

Advertisements

2 thoughts on “Automated Unpacking, Dynamic Binary Instrumentation and You

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s