Automated Unpacking, Dynamic Binary Instrumentation and You

Howdy fellas,

I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).

Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.

This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.

So let’s take the Linux ls utility as a test binary:

reynaudd@lhs-2:~/test/packed$ cp /bin/ls .

Let’s generate a list of memory references with a pintool:

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48

Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head itrace.out
0x7f313aa1da60
0x7f313aa1da63
0x7f313aa1e070
0x7f313aa1e071
0x7f313aa1e074
0x7f313aa1e076
0x7f313aa1e078
0x7f313aa1e07b
0x7f313aa1e07c
0x7f313aa1e080

Now let’s pack ls with upx and see how it goes:

reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
ls  1/5  [.......................................................]  100.0%
ls  1/5  [******.................................................]   33.8%
ls  1/5  [************...........................................]   42.6%
ls  1/5  [*****************......................................]   45.4%
ls  1/5  [***********************................................]   43.3%
ls  1/5  [*****************************..........................]   45.5%
ls  1/5  [**********************************.....................]   45.8%
ls  1/5  [****************************************...............]   45.5%
ls  1/5  [*********************************************..........]   44.2%
ls  1/5  [***************************************************....]   41.5%
ls  1/5  [*******************************************************]   41.2%
ls  2/5  [.......................................................]  100.0%
ls  2/5  [******.................................................]   36.0%
ls  2/5  [************...........................................]   45.1%
ls  2/5  [*****************......................................]   49.1%
ls  2/5  [***********************................................]   47.4%
ls  2/5  [*****************************..........................]   49.1%
ls  2/5  [**********************************.....................]   49.5%
ls  2/5  [****************************************...............]   49.2%
ls  2/5  [*********************************************..........]   47.5%
ls  2/5  [***************************************************....]   44.4%
ls  2/5  [*******************************************************]   43.8%
ls  3/5  [.......................................................]  100.0%
ls  3/5  [*******************************************************]   26.2%
ls  4/5  [.......................................................]  100.0%
ls  5/5  [.......................................................]  100.0%
ls  5/5  [*******************************************************]   27.7%
 101992 ->     43612   42.76%  linux/ElfAMD   ls

Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
 198746 pinatrace.out.ls.normal
 646742 pinatrace.out.ls.packed
 845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
 523197 itrace.out.ls.normal
 2673802 itrace.out.ls.packed
 3196999 total

As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:

#!/usr/bin/python

"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""

import sys
import getopt

def parse(pinatrace):
    f = open(pinatrace, 'r')
    writes = []
    for line in f:
        if "W" in line: # indicates a memory write
            tokens = line.split()
            writes.append(eval(tokens[len(tokens)-1]))
    f.close()
    return writes

def match(writes, itrace):
    f = open(itrace, 'r')
    for line in f:
        if "0x" in line:
            eip = eval(line)
            if eip in writes: # this eip has previously been written
                                  # to, we guess this is the oep
                f.close()
                return eip
    return None

def main():
    # parse command line options
    try:
        opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
    except getopt.error, msg:
        print msg
        print "for help use --help"
        sys.exit(2)
    # process options
    for o, a in opts:
        if o in ("-h", "--help"):
            print __doc__
            sys.exit(0)
    # process arguments

    if len(args) != 2:
        print __doc__
        print "for help use --help"
        sys.exit(2)
    print "parsing", args[0]
    writes = parse(args[0])
    print "done, parsed", len(writes), "memory writes"
    print "looking for hits in", args[1]
    hit = match(writes, args[1])
    if hit == None:
        print "no hits found, the binary doesn't look packed"
    else:
        print "Candidate OEP: 0x%X" % hit

if __name__ == "__main__":
    main()

Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):

reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000

reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed

That’s all folks!

Python XML for Real Men Cheat Sheet

Howdy again,

Now let’s do some serious XML output in Python using cElementTree.

# this is included in Python 2.5+
from xml.etree.cElementTree import ElementTree, Element, dump

# let's create the root element
root = Element("teabag")

# give it a child with an attribute
child1 = Element("spam")
child1.attrib["name"] = "value"
root.append(child1)

# and a child with text content
child2 = Element("eggs")
child2.text = "spam and eggs"
root.append(child2)

# print the whole thing to stdout
dump(root)

# or to a file
ElementTree(root).write("teabag.xml")

See the author’s website for downloads and usage information of cElementTree.

Using this API, I was able to create a 47 Mb XML file in a few minutes, burning roughly 300 Mb of heap space. This XML file represents a graph of graphs, namely the control flow graph of each function of IDA Pro. Here are some screenshots, using yEd for the visualization part:

Python XML Cheat Sheet

[UPDATED] Finally xml.dom.minidom sucks balls, it can burn up to hundreds/gigs of megabytes of sweet memory when working with “large” xml files (10 Mb or more). See this post for a really lightweight implementation.

Howdy,

Here is a quick reference of how to create an XML document and output it in Python.

import xml.dom.minidom

# create the document
doc = xml.dom.minidom.Document()

# populate it with an element
root = doc.createElement("teabag")
doc.appendChild(root)

# time to give some children to the root element, one with an attribute for instance
child1 = doc.createElement("spam")
child1.setAttribute("name", "value")
root.appendChild(child1)

# and another one with some text
child2 = doc.createElement("eggs")
text = doc.createTextNode("spam and eggs!")
child2.appendChild(text)
root.appendChild(child2)

# let's get the output, as a string
print doc.toprettyxml()

# you're supposed to get the following output:
#<?xml version="1.0" ?>
#<teabag>
#    <spam name="value"/>
#    <eggs>
#        spam and eggs!
#    </eggs>
#</teabag>

How nice is that ? Yep, a lot.

Ant Skeleton Build.xml

Here is a quick Ant skeleton build file, based on the Ant manual.

  • Install and configure Ant (you must set a few environment variables)
  • Save the following file as ‘build.xml’ in your base directory (noted as ‘.’).
  • Replace PROJECTNAME with what you want and MAINCLASSNAME with the class that must be launched in your resulting jar (the class with a static void main(String[]) method).
  • Put your source files in ./src, and any external libraries (jar files) in ./lib. Ant will then compile the classes in ./build/classes and store them as a jar file in ./build/jar.
  • Run Ant in the base directory with the command ‘ant’ or ‘ant main’, it will execute the directives in build.xml (it will clean, compile, package and run your program).
<project name="PROJECTNAME" basedir="." default="main">

  <!-- set global properties for this build -->
  <property name="main-class"  value="MAINCLASSNAME"/>
  <property name="src.dir"     value="src"/>
  <property name="build.dir"   value="build"/>
  <property name="classes.dir" value="${build.dir}/classes"/>
  <property name="jar.dir"     value="${build.dir}/jar"/>
  <property name="lib.dir"     value="lib"/>

  <!-- adds every jar in the lib directory to the classpath-->
  <path id="classpath">
  <fileset dir="${lib.dir}" includes="**/*.jar"/>
  </path>

  <target name="clean">
    <delete dir="${build.dir}"/>
  </target>

  <target name="compile">
    <mkdir dir="${classes.dir}"/>
    <javac srcdir="${src.dir}" destdir="${classes.dir}" classpathref="classpath"/>
  </target>

  <target name="jar" depends="compile">
    <mkdir dir="${jar.dir}"/>
    <jar destfile="${jar.dir}/${ant.project.name}.jar" basedir="${classes.dir}">
      <manifest>
        <attribute name="Main-Class" value="${main-class}"/>
      </manifest>
    </jar>
  </target>

  <target name="run" depends="jar">
    <java fork="true" classname="${main-class}">
      <classpath>
        <path refid="classpath"/>
        <path location="${jar.dir}/${ant.project.name}.jar"/>
      </classpath>
    </java>
  </target>

  <target name="clean-build" depends="clean,jar"/>

  <target name="main" depends="clean,run"/>
</project>