Automated Unpacking, Dynamic Binary Instrumentation and You

Howdy fellas,

I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).

Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.

This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.

So let’s take the Linux ls utility as a test binary:

reynaudd@lhs-2:~/test/packed$ cp /bin/ls .

Let’s generate a list of memory references with a pintool:

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48

Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head itrace.out
0x7f313aa1da60
0x7f313aa1da63
0x7f313aa1e070
0x7f313aa1e071
0x7f313aa1e074
0x7f313aa1e076
0x7f313aa1e078
0x7f313aa1e07b
0x7f313aa1e07c
0x7f313aa1e080

Now let’s pack ls with upx and see how it goes:

reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
ls  1/5  [.......................................................]  100.0%
ls  1/5  [******.................................................]   33.8%
ls  1/5  [************...........................................]   42.6%
ls  1/5  [*****************......................................]   45.4%
ls  1/5  [***********************................................]   43.3%
ls  1/5  [*****************************..........................]   45.5%
ls  1/5  [**********************************.....................]   45.8%
ls  1/5  [****************************************...............]   45.5%
ls  1/5  [*********************************************..........]   44.2%
ls  1/5  [***************************************************....]   41.5%
ls  1/5  [*******************************************************]   41.2%
ls  2/5  [.......................................................]  100.0%
ls  2/5  [******.................................................]   36.0%
ls  2/5  [************...........................................]   45.1%
ls  2/5  [*****************......................................]   49.1%
ls  2/5  [***********************................................]   47.4%
ls  2/5  [*****************************..........................]   49.1%
ls  2/5  [**********************************.....................]   49.5%
ls  2/5  [****************************************...............]   49.2%
ls  2/5  [*********************************************..........]   47.5%
ls  2/5  [***************************************************....]   44.4%
ls  2/5  [*******************************************************]   43.8%
ls  3/5  [.......................................................]  100.0%
ls  3/5  [*******************************************************]   26.2%
ls  4/5  [.......................................................]  100.0%
ls  5/5  [.......................................................]  100.0%
ls  5/5  [*******************************************************]   27.7%
 101992 ->     43612   42.76%  linux/ElfAMD   ls

Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
 198746 pinatrace.out.ls.normal
 646742 pinatrace.out.ls.packed
 845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
 523197 itrace.out.ls.normal
 2673802 itrace.out.ls.packed
 3196999 total

As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:

#!/usr/bin/python

"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""

import sys
import getopt

def parse(pinatrace):
    f = open(pinatrace, 'r')
    writes = []
    for line in f:
        if "W" in line: # indicates a memory write
            tokens = line.split()
            writes.append(eval(tokens[len(tokens)-1]))
    f.close()
    return writes

def match(writes, itrace):
    f = open(itrace, 'r')
    for line in f:
        if "0x" in line:
            eip = eval(line)
            if eip in writes: # this eip has previously been written
                                  # to, we guess this is the oep
                f.close()
                return eip
    return None

def main():
    # parse command line options
    try:
        opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
    except getopt.error, msg:
        print msg
        print "for help use --help"
        sys.exit(2)
    # process options
    for o, a in opts:
        if o in ("-h", "--help"):
            print __doc__
            sys.exit(0)
    # process arguments

    if len(args) != 2:
        print __doc__
        print "for help use --help"
        sys.exit(2)
    print "parsing", args[0]
    writes = parse(args[0])
    print "done, parsed", len(writes), "memory writes"
    print "looking for hits in", args[1]
    hit = match(writes, args[1])
    if hit == None:
        print "no hits found, the binary doesn't look packed"
    else:
        print "Candidate OEP: 0x%X" % hit

if __name__ == "__main__":
    main()

Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):

reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000

reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed

That’s all folks!

Advertisements

A Quick Survey on Automatic Unpacking Techniques

This is a non-comprehensive list of papers and tools dealing with automated unpacking. Please let me know if I’ve missed another technique or if I misunderstood any of the techniques below.

Ring0/Ring3 components, using manual unpacking and heuristics

OllyBonE:

OllyBonE (Break on Execution) uses a Windows driver to prevent memory pages from being executed, and an OllyDbg plugin communicating with the driver. As such it is not an automatic unpacker and requires manual tagging of the pages in which the unpacked code is expected to be found.

Technology used: Windows driver to prevent memory page execution, debugger plugin

Handles unknown packers: no.

Drawbacks: requires a priori knowledge of the memory location of the unpacked code, vulnerable to anti-debugging techniques, modification of the integrity of the host operating system due to the driver.

Code Available: yes, http://www.joestewart.org/ollybone/.

Original Site

(Updated) Dream of Every Reverser / Generic Unpacker:

It is a Windows driver used to hook ring 3 memory accesses. It is used in a project called Generic Unpacker by the same author to find the original entrypoint. The tool then tries to find all import references, dumps the file and fixes the imports. It is reported to work against UPX, FSG and AsPack, but not against more complex packers.

Technology used: Windows driver to hook userland memory access

Handles unknown packers: no.

Drawbacks: requires a priori knowledge of the memory location of the unpacked code, modification of the integrity of the host operating system due to the driver.

Code Available: yes, http://deroko.phearless.org/GenericUnpacker.rar.

Original Site

(updated) RL!Depacker

No description for this one, however it looks similar to Dream of Every Reverser / Generic Unpacker.

Code Available: yes,  http://ap0x.jezgra.net/RL!dePacker.rar.

Original Site

(updated) QuickUnpack

Again, no real description, but it looks similar to RL!Depacker and DOER / Generic Unpacker. It is a scriptable engine using a debugging API. It is reported to work against 60+ simple packers.

Code Available: yes, http://www.team-x.ru/guru-exe/?path=Tools/Unpackers/QuickUnpack/

Original Site (in Russian)

Universal PE Unpacker:

This is an IDA Pro plugin, using the IDA Pro Debugger interface. It waits for the packer to call GetProcAddress and then activates single-stepping mode until EIP is in a predefined range (an estimate for the OEP). It only works well against UPX, Morphine, Aspack, FSG and MEW (according to the authors of Renovo).

Technology used: Debugging and heuristics.

Handles unknown packers: no, needs an approximation of the OEP and assumes that the unpacker will call GetProcAddress before calling the original code.

Drawbacks: not fully automatic, very vulnerable to debugger detection, does not necessarily work against all packers or self-modifying code.

Code Available: yes, since IDA Pro 4.9

Original Site

Instruction-level analysis, comparison between written addresses and executed addresses

Renovo:

Built on TEMU (BitBlaze), it uses full system emulation to record memory writes (and mark those memory locations as dirty). Each time a new basic block is executed, if it contains a dirty memory location a hidden layer has been found. Cost: 8 times slower than normal execution. It seems to unpack everything correctly except Armadillon and Obsidium (due to incorrect system emulation ?). It seems to only obtain partial results against Themida with the VM option on.

Technology used: Full system emulation.

Handles unknown packers: yes.

Drawbacks: order of magnitude slowdown, detection of the emulation stage

Code Available: I couldn’t find it.

Original Site, Local Copy

Azure:

Paul Royal’s solution, named after BluePill because it is based on KVM, a Linux-based hypervisor. It uses Intel’s VT extension to trace the target process (at the instruction-level), by setting the trap flag and intercepting the resulting exception. The memory writes are then recorded and compared to the address of the current instruction. According to the paper, it handles every packer correctly (including Armadillo, Obsidium and Themida VM).

Technology used: Hardware assisted virtualization and virtual machine introspection.

Handles unknown packers: yes.

Drawbacks: detection of the hypervisor. Slowdown ?

Code Available: yes, http://blackhat.com/presentations/bh-usa-08/Royal/Royal_Extras.zip.

Original Site, Local Copy

Saffron:

Developed by Danny Quist and Valsmith, a first version uses Intel PIN to dynamically instrument the analyzed code. It actually inserts instructions in the code flow, allowing lightweight fine-grained control (no need for emulation or virtualization), but it modifies the integrity of the packer. A second version modifies the page fault handler of Windows and traps when a written memory page is executed. It has mixed results with Molebox, Themida, Obsidium, and doesn’t handle Armadillo correctly (according to Paul Royal).

Technology used: Dynamic instrumentation, Pagefault handling (with a kernel component in the host operating system).

Handles unknown packers: yes.

Drawbacks: modifies the integrity of the code (with DI) and of the host operating system. It must not work in a virtual machine. The dynamic instrumentation is very slow. The memory monitoring of the pagefault handler is coarse-grained (pages are aligned on a 4k boundary), and therefore some memory access can go unnoticed.

Code Available: dynamic instrumentation available, what about the driver ?

Original Site, Local Copy

(updated) OmniUnpack:

Uses a technique similar to the second version of Saffron: a Windows driver to enforce a W^X policy on memory pages.

Technology used: Pagefault handling  and system call tracing (with a kernel component in the host operating system)

Handles unknown packers: yes.

Drawbacks: modifies the integrity of the host operating system. It must not work in a virtual machine. The memory monitoring of the pagefault handler is coarse-grained, leading to spurious unpacking stages.

Code Available: ?

Original SiteLocal Copy

Pandora’s Bochs:

Developed by Lutz Böhne, it is based on Bochs which is used to monitor memory writes and compare them with branch targets. Interestingly, the assumptions about the program are stated explicitly (which is a GOOD thing) : the unpacking does not involve multiple processes, it does not happen in kernel mode, the unpacked code is reached through a branch instruction (not a fall-through edge), etc… Another interesting point in this approach is that it uses no component in the guest OS (as opposed to Renovo for example), all the information is retrieved from outside the matrix (as with Azure).

Technology used: Full system emulation based on Bochs.

Handles unknown packers: yes.

Drawbacks: As stated in the paper the limitations are speed, compatibility (not all packed samples seemed to run under Bochs), detection of OEP and reconstruction of imports sometimes failed.

Code Available: http://damogran.de/blog/archives/21-To-release,-or-not-to-release-….html

Original Site, Local Copy

Other techniques (comparison with static disassembly or disk image)

Secure and Avanced Unpacking by Sebastien Josse:

The idea developed by Sebastien Josse is to use full system emulation (based on QEMU ?) and to compare the basic blocks that are going to be executed by the virtual CPU with the equivalent address in the file image of the executable. If the memory and the disk version differ, it means that the code has been generated on the fly and therefore a hidden layer has been found. Josse then proposes techniques to rebuild a fully functional executable based on the memory dump. This technique seems to work well (but sometimes requires human intervention) against several packers, including Armadillo, ASProtect, PEtite, UPX, yC…

Technology used:Full system emulation, comparison between memory images and disk images.

Handles unknown packers: yes, manual intervention might be required in some cases.

Drawbacks: slowdown due to the full system emulation, full reconstruction of the unpacked program is not always possible.

Code Available: ?

Original Site

PolyUnpack:

The idea behind PolyUnpack is to address the fundamental nature of unpacking, which is runtime code generation. To identifiy code that has been generated at runtime, PolyUnpack uses a conceptually elegant technique: it first statically analyses the program to build a map of statically accessible code, and then traces the execution of the program. The dynamically intercepted instructions are compared with the static disassembly, if they do not appear in the static disassembly then they have been generated at runtime.

Technology used: comparison between static disassembly and dynamic tracing. The dynamic trace is extracted with single-step debugging APIs.

Handles unknown packers: yes.

Drawbacks: vulnerable to debugger detection. Note that this is a limitation of the implementation, not of the concept.

Code Available: http://polyunpack.cc.gt.atl.ga.us/polyunpack.zip (updated 26/06/2009)

Original Site, Local Copy