The Halting Problem for Reverse Engineers

I often have the feeling that technically savvy people don’t have a very high opinion of academia, and this is particularly true of security people. They have to deal with low-level details such as hardware architecture, operating systems internals (more or less documented), proprietary protocols and data structures, all of which require very specialized knowledge. Logic, theorems and algorithms don’t have a predominant place in that picture.

For the past few years I have been working on these subjects, and found some fundamental theorems to be actually *useful* in understanding the security properties of computer architectures. One of them is, of course, the undecidability of the halting problem. In essence, it says that we can not know if the computation of a program on some input will ever terminate. So what? Who cares if a program terminates, what we want is to find vulnerabilities and unpack malware samples, right?

The importance of the undecidability of the halting problem lies in its generality. In particular we can see Rice’s theorem as a generalization of this result, and to put it very simply, it says that whatever properties of programs you’re interested in, no program can tell if this property holds for every program (i.e. it is undecidable).

This is very bad news for all of us, since basically everything about programs is undecidable. Say that you are interested in finding functions that do out-of-bounds memory writes (or as I said, any other property), Rice’s theorem says that there is no program that will give you a correct answer all the time. You must accept that your program sometimes fails or infinitely loops.

I want to emphasize how bad this is. Do not let the terminology used in the theorems confuse you. In particular, the notions of input, output, and function computed by a program do not map nicely to binaries. An output is anything that gets modified by your program — any register or memory location, as soon as it is touched by an instruction, is an output. And basically, everything about outputs is undecidable. As a consequence, simple tasks such as disassembling are undecidable.

For instance, take this seemingly innocent indirect jump:

jmp [eax]

If eax is an output of instructions before it, no luck, its value is undecidable. You can run the program, write the value down, and assume it will not change, but you have no guarantee that it will not change at a given date. Undecidable. You could argue that eax can only take a finite number of values, and hence disassembling is still possible, just very intractable. But that would be without counting on self-modifying code. SMC (think packers) is the scourge of disassemblers because it gives the ability to transfer control to an output. Since I can’t decide the value of the output, I can’t disassemble.

To sum things up, here are a few direct consequences of the undecidability of the halting problem:

  1. you can’t decide the target of indirect jumps, reads and writes
  2. you can not decide if a particular memory address is code, data, or both
  3. you can’t decide values written in memory
  4. you can’t decide the numbers of occurrences of loops
  5. you can’t decide if control flow can reach a given instruction
  6. whatever you see in a given run can change arbitrarily in another run
  7. disassembling is undecidable
  8. unpacking is undecidable

I will leave how all this led to the bad habit of relying on “heuristics” to a further post. Stay classy!

Advertisement

The Sad State of Computer Security

Current computer systems have very serious, fundamental problems. And these problems can be technically exploited by attackers to run arbitrary code on our machines. So I guess we can all agree that the situation right now is somewhere between very bad and super bad. What is even more surprising is our reaction to this situation: we focus on preventing the technical exploitations of the flaws, and we leave the underlying problems intact. This leads to absurd “solutions” such as ASLR, which make it harder to develop reliable exploits and leave the initial problem (memory corruption) untouched. In a way, it’s like adding a bulletproof jacket to a dead body: sure the jacket might stop some bullets, but the body was dead in the first place. There are some really smart guys out there in the industry, and they discuss how to make really sophisticated bulletproof vests and what their weaknesses are, but nobody seems to notice that the body they’re trying to protect is dead anyway.

The fundamental problems that I mentioned earlier are that we don’t know how to produce programs without vulnerabilities, and we don’t know how to analyze compiled programs.

  • we don’t know how to produce programs without vulnerabilities: or rather, we know how to avoid large classes of vulnerabilities such as memory corruption bugs with type-safe languages but we prefer sticking with unsafe languages such as C. “Hey that’s crazy, why is that?”, you wonder. The only reason I can see is performance: right now nobody rivals with the speed and memory usage of C/C++ programs. In my opinion, it’s just a matter of time before new languages beat the performance of C (yes, you can go faster-than-C). And hopefully, we’ll forget about memory corruption bugs altogether. Note that it will still be a long way from producing *correct* programs, but that will already be an important milestone.
  • we don’t know how to analyze compiled programs, or rather we know that compiled programs are impossible to analyze. I already explained that bit in a post about malware analysis. The good side is that if we wanted, we could make programs statically analyzable (security-wise). I’m talking about binaries here, and this is no science fiction: just check out Google NaCl and the permission system on Android devices. We just have to decide that we want this kind of binaries to become standard desktop executables, and that would partly free us from the dependance to AV signatures updates.

Deepsec slides and tool releases

Deepest apologies to Edward Hopper

I’m writing this from deepsec, where I just finished my talk [slides] about dynamic instrumentation. It was a wrap up of what I did last year with Pin (malware analysis, unpacking) and a Javascript deobfuscator I didn’t blog about.

Some of you might be happy to know that the tools are now available at Google code:

  • Crême Brûlée: Javascript deobfuscation using dynamic instrumentation
  • Tarte Tatin Tools: my set of pintools for tracing and unpacking (including an IDA Python script)

As you might guess, the tools are more prototypes than anything else, and I advise you to use them only if you feel really adventurous.

Puppet Master… or how to poison Red Pill

Today we’ve been working with Vincent Mussot on a way to prevent VMM detection with red pill. We implemented it as a simple pintool:

// see the PIN manual for compilation instructions
#include <iostream>
#include <string>
#include "pin.H"

unsigned long int fake_idt = 0;

VOID poison(unsigned int* m) {
    *(unsigned int*)((*m)+2) = fake_idt;
}

VOID Instruction(INS ins, VOID *v) {
    string dis = INS_Disassemble(ins);
    if (dis.substr(0,4)=="sidt") {       
        unsigned int rpill = INS_Address(ins);
        unsigned int* m = (unsigned int *)(((char *)rpill)+3);
        INS_InsertCall(ins, IPOINT_AFTER, (AFUNPTR)poison, IARG_PTR, m, IARG_END);
    }   
}

// omitted for brevity
// parses the command line and sets fakeidt
int parseArguments(int argc, char **argv);

int main(int argc, char * argv[]) {
    if(parseArguments(argc, argv)) {
        printf("Usage: pin -f <toolname> <fake_idt> -- <binary> [arguments]\n");
        exit(1);
    }
    PIN_Init(argc, argv);
    INS_AddInstrumentFunction(Instruction, 0);
    PIN_StartProgram();   
    return 0;
}

Definitely not easy to read, but we followed the red pill coding standards ;) It waits for red pill to execute the SIDT instruction and then overwrites the memory location containing the value of the interrupt descriptor table register (IDTR) with an arbitrary value, given on the command line.

> redpill.exe
idt base: 0xff03f400
Inside Matrix!

> pin -t puppetmaster.dll 0x2501 -- redpill.exe
idt base: 0x2501
Not in Matrix.

Automated Unpacking, Dynamic Binary Instrumentation and You

Howdy fellas,

I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).

Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.

This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.

So let’s take the Linux ls utility as a test binary:

reynaudd@lhs-2:~/test/packed$ cp /bin/ls .

Let’s generate a list of memory references with a pintool:

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48

Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head itrace.out
0x7f313aa1da60
0x7f313aa1da63
0x7f313aa1e070
0x7f313aa1e071
0x7f313aa1e074
0x7f313aa1e076
0x7f313aa1e078
0x7f313aa1e07b
0x7f313aa1e07c
0x7f313aa1e080

Now let’s pack ls with upx and see how it goes:

reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
ls  1/5  [.......................................................]  100.0%
ls  1/5  [******.................................................]   33.8%
ls  1/5  [************...........................................]   42.6%
ls  1/5  [*****************......................................]   45.4%
ls  1/5  [***********************................................]   43.3%
ls  1/5  [*****************************..........................]   45.5%
ls  1/5  [**********************************.....................]   45.8%
ls  1/5  [****************************************...............]   45.5%
ls  1/5  [*********************************************..........]   44.2%
ls  1/5  [***************************************************....]   41.5%
ls  1/5  [*******************************************************]   41.2%
ls  2/5  [.......................................................]  100.0%
ls  2/5  [******.................................................]   36.0%
ls  2/5  [************...........................................]   45.1%
ls  2/5  [*****************......................................]   49.1%
ls  2/5  [***********************................................]   47.4%
ls  2/5  [*****************************..........................]   49.1%
ls  2/5  [**********************************.....................]   49.5%
ls  2/5  [****************************************...............]   49.2%
ls  2/5  [*********************************************..........]   47.5%
ls  2/5  [***************************************************....]   44.4%
ls  2/5  [*******************************************************]   43.8%
ls  3/5  [.......................................................]  100.0%
ls  3/5  [*******************************************************]   26.2%
ls  4/5  [.......................................................]  100.0%
ls  5/5  [.......................................................]  100.0%
ls  5/5  [*******************************************************]   27.7%
 101992 ->     43612   42.76%  linux/ElfAMD   ls

Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
 198746 pinatrace.out.ls.normal
 646742 pinatrace.out.ls.packed
 845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
 523197 itrace.out.ls.normal
 2673802 itrace.out.ls.packed
 3196999 total

As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:

#!/usr/bin/python

"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""

import sys
import getopt

def parse(pinatrace):
    f = open(pinatrace, 'r')
    writes = []
    for line in f:
        if "W" in line: # indicates a memory write
            tokens = line.split()
            writes.append(eval(tokens[len(tokens)-1]))
    f.close()
    return writes

def match(writes, itrace):
    f = open(itrace, 'r')
    for line in f:
        if "0x" in line:
            eip = eval(line)
            if eip in writes: # this eip has previously been written
                                  # to, we guess this is the oep
                f.close()
                return eip
    return None

def main():
    # parse command line options
    try:
        opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
    except getopt.error, msg:
        print msg
        print "for help use --help"
        sys.exit(2)
    # process options
    for o, a in opts:
        if o in ("-h", "--help"):
            print __doc__
            sys.exit(0)
    # process arguments

    if len(args) != 2:
        print __doc__
        print "for help use --help"
        sys.exit(2)
    print "parsing", args[0]
    writes = parse(args[0])
    print "done, parsed", len(writes), "memory writes"
    print "looking for hits in", args[1]
    hit = match(writes, args[1])
    if hit == None:
        print "no hits found, the binary doesn't look packed"
    else:
        print "Candidate OEP: 0x%X" % hit

if __name__ == "__main__":
    main()

Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):

reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000

reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed

That’s all folks!

A First Look at Google Native Client’s Inner Sandbox

Google has recently announced a new project to allow web applications to be programmed in x86. Yep, seriously.

Doing so securely is quite a challenge. x86 machine code has proved rather malware-friendly because it is really permissive and mixing up code and data complicates seriously static analysis. Since Google’s objective is to speed up web apps, they’re not going to use emulation to ensure that the code does not perform unauthorised calls, they rather chose to enforce some constraints on the code with a static verification algorithm. It makes sense, but doing so correctly is tricky.

The Inner Sandbox

The key component is called the inner sandbox, where code can’t execute sensitive operations such as system calls. To ensure statically that the code will stay in the inner sandbox, the following constraints are checked:

  • C1. Once loaded into the memory, the binary is not writable,
    enforced by OS-level protection mechanisms during execution.
  • C2. The binary is statically linked at a start address of zero,
    with the first byte of text at 64K.
  • C3. All indirect control transfers use a nacljmp pseudoinstruction
  • C4. The binary is padded up to the nearest page with at least
    one hlt instruction (0xf4).
  • C5. The binary contains no instructions or pseudo-instructions
    overlapping a 32-byte boundary.
  • C6. All valid instruction addresses are reachable by a fallthrough
    disassembly that starts at the load (base) address.
  • C7. All direct control transfers target valid instructions.

C1 means no self-modifying code, and relies on the segment permissions (as you can see in the screenshot, the .text section is not writable). segments1

To sum things up, the purpose of these constraints is to ensure the following properties:

  • everything in the code segment can be reliably disassembled (so that illegal instructions can be detected statically)
  • if you jump somewhere, it must be in something that has already been disassembled

These properties, if they are enforced correctly, ensure that you can’t execute arbitrary code. Now the problem is that enforcing these properties correctly is quite hard. For example, a typical static analysis problem is: does a function return to the caller or somewhere else ? It is hard to ensure that the return address has not been modified dynamically on the stack. To bypass this problem, NaCl does not allow the return instruction, but rather uses…

The nacljmp pseudo-instruction

It consists in a mask and a jump:

and %eax, 0xffffffe0
jmp *(%eax)

It puzzled me for some time. The purpose of this construct is to enable jumping to an unknown location, by ensuring that the target is a 0 mod 32 address. Constraint C5 ensures that every 0 mod 32 address is a valid target (= it is not in the middle of an instruction), and therefore it has already been disassembled. So we don’t know statically where we’re jumping, but we know there’s no illegal instruction there.

This indirect jumping stuff sounds juicy enough, and it didn’t last long until somebody found an exploit for it.

“The validator was checking for calls through
registers, as in this two-byte call:

 83 e2 e0                and    $0xffffffe0,%edx
 ff d2                   call   %edx

but erroneously permitting similar calls through memory using this
addressing mode:

 83 e2 e0                and    $0xffffffe0,%edx
 ff 12                   call   *(%edx)"

Ironically enough, the example given in the research paper used memory addressing but also stated “we disallow memory addressing modes on indirect jmp and call instructions“.

We can note the position of Google with this project: they are releasing a research paper and the source code of a very early, experimental project to share with the security and research communities, and the developers can also be reached on the discussion group. They encourage people to break the sandbox and credit the author of the first exploit. Ain’t it cool ?

Ruxcon08 Cover Up

Hi folks,

I just come back from Ruxcon08 in Sydney, a really nice security conference with lots of high-level talks, an excellent venue and super-duper ambiance. Thanks to everyone for the positive feedback on my presentation on GPU Powered Malware (slides here).

My personal highlights are:

Silvio Cesare, Security Applications for Emulation. A really good talk dealing in part with automated unpacking, tracing, dynamic taint analysis and symbolic execution. All of which are the next step in malware analysis (me thinks). Silvio also produced a quite amazing memory bug checker for the Linux kernel.

– Nishad Herath, Now you see it, now you don’t! – Obfuscation ’08 style… He introduced the new trends that we see in obfuscation (not malware-specific) like code fragmentation and instruction virtualisation. Too bad he was (intendedly ?) light on details on the use of abstract interpretation and trace analyses. Me wants more !

Though not related with what I do, I also enjoyed Browser Rider by Ben Mossé (you really have to see a Javascript tunnel between a rogue server and a browser in action to get the full flavour) and Intelligent Web Fuzzing by Mr and Mrs Archibald. Instrumenting PHP, MySQL and Bash with all that hooks looks definitely hairy ! Oh yeah and a special mention for Netscreen of the Dead by Graeme Neilson – this guy is getting Zombie-creepiness and gore to the next level.

Well see you next year I guess ;)

Links:

World of Warcraft – Scan.dll is not a Virus

Howdy,

As I logged on today on WoW, my antivirus raised a virus alert for the file Scan.dll in WoW’s installation folder. I am using Avast 4.8 Home Edition (Virus Database 081028-0 dating from today). It is probably a false positive, but it is also possible that my WoW install has been infected with something (since some malware specifically targets WoW accounts). Since I don’t want to lose all my precious gold, I preferred checking a few things before logging in.

Here is a VirusTotal scan of the suspicious file Scan.dll: http://www.virustotal.com/fr/analisis/2b23aa39412bdfffdc8e8f34215de118. Here is what we can learn from this:

  • only 3/36 antiviruses flag scan.dll as suspicious (which is low but still more than one)
  • all raise generic alerts such as “suspicious file” and “Win32:Trojan-gen {Other}”. In plain English, this means “I have no idea what this file does, however it is protected against analysis and I am better off raising a scary alert”.
  • according to multiple tools the file is packed with UPX.

Hopefully, UPX is a really weak protection and the official release can unpack the file for us. So, all we have to do is download the latest UPX release, install it and unpack Scan.dll:

D:\test\upx303w>upx -d Scan.dll -o UnpackedScan.dll
                       Ultimate Packer for eXecutables
                     Copyright (C) 1996 - 2008UPX 3.03w
        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
        90372 <- 39684    43.91%    win32/pe     UnpackedScan.dll

Unpacked 1 file.

Now that we have an unpacked Scan.dll, let’s check again what Virus Total has to tell us: http://www.virustotal.com/fr/analisis/fdfe09829e1dd865144d6f80313209c4

This time, we have 0/36 detection rate, and the file does not seem to include additional layers of protection (which means it can now be fully analysed by antivirus engines).

Conclusion: it was the UPX protection layer that caused the new Avast and eSafe alert. Therefore this alert is a false positive, Scan.dll is not infected and you can go back to murloc farming. What is strange however is that UPX is a really common packer, I’m wondering exactly what part of it is confusing Avast’s heuristics.

Ok I didn’t understand anything above, what should I do/know ?

  • at the time of writing, Scan.dll is not a virus
  • the alert only comes from Avast and related antiviruses
  • when Avast asks you what to do, choose Nothing. On the next virus database update, the alert will probably be removed.
  • if you chose Quarantine, try to restore the file since WoW will probably complain that you messed with one of its DLLs.

Here is the post dealing with this issue on the official forums.

A Quick Survey on Automatic Unpacking Techniques

This is a non-comprehensive list of papers and tools dealing with automated unpacking. Please let me know if I’ve missed another technique or if I misunderstood any of the techniques below.

Ring0/Ring3 components, using manual unpacking and heuristics

OllyBonE:

OllyBonE (Break on Execution) uses a Windows driver to prevent memory pages from being executed, and an OllyDbg plugin communicating with the driver. As such it is not an automatic unpacker and requires manual tagging of the pages in which the unpacked code is expected to be found.

Technology used: Windows driver to prevent memory page execution, debugger plugin

Handles unknown packers: no.

Drawbacks: requires a priori knowledge of the memory location of the unpacked code, vulnerable to anti-debugging techniques, modification of the integrity of the host operating system due to the driver.

Code Available: yes, http://www.joestewart.org/ollybone/.

Original Site

(Updated) Dream of Every Reverser / Generic Unpacker:

It is a Windows driver used to hook ring 3 memory accesses. It is used in a project called Generic Unpacker by the same author to find the original entrypoint. The tool then tries to find all import references, dumps the file and fixes the imports. It is reported to work against UPX, FSG and AsPack, but not against more complex packers.

Technology used: Windows driver to hook userland memory access

Handles unknown packers: no.

Drawbacks: requires a priori knowledge of the memory location of the unpacked code, modification of the integrity of the host operating system due to the driver.

Code Available: yes, http://deroko.phearless.org/GenericUnpacker.rar.

Original Site

(updated) RL!Depacker

No description for this one, however it looks similar to Dream of Every Reverser / Generic Unpacker.

Code Available: yes,  http://ap0x.jezgra.net/RL!dePacker.rar.

Original Site

(updated) QuickUnpack

Again, no real description, but it looks similar to RL!Depacker and DOER / Generic Unpacker. It is a scriptable engine using a debugging API. It is reported to work against 60+ simple packers.

Code Available: yes, http://www.team-x.ru/guru-exe/?path=Tools/Unpackers/QuickUnpack/

Original Site (in Russian)

Universal PE Unpacker:

This is an IDA Pro plugin, using the IDA Pro Debugger interface. It waits for the packer to call GetProcAddress and then activates single-stepping mode until EIP is in a predefined range (an estimate for the OEP). It only works well against UPX, Morphine, Aspack, FSG and MEW (according to the authors of Renovo).

Technology used: Debugging and heuristics.

Handles unknown packers: no, needs an approximation of the OEP and assumes that the unpacker will call GetProcAddress before calling the original code.

Drawbacks: not fully automatic, very vulnerable to debugger detection, does not necessarily work against all packers or self-modifying code.

Code Available: yes, since IDA Pro 4.9

Original Site

Instruction-level analysis, comparison between written addresses and executed addresses

Renovo:

Built on TEMU (BitBlaze), it uses full system emulation to record memory writes (and mark those memory locations as dirty). Each time a new basic block is executed, if it contains a dirty memory location a hidden layer has been found. Cost: 8 times slower than normal execution. It seems to unpack everything correctly except Armadillon and Obsidium (due to incorrect system emulation ?). It seems to only obtain partial results against Themida with the VM option on.

Technology used: Full system emulation.

Handles unknown packers: yes.

Drawbacks: order of magnitude slowdown, detection of the emulation stage

Code Available: I couldn’t find it.

Original Site, Local Copy

Azure:

Paul Royal’s solution, named after BluePill because it is based on KVM, a Linux-based hypervisor. It uses Intel’s VT extension to trace the target process (at the instruction-level), by setting the trap flag and intercepting the resulting exception. The memory writes are then recorded and compared to the address of the current instruction. According to the paper, it handles every packer correctly (including Armadillo, Obsidium and Themida VM).

Technology used: Hardware assisted virtualization and virtual machine introspection.

Handles unknown packers: yes.

Drawbacks: detection of the hypervisor. Slowdown ?

Code Available: yes, http://blackhat.com/presentations/bh-usa-08/Royal/Royal_Extras.zip.

Original Site, Local Copy

Saffron:

Developed by Danny Quist and Valsmith, a first version uses Intel PIN to dynamically instrument the analyzed code. It actually inserts instructions in the code flow, allowing lightweight fine-grained control (no need for emulation or virtualization), but it modifies the integrity of the packer. A second version modifies the page fault handler of Windows and traps when a written memory page is executed. It has mixed results with Molebox, Themida, Obsidium, and doesn’t handle Armadillo correctly (according to Paul Royal).

Technology used: Dynamic instrumentation, Pagefault handling (with a kernel component in the host operating system).

Handles unknown packers: yes.

Drawbacks: modifies the integrity of the code (with DI) and of the host operating system. It must not work in a virtual machine. The dynamic instrumentation is very slow. The memory monitoring of the pagefault handler is coarse-grained (pages are aligned on a 4k boundary), and therefore some memory access can go unnoticed.

Code Available: dynamic instrumentation available, what about the driver ?

Original Site, Local Copy

(updated) OmniUnpack:

Uses a technique similar to the second version of Saffron: a Windows driver to enforce a W^X policy on memory pages.

Technology used: Pagefault handling  and system call tracing (with a kernel component in the host operating system)

Handles unknown packers: yes.

Drawbacks: modifies the integrity of the host operating system. It must not work in a virtual machine. The memory monitoring of the pagefault handler is coarse-grained, leading to spurious unpacking stages.

Code Available: ?

Original SiteLocal Copy

Pandora’s Bochs:

Developed by Lutz Böhne, it is based on Bochs which is used to monitor memory writes and compare them with branch targets. Interestingly, the assumptions about the program are stated explicitly (which is a GOOD thing) : the unpacking does not involve multiple processes, it does not happen in kernel mode, the unpacked code is reached through a branch instruction (not a fall-through edge), etc… Another interesting point in this approach is that it uses no component in the guest OS (as opposed to Renovo for example), all the information is retrieved from outside the matrix (as with Azure).

Technology used: Full system emulation based on Bochs.

Handles unknown packers: yes.

Drawbacks: As stated in the paper the limitations are speed, compatibility (not all packed samples seemed to run under Bochs), detection of OEP and reconstruction of imports sometimes failed.

Code Available: http://damogran.de/blog/archives/21-To-release,-or-not-to-release-….html

Original Site, Local Copy

Other techniques (comparison with static disassembly or disk image)

Secure and Avanced Unpacking by Sebastien Josse:

The idea developed by Sebastien Josse is to use full system emulation (based on QEMU ?) and to compare the basic blocks that are going to be executed by the virtual CPU with the equivalent address in the file image of the executable. If the memory and the disk version differ, it means that the code has been generated on the fly and therefore a hidden layer has been found. Josse then proposes techniques to rebuild a fully functional executable based on the memory dump. This technique seems to work well (but sometimes requires human intervention) against several packers, including Armadillo, ASProtect, PEtite, UPX, yC…

Technology used:Full system emulation, comparison between memory images and disk images.

Handles unknown packers: yes, manual intervention might be required in some cases.

Drawbacks: slowdown due to the full system emulation, full reconstruction of the unpacked program is not always possible.

Code Available: ?

Original Site

PolyUnpack:

The idea behind PolyUnpack is to address the fundamental nature of unpacking, which is runtime code generation. To identifiy code that has been generated at runtime, PolyUnpack uses a conceptually elegant technique: it first statically analyses the program to build a map of statically accessible code, and then traces the execution of the program. The dynamically intercepted instructions are compared with the static disassembly, if they do not appear in the static disassembly then they have been generated at runtime.

Technology used: comparison between static disassembly and dynamic tracing. The dynamic trace is extracted with single-step debugging APIs.

Handles unknown packers: yes.

Drawbacks: vulnerable to debugger detection. Note that this is a limitation of the implementation, not of the concept.

Code Available: http://polyunpack.cc.gt.atl.ga.us/polyunpack.zip (updated 26/06/2009)

Original Site, Local Copy