The Halting Problem for Reverse Engineers

I often have the feeling that technically savvy people don’t have a very high opinion of academia, and this is particularly true of security people. They have to deal with low-level details such as hardware architecture, operating systems internals (more or less documented), proprietary protocols and data structures, all of which require very specialized knowledge. Logic, theorems and algorithms don’t have a predominant place in that picture.

For the past few years I have been working on these subjects, and found some fundamental theorems to be actually *useful* in understanding the security properties of computer architectures. One of them is, of course, the undecidability of the halting problem. In essence, it says that we can not know if the computation of a program on some input will ever terminate. So what? Who cares if a program terminates, what we want is to find vulnerabilities and unpack malware samples, right?

The importance of the undecidability of the halting problem lies in its generality. In particular we can see Rice’s theorem as a generalization of this result, and to put it very simply, it says that whatever properties of programs you’re interested in, no program can tell if this property holds for every program (i.e. it is undecidable).

This is very bad news for all of us, since basically everything about programs is undecidable. Say that you are interested in finding functions that do out-of-bounds memory writes (or as I said, any other property), Rice’s theorem says that there is no program that will give you a correct answer all the time. You must accept that your program sometimes fails or infinitely loops.

I want to emphasize how bad this is. Do not let the terminology used in the theorems confuse you. In particular, the notions of input, output, and function computed by a program do not map nicely to binaries. An output is anything that gets modified by your program — any register or memory location, as soon as it is touched by an instruction, is an output. And basically, everything about outputs is undecidable. As a consequence, simple tasks such as disassembling are undecidable.

For instance, take this seemingly innocent indirect jump:

jmp [eax]

If eax is an output of instructions before it, no luck, its value is undecidable. You can run the program, write the value down, and assume it will not change, but you have no guarantee that it will not change at a given date. Undecidable. You could argue that eax can only take a finite number of values, and hence disassembling is still possible, just very intractable. But that would be without counting on self-modifying code. SMC (think packers) is the scourge of disassemblers because it gives the ability to transfer control to an output. Since I can’t decide the value of the output, I can’t disassemble.

To sum things up, here are a few direct consequences of the undecidability of the halting problem:

  1. you can’t decide the target of indirect jumps, reads and writes
  2. you can not decide if a particular memory address is code, data, or both
  3. you can’t decide values written in memory
  4. you can’t decide the numbers of occurrences of loops
  5. you can’t decide if control flow can reach a given instruction
  6. whatever you see in a given run can change arbitrarily in another run
  7. disassembling is undecidable
  8. unpacking is undecidable

I will leave how all this led to the bad habit of relying on “heuristics” to a further post. Stay classy!

The Sad State of Computer Security

Current computer systems have very serious, fundamental problems. And these problems can be technically exploited by attackers to run arbitrary code on our machines. So I guess we can all agree that the situation right now is somewhere between very bad and super bad. What is even more surprising is our reaction to this situation: we focus on preventing the technical exploitations of the flaws, and we leave the underlying problems intact. This leads to absurd “solutions” such as ASLR, which make it harder to develop reliable exploits and leave the initial problem (memory corruption) untouched. In a way, it’s like adding a bulletproof jacket to a dead body: sure the jacket might stop some bullets, but the body was dead in the first place. There are some really smart guys out there in the industry, and they discuss how to make really sophisticated bulletproof vests and what their weaknesses are, but nobody seems to notice that the body they’re trying to protect is dead anyway.

The fundamental problems that I mentioned earlier are that we don’t know how to produce programs without vulnerabilities, and we don’t know how to analyze compiled programs.

  • we don’t know how to produce programs without vulnerabilities: or rather, we know how to avoid large classes of vulnerabilities such as memory corruption bugs with type-safe languages but we prefer sticking with unsafe languages such as C. “Hey that’s crazy, why is that?”, you wonder. The only reason I can see is performance: right now nobody rivals with the speed and memory usage of C/C++ programs. In my opinion, it’s just a matter of time before new languages beat the performance of C (yes, you can go faster-than-C). And hopefully, we’ll forget about memory corruption bugs altogether. Note that it will still be a long way from producing *correct* programs, but that will already be an important milestone.
  • we don’t know how to analyze compiled programs, or rather we know that compiled programs are impossible to analyze. I already explained that bit in a post about malware analysis. The good side is that if we wanted, we could make programs statically analyzable (security-wise). I’m talking about binaries here, and this is no science fiction: just check out Google NaCl and the permission system on Android devices. We just have to decide that we want this kind of binaries to become standard desktop executables, and that would partly free us from the dependance to AV signatures updates.

Deepsec slides and tool releases

Deepest apologies to Edward Hopper

I’m writing this from deepsec, where I just finished my talk [slides] about dynamic instrumentation. It was a wrap up of what I did last year with Pin (malware analysis, unpacking) and a Javascript deobfuscator I didn’t blog about.

Some of you might be happy to know that the tools are now available at Google code:

  • Crême Brûlée: Javascript deobfuscation using dynamic instrumentation
  • Tarte Tatin Tools: my set of pintools for tracing and unpacking (including an IDA Python script)

As you might guess, the tools are more prototypes than anything else, and I advise you to use them only if you feel really adventurous.

Puppet Master… or how to poison Red Pill

Today we’ve been working with Vincent Mussot on a way to prevent VMM detection with red pill. We implemented it as a simple pintool:

// see the PIN manual for compilation instructions
#include <iostream>
#include <string>
#include "pin.H"

unsigned long int fake_idt = 0;

VOID poison(unsigned int* m) {
    *(unsigned int*)((*m)+2) = fake_idt;
}

VOID Instruction(INS ins, VOID *v) {
    string dis = INS_Disassemble(ins);
    if (dis.substr(0,4)=="sidt") {       
        unsigned int rpill = INS_Address(ins);
        unsigned int* m = (unsigned int *)(((char *)rpill)+3);
        INS_InsertCall(ins, IPOINT_AFTER, (AFUNPTR)poison, IARG_PTR, m, IARG_END);
    }   
}

// omitted for brevity
// parses the command line and sets fakeidt
int parseArguments(int argc, char **argv);

int main(int argc, char * argv[]) {
    if(parseArguments(argc, argv)) {
        printf("Usage: pin -f <toolname> <fake_idt> -- <binary> [arguments]\n");
        exit(1);
    }
    PIN_Init(argc, argv);
    INS_AddInstrumentFunction(Instruction, 0);
    PIN_StartProgram();   
    return 0;
}

Definitely not easy to read, but we followed the red pill coding standards ;) It waits for red pill to execute the SIDT instruction and then overwrites the memory location containing the value of the interrupt descriptor table register (IDTR) with an arbitrary value, given on the command line.

> redpill.exe
idt base: 0xff03f400
Inside Matrix!

> pin -t puppetmaster.dll 0x2501 -- redpill.exe
idt base: 0x2501
Not in Matrix.

Automated Unpacking, Dynamic Binary Instrumentation and You

Howdy fellas,

I was thinking about Google Native Client and their sandbox model, and all of a sudden I realised that you could achieve the same level of control with dynamic binary instrumentation. This is the kind of moment where you think you have a genius idea, just to realise that lots of other people had it before you (including at least Skape, Danny Quist and Ivanlef0u).

Anyway it sounded fun, so here is my toy experiment: finding the original entrypoint of packed executables in 60 lines of Python.

This project uses PIN for the analysis of the file, and even better: we only use the examples in the user guide. I’ll explain how PIN works another day, if you don’t mind. The point of interest is that I apply the usual technique for generic unpacking (record memory writes, compare that with the executed addresses) but with dynamic instrumentation rather than emulation or technical approaches like page permissions and such. For more info on these approaches, see my outrageously brilliant post here.

So let’s take the Linux ls utility as a test binary:

reynaudd@lhs-2:~/test/packed$ cp /bin/ls .

Let’s generate a list of memory references with a pintool:

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head pinatrace.out
0x7fb878380a63: W 0x7fff80599d98
0x7fb878381070: W 0x7fff80599d90
0x7fb878381074: W 0x7fff80599d88
0x7fb878381076: W 0x7fff80599d80
0x7fb87838107b: W 0x7fff80599d78
0x7fb878381092: R 0x7fb87859bbe0
0x7fb87838109c: R 0x7fb87859bfb8
0x7fb8783810a3: W 0x7fb87859bda8
0x7fb8783810aa: W 0x7fb87859c528
0x7fb8783810b1: R 0x7fb87859be48

Then, let’s generate a hit trace with another pintool (it just prints the address of executed instructions, which is fine for us):

reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls
reynaudd@lhs-2:~/test/packed$ head itrace.out
0x7f313aa1da60
0x7f313aa1da63
0x7f313aa1e070
0x7f313aa1e071
0x7f313aa1e074
0x7f313aa1e076
0x7f313aa1e078
0x7f313aa1e07b
0x7f313aa1e07c
0x7f313aa1e080

Now let’s pack ls with upx and see how it goes:

reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.normal > /dev/null
reynaudd@lhs-2:~/test/packed$ upx ls
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
ls  1/5  [.......................................................]  100.0%
ls  1/5  [******.................................................]   33.8%
ls  1/5  [************...........................................]   42.6%
ls  1/5  [*****************......................................]   45.4%
ls  1/5  [***********************................................]   43.3%
ls  1/5  [*****************************..........................]   45.5%
ls  1/5  [**********************************.....................]   45.8%
ls  1/5  [****************************************...............]   45.5%
ls  1/5  [*********************************************..........]   44.2%
ls  1/5  [***************************************************....]   41.5%
ls  1/5  [*******************************************************]   41.2%
ls  2/5  [.......................................................]  100.0%
ls  2/5  [******.................................................]   36.0%
ls  2/5  [************...........................................]   45.1%
ls  2/5  [*****************......................................]   49.1%
ls  2/5  [***********************................................]   47.4%
ls  2/5  [*****************************..........................]   49.1%
ls  2/5  [**********************************.....................]   49.5%
ls  2/5  [****************************************...............]   49.2%
ls  2/5  [*********************************************..........]   47.5%
ls  2/5  [***************************************************....]   44.4%
ls  2/5  [*******************************************************]   43.8%
ls  3/5  [.......................................................]  100.0%
ls  3/5  [*******************************************************]   26.2%
ls  4/5  [.......................................................]  100.0%
ls  5/5  [.......................................................]  100.0%
ls  5/5  [*******************************************************]   27.7%
 101992 ->     43612   42.76%  linux/ElfAMD   ls

Packed 1 file.
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/pinatrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./ls > /dev/null
reynaudd@lhs-2:~/test/packed$ mv itrace.out itrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ mv pinatrace.out pinatrace.out.ls.packed
reynaudd@lhs-2:~/test/packed$ wc -l pinatrace.out.ls.*
 198746 pinatrace.out.ls.normal
 646742 pinatrace.out.ls.packed
 845488 total
reynaudd@lhs-2:~/test/packed$ wc -l itrace.out.ls.*
 523197 itrace.out.ls.normal
 2673802 itrace.out.ls.packed
 3196999 total

As you can see, PIN works surprisingly well on a packed executable (upx is quite analysis friendly though). The stats are here to confirm the intuition: there are more instructions executed in the packed executable than in the normal executable, and there are more memory reads and writes. Now let’s roll some highly unoptimised python code to match the memory writes and the hit trace:

#!/usr/bin/python

"""Usage: python tracesurfer.py <pinatrace.out file> <itrace.out file>"""

import sys
import getopt

def parse(pinatrace):
    f = open(pinatrace, 'r')
    writes = []
    for line in f:
        if "W" in line: # indicates a memory write
            tokens = line.split()
            writes.append(eval(tokens[len(tokens)-1]))
    f.close()
    return writes

def match(writes, itrace):
    f = open(itrace, 'r')
    for line in f:
        if "0x" in line:
            eip = eval(line)
            if eip in writes: # this eip has previously been written
                                  # to, we guess this is the oep
                f.close()
                return eip
    return None

def main():
    # parse command line options
    try:
        opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
    except getopt.error, msg:
        print msg
        print "for help use --help"
        sys.exit(2)
    # process options
    for o, a in opts:
        if o in ("-h", "--help"):
            print __doc__
            sys.exit(0)
    # process arguments

    if len(args) != 2:
        print __doc__
        print "for help use --help"
        sys.exit(2)
    print "parsing", args[0]
    writes = parse(args[0])
    print "done, parsed", len(writes), "memory writes"
    print "looking for hits in", args[1]
    hit = match(writes, args[1])
    if hit == None:
        print "no hits found, the binary doesn't look packed"
    else:
        print "Candidate OEP: 0x%X" % hit

if __name__ == "__main__":
    main()

Finally, let’s execute it with the two generated files for our packed executable (and the normal executable, just to check if everything is fine):

reynaudd@lhs-2:~/test/packed$ ./tracesurfer2.py pinatrace.out.ls.packed itrace.out.ls.packed
parsing pinatrace.out.ls.packed
done, parsed 208638 memory writes
looking for hits in itrace.out.ls.packed
Candidate OEP: 0x129000

reynaudd@lhs-2:~/test/packed$ ./tracesurfer.py pinatrace.out.ls.normal itrace.out.ls.normal
parsing pinatrace.out.ls.normal
done, parsed 55829 memory writes.
sorting the writes list
done
looking for hits in itrace.out.ls.normal
no hits found, the binary doesn't look packed

That’s all folks!

A First Look at Google Native Client’s Inner Sandbox

Google has recently announced a new project to allow web applications to be programmed in x86. Yep, seriously.

Doing so securely is quite a challenge. x86 machine code has proved rather malware-friendly because it is really permissive and mixing up code and data complicates seriously static analysis. Since Google’s objective is to speed up web apps, they’re not going to use emulation to ensure that the code does not perform unauthorised calls, they rather chose to enforce some constraints on the code with a static verification algorithm. It makes sense, but doing so correctly is tricky.

The Inner Sandbox

The key component is called the inner sandbox, where code can’t execute sensitive operations such as system calls. To ensure statically that the code will stay in the inner sandbox, the following constraints are checked:

  • C1. Once loaded into the memory, the binary is not writable,
    enforced by OS-level protection mechanisms during execution.
  • C2. The binary is statically linked at a start address of zero,
    with the first byte of text at 64K.
  • C3. All indirect control transfers use a nacljmp pseudoinstruction
  • C4. The binary is padded up to the nearest page with at least
    one hlt instruction (0xf4).
  • C5. The binary contains no instructions or pseudo-instructions
    overlapping a 32-byte boundary.
  • C6. All valid instruction addresses are reachable by a fallthrough
    disassembly that starts at the load (base) address.
  • C7. All direct control transfers target valid instructions.

C1 means no self-modifying code, and relies on the segment permissions (as you can see in the screenshot, the .text section is not writable). segments1

To sum things up, the purpose of these constraints is to ensure the following properties:

  • everything in the code segment can be reliably disassembled (so that illegal instructions can be detected statically)
  • if you jump somewhere, it must be in something that has already been disassembled

These properties, if they are enforced correctly, ensure that you can’t execute arbitrary code. Now the problem is that enforcing these properties correctly is quite hard. For example, a typical static analysis problem is: does a function return to the caller or somewhere else ? It is hard to ensure that the return address has not been modified dynamically on the stack. To bypass this problem, NaCl does not allow the return instruction, but rather uses…

The nacljmp pseudo-instruction

It consists in a mask and a jump:

and %eax, 0xffffffe0
jmp *(%eax)

It puzzled me for some time. The purpose of this construct is to enable jumping to an unknown location, by ensuring that the target is a 0 mod 32 address. Constraint C5 ensures that every 0 mod 32 address is a valid target (= it is not in the middle of an instruction), and therefore it has already been disassembled. So we don’t know statically where we’re jumping, but we know there’s no illegal instruction there.

This indirect jumping stuff sounds juicy enough, and it didn’t last long until somebody found an exploit for it.

“The validator was checking for calls through
registers, as in this two-byte call:

 83 e2 e0                and    $0xffffffe0,%edx
 ff d2                   call   %edx

but erroneously permitting similar calls through memory using this
addressing mode:

 83 e2 e0                and    $0xffffffe0,%edx
 ff 12                   call   *(%edx)"

Ironically enough, the example given in the research paper used memory addressing but also stated “we disallow memory addressing modes on indirect jmp and call instructions“.

We can note the position of Google with this project: they are releasing a research paper and the source code of a very early, experimental project to share with the security and research communities, and the developers can also be reached on the discussion group. They encourage people to break the sandbox and credit the author of the first exploit. Ain’t it cool ?

Ruxcon08 Cover Up

Hi folks,

I just come back from Ruxcon08 in Sydney, a really nice security conference with lots of high-level talks, an excellent venue and super-duper ambiance. Thanks to everyone for the positive feedback on my presentation on GPU Powered Malware (slides here).

My personal highlights are:

Silvio Cesare, Security Applications for Emulation. A really good talk dealing in part with automated unpacking, tracing, dynamic taint analysis and symbolic execution. All of which are the next step in malware analysis (me thinks). Silvio also produced a quite amazing memory bug checker for the Linux kernel.

– Nishad Herath, Now you see it, now you don’t! – Obfuscation ’08 style… He introduced the new trends that we see in obfuscation (not malware-specific) like code fragmentation and instruction virtualisation. Too bad he was (intendedly ?) light on details on the use of abstract interpretation and trace analyses. Me wants more !

Though not related with what I do, I also enjoyed Browser Rider by Ben Mossé (you really have to see a Javascript tunnel between a rogue server and a browser in action to get the full flavour) and Intelligent Web Fuzzing by Mr and Mrs Archibald. Instrumenting PHP, MySQL and Bash with all that hooks looks definitely hairy ! Oh yeah and a special mention for Netscreen of the Dead by Graeme Neilson – this guy is getting Zombie-creepiness and gore to the next level.

Well see you next year I guess ;)

Links: