The Halting Problem for Reverse Engineers

I often have the feeling that technically savvy people don’t have a very high opinion of academia, and this is particularly true of security people. They have to deal with low-level details such as hardware architecture, operating systems internals (more or less documented), proprietary protocols and data structures, all of which require very specialized knowledge. Logic, theorems and algorithms don’t have a predominant place in that picture.

For the past few years I have been working on these subjects, and found some fundamental theorems to be actually *useful* in understanding the security properties of computer architectures. One of them is, of course, the undecidability of the halting problem. In essence, it says that we can not know if the computation of a program on some input will ever terminate. So what? Who cares if a program terminates, what we want is to find vulnerabilities and unpack malware samples, right?

The importance of the undecidability of the halting problem lies in its generality. In particular we can see Rice’s theorem as a generalization of this result, and to put it very simply, it says that whatever properties of programs you’re interested in, no program can tell if this property holds for every program (i.e. it is undecidable).

This is very bad news for all of us, since basically everything about programs is undecidable. Say that you are interested in finding functions that do out-of-bounds memory writes (or as I said, any other property), Rice’s theorem says that there is no program that will give you a correct answer all the time. You must accept that your program sometimes fails or infinitely loops.

I want to emphasize how bad this is. Do not let the terminology used in the theorems confuse you. In particular, the notions of input, output, and function computed by a program do not map nicely to binaries. An output is anything that gets modified by your program — any register or memory location, as soon as it is touched by an instruction, is an output. And basically, everything about outputs is undecidable. As a consequence, simple tasks such as disassembling are undecidable.

For instance, take this seemingly innocent indirect jump:

jmp [eax]

If eax is an output of instructions before it, no luck, its value is undecidable. You can run the program, write the value down, and assume it will not change, but you have no guarantee that it will not change at a given date. Undecidable. You could argue that eax can only take a finite number of values, and hence disassembling is still possible, just very intractable. But that would be without counting on self-modifying code. SMC (think packers) is the scourge of disassemblers because it gives the ability to transfer control to an output. Since I can’t decide the value of the output, I can’t disassemble.

To sum things up, here are a few direct consequences of the undecidability of the halting problem:

  1. you can’t decide the target of indirect jumps, reads and writes
  2. you can not decide if a particular memory address is code, data, or both
  3. you can’t decide values written in memory
  4. you can’t decide the numbers of occurrences of loops
  5. you can’t decide if control flow can reach a given instruction
  6. whatever you see in a given run can change arbitrarily in another run
  7. disassembling is undecidable
  8. unpacking is undecidable

I will leave how all this led to the bad habit of relying on “heuristics” to a further post. Stay classy!


14 thoughts on “The Halting Problem for Reverse Engineers

  1. Uhm I’m not really sure we need to be so pessimist about this. One good point that “we” have in favor is that the things we look at are finite state machines, and our solutions (ie. analysis programs) do not need to apply to all other programs.

    If you look at things by extension, you can decide them.

    I might be wrong, I’m looking forward for discussion :)

      1. Hi Agustin,
        Dan is right, let us consider an example to approximate the complexity of program analysis.

        Indeed the halting problem is decidable for finite-state machines considering that the duration of repeating pattern cannot exceed the number of internal states. If you take a 1Ghz machine with a 100GB hard drive then this duration is about 2^800 seconds (if I’m not wrong)… that’s pretty long, almost infinite from my point of view.

        If you consider that 2^800 is *almost infinite* then a 1Ghz machine with a 100GB is an *almost infinite-state* machine.

        I think that have encountered theoretical developments about this notion of *almost infinite* but I cannot remember where.


    1. It’s not pessimism, it’s knowing fundamental limitations.

      What do you mean when you say we do not need to apply to all other programs? What class of programs are we interested in then?

      A discussion is possible about the finiteness of our state, as is the case for indirect jumps. The number of possible configurations however is so astronomically big that from a practical point of view it is pretty close to infinity. And then we have genuine infinites that really throw decidability out the window (like self-modifying code).

      1. Yes, I had in mind that the amount of states is huge^huge, but what I think I wanted to point is that we can limit the search space in many cases and make the problem tractable.

        “What do you mean when you say we do not need to apply to all other programs? What class of programs are we interested in then?”

        Please disregard my previous statement :)

      2. We can indeed limit the search space in some/many cases, the problem is that the worst case can quickly show its ugly head. For instance, where should you start disassembling after a call instruction? IDA Pro does not try very hard, it just assumes that you always ret right after a call and disassembles from here. Something as simple as “call, pop, inc, push, ret” introduces an off-by-one error that fools IDA Pro. Metasm does a better job at solving this. The example shown above is “trivial” to solve (as long as you have a semantic description of each instruction) because a single path is exercised, even if the path forms an acyclic graph you can solve every possible value for the return site. But if the return address is manipulated in a loop, as far as I know, everybody fails. This is what I meant when I mentioned the undecidability of disassembling: in the process of determining control flow, you might have to face undecidable problems (such as loop termination). Happy holidays :)

  2. Disassembly isn’t inherently undecidable, but the form of assembly used by x86 (variable length instructions with no alignment restrictions) makes it so. With some restrictions, even x86 disassembly becomes decidable — Google, for example, does this with NaCl binaries, which allows it to statically verify that they contain no unsafe instructions.

    1. The variable length encoding makes it more complex but this is not where the undecidability comes from. The longest encoding being 15 bytes, that makes only at worst 14 wrong starting points per instruction. Of course having to start disassembling at every byte is undesirable but it is definitely in the realm of tractability.

      The purpose of NaCl is to have *reliable* disassembling, in the sense that you don’t have to do this overly conservative per-byte disassembling. In essence, it brings you the same properties as if every x86 instruction could be encoded in 1 byte. The control flow can still not be determined, an indirect jump can jump anywhere except in the middle of an instruction.

      What really makes NaCl binaries statically verifiable is the interdiction of self-modifying code.

      1. Dan why do always agree? I was more fun before.
        Nevertheless, I would like make the discussion more controversial.

        Behavior decision is undecidable because we define behavior as a property that should be true for any program which compute the same function.
        But this is quite a theoretical definition of behavior.

        If we relax this notion, we may have some good new. For example If we define behavior as the set of system calls that can triggered during the execution then the behavior of Nacl like programs can be easily approximated. The same holds if we define behavior as the control flow, the information flow or the execution time. (hum…Yes we can always put an upper bound on the execution time but I never said that this bound was always finite ;)).

        Sometimes we are able to approximate behavioral properties. This is a great news! The counterpart is that this approximation is often uninteresting; e.g. the program can trigger any system call or the program can reach any memory address.

        Merry Christmas

  3. The halting problem on Turing machines, which use “unbounded” tapes (memory), is undecidable.

    You can model “real computers” as linear bounded automata in which case the halting problem becomes decidable via proofs that reject by configuration cycling (just google this).

    The focus here is the difference between bounded and unbounded machine models. Once memory becomes finite (bounded) most decision problems (such as the halting) become decidable but in many cases its algorithm is intractable which is the real problem we have to deal with within program analysis.

  4. I wish that people would stop pretending that real-world problems are undecidable. They never are, not a single one of them. People like undecidability because it’s so easy to prove. So when you point out to them that their undecidability proofs make factually incorrect assumptions, they don’t want to believe it, even though they know it’s true. They try to argue that the proofs have some kind of fuzzy approximate relevance. I wish that people would prove things NP-complete instead. It’s more work, but it has the advantage of meaning something in the real world. Although even NP-complete problems are sometimes surprisingly tractable in the domain of interest. Even better would be to prove an actual lower bound on complexity, but nobody knows how to do that. (And, therefore, it must not be important.)

    Matthieu, you established an upper bound of 2^800 seconds on a certain problem. That’s not the same as a lower bound.

    1. You use the term “real-world problem” to refer to the fact that the (real-world, hence bounded) machine size is one of the inputs. I have no problem factoring that out and reasoning about the abstract problem, with arbitrarily large machines. If you prefer keeping the massive exponent and reason about particular machine instances, feel free to do so.

    2. Concerning ‘fuzzy approximation’, Ben you I suppose that you know that NP completeness is based on two unprouved hypothesis P!=NP and the absence of speed-up in P.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s