Fun with diagonalization

No more loops, embrace the future with recursive self-modifying code and diagonalization. In the following code snippet, we show how to replace loops like ‘while E do C’ with only ‘if’ and ‘eval’ statements:

// expects stmt == 'if(expr) { command }'
function Z(stmt) {
    start = stmt.indexOf('(')+1;
    end = stmt.indexOf(')');
    expr = stmt.substr(start, end-start);

    start = stmt.indexOf('{')+1;
    end = stmt.lastIndexOf('}');
    command = stmt.substr(start, end-start);

    out = 'if ('+expr+') {'+command+'; stmt=\'if('+expr+') {'+command+'}\'; eval(Z(stmt))}';
    return out;

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) { alert(i); i-- }'))

Read more about diagonalization and its use by Cantor and Gödel here.

UPDATE: a nicer code snippet by Matt Kaczmarek below.

special = "/**/"
function Z (stmt){
    stmt = stmt.replace (special, "\'" + stmt + "\'");
    return stmt;

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) {alert(i); i-- ; eval(Z(/**/))}'));

Newsflash: clever tricks have never been clever

“DOS applications were rarely well behaved in terms of using existing system resources since they typically ran one at time. Developers often used tricks or even existing bugs in the OS or processors in clever ways to get more memory or prevent running of unlicensed copies. Windows was intended to enable multiple applications to run at the same time, so it was difficult to support running existing applications while isolating them so their clever tricks would not corrupt other applications.”
Tandy Trower []

And that was 25 years ago. Ouch.

US Domination over Security Conferences

(UPDATE: now with NDSS data)

Looking at the program committees of a few top-ranking security conferences, you have to stop and think “wow, there’s a lot of American people in there” (I am referring to people working in the US, not necessarily their nationality). Just to name a few:

  • ACM CCS 2010: 39 out of 54 PC members in the US (72.2%)
  • Internet Society NDSS 2010: 31 out of 37 (83.8%)
  • IEEE S&P 2010: 42 out of 49 (85.7%)
  • Usenix Security 2010: 26 out of 28 (92.9%)

And an approximation of the similarity of program committees (only based on last names):

ndss & s_p : ['Feamster', 'Traynor', 'Shacham', 'Ateniese', 'Kirda',
'Smith', 'Stavrou', 'Barth', 'Backes', 'Kruegel', 'Borisov', 'Kohno',
'Brumley', 'Sion', 'Locasto', 'Reiter']
similarity: 33.33%

ndss & usenix : ['Traynor', 'Shacham', 'Provos', 'Barth', 'Danezis',
'Borisov', 'Ganapathy', 'Garfinkel', 'Locasto']
similarity: 24.32%

ndss & ccs : ['Christodorescu', 'Kirda', 'Stavrou', 'Barth',
'Danezis', 'Borisov', 'Kohno', 'Ganapathy', 'Locasto', 'Cui']
similarity: 18.52%

s_p & usenix : ['Jha', 'Traynor', 'Giffin', 'Halderman', 'Shacham',
'Barth', 'Borisov', 'Locasto']
similarity: 16.67%

s_p & ccs : ['Halderman', 'Molnar', 'Waters', 'XWang', 'Stavrou',
'Barth', 'Gligor', 'Borisov', 'Kohno', 'Locasto', 'Yu', 'Kirda']
similarity: 22.22%

usenix & ccs : ['Sekar', 'Halderman', 'Gribble', 'Sailer', 'Barth',
'Danezis', 'Borisov', 'Ganapathy', 'Locasto', 'HWang']
similarity: 18.52%


I’m not sure how to interpret these numbers, they can mean lots of different things:

  • they might reflect the proportion of US/RestOfWorld academics working on computer security
  • or they might reflect the proportion of US/RestOfWorld academics working on software engineering (and therefore there is nothing specific to computer security)
  • causation 1: security conferences are top ranking when they have mostly US PC members
  • causation 2: top ranking conferences attract US PC members
  • what else?

1998 has called and they want their software back

“The need to improve current industrial software engineering practice is widely felt. To mention only a few striking examples: too many software projects fail and are canceled before completion or are not delivered on time or exceed their budget the energy spent on testing code is ever increasing and tends to represent more than half of the entire development cost, the number of errors found in complex software is ofter rather high, there is almost no software warranty whatsoever, but again and again the world is surprised by Trojan horses and security holes, etc.”

Abstract State Machines, Egon Börger and Robert Stärk, 1998

The Sad State of Reverse Engineering

When you look at software engineering as a research field, you can see some pretty serious progress there. There are amazing projects like PyPy and LLVM, massive optimizations in gcc and JIT compilers (HotSpot, Psyco, TraceMonkey). Compared to that, I have the impression that the reverse engineering community did not produce any significant results. What we have is disassemblers, that is to say parsers.

To make things even worse, the more advanced tools used in RE have been created for a totally different purpose (think Pin, VEX, QEMU, Bochs, virtualization…). Some nice works are being performed by folks like Sean Heelan, Silvio Cesare, the Sogeti R&D team (metasm, fuzzgrind) and the BitBlaze team (TEMU, Vine). But overall I can see no open, community-driven, formally sound approach. The tools are either not FOSS, limited in scope, or just not-that-reusable.

There is a number of potential factors to explain the situation:

  • reversers are not developers (this, I think, is a big factor)
  • reversers are solitary, basement programmers (not to mention cheese pops and japanese tentacle porn)
  • the complexity of x86 + Windows makes the entry cost too high for academics

We are therefore left with a research niche with virtually no academics, little to no developer community, that still pumps some big bucks. The only player left is the security industry, i.e. corporations which have absolutely no incentive to solve the problem.

Did I miss something, or is the picture really that grim?

Differential Reversing

I love this []: a known technique with a clean, elegant and almost free approach. The idea is to find interesting input-dependent spots in binaries:

  • first, instrument the binary and record a hit trace (basic block granularity is enough) for a base input and a trigger input
  • then, compute the difference between the hit traces
  • finally, highlight the differences in a disassembler, and plug a wetware to analyse the result

Dion does (1.) with a pintool, (2.) with a python script and (3.) with IDAPython. Sweeeet :)

Automatic Exploit Generation

One of the best MSc dissertations I’ve read:

“We present a novel algorithm that integrates data-flow analysis and a decision procedure with the aim of automatically building exploits. The exploits we generate are constructed to hijack the control flow of an application and redirect it to malicious code.

Our algorithm is designed to build exploits for three common classes of security vulnerability; stack-based buffer overflows that corrupt a stored instruction pointer, buffer overflows that corrupt a function pointer, and buffer overflows that corrupt the destination address used by instructions that write to memory. For these vulnerability classes we present a system capable of generating functional exploits in the presence of complex arithmetic modification of inputs and arbitrary constraints. Exploits are generated using dynamic data-flow analysis in combination with a decision procedure.”

(yes, I am now Schneier-blogging)