Fun with diagonalization

No more loops, embrace the future with recursive self-modifying code and diagonalization. In the following code snippet, we show how to replace loops like ‘while E do C’ with only ‘if’ and ‘eval’ statements:


// expects stmt == 'if(expr) { command }'
function Z(stmt) {
    start = stmt.indexOf('(')+1;
    end = stmt.indexOf(')');
    expr = stmt.substr(start, end-start);

    start = stmt.indexOf('{')+1;
    end = stmt.lastIndexOf('}');
    command = stmt.substr(start, end-start);

    out = 'if ('+expr+') {'+command+'; stmt=\'if('+expr+') {'+command+'}\'; eval(Z(stmt))}';
    return out;
}

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) { alert(i); i-- }'))



Read more about diagonalization and its use by Cantor and Gödel here.

UPDATE: a nicer code snippet by Matt Kaczmarek below.


special = "/**/"
function Z (stmt){
    stmt = stmt.replace (special, "\'" + stmt + "\'");
    return stmt;
}

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) {alert(i); i-- ; eval(Z(/**/))}'));

Self Advertisement

———————————————————————-
*** Malware 2010 Call for Papers ***
———————————————————————-
Important dates
———————————————————————-
Submission of papers: June 30, 2010 23:59:59 EST
Notification of Acceptance: August 27th, 2010, 23:59:59 EST
Camera ready paper: September 10th, 2010: 23:59:59 EST
Workshop dates: October 20-21, 2010
———————————————————————-
Overview
———————————————————————-
The 5th IEEE International Conference on Malicious and Unwanted
Software (Malware 2010) will be held in Nancy, France, Oct. 20-21,
2010. The conference is designed to bring together experts from
industry, academia, and government to present and discuss, in an open
environment, the latest advances and discoveries in the field of
malicious and unwanted software. Techniques, economics and legal
issues surrounding the topic of Malware, and the methods to detect and
control them will be discussed.
This year’s conference will pay particular attention to (and will also
be extensively discussed in a panel session) the pressing topic of
“Malware and Cloud Computing”.  As low-cost netbooks become popular,
Google’s Chrome OS enters the mainstream, and social networks
(Facebook, YouTube, Twitter, LinkedIn, and so forth) become
ubiquitous, the security dangers associated with the new computing
paradigm increase exponentially.  In effect, “Cloud Computing”,
Multi-tenant, Single Schema, Single Server Platforms (C2S3P) increase
vulnerabilities by providing a single point of failure and attack for
organized criminal networks. Critical/sensitive/private information is
at risk, and very much like previous technology adoption trends, such
as wireless networks, the dash for success is trumping the need for
security.
Thus, the organizers of Malware 2010 solicit original written
contributions addressing these issues and research questions.
Manuscripts focusing on the security properties of Cloud Computing,
the risks associated with the deployment of such networks, and the
analysis of real incidents where a breach has occurred will be
particularly welcomed.
———————————————————————-
Submissions are solicited in, but not limited to, the following areas
———————————————————————-
Theoretical aspects and new directions on Malware related research
Analysis and measurements of real malware incidents
Worms, viruses and other propagating Malware
Spyware, keystroke loggers, information theft Malware
Honeypots and other sample collection methodologies
Botnet attacks, detection/tracking and defense
Malware economics and black market studies
Code reverse engineering tools and practices
Malware performance, analysis and capture tools
Anti-spam and anti-phishing techniques and practices
Legal aspects of unwanted software use
Malware and its impact in social networking and cloud computing
Rootkit and virtualization techniques
Malware in wireless mobile devices
———————————————————————-
Publication
———————————————————————-
The proceedings of the conference will be published in printed, and
DVD, form and will be included in the IEEE Xplore digital library.  In
addition, the Conference’s Technical Program Committee will select one
manuscript as a recipient of the “Best Paper Award”.  The Best Paper
Award author, together with the authors of a few selected manuscripts
from the conference, will be invited to submit an extended version to
a special issue of the Journal.
———————————————————————-
Paper Submission Information
———————————————————————-
Papers should be submitted through EDAS system at:
Submitted manuscripts must be 10-point font size, and should not
exceed 8 single-spaced pages in length, including the abstract,
figures, and references. Authors whose manuscript exceeds the 8 page
limit may be allowed to include two additional pages for an extra
charge.  However, under no circumstances shall a submitted manuscript
exceed the 10 page limit. Submitted papers must not substantially
overlap with papers that have been published or that are
simultaneously submitted to a journal or a conference with
proceedings.
———————————————————————-
Additional Information
———————————————————————-
For more information on Malware 2010 or if you are interested in
contributing to the organization of the conference please contact
Dr. Fernando C. Colon Osorio, General Program Chair, Malware 2010 at
information concerning submission of an original manuscript to the
conference, please contact the Technical Program Committee Chairs
(TPC).
———————————————————————-
Chairs of Malware 2010
———————————————————————-
Jean-Yves Marion, Nancy University, France, TPC
Noam Rathaus, Beyond Security, USA, TPC
Cliff Zhou, University Central Florida, USA, TPC
———————————————————————-
PC members
———————————————————————-
Anthony Arrott, Trends Micro, USA
Pierre-Marc Bureau, ESET, Canada
Mila Dalla Preda, Verona University, Italy
Saumya Debray, Arizona University, USA
Thomas Engel,  University of Luxembourg, Luxembourg
Jose M. Fernandez, Ecole Polytechnique de Montreal, Canada
Olivier Festor, INRIA Nancy Grand-Est, France
Brent Kang, North Carolina University, USA
Felix Leder, Bonn University, Germany
Bo Olsen, Kaspersky, USA
Jose Nazario, Arbor networks, USA
Phil Porras, SRI International, USA
Fred Raynal, Sogeti, France
Andrew Walenstein, Lafayette University, USA
Jeff Williams, Microsoft, USA
Yang Xiang, Deakin University, Australia
———————————————————————-
Publicity chair: Daniel Reynaud, Nancy University – Loria, France
Local chair: Matthieu Kaczmarek, INRIA Nancy Grand-Est, France

Newsflash: clever tricks have never been clever

“DOS applications were rarely well behaved in terms of using existing system resources since they typically ran one at time. Developers often used tricks or even existing bugs in the OS or processors in clever ways to get more memory or prevent running of unlicensed copies. Windows was intended to enable multiple applications to run at the same time, so it was difficult to support running existing applications while isolating them so their clever tricks would not corrupt other applications.”
Tandy Trower [technologizer.com]

And that was 25 years ago. Ouch.

US Domination over Security Conferences

(UPDATE: now with NDSS data)

Looking at the program committees of a few top-ranking security conferences, you have to stop and think “wow, there’s a lot of American people in there” (I am referring to people working in the US, not necessarily their nationality). Just to name a few:

  • ACM CCS 2010: 39 out of 54 PC members in the US (72.2%)
  • Internet Society NDSS 2010: 31 out of 37 (83.8%)
  • IEEE S&P 2010: 42 out of 49 (85.7%)
  • Usenix Security 2010: 26 out of 28 (92.9%)

And an approximation of the similarity of program committees (only based on last names):

ndss & s_p : ['Feamster', 'Traynor', 'Shacham', 'Ateniese', 'Kirda',
'Smith', 'Stavrou', 'Barth', 'Backes', 'Kruegel', 'Borisov', 'Kohno',
'Brumley', 'Sion', 'Locasto', 'Reiter']
similarity: 33.33%

ndss & usenix : ['Traynor', 'Shacham', 'Provos', 'Barth', 'Danezis',
'Borisov', 'Ganapathy', 'Garfinkel', 'Locasto']
similarity: 24.32%

ndss & ccs : ['Christodorescu', 'Kirda', 'Stavrou', 'Barth',
'Danezis', 'Borisov', 'Kohno', 'Ganapathy', 'Locasto', 'Cui']
similarity: 18.52%

s_p & usenix : ['Jha', 'Traynor', 'Giffin', 'Halderman', 'Shacham',
'Barth', 'Borisov', 'Locasto']
similarity: 16.67%

s_p & ccs : ['Halderman', 'Molnar', 'Waters', 'XWang', 'Stavrou',
'Barth', 'Gligor', 'Borisov', 'Kohno', 'Locasto', 'Yu', 'Kirda']
similarity: 22.22%

usenix & ccs : ['Sekar', 'Halderman', 'Gribble', 'Sailer', 'Barth',
'Danezis', 'Borisov', 'Ganapathy', 'Locasto', 'HWang']
similarity: 18.52%

.

I’m not sure how to interpret these numbers, they can mean lots of different things:

  • they might reflect the proportion of US/RestOfWorld academics working on computer security
  • or they might reflect the proportion of US/RestOfWorld academics working on software engineering (and therefore there is nothing specific to computer security)
  • causation 1: security conferences are top ranking when they have mostly US PC members
  • causation 2: top ranking conferences attract US PC members
  • what else?

Better shellcode disassembly in Pym’s

Pym’s is an x86 disassembler packaged as a web application. It now includes a new feature, that I like to call “shellcode disassembly for really lazy dudes”. Just paste a C char* in the textbox and let the magic occur.

For instance, try with this shellcode (chosen completely arbitrarily), and you should get something like this:

[0x00000000] 	jmp near 0x0000001e
[0x00000002] 	pop edx
[0x00000003] 	mov esi edx
[0x00000005] 	mov eax [dx]
[0x00000007] 	cmp ax 0x7dca
[0x0000000b] 	jnz 0x00000013
[0x0000000d] 	add ax 0x303
[0x00000011] 	mov [dx] eax
[0x00000013] 	inc dl
[0x00000015] 	cmp eax 0x41414141
[0x0000001a] 	jnz 0x00000005
[0x0000001c] 	jmp esi
[0x0000001e] 	call 0x00000002
[0x00000023] 	xor edx edx
[0x00000025] 	push 0xb
[0x00000027] 	pop eax
[0x00000028] 	cdq
[0x00000029] 	push edx
[0x0000002a] 	push 0x68732f2f
[0x0000002f] 	push 0x6e69622f
[0x00000034] 	mov ebx esp
[0x00000036] 	push edx
[0x00000037] 	push ebx
[0x00000038] 	mov ecx esp
[0x0000003a] 	retf 0x417d

A note on the x86 semantics modeling in JPC

JPC is a full PC emulator (à la Bochs but in Java), including the BIOS, VGA BIOS, floppy drive and other hardware components. Of particular interest to me, is the way x86 instructions are modeled and executed. It works and stays at the binary level (no fancy disassembly), and actually compiles x86 instructions to a simpler microcode language representing “atomic” instructions. This microcode language is then straightforward to execute, although a bit more complex than similar micro-languages (such as VEX, REIL or BIL).

The core of the x86 semantics is contained in the x86 to microcode compiler, found in org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUDecoder.decodeOpcode(). This method takes a binary x86 instruction and decodes its prefices, opcode, modrm, sib, displacement and immediate parameters. Then it delegates the translation of the microcode to this sequence of methods:


writeInputOperands(prefices, opcode, modrm, sib, displacement, immediate);
writeOperation(prefices, opcode, modrm);
writeOutputOperands(prefices, opcode, modrm, sib, displacement);
writeFlags(prefices, opcode, modrm);
<span style="color: #000000;">

For instance, if we take the binary instruction 04 42 (add al, 0x42), it is decoded with opcode = 0x04 and immediate = 0x42. Then based on these values, the instruction is translated to the following microcode sequence:


// writeInputOperands:
LOAD0_AL
LOAD1_IB 0x42
// writeOperation:
ADD
// writeOutputOperands:
STORE0_AL
// writeFlags:
ADD_O8_FLAGS<span style="color: #ffffff;">
</span>

Now, understanding the semantics of an x86 instruction reduces to understanding the semantics of the microcode language. For this, we need the microcode interpreter, which is org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUBlock.execute(). It is a relatively simple execution language (execution-wise), with 5 general-purpose registers but with roughly 750 opcodes. The execution of the above microcodes translates to this Java sequence:


reg0 = cpu.eax & 0xff;
reg1 = 0x42 & 0xff;
reg2 = reg0; reg0 = reg2 + reg1;
cpu.eax = (cpu.eax & ~0xff) | (reg0 & 0xff);
cpu.setZeroFlag((byte)reg0);
cpu.setParityFlag(reg0);
cpu.setSignFlag((byte)reg0);
cpu.setCarryFlag(reg0, Processor.CY_TWIDDLE_FF);
cpu.setAuxiliaryCarryFlag(reg2, reg1, result, Processor.AC_XOR);
cpu.setOverflowFlag(reg0, reg2, reg1, Processor.OF_ADD_BYTE);

A super quick and super dirty shellcode emulator

Using JPC, it gives something like this:


import java.io.IOException;
import org.jpc.emulator.PC;
import org.jpc.emulator.processor.Processor;
import org.jpc.j2se.VirtualClock;

public class EmulateShellcode {
    public static void main(String[] args) throws IOException {
	VirtualClock clock = new VirtualClock();
	PC pc = new PC(clock, args);
	Processor cpu = pc.getProcessor();
	cpu.cs.setSelector(0);
	cpu.eip = 0;

	if(args.length == 0) {
	    System.out.println("Usage: java EmulateShellcode ");
	    System.out.println("*** examples:");
	    System.out.println("* add al, 0x42:\n\tjava EmulateShellcode 04 42");
	    System.out.println("* nop:\n\tjava EmulateShellcode 0x90");
	    System.out.println("* inc edx:\n\tjava EmulateShellcode 42h");
	    System.exit(1);
	}

	System.out.println("Loading shellcode in memory...");
	for(int i=0; i<args.length; i++) {
	    String str = args[i];
	    if(args[i].startsWith("0x"))
		str = args[i].substring(2);
	    else if(args[i].startsWith("\\x"))
		str = args[i].substring(2);
	    else if(args[i].endsWith("h"))
		str = args[i].substring(0, args[i].length()-1);

            <strong>// the byte at address i is the argument at position i cpu.linearMemory.setByte(i, (byte)Integer.parseInt(str, 16));</strong>
	}

	System.out.println("Before execution:");
	cpu.printState();
        <strong>// execute starting at address 0 cpu.linearMemory.executeProtected(cpu, 0);</strong>
	System.out.println("\nAfter execution:");
	cpu.printState();
    }
}

Creating a toy virtual machine with PyPy

Here, you can use “virtual machine” as in “Java Virtual Machine”, not as in virtualization. We will play with the virtual machine described in the paper Tracing the Meta-Level: PyPy’s Tracing JIT Compiler by C.F. Bolz, A. Cuni, M. Fijalkowski and A. Rigo (it’s a great read by the way).

PyPy is a fascinating project, too complex to describe here. Among other things, PyPy can take any interpreter written in a subset of Python, translate it to C, and automatically generate a JIT compiler for this language. Does it sound too good to be true? Let’s try this.

  • grab PyPy source code
  • create the interpreter in pypy/translator/goal/target-toy.py with the following code:
</pre>
<pre style="text-align: justify;">import os, sys
import autopath
import py

# these are the opcodes for the interpreted language
JUMP_IF_A  = 1
MOV_A_R    = 2
MOV_R_A    = 3
ADD_R_TO_A = 4
DECR_A     = 5
RETURN_A   = 6

<em>from pypy.rlib.jit import JitDriver tlrjitdriver = JitDriver(greens = ['pc', 'bytecode'], reds = ['a', 'regs'])</em>

# the main interpreter loop
def interpret(bytecode, a):
   regs = [0] * 256
   pc = 0
   while True:
<em> tlrjitdriver.jit_merge_point(bytecode=bytecode, pc=pc, a=a, regs=regs) </em>       opcode = bytecode[pc]
       pc += 1
       if opcode == JUMP_IF_A:
           target = bytecode[pc]
           pc += 1
           if a:
<em> if target<pc: tlrjitdriver.can_enter_jit(bytecode=bytecode, pc=target, a=a, regs=regs) </em>               pc = target
       elif opcode == MOV_A_R:
           n = bytecode[pc]
           pc += 1
           regs[n] = a
       elif opcode == MOV_R_A:
           n = bytecode[pc]
           pc += 1
           a = regs[n]
       elif opcode == ADD_R_TO_A:
           n = bytecode[pc]
           pc += 1
           a += regs[n]
       elif opcode == DECR_A:
           a -= 1
       elif opcode == RETURN_A:
           return a

# __________  Entry point  __________
def entry_point(argv):
    # the program we want to interpret
    # it computes the square of its argument
    bytecode = [
        MOV_A_R,    0, # i = a
        MOV_A_R,    1, # copy of ’a’
        # 4:
        MOV_R_A,    0, # i--
        DECR_A,
        MOV_A_R,    0,
        MOV_R_A,    2, # res += a
        ADD_R_TO_A, 1,
        MOV_A_R,    2,
        MOV_R_A,    0, # if i!=0: goto 4
        JUMP_IF_A,  4,
        MOV_R_A,    2,
        RETURN_A
    ]
    result = interpret(bytecode, int(argv[1]))
    print result
    return 0

def jitpolicy(driver):
    from pypy.jit.metainterp.policy import JitPolicy
    return JitPolicy()

# _____ Define and setup target ___
def target(*args):
    return entry_point, None

# main function, if this script is called from the command line
if __name__ == '__main__':
    entry_point(sys.argv)</pre>
<div><span style="color: #000000;">

  • the lines in italic are the annotations for the JIT compiler. We need to give PyPy some insight on the interpreted language by declaring what is the instruction pointer (the green variables), the beginning of the dispatch loop and the backward branches (see the paper for full details).
  • check that you can execute this script correctly by running python target-toy.py 12, the output should be 144
  • PyPy can translate this script in C. For this, first install the dependencies and then run the following command: python translate.py target-toy.py
  • this should give you an executable target-toy-c, rename it target-toy-native and check that ./target-toy-native 12 yields 144
  • now we can ask PyPy to translate target-toy.py in C and generate a JIT compiler for it. For this, we just run python translate.py --opt=jit target-toy.py
    • note: the 64-bit backend of PyPy is not implemented yet, so if you are on a 64-bit system, you will have to struggle a bit. You will have to use a 32-bit Python interpreter (see my former post), create an alias for gcc -m32 (let’s call it gcc32) and then pass the option –cc=gcc32 to translate.py.
  • this should give you another target-toy-c executable, rename it to target-toy-jit and check that ./target-toy-jit 12 yields 144

Ok, everything is working, so let’s now see how all this performs by computing large squares:

~/pypy-trunk/pypy/translator/goal$ time python target-toy.py 1000000
1000000000000
real 0m18.637s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-native 1000000
-727379968
real 0m0.024s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-jit 1000000
-727379968
[...]
real 0m0.005s

The first run is the square program interpreted by our program, itself interpreted by the Python interpreter. Double interpretation is slow.

The second run is the square program interpreted by a native version of our interpret function. Interpretation by native code is ok.

The third run is the square program interpreted and JIT’ed on the fly. It’s super awesome :)

Final note: I must thank everybody from #pypy on freenode, for their help and resilience to stupid questions. Thanks guys!