Indefinite Studies

Academic ramblings about software security.

Newsflash: clever tricks have never been clever

leave a comment »

“DOS applications were rarely well behaved in terms of using existing system resources since they typically ran one at time. Developers often used tricks or even existing bugs in the OS or processors in clever ways to get more memory or prevent running of unlicensed copies. Windows was intended to enable multiple applications to run at the same time, so it was difficult to support running existing applications while isolating them so their clever tricks would not corrupt other applications.”
Tandy Trower [technologizer.com]

And that was 25 years ago. Ouch.

Written by dan

March 9, 2010 at 23:00

Posted in useless

US Domination over Security Conferences

with 2 comments

(UPDATE: now with NDSS data)

Looking at the program committees of a few top-ranking security conferences, you have to stop and think “wow, there’s a lot of American people in there” (I am referring to people working in the US, not necessarily their nationality). Just to name a few:

  • ACM CCS 2010: 39 out of 54 PC members in the US (72.2%)
  • Internet Society NDSS 2010: 31 out of 37 (83.8%)
  • IEEE S&P 2010: 42 out of 49 (85.7%)
  • Usenix Security 2010: 26 out of 28 (92.9%)

And an approximation of the similarity of program committees (only based on last names):

ndss & s_p : ['Feamster', 'Traynor', 'Shacham', 'Ateniese', 'Kirda',
'Smith', 'Stavrou', 'Barth', 'Backes', 'Kruegel', 'Borisov', 'Kohno',
'Brumley', 'Sion', 'Locasto', 'Reiter']
similarity: 33.33%

ndss & usenix : ['Traynor', 'Shacham', 'Provos', 'Barth', 'Danezis',
'Borisov', 'Ganapathy', 'Garfinkel', 'Locasto']
similarity: 24.32%

ndss & ccs : ['Christodorescu', 'Kirda', 'Stavrou', 'Barth',
'Danezis', 'Borisov', 'Kohno', 'Ganapathy', 'Locasto', 'Cui']
similarity: 18.52%

s_p & usenix : ['Jha', 'Traynor', 'Giffin', 'Halderman', 'Shacham',
'Barth', 'Borisov', 'Locasto']
similarity: 16.67%

s_p & ccs : ['Halderman', 'Molnar', 'Waters', 'XWang', 'Stavrou',
'Barth', 'Gligor', 'Borisov', 'Kohno', 'Locasto', 'Yu', 'Kirda']
similarity: 22.22%

usenix & ccs : ['Sekar', 'Halderman', 'Gribble', 'Sailer', 'Barth',
'Danezis', 'Borisov', 'Ganapathy', 'Locasto', 'HWang']
similarity: 18.52%

.

I’m not sure how to interpret these numbers, they can mean lots of different things:

  • they might reflect the proportion of US/RestOfWorld academics working on computer security
  • or they might reflect the proportion of US/RestOfWorld academics working on software engineering (and therefore there is nothing specific to computer security)
  • causation 1: security conferences are top ranking when they have mostly US PC members
  • causation 2: top ranking conferences attract US PC members
  • what else?

Written by dan

February 25, 2010 at 13:46

Posted in useless

Better shellcode disassembly in Pym’s

leave a comment »

Pym’s is an x86 disassembler packaged as a web application. It now includes a new feature, that I like to call “shellcode disassembly for really lazy dudes”. Just paste a C char* in the textbox and let the magic occur.

For instance, try with this shellcode (chosen completely arbitrarily), and you should get something like this:

[0x00000000] 	jmp near 0x0000001e
[0x00000002] 	pop edx
[0x00000003] 	mov esi edx
[0x00000005] 	mov eax [dx]
[0x00000007] 	cmp ax 0x7dca
[0x0000000b] 	jnz 0x00000013
[0x0000000d] 	add ax 0x303
[0x00000011] 	mov [dx] eax
[0x00000013] 	inc dl
[0x00000015] 	cmp eax 0x41414141
[0x0000001a] 	jnz 0x00000005
[0x0000001c] 	jmp esi
[0x0000001e] 	call 0x00000002
[0x00000023] 	xor edx edx
[0x00000025] 	push 0xb
[0x00000027] 	pop eax
[0x00000028] 	cdq
[0x00000029] 	push edx
[0x0000002a] 	push 0x68732f2f
[0x0000002f] 	push 0x6e69622f
[0x00000034] 	mov ebx esp
[0x00000036] 	push edx
[0x00000037] 	push ebx
[0x00000038] 	mov ecx esp
[0x0000003a] 	retf 0x417d

Written by dan

February 15, 2010 at 15:55

Posted in code analysis

A note on the x86 semantics modeling in JPC

with one comment

JPC is a full PC emulator (à la Bochs but in Java), including the BIOS, VGA BIOS, floppy drive and other hardware components. Of particular interest to me, is the way x86 instructions are modeled and executed. It works and stays at the binary level (no fancy disassembly), and actually compiles x86 instructions to a simpler microcode language representing “atomic” instructions. This microcode language is then straightforward to execute, although a bit more complex than similar micro-languages (such as VEX, REIL or BIL).

The core of the x86 semantics is contained in the x86 to microcode compiler, found in org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUDecoder.decodeOpcode(). This method takes a binary x86 instruction and decodes its prefices, opcode, modrm, sib, displacement and immediate parameters. Then it delegates the translation of the microcode to this sequence of methods:


writeInputOperands(prefices, opcode, modrm, sib, displacement, immediate);
writeOperation(prefices, opcode, modrm);
writeOutputOperands(prefices, opcode, modrm, sib, displacement);
writeFlags(prefices, opcode, modrm);
<span style="color: #000000;">

For instance, if we take the binary instruction 04 42 (add al, 0x42), it is decoded with opcode = 0x04 and immediate = 0x42. Then based on these values, the instruction is translated to the following microcode sequence:


// writeInputOperands:
LOAD0_AL
LOAD1_IB 0x42
// writeOperation:
ADD
// writeOutputOperands:
STORE0_AL
// writeFlags:
ADD_O8_FLAGS<span style="color: #ffffff;">
</span>

Now, understanding the semantics of an x86 instruction reduces to understanding the semantics of the microcode language. For this, we need the microcode interpreter, which is org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUBlock.execute(). It is a relatively simple execution language (execution-wise), with 5 general-purpose registers but with roughly 750 opcodes. The execution of the above microcodes translates to this Java sequence:


reg0 = cpu.eax & 0xff;
reg1 = 0x42 & 0xff;
reg2 = reg0; reg0 = reg2 + reg1;
cpu.eax = (cpu.eax & ~0xff) | (reg0 & 0xff);
cpu.setZeroFlag((byte)reg0);
cpu.setParityFlag(reg0);
cpu.setSignFlag((byte)reg0);
cpu.setCarryFlag(reg0, Processor.CY_TWIDDLE_FF);
cpu.setAuxiliaryCarryFlag(reg2, reg1, result, Processor.AC_XOR);
cpu.setOverflowFlag(reg0, reg2, reg1, Processor.OF_ADD_BYTE);

Written by dan

February 11, 2010 at 14:15

A super quick and super dirty shellcode emulator

leave a comment »

Using JPC, it gives something like this:


import java.io.IOException;
import org.jpc.emulator.PC;
import org.jpc.emulator.processor.Processor;
import org.jpc.j2se.VirtualClock;

public class EmulateShellcode {
    public static void main(String[] args) throws IOException {
	VirtualClock clock = new VirtualClock();
	PC pc = new PC(clock, args);
	Processor cpu = pc.getProcessor();
	cpu.cs.setSelector(0);
	cpu.eip = 0;

	if(args.length == 0) {
	    System.out.println("Usage: java EmulateShellcode ");
	    System.out.println("*** examples:");
	    System.out.println("* add al, 0x42:\n\tjava EmulateShellcode 04 42");
	    System.out.println("* nop:\n\tjava EmulateShellcode 0x90");
	    System.out.println("* inc edx:\n\tjava EmulateShellcode 42h");
	    System.exit(1);
	}

	System.out.println("Loading shellcode in memory...");
	for(int i=0; i<args.length; i++) {
	    String str = args[i];
	    if(args[i].startsWith("0x"))
		str = args[i].substring(2);
	    else if(args[i].startsWith("\\x"))
		str = args[i].substring(2);
	    else if(args[i].endsWith("h"))
		str = args[i].substring(0, args[i].length()-1);

            <strong>// the byte at address i is the argument at position i cpu.linearMemory.setByte(i, (byte)Integer.parseInt(str, 16));</strong>
	}

	System.out.println("Before execution:");
	cpu.printState();
        <strong>// execute starting at address 0 cpu.linearMemory.executeProtected(cpu, 0);</strong>
	System.out.println("\nAfter execution:");
	cpu.printState();
    }
}

Written by dan

February 10, 2010 at 15:13

Posted in code analysis

Creating a toy virtual machine with PyPy

with 14 comments

Here, you can use “virtual machine” as in “Java Virtual Machine”, not as in virtualization. We will play with the virtual machine described in the paper Tracing the Meta-Level: PyPy’s Tracing JIT Compiler by C.F. Bolz, A. Cuni, M. Fijalkowski and A. Rigo (it’s a great read by the way).

PyPy is a fascinating project, too complex to describe here. Among other things, PyPy can take any interpreter written in a subset of Python, translate it to C, and automatically generate a JIT compiler for this language. Does it sound too good to be true? Let’s try this.

  • grab PyPy source code
  • create the interpreter in pypy/translator/goal/target-toy.py with the following code:

</pre>
<pre style="text-align: justify;">import os, sys
import autopath
import py

# these are the opcodes for the interpreted language
JUMP_IF_A  = 1
MOV_A_R    = 2
MOV_R_A    = 3
ADD_R_TO_A = 4
DECR_A     = 5
RETURN_A   = 6

<em>from pypy.rlib.jit import JitDriver tlrjitdriver = JitDriver(greens = ['pc', 'bytecode'], reds = ['a', 'regs'])</em>

# the main interpreter loop
def interpret(bytecode, a):
   regs = [0] * 256
   pc = 0
   while True:
<em> tlrjitdriver.jit_merge_point(bytecode=bytecode, pc=pc, a=a, regs=regs) </em>       opcode = bytecode[pc]
       pc += 1
       if opcode == JUMP_IF_A:
           target = bytecode[pc]
           pc += 1
           if a:
<em> if target<pc: tlrjitdriver.can_enter_jit(bytecode=bytecode, pc=target, a=a, regs=regs) </em>               pc = target
       elif opcode == MOV_A_R:
           n = bytecode[pc]
           pc += 1
           regs[n] = a
       elif opcode == MOV_R_A:
           n = bytecode[pc]
           pc += 1
           a = regs[n]
       elif opcode == ADD_R_TO_A:
           n = bytecode[pc]
           pc += 1
           a += regs[n]
       elif opcode == DECR_A:
           a -= 1
       elif opcode == RETURN_A:
           return a

# __________  Entry point  __________
def entry_point(argv):
    # the program we want to interpret
    # it computes the square of its argument
    bytecode = [
        MOV_A_R,    0, # i = a
        MOV_A_R,    1, # copy of ’a’
        # 4:
        MOV_R_A,    0, # i--
        DECR_A,
        MOV_A_R,    0,
        MOV_R_A,    2, # res += a
        ADD_R_TO_A, 1,
        MOV_A_R,    2,
        MOV_R_A,    0, # if i!=0: goto 4
        JUMP_IF_A,  4,
        MOV_R_A,    2,
        RETURN_A
    ]
    result = interpret(bytecode, int(argv[1]))
    print result
    return 0

def jitpolicy(driver):
    from pypy.jit.metainterp.policy import JitPolicy
    return JitPolicy()

# _____ Define and setup target ___
def target(*args):
    return entry_point, None

# main function, if this script is called from the command line
if __name__ == '__main__':
    entry_point(sys.argv)</pre>
<div><span style="color: #000000;">

  • the lines in italic are the annotations for the JIT compiler. We need to give PyPy some insight on the interpreted language by declaring what is the instruction pointer (the green variables), the beginning of the dispatch loop and the backward branches (see the paper for full details).
  • check that you can execute this script correctly by running python target-toy.py 12, the output should be 144
  • PyPy can translate this script in C. For this, first install the dependencies and then run the following command: python translate.py target-toy.py
  • this should give you an executable target-toy-c, rename it target-toy-native and check that ./target-toy-native 12 yields 144
  • now we can ask PyPy to translate target-toy.py in C and generate a JIT compiler for it. For this, we just run python translate.py --opt=jit target-toy.py
    • note: the 64-bit backend of PyPy is not implemented yet, so if you are on a 64-bit system, you will have to struggle a bit. You will have to use a 32-bit Python interpreter (see my former post), create an alias for gcc -m32 (let’s call it gcc32) and then pass the option –cc=gcc32 to translate.py.
  • this should give you another target-toy-c executable, rename it to target-toy-jit and check that ./target-toy-jit 12 yields 144

Ok, everything is working, so let’s now see how all this performs by computing large squares:

~/pypy-trunk/pypy/translator/goal$ time python target-toy.py 1000000
1000000000000
real 0m18.637s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-native 1000000
-727379968
real 0m0.024s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-jit 1000000
-727379968
[...]
real 0m0.005s

The first run is the square program interpreted by our program, itself interpreted by the Python interpreter. Double interpretation is slow.

The second run is the square program interpreted by a native version of our interpret function. Interpretation by native code is ok.

The third run is the square program interpreted and JIT’ed on the fly. It’s super awesome :)

Final note: I must thank everybody from #pypy on freenode, for their help and resilience to stupid questions. Thanks guys!

Written by dan

February 8, 2010 at 19:30

Posted in code analysis

How to build 32-bit Python on Ubuntu 9.10 x86_64

with one comment

This might be useful to someone:
$ cd Python-2.6.4
$ CC="gcc -m32" LDFLAGS="-L/lib32 -L/usr/lib32 \
-L`pwd`/lib32 -Wl,-rpath,/lib32 -Wl,-rpath,/usr/lib32" \
./configure --prefix=/opt/pym32
$ make 
$ sudo make install

Now, to check if the build is correct, try to check the max integer size on your standard 64-bit interpreter and in your fresh 32-bit binary:

$ python -c "import sys; print sys.maxint"
9223372036854775807
$ /opt/pym32/bin/python -c "import sys; print sys.maxint"
2147483647

Written by dan

February 8, 2010 at 10:42

Posted in python

Getting Started with Savarin

with 6 comments

(disclaimer: the author of Savarin, Matthieu Kaczmarek, is a colleague working in the office next door and a friend of mine)

Savarin is a free online binary classification service (you can think of it as automatic diff’ing against large databases of programs). It is in beta, not fully polished yet, but you can still squeeze some interesting results out of it. Here is your daily shot of binary analysis, freshly brewed.

You will need:

  • 2 different malware samples in the same malware family. We are going to use Sasser.A (already in Savarin’s database) and an unpacked Sasser.G (md5 b973853d0863070aca89ce00d4ee0fb9 [offensivecomputing.net])
  • IDA with IDAPython for the actual diff’ing (I have IDA 5.5, I don’t know if this works with the free version)

Let’s go:

  1. open Savarin
  2. in “Classification against custom database”, choose SasserA
  3. upload the Sasser.G sample
  4. in the results page, click More to see the similarity with other binaries in the Sasser family
  5. you can see that the sample is 41.95% similar to a sample with md5 edc66a4031f5a41f9ddf08595a1d4c92

At this point, you have a classification of a sample against a (small) database of programs. You can therefore see the distance between this sample and other samples. If you ask me, it’s a lot better to see that unknownsample.exe is 80% similar to badguy.exe and 90% similar to badguy2.0.exe than just “infected” or “not infected”.

For the actual diff’ing, follow these steps:

  1. open the Sasser.G sample in IDA
  2. download the IDAPython analysis report on Savarin’s analysis page (this report contains all the data needed to visualize the binary differences in IDA)
  3. execute the IDAPython analysis report
  4. right now, the situation is pretty anticlimactic since you should see no change apart from a few lines in the console. Wait until next step for the interesting stuff. Yes, you had nothing to do in this step, so what?
  5. type SavColor(‘md5.edc66a4031f5a41f9ddf08595a1d4c92′, 0x0088ff) in the IDAPython console (it is the md5 value of the Sasser.A sample)
  6. type SavComment(‘md5.edc66a4031f5a41f9ddf08595a1d4c92′) in the IDAPython console
  7. this is it, now you can browse the Sasser.G sample, and the common parts with Sasser.A will be colored. Additionally, for two matching instructions you will see the corresponding address in the Sasser.A sample.

The Fine Screenshots:

Written by dan

January 20, 2010 at 10:39

Posted in code analysis, malware

1998 has called and they want their software back

with 2 comments

“The need to improve current industrial software engineering practice is widely felt. To mention only a few striking examples: too many software projects fail and are canceled before completion or are not delivered on time or exceed their budget the energy spent on testing code is ever increasing and tends to represent more than half of the entire development cost, the number of errors found in complex software is ofter rather high, there is almost no software warranty whatsoever, but again and again the world is surprised by Trojan horses and security holes, etc.”

Abstract State Machines, Egon Börger and Robert Stärk, 1998

Written by dan

January 15, 2010 at 12:44

Posted in useless

Packers, egg, sausage and packers

with 2 comments

Thanks to Silvio Cesare and Felix Gröbert, I now have 44 species in my packer zoo. I tested TraceSurfer on these 44 packers, here are the full results and the visualizations.

(quick recap: TraceSurfer uses Pin to trace every instruction in target binaries, extracting a trace file. Then the trace is analysed (or surfed) to detect layers of self-modifying code (or code waves) and code protection patterns)

Highlights:

  • only 35 binaries execute correctly on my machine
  • Pin v. 31933 works on 30 of these binaries (success rate of 85.71%)
  • some packers don’t seem to pack anything since they have only 1 code wave (!Epack Lite 1.4, Pepack, VmProtect). Either the binaries are indeed packed and I fail to detect the self-modifying code, or they’re not self-modifying at all. It’s probably the latter, since the trace sizes are very close to the original (unpacked) binary.
  • very few packers seem to use code scrambling (supposedly an advanced anti-dumping technique): Acprotect, Petite and the Yoda family
  • integrity checking seems to be more popular, it is used in Acprotect, Themida, Enigma, Pelock, NSPack…
  • given the trace size, it seems that anti-emulation loops can be found in Morphine (164M instructions) and !Epack 1.0 (87M instructions)
  • nice similarities between Yoda Crypter 1.3 and Yoda Protector 1.4
  • the Monster Award is attributed to… Themida 1.8.5.2 (80M instructions, 31 code waves, 142890 decrypted bytes + integrity checking)

Written by dan

December 21, 2009 at 12:58

Posted in code analysis

Follow

Get every new post delivered to your Inbox.