The Mythical Man-Month Reading Notes

  • the man-month is a fallacious and dangerous myth, for it implies than men and months are interchangeable
  • Brooks’s Law: adding manpower to a late project makes it later (effort is needed to repartition the work, train the new people, and more time is spent on intercommunication)
  • “Good cooking takes time” (Restaurant Antoine’s Chef)
  • discipline is good for art: providing an architecture enhances the creative style of an implementing group
  • the purpose of organization is to reduce the amount of communication and coordination necessary
  • “The programmer delivers satisfaction of a user need rather than any tangible product” (Cosgrove)
  • all repairs tend to destroy structure, to increase the entropy of a system
  • “How does one project get to be a year late? One day at a time”
  • the first step in controlling a project on a tight schedule is to have a schedule with concrete, measurable milestones. A programmer will rarely lie about a milestone if it’s so sharp he can’t deceive himself.
  • chronic schedule slippage is a morale-killer: “if you miss one deadline, make sure you make the next one” (Mac Carthy)

A note on the x86 semantics modeling in JPC

JPC is a full PC emulator (à la Bochs but in Java), including the BIOS, VGA BIOS, floppy drive and other hardware components. Of particular interest to me, is the way x86 instructions are modeled and executed. It works and stays at the binary level (no fancy disassembly), and actually compiles x86 instructions to a simpler microcode language representing “atomic” instructions. This microcode language is then straightforward to execute, although a bit more complex than similar micro-languages (such as VEX, REIL or BIL).

The core of the x86 semantics is contained in the x86 to microcode compiler, found in org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUDecoder.decodeOpcode(). This method takes a binary x86 instruction and decodes its prefices, opcode, modrm, sib, displacement and immediate parameters. Then it delegates the translation of the microcode to this sequence of methods:

writeInputOperands(prefices, opcode, modrm, sib, displacement, immediate);
writeOperation(prefices, opcode, modrm);
writeOutputOperands(prefices, opcode, modrm, sib, displacement);
writeFlags(prefices, opcode, modrm);
<span style="color: #000000;">

For instance, if we take the binary instruction 04 42 (add al, 0x42), it is decoded with opcode = 0x04 and immediate = 0x42. Then based on these values, the instruction is translated to the following microcode sequence:

// writeInputOperands:
LOAD1_IB 0x42
// writeOperation:
// writeOutputOperands:
// writeFlags:
ADD_O8_FLAGS<span style="color: #ffffff;">

Now, understanding the semantics of an x86 instruction reduces to understanding the semantics of the microcode language. For this, we need the microcode interpreter, which is org.jpc.emulator.memory.codeblock.optimised.ProtectedModeUBlock.execute(). It is a relatively simple execution language (execution-wise), with 5 general-purpose registers but with roughly 750 opcodes. The execution of the above microcodes translates to this Java sequence:

reg0 = cpu.eax & 0xff;
reg1 = 0x42 & 0xff;
reg2 = reg0; reg0 = reg2 + reg1;
cpu.eax = (cpu.eax & ~0xff) | (reg0 & 0xff);
cpu.setCarryFlag(reg0, Processor.CY_TWIDDLE_FF);
cpu.setAuxiliaryCarryFlag(reg2, reg1, result, Processor.AC_XOR);
cpu.setOverflowFlag(reg0, reg2, reg1, Processor.OF_ADD_BYTE);

Python Idioms: + versus join

I was told to use ”.join([]) instead of the ‘+’ operator in Python. However a (bad) benchmark showed ‘+’ to be a lot faster. I think it is reasonable to say that in some cases ‘+’ is faster, here is my test:

def test0(b, c, d, e, f):
    for i in xrange(10**7):
        a = b + c + d + e + f

def test1():
    l = ['hello ', 'world ', 'with ', '+ ', 'operator']
    for i in xrange(10**7):
        a = ''
	for j in l:
            a += j

def test2():
    l = ['hello', 'world', 'with', 'join', 'function']
    for i in xrange(10**7):
	a = ' '.join(l)

test0('hello ', 'world ', 'with ', '+ ', 'operator')

And the result of the test:

$ python -m cProfile -s cumulative
hello world with + operator
hello world with + operator
hello world with join function

   10000007 function calls in 14.968 CPU seconds
   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    6.838    6.838    6.838    6.838
        1    2.683    2.683    5.113    5.113
        1    3.016    3.016    3.016    3.016

So clearly the worst way of using ‘+’ is when iterating over a list of strings and accumulating the concatenations in a variable (function test1). But there is nothing wrong with performing multiple ‘+’ operations in a single line and then storing the result in a variable (function test0).

A quick look at the bytecode of the function confirms this intuition, we can see a bunch of LOADs and ADDs and only one STORE:

>>> import dis
>>> dis.dis(test0)
             19 LOAD_FAST                0 (b)
             22 LOAD_FAST                1 (c)
             25 BINARY_ADD          
             26 LOAD_FAST                2 (d)
             29 BINARY_ADD          
             30 LOAD_FAST                3 (e)
             33 BINARY_ADD          
             34 LOAD_FAST                4 (f)
             37 BINARY_ADD          
             38 STORE_FAST               6 (a)

The test was performed with Python 2.5.4 on a Debian sid. Would be nice to see if the results hold for new versions of the Python interpreter.

A Quick Survey on Intermediate Representations for Program Analysis

This is mostly a note to myself, but I guess people interested in automating reverse engineering will be interested at some point in IR suitable for low-level abstractions. I consider top-down IR used by optimizing compilers and bottom-up IR used by decompilers and other reversing tools.

Intermediate Representations for Reverse Engineering

REIL. Used in BinNavi, the Reverse Engineering Intermediate Language defines a very simple RISC architecture (17 instructions), with the nice property that each instruction has at most one side-effect. Thomas Dullien and Sebastian Porst recently presented at CanSecWest an abstract interpretation framework for REIL (paper, slides). It is clearly possible to easily write analyses and transformation passes for REIL without getting into the complexity of the whole x86 architecture, given x86 -> REIL and REIL -> x86 translators.

Here are some sample REIL instructions :

1006E4B00: str edi, , edi
1006E4D00: sub esp, 4, esp
1006E4D01: and esp, 4294967295, esp
1006E4D02: stm ebp, , esp

Language Reference

Hex Rays Microcode. Presented at Black Hat USA 2008 by Ilfak Guilfanov (paper, slides), it is an IR used during decompilation. From the paper: “The microcode language is very detailed and precisely represents how each instruction modifies the memory, registers, and processor condition codes. Typically one CPU instruction is converted into 5-15 microinstructions”. According to the REIL paper, REIL and the microcode language are significantly different, for instance the microinstructions can have a variable number of operands and perform multiple side effects.

Sample microcode:

mov esi.4, eoff.4
mov ds.2, seg.2
add eoff.4, #4.4, eoff.4
ldx seg.2, eoff.4, et1.4
mov et1.4, eax.4

I couldn’t find the language reference.

ELIR. Part of the ERESI project, the goal of ELIR is to simplify static analysis by providing a platform independent abstraction. An overview was presented at Ekoparty08 (slides) and some ideas appeared in Phrack 64, but 30s of Googling didn’t get me to the language reference or a code sample, so that’s all I will say about ELIR for the moment.

Pin Inspection API. PIN, Intel’s Dynamic Binary Instrumentation framework provides a very handy instruction inspection API. This is not an IR but provides the same type of information about complex instructions without having to make giant switch statements. For instance, this is the way to log memory writes with PIN given an instruction:

VOID RecordMemWrite(VOID * addr, UINT32 size) {
    fprintf(trace,",%dW%p", size, addr);

// this function is called each time an instruction is encountered
VOID Instruction(INS ins, VOID *v) {
    // isn't that a nice API ?
    if (WRITES && INS_IsMemoryWrite(ins)) {
            ins, IPOINT_BEFORE, (AFUNPTR)RecordMemWrite,

API Documentation

Valgrind IR. On my todo list.

FermaT Transformation System. I’ll have to write something about it someday. Oh lucky you, a wikipedia entry and a bunch of papers!

Optimizing Compilers Intermediate Representations

LLVM Bitcode. This language uses low-level RISC-like instructions in SSA form with type information. It is clean and well defined, and is a very suitable target for platform-independent analysis and optimization. It is designed to convey high-level information in lower level operations, so converting machine code to LLVM bitcode probably requires some intensive work.

Here is the hello world example :

; Declare the string constant as a global constant...
@.LC0 = internal constant [13 x i8] c"hello worldA0"          

; External declaration of the puts function
declare i32 @puts(i8 *)                                           

; Definition of main function
define i32 @main() {
        ; Convert [13 x i8]* to i8  *...
        %cast210 = getelementptr [13 x i8]* @.LC0, i64 0, i64 0 ; i8 *

        ; Call puts function to write out the string to stdout...
        call i32 @puts(i8 * %cast210)
        ret i32 0

Language reference

Register Transfer Language. One of the IR used in GCC, it is an architecture-neutral assembly language that represents instructions in a LISP-like form (d’oh), like this:

(insn 2 49 3 test.c:3 (set (mem/c/i:SI (plus:DI (reg/f:DI 6 bp)
                (const_int -20 [0xffffffffffffffec])) [0 argc+0 S4 A32])
        (reg:SI 5 di [ argc ])) 47 {*movsi_1} (nil))

It feels a bit old-fashioned and less clean than LLVM bitcode, but this is just a gut feeling. Use gcc -fdump-rtl-all to see what it looks like.

Side note: the idea of dumping RTL to a file, performing transformations on it and giving this back to GCC is quite common, but RMS qualifies it as “not feasible”, even though the creator of RTL says it is not only feasible but quite useful actually.

Python XML for Real Men Cheat Sheet

Howdy again,

Now let’s do some serious XML output in Python using cElementTree.

# this is included in Python 2.5+
from xml.etree.cElementTree import ElementTree, Element, dump

# let's create the root element
root = Element("teabag")

# give it a child with an attribute
child1 = Element("spam")
child1.attrib["name"] = "value"

# and a child with text content
child2 = Element("eggs")
child2.text = "spam and eggs"

# print the whole thing to stdout

# or to a file

See the author’s website for downloads and usage information of cElementTree.

Using this API, I was able to create a 47 Mb XML file in a few minutes, burning roughly 300 Mb of heap space. This XML file represents a graph of graphs, namely the control flow graph of each function of IDA Pro. Here are some screenshots, using yEd for the visualization part:

Python XML Cheat Sheet

[UPDATED] Finally xml.dom.minidom sucks balls, it can burn up to hundreds/gigs of megabytes of sweet memory when working with “large” xml files (10 Mb or more). See this post for a really lightweight implementation.


Here is a quick reference of how to create an XML document and output it in Python.

import xml.dom.minidom

# create the document
doc = xml.dom.minidom.Document()

# populate it with an element
root = doc.createElement("teabag")

# time to give some children to the root element, one with an attribute for instance
child1 = doc.createElement("spam")
child1.setAttribute("name", "value")

# and another one with some text
child2 = doc.createElement("eggs")
text = doc.createTextNode("spam and eggs!")

# let's get the output, as a string
print doc.toprettyxml()

# you're supposed to get the following output:
#<?xml version="1.0" ?>
#    <spam name="value"/>
#    <eggs>
#        spam and eggs!
#    </eggs>

How nice is that ? Yep, a lot.