Following Flow Dependences with Metasm

Metasm is an LGPL’ed assembly manipulation framework written in Ruby. It is capable of following flow dependences along multiple paths, a feature called backtracking in metasm jargon and close to the notion of program slicing [].

Definition: a node j is flow-dependent on a node i if (1) a variable x is referenced at j, (2) x is defined at i and (3) there exists a path from i to j without intervening definitions of x. Source []

We are going to use the following ruby script to test the backtracker with different source programs, it takes as input a program (sourcecode), a node (labeled ‘sliceme’) and a variable (eax) and returns the flow-dependences of eax and in some cases, the actual values that eax will take.

require 'metasm'
include Metasm

def sliceit(asmsource, label, reg)
    # encode the shellcode
    sc = Shellcode.assemble(, asmsource)
    dasm = sc.disassemble(0)

    # get the address of the label
    offset = sc.encoded.export[label]

    reg = reg.to_sym

    # backtrace
    log = []
    dasm.backtrace(reg, offset, :log => log)

    # return the trace

sourcecode = <<EOS
; asm program here

slice = sliceit(sourcecode, 'sliceme', 'eax')
# slice est un tableau qui contient le log du backtrace
slice.each { |ev, *args|
    if ev == :di    # decodedinstruction
        after = args[0]
        before = args[1]
        instr = args[2]
        puts "#{instr} [#{before} -> #{after}]"

Let’s take some simple code samples. If there is a single execution path, metasm can compute the value contained in the variable, for instance:single_path

mov eax, 42h
inc eax
sliceme: jmp eax

-> metasm returns:
5 inc eax [eax -> eax+1]
0 mov eax, 42h [eax+1 -> 43h]

We can see that useless definitions of the variable are not taken into account:

mov eax, 0h
mov eax, 42h
inc eax
sliceme: jmp eax

-> metasm returns:
0ah inc eax [eax -> eax+1]
5 mov eax, 42h [eax+1 -> 43h]

Definitions of other variables are also ignored (if eax does not depend on them):

mov eax, 42h
mov ebx, 0
sliceme: jmp eax

-> metasm returns:
0 mov eax, 42h [eax -> 42h]

But they are included if eax depends on them:

mov ebx, 1h
mov eax, 42h
add eax, ebx
sliceme: jmp eax

-> metasm returns:
0ah add eax, ebx [eax -> eax+ebx]
5 mov eax, 42h [eax+ebx -> ebx+42h]
0 mov ebx, 1 [ebx+42h -> 43h]

If there are multiple acyclic paths, metasm will be able to compute the value of the sliced variable along each path, for instance:multiple

mov eax, 42h
cmp ebx, 0
jnz pouet
dec eax
jmp sliceme
pouet: inc eax
sliceme: jmp eax

-> metasm returns:
0dh inc eax [eax -> eax+1]
0 mov eax, 42h [eax+1 -> 43h]
0ah dec eax [eax -> eax-1]
0 mov eax, 42h [eax-1 -> 41h]

Metasm states that at the end, eax will be either 41h or 43h but it doesn’t know which one. It doesn’t state it explicitly, but the value of eax is control-dependent on ebx.

Finally in the worst case scenario, if there are cyclic paths (such as loops), metasm will go through each path once but will be unable to compute the value after the cycle. Note that in the general case, this is undecidable.cycle

mov eax, 0
entry: cmp ebx, 0
jnz sliceme

dec ebx
inc eax
jmp entry

sliceme: jmp eax

-> metasm returns:
0bh inc eax [eax -> eax+1]
0 mov eax, entrypoint_0 [eax -> 0]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s