Metasm is an LGPL’ed assembly manipulation framework written in Ruby. It is capable of following flow dependences along multiple paths, a feature called backtracking in metasm jargon and close to the notion of program slicing [wikipedia.org].
Definition: a node j is flow-dependent on a node i if (1) a variable x is referenced at j, (2) x is defined at i and (3) there exists a path from i to j without intervening definitions of x. Source [research.ibm.com]
We are going to use the following ruby script to test the backtracker with different source programs, it takes as input a program (sourcecode), a node (labeled ‘sliceme’) and a variable (eax) and returns the flow-dependences of eax and in some cases, the actual values that eax will take.
require 'metasm' include Metasm def sliceit(asmsource, label, reg) # encode the shellcode sc = Shellcode.assemble(Ia32.new, asmsource) dasm = sc.disassemble(0) # get the address of the label offset = sc.encoded.export[label] reg = reg.to_sym # backtrace log = [] dasm.backtrace(reg, offset, :log => log) # return the trace log end sourcecode = <<EOS ; asm program here EOS slice = sliceit(sourcecode, 'sliceme', 'eax') # slice est un tableau qui contient le log du backtrace slice.each { |ev, *args| if ev == :di # decodedinstruction after = args[0] before = args[1] instr = args[2] puts "#{instr} [#{before} -> #{after}]" end# }
Let’s take some simple code samples. If there is a single execution path, metasm can compute the value contained in the variable, for instance:
mov eax, 42h inc eax sliceme: jmp eax -> metasm returns: 5 inc eax [eax -> eax+1] 0 mov eax, 42h [eax+1 -> 43h]
We can see that useless definitions of the variable are not taken into account:
mov eax, 0h mov eax, 42h inc eax sliceme: jmp eax -> metasm returns: 0ah inc eax [eax -> eax+1] 5 mov eax, 42h [eax+1 -> 43h]
Definitions of other variables are also ignored (if eax does not depend on them):
mov eax, 42h mov ebx, 0 sliceme: jmp eax -> metasm returns: 0 mov eax, 42h [eax -> 42h]
But they are included if eax depends on them:
mov ebx, 1h mov eax, 42h add eax, ebx sliceme: jmp eax -> metasm returns: 0ah add eax, ebx [eax -> eax+ebx] 5 mov eax, 42h [eax+ebx -> ebx+42h] 0 mov ebx, 1 [ebx+42h -> 43h]
If there are multiple acyclic paths, metasm will be able to compute the value of the sliced variable along each path, for instance:
mov eax, 42h cmp ebx, 0 jnz pouet dec eax jmp sliceme pouet: inc eax sliceme: jmp eax -> metasm returns: 0dh inc eax [eax -> eax+1] 0 mov eax, 42h [eax+1 -> 43h] 0ah dec eax [eax -> eax-1] 0 mov eax, 42h [eax-1 -> 41h]
Metasm states that at the end, eax will be either 41h or 43h but it doesn’t know which one. It doesn’t state it explicitly, but the value of eax is control-dependent on ebx.
Finally in the worst case scenario, if there are cyclic paths (such as loops), metasm will go through each path once but will be unable to compute the value after the cycle. Note that in the general case, this is undecidable.
mov eax, 0 entry: cmp ebx, 0 jnz sliceme dec ebx inc eax jmp entry sliceme: jmp eax -> metasm returns: 0bh inc eax [eax -> eax+1] 0 mov eax, entrypoint_0 [eax -> 0]