I have been working with my PhD supervisor on a dynamic typing system to detect and visualize the temporal evolution of self-modifying programs (it’s not as complicated as it sounds). The typing system works as follows:
- each memory address has a read, write and execution level (r, w, x)
- initially, every memory address begin with type (0, 0, 0)
- when an address with type (r, w, x) is executed, its type becomes (r, w, w+1)
- when an instruction with type (r1, w1, x1) reads a memory address with type (r2, w2, x2), the target address type becomes (x1, w2, x2)
- when an instruction with type (r1, w1, x1) writes to a memory address with type (r2, w2, x2), the target address type becomes (r2, x1, x2)
With that we can get a trace from a program (with DBI, an emulator, a debugger, whatever) and see what is executed (execution level >= 1). By construction, if we have code with an execution level of 2, it means that it has been written by the program itself before being executed, therefore it is self-modifying code.
Again by construction, if we see code with an execution level k+1, it means that it has been written by code at level k. Hence we can precisely distinguish between different layers of code (in our jargon, different code waves)
Now we can detect some interesting properties based on the type of memory addresses:
- if an address has been read, written and then executed (RWX), we label it decrypted
- if an address has only been written and executed (WX), we label it blind write
- if an address has been executed and then read (XR), we assume there has been an integrity check
- if an address has been executed and then written (XW), we assume the code has been scrambled (supposedly as an anti-memory-dump technique)
Therefore we have a way to trace different layers of code, and some relations between the layers (decryption, blind writes, integrity checking and code scrambling). This gives us the following visualization for some packers:
Note 1: thanks to Silvio Cesare for providing the packed samples
Note 2: we are going to present all this stuff at Malware (Montréal) with Jean-Yves Marion and Wadie Guizani, and at Deepsec (Vienna)
The graph visualizations look great. I’m looking forward to read the upcoming papers.
thanks, same for your malware classification paper! I hope you get some nice malware phylogeny graphs :)