Recently I’ve been wondering, how is malware analysis different from traditional program analysis? The fundamental reason is that programs can generally self-modify themselves. There is a direct consequence: with malware we have to admit that we don’t have static access to the program listing (thus preventing standard program analyses). And since turning self-modifying code (SMC) into normal code is undecidable, we end up only with technical (i.e. partial) solutions. This is why virtually every paper on malware analysis will only be a report on how a given technology/implementation is better/faster/stronger than the others.
This has a corollary too: since we have only partial solutions, malware authors actively implement techniques to defeat our implementations. This opens a sub-research field: the production of techniques to defeat the analysis-defeating techniques. Yes, there is some irony in this, for instance this about packing -> emulation-based unpacking -> anti-emulation techniques -> other-wonderful-unpacking-techniques…
Now, you might wonder, how did we get into this quagmire? As Schneier (
http://www.schneier.com/blog/archives/2007/05/do_we_really_ne.html) pointed it out before me, this is an accident – a historic by-product of the way the IT industry evolved. The x86 architecture allowed self-modifying code, and operating systems did nothing to prevent or regulate that. And bam, a research niche was born.
Recently I’ve been wondering, how is malware analysis different from traditional program analysis? The fundamental reason is that programs can generally self-modify themselves. There is a direct consequence: with malware we have to admit that we don’t have static access to the program listing (thus preventing standard program analyses). And since turning self-modifying code into normal code is undecidable, we end up only with technical, partial solutions. This is why virtually every paper on malware analysis will only be a report on how a given technology/implementation is better/faster/stronger than the others.
This has a corollary too: since we have only partial solutions, in some cases they don’t work. And malware authors actively exploit that fact, by implementing techniques to defeat our implementations. This opened a sub-research field: the production of techniques to defeat the analysis-defeating techniques. Yes, there is some irony in this, for instance think about packing -> emulation-based unpacking -> anti-emulation techniques -> other-wonderful-unpacking-techniques…
Now, you might wonder, how did we get into this quagmire? As Schneier pointed it out before me, this is an accident – a historic by-product of the way the IT industry evolved. The x86 architecture allowed self-modifying code, and operating systems did nothing to prevent or regulate that. And bam, a research niche was born.
Like this:
Like Loading...
Related
Good that accident happened. It made your life so damn cool!
If I wasn’t doing malware analysis, I would probably be playing in a rock band right now. *That* would be cool ;)
Let us generalize the question: to which extent do we need to understand attaks in order to produce working defenses? I am a software security tester, and I often don’t care about details. Abstractions are sufficient in many cases. For instance if a system allows unauthorized parties to execute program code in a trust domain they should have only limited access to, this is always bad, no matter how the code is designed. On the other hand I may care about those details that matter for getting the code into this trust domain.
So what are the promises of malware analysis?
The purpose of malware analysis should be to answer the question “is this piece of program potentially malicious ?”
The problem is: we really don’t know how to do that. Dealing with the “potentially” part is intractable in the worst case (I can give examples). And it’s somewhat pointless, since we don’t know what “malicious” means anyway.
For some subsets of the programs out there I can answer a slightly modified question without even looking at the program. The modification is to replace malicious with harmful; I prefer considering effects, not intentions. Any program not executed on my machine is definitiely not potentially harmful since it has no chance of doing any harm to me. Any program properly confined to a sandbox is also not potentially harmful (unless the sandbox has a problem, and within the definition of harm underlying the sandbox design). Any program executed on a machine without any assets on it is not potentially harmful to this machine (it may be to others).
These examples suggest that we have plenty of ways of limiting harm without knowing what individual programs may do or not do. Again, what exactly are the promises of malware analysis?
I agree there are some cases where you know a program won’t hurt you. That explains the success of Flash programs for instance.
But I’m not sure how reasoning about “harmful” solves the problem, because most programs can be harmful in some way. For instance, formatting a hard drive is clearly harmful, but it is only malicious if I did not intend dancingbunnies.exe to format my hard drive. This is an extreme example, but I think in most cases intent matters.
Good point, a program should only do things that you are aware of, or maybe things that you would approve of if you were aware it is doing them. But I don’t see how this could be formalized and automatically determined: this definition implies that any program can be both harmful/malicious and harmless, depending solely on the awareness, intentions and expectations of the person running it.
Perhaps a probabilistic definition is more suitable. Some behaviors of a program or effects of program execution are more likely to be found in malicious/harmful programs than they are in harmless ones. Self-replication into other programs or machines is perhaps the most obvious example.