The Sad State of Computer Security

20 11 2009

Current computer systems have very serious, fundamental problems. And these problems can be technically exploited by attackers to run arbitrary code on our machines. So I guess we can all agree that the situation right now is somewhere between very bad and super bad. What is even more surprising is our reaction to this situation: we focus on preventing the technical exploitations of the flaws, and we leave the underlying problems intact. This leads to absurd “solutions” such as ASLR, which make it harder to develop reliable exploits and leave the initial problem (memory corruption) untouched. In a way, it’s like adding a bulletproof jacket to a dead body: sure the jacket might stop some bullets, but the body was dead in the first place. There are some really smart guys out there in the industry, and they discuss how to make really sophisticated bulletproof vests and what their weaknesses are, but nobody seems to notice that the body they’re trying to protect is dead anyway.

The fundamental problems that I mentioned earlier are that we don’t know how to produce programs without vulnerabilities, and we don’t know how to analyze compiled programs.

  • we don’t know how to produce programs without vulnerabilities: or rather, we know how to avoid large classes of vulnerabilities such as memory corruption bugs with type-safe languages but we prefer sticking with unsafe languages such as C. “Hey that’s crazy, why is that?”, you wonder. The only reason I can see is performance: right now nobody rivals with the speed and memory usage of C/C++ programs. In my opinion, it’s just a matter of time before new languages beat the performance of C (yes, you can go faster-than-C). And hopefully, we’ll forget about memory corruption bugs altogether. Note that it will still be a long way from producing *correct* programs, but that will already be an important milestone.
  • we don’t know how to analyze compiled programs, or rather we know that compiled programs are impossible to analyze. I already explained that bit in a post about malware analysis. The good side is that if we wanted, we could make programs statically analyzable (security-wise). I’m talking about binaries here, and this is no science fiction: just check out Google NaCl and the permission system on Android devices. We just have to decide that we want this kind of binaries to become standard desktop executables, and that would partly free us from the dependance to AV signatures updates.




Deepsec slides and tool releases

19 11 2009

Deepest apologies to Edward Hopper

I’m writing this from deepsec, where I just finished my talk [slides] about dynamic instrumentation. It was a wrap up of what I did last year with Pin (malware analysis, unpacking) and a Javascript deobfuscator I didn’t blog about.

Some of you might be happy to know that the tools are now available at Google code:

  • Crême Brûlée: Javascript deobfuscation using dynamic instrumentation
  • Tarte Tatin Tools: my set of pintools for tracing and unpacking (including an IDA Python script)

As you might guess, the tools are more prototypes than anything else, and I advise you to use them only if you feel really adventurous.





The Sad State of Reverse Engineering

1 11 2009

When you look at software engineering as a research field, you can see some pretty serious progress there. There are amazing projects like PyPy and LLVM, massive optimizations in gcc and JIT compilers (HotSpot, Psyco, TraceMonkey). Compared to that, I have the impression that the reverse engineering community did not produce any significant results. What we have is disassemblers, that is to say parsers.

To make things even worse, the more advanced tools used in RE have been created for a totally different purpose (think Pin, VEX, QEMU, Bochs, virtualization…). Some nice works are being performed by folks like Sean Heelan, Silvio Cesare, the Sogeti R&D team (metasm, fuzzgrind) and the BitBlaze team (TEMU, Vine). But overall I can see no open, community-driven, formally sound approach. The tools are either not FOSS, limited in scope, or just not-that-reusable.

There is a number of potential factors to explain the situation:

  • reversers are not developers (this, I think, is a big factor)
  • reversers are solitary, basement programmers (not to mention cheese pops and japanese tentacle porn)
  • the complexity of x86 + Windows makes the entry cost too high for academics

We are therefore left with a research niche with virtually no academics, little to no developer community, that still pumps some big bucks. The only player left is the security industry, i.e. corporations which have absolutely no incentive to solve the problem.

Did I miss something, or is the picture really that grim?





Stop the Bullshit, People

29 10 2009

Here is the top 5 list of bad ideas that show up every time you discuss malware or desktop security. These ideas are so bad that they get you sucked into a depressingly bad exchange of stupid arguments. So please, stop using them. Or else I’ll kick you in the nuts.

 

“Yeah sure it works… if there’s no vulnerability in it lol”
That, sir, is a tautology. Besides, with this kind of argument, you can quickly infer that nothing actually works.

“Yeah your technique is nice and all, but there’s no way it’s going to be included in mainstream computers (i.e. Windows)”
This is such a bad idea, that I’m not even going to comment on it.

“Your anti-malware technique will not work in cases X and Y”
Of course it won’t. We only have informal definitions of malware, so basically every anti-malware scheme is based on heuristics (i.e. sometimes they work, sometimes not)

“You can’t ask the user to make informed decisions”
As stated above, we have no automatic way to decide if actions are malicious or not. So of course at some point we’ll have to ask the user. Just because the Vista UAC sucked does not mean all ask-the-user schemes suck.

“I don’t care about malware, I’m not running Windows”
Deep inside you, you know that there is no secret sauce in other OSes that make them magically immune to malware, don’t you?

 





Differential Reversing

2 10 2009

I love this [dion.t-rexin.org]: a known technique with a clean, elegant and almost free approach. The idea is to find interesting input-dependent spots in binaries:

  • first, instrument the binary and record a hit trace (basic block granularity is enough) for a base input and a trigger input
  • then, compute the difference between the hit traces
  • finally, highlight the differences in a disassembler, and plug a wetware to analyse the result

Dion does (1.) with a pintool, (2.) with a python script and (3.) with IDAPython. Sweeeet :)





A new visualization for packed and self-modifying programs

21 09 2009

I have been working with my PhD supervisor on a dynamic typing system to detect and visualize the temporal evolution of self-modifying programs (it’s not as complicated as it sounds). The typing system works as follows:

  • each memory address has a read, write and execution level (r, w, x)
  • initially, every memory address begin with type (0, 0, 0)
  • when an address with type (r, w, x) is executed, its type becomes (r, w, w+1)
  • when an instruction with type (r1, w1, x1) reads a memory address with type (r2, w2, x2), the target address type becomes (x1, w2, x2)
  • when an instruction with type (r1, w1, x1) writes to a memory address with type (r2, w2, x2), the target address type becomes (r2, x1, x2)

With that we can get a trace from a program (with DBI, an emulator, a debugger, whatever) and see what is executed (execution level >= 1). By construction, if we have code with an execution level of 2, it means that it has been written by the program itself before being executed, therefore it is self-modifying code.

Again by construction, if we see code with an execution level k+1, it means that it has been written by code at level k. Hence we can precisely distinguish between different layers of code (in our jargon, different code waves)

Now we can detect some interesting properties based on the type of memory addresses:

  • if an address has been read, written and then executed (RWX), we label it decrypted
  • if an address has only been written and executed (WX), we label it blind write
  • if an address has been executed and then read (XR), we assume there has been an integrity check
  • if an address has been executed and then written (XW), we assume the code has been scrambled (supposedly as an anti-memory-dump technique)

Therefore we have a way to trace different layers of code, and some relations between the layers (decryption, blind writes, integrity checking and code scrambling). This gives us the following visualization for some packers:

upx-hostnamemolebox-hostnamepec2-hostnameyp-1allaplepelock-hostnameacprotect-hostnametelock-hostname

Note 1: thanks to Silvio Cesare for providing the packed samples

Note 2: we are going to present all this stuff at Malware (Montréal) with Jean-Yves Marion and Wadie Guizani, and at Deepsec (Vienna)





A look at anti-virtualization in malware samples

21 09 2009

In previous posts, I described PuppetMaster, a way to dynamically detect and control CPU-based VMM detection methods in malware samples. We ran it on 2 sets of malware samples, and here are the results.

1. 60k samples from a Nepenthes honeypot

  • 62498 samples on the honeypot
  • 59554 of them being executable files
  • 48404 were analysed “correctly”
  • 13409 samples were terminated due to a 2 minutes timeout

The number of samples trying to detect virtualization is surprisingly low:

  • 71 (0.15%) binaries used at least one anti-virtualization technique
  • 65 (0.13%) binaries used the SIDT anti-virtualization technique
  • 0 (0.00%) binaries used the STR anti-virtualization technique
  • 0 (0.00%) binaries used the SLDT anti-virtualization technique
  • 0 (0.00%) binaries used the SGDT anti-virtualization technique
  • 14 (0.03%) binaries used the VMware channel anti-virtualization technique

2. 25k samples from uh… somewhere

These samples were shared by Paul Royal, so thanks Paul :)

  • 25118 samples
  • 23104 of them being executable files
  • 18670 were analysed “correctly”
  • 8298 samples were terminated due to a 2 minutes timeout

Again, the number of samples trying to detect virtualization is very low:

  • 117 (0.63%) binaries used at least one anti-virtualization technique
  • 56 (0.30%) binaries used the SIDT anti-virtualization technique
  • 0 (0.00%) binaries used the STR anti-virtualization technique
  • 2 (0.01%) binaries used the SLDT anti-virtualization technique
  • 6 (0.03%) binaries used the SGDT anti-virtualization technique
  • 58 (0.31%) binaries used the VMware channel anti-virtualization technique

Conclusion

There are a few potential reasons why the numbers are so low:

  1. the samples used other techniques that we do not support (such as detecting the VMware tools, or hardware version)
  2. or the samples we got are really not representative of malware samples in the wild. Indeed, our 60k samples contain mostly Allaple samples.
  3. or anti-virtualization techniques are not that common in actual malware samples…

It would be interesting to run the test on better malware repositories, unfortunately this is not something obvious to get our hands on. So if you have a big malware repo ready to be dissected, and you would like to share them with an academic lab for free, I’d be glad to hear from you: reynaudd at loria dot fr.





Do We Really Need Malware Analysis?

15 09 2009
Recently I’ve been wondering, how is malware analysis different from traditional program analysis? The fundamental reason is that programs can generally self-modify themselves. There is a direct consequence: with malware we have to admit that we don’t have static access to the program listing (thus preventing standard program analyses). And since turning self-modifying code (SMC) into normal code is undecidable, we end up only with technical (i.e. partial) solutions. This is why virtually every paper on malware analysis will only be a report on how a given technology/implementation is better/faster/stronger than the others.
This has a corollary too: since we have only partial solutions, malware authors actively implement techniques to defeat our implementations. This opens a sub-research field: the production of techniques to defeat the analysis-defeating techniques. Yes, there is some irony in this, for instance this about packing -> emulation-based unpacking -> anti-emulation techniques -> other-wonderful-unpacking-techniques…
Now, you might wonder, how did we get into this quagmire? As Schneier (http://www.schneier.com/blog/archives/2007/05/do_we_really_ne.html) pointed it out before me, this is an accident – a historic by-product of the way the IT industry evolved. The x86 architecture allowed self-modifying code, and operating systems did nothing to prevent or regulate that. And bam, a research niche was born.

omgwtfRecently I’ve been wondering, how is malware analysis different from traditional program analysis? The fundamental reason is that programs can generally self-modify themselves. There is a direct consequence: with malware we have to admit that we don’t have static access to the program listing (thus preventing standard program analyses). And since turning self-modifying code into normal code is undecidable, we end up only with technical, partial solutions. This is why virtually every paper on malware analysis will only be a report on how a given technology/implementation is better/faster/stronger than the others.

This has a corollary too: since we have only partial solutions, in some cases they don’t work. And malware authors actively exploit that fact, by implementing techniques to defeat our implementations. This opened a sub-research field: the production of techniques to defeat the analysis-defeating techniques. Yes, there is some irony in this, for instance think about packing -> emulation-based unpacking -> anti-emulation techniques -> other-wonderful-unpacking-techniques…

Now, you might wonder, how did we get into this quagmire? As Schneier pointed it out before me, this is an accident – a historic by-product of the way the IT industry evolved. The x86 architecture allowed self-modifying code, and operating systems did nothing to prevent or regulate that. And bam, a research niche was born.





Automatic Exploit Generation

11 09 2009

One of the best MSc dissertations I’ve read:

“We present a novel algorithm that integrates data-flow analysis and a decision procedure with the aim of automatically building exploits. The exploits we generate are constructed to hijack the control flow of an application and redirect it to malicious code.

Our algorithm is designed to build exploits for three common classes of security vulnerability; stack-based buffer overflows that corrupt a stored instruction pointer, buffer overflows that corrupt a function pointer, and buffer overflows that corrupt the destination address used by instructions that write to memory. For these vulnerability classes we present a system capable of generating functional exploits in the presence of complex arithmetic modification of inputs and arbitrary constraints. Exploits are generated using dynamic data-flow analysis in combination with a decision procedure.”

(yes, I am now Schneier-blogging)





Merry Christmas everybody!

20 07 2009

books