Current computer systems have very serious, fundamental problems. And these problems can be technically exploited by attackers to run arbitrary code on our machines. So I guess we can all agree that the situation right now is somewhere between very bad and super bad. What is even more surprising is our reaction to this situation: we focus on preventing the technical exploitations of the flaws, and we leave the underlying problems intact. This leads to absurd “solutions” such as ASLR, which make it harder to develop reliable exploits and leave the initial problem (memory corruption) untouched. In a way, it’s like adding a bulletproof jacket to a dead body: sure the jacket might stop some bullets, but the body was dead in the first place. There are some really smart guys out there in the industry, and they discuss how to make really sophisticated bulletproof vests and what their weaknesses are, but nobody seems to notice that the body they’re trying to protect is dead anyway.
The fundamental problems that I mentioned earlier are that we don’t know how to produce programs without vulnerabilities, and we don’t know how to analyze compiled programs.
- we don’t know how to produce programs without vulnerabilities: or rather, we know how to avoid large classes of vulnerabilities such as memory corruption bugs with type-safe languages but we prefer sticking with unsafe languages such as C. “Hey that’s crazy, why is that?”, you wonder. The only reason I can see is performance: right now nobody rivals with the speed and memory usage of C/C++ programs. In my opinion, it’s just a matter of time before new languages beat the performance of C (yes, you can go faster-than-C). And hopefully, we’ll forget about memory corruption bugs altogether. Note that it will still be a long way from producing *correct* programs, but that will already be an important milestone.
- we don’t know how to analyze compiled programs, or rather we know that compiled programs are impossible to analyze. I already explained that bit in a post about malware analysis. The good side is that if we wanted, we could make programs statically analyzable (security-wise). I’m talking about binaries here, and this is no science fiction: just check out Google NaCl and the permission system on Android devices. We just have to decide that we want this kind of binaries to become standard desktop executables, and that would partly free us from the dependance to AV signatures updates.
Regarding to te phrase “we don’t know how to produce programs without vulnerabilities”, we’ll NEVER create software without flaws of any type (design flaws, implementation flaws, concept, etc). That’s because computer science is strictly and intrinsically a human science, with a study object made by humans, and we are fallible beings; humankind will never be free of errors, software is not the exception. In every big enough agroupation of ideas and implementations regarding a specific aspect, there will eventually appear a flaw and its workaround. Sure, we can approach to an ideal state, but we won’t fully reach it.
Sorry for my english.
Kind regards,
Knyghte.
> we’ll NEVER create software without flaws of any type
sure, that’s what I meant by “correct” programs. But I still believe that if you find a way to specify certain classes of vulnerabilities, then you can have tools that decide if your program is correct with regards to the specification or not. Of course you still have potential pitfalls (correctness of the model and the specification), but that would be a nice step away from empirical testing.
Good post. I’m also a believer that it is only a matter of time before most programs are written in languages without possibility of memory corruption. I think systems code like an OS kernel will still be written in traditional system languages like C/C++ for quite some time however. But I also think that static analysis will eventually reach the point where he can make some strong safety guarantees about the system in regards to memory corruption – even if that means we have to extend the language to resolve ambiguous cases. I think that exploit mitgation like ASLR is a good strategey to use until then on mainstream operating systems, and in the near term future (in the grand scheme of things) this will be the end of code execution bugs for the majority of cases in applications. But for kernel land, I guess we’ll have to wait for static analysis.
I might be naive, but I think security research is on the right path. It just needs another 15 or 20 years to hit the big time.
> I might be naive, but I think security research is on the right path. It just needs another 15 or 20 years to hit the big time.
this is probably true, but right now I’m in a phase where I feel like a newcomer to things that seemed sophisticated, but then you see how stuff works under the hood and then you’re like “dammit, isn’t that supposed to be computer *science* ?”
(I’m talking about the fact that we have nothing that beats developing with empirical methods in antiquated system languages)
Hmm… I am not sure whether I am quite so optimistic. I agree that we’ll get the ‘code execution’ problem sorted, but I am not sure whether we’ll get ‘memory corruption’ sorted. For straight ANSI C with limited indirection of control transfer, yes, but I do not see the same advances in static analysis for C++ code…
My scenario is more something like this: what if I give you a reasonably dev-friendly language with really nice safety propertie that outperforms everything out there? Ok this is über-optimism but hey… ^^
As you know I agree with you on both points but you miss some details.
As you said the first way to get rid of memory corruption would be to move to a sound tool chain where programs can be proved safe with respect to memory corruption. There are many such frameworks, then why industry is still using poor old compilers. One answer is ‘it would be too expensive’. When MS releases a new MS Office, they re-use most of the old code and just add some new features. Moving office to a sound tool chain would require to review the whole code, and that would be really pain in the …
That is why smart people such as Podelsky try to patch the tool chain. Providing a new components that would prove a C program to be safe and that would point out all ambiguous cases. This is also a good approach that would help reusing the old libraries. To sum up the Idea, I would like to be the one moving the STL from C++ to any other good language, I would prefer a prover to tell me ‘OK just review theese few functions and the library is bug free’.
An other point is that a world without vulnerabilties would be better but not free of malware. Vulnerabilties help on the propagation aspect but malware such as rogue AV don’t need flaw to exist.
That is why we need ‘open binaries’, I mean binaries whose behavior can be soundly guessed. On this aspect, I think there is still much to do. Indeed you can ensure that a program follow some security politics and I hope that Google will enforce Nacl standards in Chrome OS. Nevertheless, it is not sufficient. Enforcing politics don’t tell you what the program really do and since it is impossible to separe good from bad behaviors we are sill stuck with some kind of black listing but at an higher level.
So my questions are ‘How to guess program behaviors ?’ and more difficult ‘How to represent a behavior ?’. Those are old logician problems (Oops I did it again ;-)).
> An other point is that a world without vulnerabilties would be better but not free of malware
that’s right. I like the ‘open binaries’ denomination :) And I agree that open binaries don’t solve the malware problem either, but at least we will be able to get out of the current quagmire.
>> But I still believe that if you find a way to specify certain classes of vulnerabilities, then you can have tools that decide if your program is correct with regards to the specification or not.
I’d wish there would be a taxonomy for software …
>> An other point is that a world without vulnerabilties would be better but not free of malware.
I subscribe to Matthieu’s thought. It’d be ideal to imprint speed, flexibility and a great cost-saver to threat analysis with an open specification.
But the bad guys can choose not to follow standards (have they even did?) and keep coding on current binaries architectures (PE, ELF). Hey Thomas More, can you help us here? (no offense intended)
I’d suggest “a new, fehacient, stable and trend-contemplative scriptable programming language, amongst other things.
>> So my questions are ‘How to guess program behaviors ?’ and more difficult ‘How to represent a behavior ?’
That’s right, in my humble opinion: behaviour’s not computable, and that’s why we need heuristics so much.
I love this post, by the way.
I think that with open binaries, you can statically compute a safe superset of the program’s behaviour. For instance you could say “this program can delete files” (you did not prove that it will, since it depends on lots of undecidable stuff but you proved that it has the potential to do so).
Again, this does not solve the malware problem but at least it would be a step in the right direction. It would be possible to detect “green binaries” for instance, such as programs that only access the display and sound devices. Think Flash or Java applets, but in x86.
Dan is right, even if behaviors are not computable in most of programming languages, you can always compute an abstract interpretation. This interpretation does not provide the exact semantics but can help to decide some security policies.
Though, this is a white/black (green, yellow, purple…) list, it is sad, but I don’t see any other way to do it.
I still think that the problem is to design a specification language that would be abstract enough to be understandable and generic enough to cover most of use cases (and criptic enough to kick off ambigious malicious behovior). Dan are you thinking about yet an other thesis chapter? ;-)