When you look at software engineering as a research field, you can see some pretty serious progress there. There are amazing projects like PyPy and LLVM, massive optimizations in gcc and JIT compilers (HotSpot, Psyco, TraceMonkey). Compared to that, I have the impression that the reverse engineering community did not produce any significant results. What we have is disassemblers, that is to say parsers.
To make things even worse, the more advanced tools used in RE have been created for a totally different purpose (think Pin, VEX, QEMU, Bochs, virtualization…). Some nice works are being performed by folks like Sean Heelan, Silvio Cesare, the Sogeti R&D team (metasm, fuzzgrind) and the BitBlaze team (TEMU, Vine). But overall I can see no open, community-driven, formally sound approach. The tools are either not FOSS, limited in scope, or just not-that-reusable.
There is a number of potential factors to explain the situation:
- reversers are not developers (this, I think, is a big factor)
- reversers are solitary, basement programmers (not to mention cheese pops and japanese tentacle porn)
- the complexity of x86 + Windows makes the entry cost too high for academics
We are therefore left with a research niche with virtually no academics, little to no developer community, that still pumps some big bucks. The only player left is the security industry, i.e. corporations which have absolutely no incentive to solve the problem.
Did I miss something, or is the picture really that grim?
Maybe one of the reasons for not having evolved so much is the fact that most software engineering projects tend to be built in top of consistent-well-established bases (e.g Hardware Indepentent Libraries) which allows to work with the system almost forgetting any low-level hardware connection. As you say, RE is a very complex task and most of the time relies on the tiny little details that populate our systems. All in all I mean, software engineering projects can be tackled in a global hardware independent thinking whereas RE tends to be the complete opposite (You build the ideas from the implementation).
Not a very developed reasoning from my side here, doesn’t intends to be so either, just thought of it while reading the post and dunno, kinda makes sense to me :)
What you say is true: in RE we build one-shot tools to tackle specific problems because each time we have to cope with all the complexity and the obscure corners of the current implementations. I still think it should be possible however to abstract most of the details away with good binary analysis libraries, and I find it surprising that such a thing has not emerged yet.
From the academic point of view I cannot tell wold-wide but in france many ‘one-shot’ RE projects result from Master or PhD thesis. It is uncommon that a PhD is recruited within his former team. As a result, the project ends and it is difficult for a research team to recover the lost ‘know-how’.
Now, why the industry don’t build long term public RE projects. I think this is also because of ‘know-how’. The skills of a new born PhD can be very valuable for a business as soon as the ‘know-how’ remains secret. An illustration of this is the AV industry where the quality of the products mainly depends on the skills of analysts.
Re tools are too expensive to develop and just give them away (would you give something like IDA for free, or even open its sources while you can get a bunch of benjamins), there are many cool projects but developed behind closed doors by AV industry.
sure, I guess everywhere companies have their own projects, and they see no incentive to open them. Yet usually communities emerge and create open products… but not in RE
There are many factors that make it difficult to develop RE tools, be it either open or commercially.
Before we begin, we should clarify to ourselves just how tiny the RE community is. I would be surprised if we have 5000 active reverse engineers in the world.
By different estimates, there are between 1m and 10m software developers in the world, leading to a 200:1 or 2000:1 ratio.
Reason #1: Development makes money, RE does not
Companies like Microsoft, Google etc. essentially make their money writing code. They will invest heavily in development tools. Some of them will provide funds to FOSS to build stuff. But all this money goes to *development*, not to RE.
Reason #2: FOSS is altruistic, building for RE’s isn’t
Let’s assume your employer doesn’t support FOSS, and you’re going to spend your free time on a project. That in itself is quite altruistic. Would you not prefer to contribute to a community that touches and improves the lives for *many* people, not just the lives of roughly 5000 hardcore reversers ?
Reason #3: RE tools are really difficult to build.
They require people that combine the following traits:
1) Knowledge of RE. This is not easily acquired in school.
2) Ability to read the more “formal” parts of literature. Many CS graduates can’t read basic program analysis papers.
3) Ability to develop larger projects. This is also rare.
Being able to do any of the three things above will provide you with a job outside of RE tool development. Being able to do two of the three will provide you with a well-paid job outside of RE tool development.
Reason #4: One-off-tools to get publications
A lot of good research work gets done on a “one-off” basis. There is a long painful stretch between “prototype that works often enough for our paper” and “usable tool”.
For publications, the former is sufficient.
Reason #5: PhD thesis are time-limited
People that work on a RE-related topic for their PhD thesis need to eat after they are PhDs. This means they have two options: Go to industry, which is unlikely to pick up their PhD work (there is no money to be made in RE tools), or go to academia, which actively disincentives maintaining an existing tool (you’re supposed to publish new papers, not spend time fixing bugs in existing code).
Reason #6: RE is a difficult market commercially
I think few people realize this (perhaps primarily Ilfak and me ;) — but RE is a difficult market to be in. You have *very* demanding customers, that are at the same time quite resource-constrained. You need extremely good developers to work on the tools. This puts you in a situation where you essentially can’t pay the developers the same amount of money they could make scaling facebook or working on photoshop, and you have to hope that the work environment you create for the devs, coupled with the interesting problems, convinces them to work on a less well-paid job.
So…errr…I could go on for a while, but really: The state of RE tools is not surprising. In fact, I am rather astonished that we *have* the tools we have in the first place, and this is mainly due to the fact hat a lot of us like to work on RE in spite of all the economics being stacked against us.
Wow, that is indeed a lot of good reasons.
> Reason #2: FOSS is altruistic, building for RE’s isn’t
That’s true if you build something specifically for reversers, but I can imagine projects originating from RE that could be useful to other communities.
> Reason #3: RE tools are really difficult to build.
This is probably the major concern. We would need someone who combines the three traits and is crazy enough to devote his time to developing a free RE project, and the probability of all that happening at the same time is minus something.
> Reason #4: One-off-tools to get publications
Why does this sound all too familiar?
> People that work on a RE-related topic for their PhD thesis need to eat after they are PhDs.
Don’t break my delusions please.
> Reason #4: One-off-tools to get publications
even if their tools aren’t FOSS!
I can understand if companies do not open source their applications, but why are so many academia RE tools closed source?
They could form a base toolset to enable the further development and maintenance (eradicating Reason #5). And if there is an existing FOSS RE toolset, one-shot tools could be integrated into the existing toolsets.
Although there is a current trend to publish the sources of RE research tools (Dan’s PIN tools, BitBlaze/TEMU), there has been a set of tools, coming from the academia, which hasn’t been open sourced: Anubis, TaintBochs, …
> I can understand if companies do not open source their applications, but why are so many academia RE tools closed source?
The reasons we face in my lab are:
1. you don’t own your code, so you can’t give it away. Technically, the owner of the code is not the PhD student but his institute or the people spending money on him/her. For something “big”, potentially patentable (meh), you would have to ask formally for the authorization to open source it…
2. really dirty code. Honestly, the reason why my tools were private so far was that it takes time to make something public. You have to make sure that your code does not look too crappy and compiles without you, write some documentation, setup a website, provide some form of support… And you’re not even sure to have feedback, so it is an immediate cost with potentially no benefit.
No (academic) newcomers into the RE fields?
– So what do we do now? The complexity of Windows (7) on x86(-64) is enormously high. Most people at my university are way too uninterested to dive into RE these days. It takes years. And patience. People – like professors e. g. – are telling you that it’s impossible to analyze binaries. Just because their own RE abilities are very limited. Who teaches RE skills? No one does. First part of the problem. Or who can teach that?
I wouldn’t say I’m good at RE. My work doesn’t require deeply sophisticated skills of that kind. I see many very interesting tools that could be OpenSource. Like BinVis, BinNavi or so. But they aren’t. That’s the industry’s own choice: no one contributes to the whole. Or very few people.
If you want to do practical reverse engineering just with OpenSource tools: have fun. There aren’t very many of these tools. Maybe metasm, VILE, VEX, ButBlaze, maybe even IDA plugins. But seriously: using these tools requires a hardcore set of skills or – speaking of IDA – money and at least some skills.
If no one teaches RE, and no one offers free tools – no one will follow the path. Correct me if I’m wrong, but the picture is much more grim: if you attempt to research RE topics at a university there may even be legal problems…
> Correct me if I’m wrong, but the picture is much more grim: if you attempt to research RE topics at a university there may even be legal problems…
The magic answer here is “malware analysis”. You don’t do RE, you do malware analysis. Just to make sure you won’t have any problems, you have to mention that at least 15 zillion malware samples are produced each day by friggin’ terrorists, and they cost 110 trouzillions dollars a year to the free world, resulting in lots of pain, natural disasters, and unnecessary deaths of innocent baby seals. Legal problem solved.
Hey Dan,
sorry if my post sounds too negative :)
>> Reason #2: FOSS is altruistic, building for RE’s isn’t
>That’s true if you build something specifically for reversers, but I can imagine projects >originating from RE that could be useful to other communities.
Yes, of course, but not “in the short run” (e.g. less than 3-5 years).
>This is probably the major concern. We would need someone who combines the
> three traits and is crazy enough to devote his time to developing a free RE project,
>and the probability of all that happening at the same time is minus something.
Well… look at it from another angle, too: I built “free” RE tools earlier, never got
feedback, and had a gazillion other things to do. Then I built BinDiff non-free, and
at least I could afford spending a long and sustained effort on it, which would not have
happened otherwise. Neither BinNavi nor VxClass could’ve been built on my free
time.
And there’s a *long* painful stretch from “prototype” to “usable product”. Literally, years.
>> People that work on a RE-related topic for their PhD thesis need to eat after they are PhDs.
>Don’t break my delusions please.
Hahaha. Sorry…
Well, I’m not giving up just yet ;-)
fuzzing has made some strides