Indefinite Studies

Academic ramblings about software security.

The Berkeley Overmind

leave a comment »

Some postdocs at my university get paid to play StarCraft. My life sucks.

Written by dan

January 20, 2011 at 11:08

Posted in useless

Obfuscation Patterns in Academic Papers

with 6 comments

These techniques seem to work, since I stumble quite frequently upon them in papers accepted at important venues:

  • misleading claim — come up with your own definition of a widely accepted term, preferably in a way inconsistent with the intuition and the general understanding of the term. Then claim spectacular results that happen to be true only with respect to your definition.
  • marketing pitch — use data irrelevant to the scientific discussion. Bonus: cite a dubious, hard-to-verify source with a non-existent methodology. Ex: “Problem X costs more than Y gazillion $$$ per year (hence our work is of epic importance)”.
  • over-sophistication —  use overly sophisticated terms for techniques or ideas that would otherwise be simple to understand (especially if it makes them look better). Ex: “heuristics” [1] sounds a lot better than “hack”.
  • no limitations — there are no limitations to your work, only things that are either out-of-scope or left for future work.
  • outsourcing problems — outsource large portions of the hard problems in your area by “leveraging” (see over-sophistication) existing work from a different domain with a complex literature, just to make sure that nobody can grasp all the dependencies of your paper.
  • biased evaluation — perform experimental evaluation designed to make others look bad. Even better if the evaluation is not reproducible, because you fail to disclose your source code, detailed algorithm or input set.

[1] heuristics (noun — hyu’-ris-tik): algorithm that does not work.

Written by dan

December 26, 2010 at 16:24

Posted in rants

The Halting Problem for Reverse Engineers

with 14 comments

I often have the feeling that technically savvy people don’t have a very high opinion of academia, and this is particularly true of security people. They have to deal with low-level details such as hardware architecture, operating systems internals (more or less documented), proprietary protocols and data structures, all of which require very specialized knowledge. Logic, theorems and algorithms don’t have a predominant place in that picture.

For the past few years I have been working on these subjects, and found some fundamental theorems to be actually *useful* in understanding the security properties of computer architectures. One of them is, of course, the undecidability of the halting problem. In essence, it says that we can not know if the computation of a program on some input will ever terminate. So what? Who cares if a program terminates, what we want is to find vulnerabilities and unpack malware samples, right?

The importance of the undecidability of the halting problem lies in its generality. In particular we can see Rice’s theorem as a generalization of this result, and to put it very simply, it says that whatever properties of programs you’re interested in, no program can tell if this property holds for every program (i.e. it is undecidable).

This is very bad news for all of us, since basically everything about programs is undecidable. Say that you are interested in finding functions that do out-of-bounds memory writes (or as I said, any other property), Rice’s theorem says that there is no program that will give you a correct answer all the time. You must accept that your program sometimes fails or infinitely loops.

I want to emphasize how bad this is. Do not let the terminology used in the theorems confuse you. In particular, the notions of input, output, and function computed by a program do not map nicely to binaries. An output is anything that gets modified by your program — any register or memory location, as soon as it is touched by an instruction, is an output. And basically, everything about outputs is undecidable. As a consequence, simple tasks such as disassembling are undecidable.

For instance, take this seemingly innocent indirect jump:

jmp [eax]

If eax is an output of instructions before it, no luck, its value is undecidable. You can run the program, write the value down, and assume it will not change, but you have no guarantee that it will not change at a given date. Undecidable. You could argue that eax can only take a finite number of values, and hence disassembling is still possible, just very intractable. But that would be without counting on self-modifying code. SMC (think packers) is the scourge of disassemblers because it gives the ability to transfer control to an output. Since I can’t decide the value of the output, I can’t disassemble.

To sum things up, here are a few direct consequences of the undecidability of the halting problem:

  1. you can’t decide the target of indirect jumps, reads and writes
  2. you can not decide if a particular memory address is code, data, or both
  3. you can’t decide values written in memory
  4. you can’t decide the numbers of occurrences of loops
  5. you can’t decide if control flow can reach a given instruction
  6. whatever you see in a given run can change arbitrarily in another run
  7. disassembling is undecidable
  8. unpacking is undecidable

I will leave how all this led to the bad habit of relying on “heuristics” to a further post. Stay classy!

Written by dan

December 19, 2010 at 08:12

Posted in code analysis, security

Headin’ Out Californee Way

with 5 comments

Well, I have not been very active on this blog lately. Since my last post I got married, defended my PhD (in front of a superlatively awesome committee, really) and left the snowy plains of Lorraine for California, among other things. I started my new job as a research staff member in Dawn Song‘s group in Berkeley, so I will probably keep complaining about malware, vulnerabilities, the price of housing and pretty much everything else on this blog.

Also, I might actually start training to become a cage fighter and learn emacs — not sure which one is less painful.

Written by dan

December 8, 2010 at 08:57

Posted in useless

On the unsolvability of program behaviour

leave a comment »

This is spectacularly well stated:

Due to the unsolvability of the halting problem (and nearly any other question concerning program behaviour), no analysis that always terminates can be exact. Therefore we only have three alternatives:

  • Consider systems with a finite number of finite behaviours (e.g. programs without loops) [...]
  • Ask interactively for help in case of doubt [...]
  • Accept approximate but correct information.

Neil D. Jones and Flemming Nielson, Abstract interpretation: a semantics-based tool for program analysis [PS]

Written by dan

September 13, 2010 at 15:55

Posted in code analysis

Aboriginal Art

leave a comment »

Pictures from an exhibition on aboriginal art at Vogüé Castle, France. Shot with a Nexus One.

Written by dan

July 29, 2010 at 17:28

Posted in useless

Extended Self-Advertisement

leave a comment »

The deadline of the Malware 2010 conference is extended to July 17, 2010. Now you have no excuse.

Written by dan

July 8, 2010 at 08:13

Posted in useless

HITB Magazine #2 is out

leave a comment »

Featuring me, woot.

Written by dan

April 23, 2010 at 12:05

Posted in useless

Fun with diagonalization

leave a comment »

No more loops, embrace the future with recursive self-modifying code and diagonalization. In the following code snippet, we show how to replace loops like ‘while E do C’ with only ‘if’ and ‘eval’ statements:


// expects stmt == 'if(expr) { command }'
function Z(stmt) {
    start = stmt.indexOf('(')+1;
    end = stmt.indexOf(')');
    expr = stmt.substr(start, end-start);

    start = stmt.indexOf('{')+1;
    end = stmt.lastIndexOf('}');
    command = stmt.substr(start, end-start);

    out = 'if ('+expr+') {'+command+'; stmt=\'if('+expr+') {'+command+'}\'; eval(Z(stmt))}';
    return out;
}

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) { alert(i); i-- }'))


Read more about diagonalization and its use by Cantor and Gödel here.

UPDATE: a nicer code snippet by Matt Kaczmarek below.


special = "/**/"
function Z (stmt){
    stmt = stmt.replace (special, "\'" + stmt + "\'");
    return stmt;
}

i = 3;

// the following line is equivalent to while(i>=0) { alert(i); i-- }:
eval(Z('if(i>=0) {alert(i); i-- ; eval(Z(/**/))}'));

Written by dan

April 19, 2010 at 15:05

Posted in useless

Self Advertisement

with 2 comments

———————————————————————-
*** Malware 2010 Call for Papers ***
http://malware10.loria.fr/
———————————————————————-
Important dates
———————————————————————-
Submission of papers: June 30, 2010 23:59:59 EST
Notification of Acceptance: August 27th, 2010, 23:59:59 EST
Camera ready paper: September 10th, 2010: 23:59:59 EST
Workshop dates: October 20-21, 2010
———————————————————————-
Overview
———————————————————————-
The 5th IEEE International Conference on Malicious and Unwanted
Software (Malware 2010) will be held in Nancy, France, Oct. 20-21,
2010. The conference is designed to bring together experts from
industry, academia, and government to present and discuss, in an open
environment, the latest advances and discoveries in the field of
malicious and unwanted software. Techniques, economics and legal
issues surrounding the topic of Malware, and the methods to detect and
control them will be discussed.
This year’s conference will pay particular attention to (and will also
be extensively discussed in a panel session) the pressing topic of
“Malware and Cloud Computing”.  As low-cost netbooks become popular,
Google’s Chrome OS enters the mainstream, and social networks
(Facebook, YouTube, Twitter, LinkedIn, and so forth) become
ubiquitous, the security dangers associated with the new computing
paradigm increase exponentially.  In effect, “Cloud Computing”,
Multi-tenant, Single Schema, Single Server Platforms (C2S3P) increase
vulnerabilities by providing a single point of failure and attack for
organized criminal networks. Critical/sensitive/private information is
at risk, and very much like previous technology adoption trends, such
as wireless networks, the dash for success is trumping the need for
security.
Thus, the organizers of Malware 2010 solicit original written
contributions addressing these issues and research questions.
Manuscripts focusing on the security properties of Cloud Computing,
the risks associated with the deployment of such networks, and the
analysis of real incidents where a breach has occurred will be
particularly welcomed.
———————————————————————-
Submissions are solicited in, but not limited to, the following areas
———————————————————————-
Theoretical aspects and new directions on Malware related research
Analysis and measurements of real malware incidents
Worms, viruses and other propagating Malware
Spyware, keystroke loggers, information theft Malware
Honeypots and other sample collection methodologies
Botnet attacks, detection/tracking and defense
Malware economics and black market studies
Code reverse engineering tools and practices
Malware performance, analysis and capture tools
Anti-spam and anti-phishing techniques and practices
Legal aspects of unwanted software use
Malware and its impact in social networking and cloud computing
Rootkit and virtualization techniques
Malware in wireless mobile devices
———————————————————————-
Publication
———————————————————————-
The proceedings of the conference will be published in printed, and
DVD, form and will be included in the IEEE Xplore digital library.  In
addition, the Conference’s Technical Program Committee will select one
manuscript as a recipient of the “Best Paper Award”.  The Best Paper
Award author, together with the authors of a few selected manuscripts
from the conference, will be invited to submit an extended version to
a special issue of the Journal.
———————————————————————-
Paper Submission Information
———————————————————————-
Papers should be submitted through EDAS system at:
http://www.edas.info/
Submitted manuscripts must be 10-point font size, and should not
exceed 8 single-spaced pages in length, including the abstract,
figures, and references. Authors whose manuscript exceeds the 8 page
limit may be allowed to include two additional pages for an extra
charge.  However, under no circumstances shall a submitted manuscript
exceed the 10 page limit. Submitted papers must not substantially
overlap with papers that have been published or that are
simultaneously submitted to a journal or a conference with
proceedings.
———————————————————————-
Additional Information
———————————————————————-
For more information on Malware 2010 or if you are interested in
contributing to the organization of the conference please contact
Dr. Fernando C. Colon Osorio, General Program Chair, Malware 2010 at
fcco@brandeis.edu or visit our web site www.malware2010.org. For
information concerning submission of an original manuscript to the
conference, please contact the Technical Program Committee Chairs
(TPC).
———————————————————————-
Chairs of Malware 2010
———————————————————————-
Jean-Yves Marion, Nancy University, France, TPC
Noam Rathaus, Beyond Security, USA, TPC
Cliff Zhou, University Central Florida, USA, TPC
———————————————————————-
PC members
———————————————————————-
Anthony Arrott, Trends Micro, USA
Pierre-Marc Bureau, ESET, Canada
Mila Dalla Preda, Verona University, Italy
Saumya Debray, Arizona University, USA
Thomas Engel,  University of Luxembourg, Luxembourg
Jose M. Fernandez, Ecole Polytechnique de Montreal, Canada
Olivier Festor, INRIA Nancy Grand-Est, France
Brent Kang, North Carolina University, USA
Felix Leder, Bonn University, Germany
Bo Olsen, Kaspersky, USA
Jose Nazario, Arbor networks, USA
Phil Porras, SRI International, USA
Fred Raynal, Sogeti, France
Andrew Walenstein, Lafayette University, USA
Jeff Williams, Microsoft, USA
Yang Xiang, Deakin University, Australia
———————————————————————-
Publicity chair: Daniel Reynaud, Nancy University – Loria, France
Local chair: Matthieu Kaczmarek, INRIA Nancy Grand-Est, France

Written by dan

March 20, 2010 at 12:11

Posted in malware

Follow

Get every new post delivered to your Inbox.