Indefinite Studies

Academic ramblings about software security.

Creating a toy virtual machine with PyPy

with 15 comments

Here, you can use “virtual machine” as in “Java Virtual Machine”, not as in virtualization. We will play with the virtual machine described in the paper Tracing the Meta-Level: PyPy’s Tracing JIT Compiler by C.F. Bolz, A. Cuni, M. Fijalkowski and A. Rigo (it’s a great read by the way).

PyPy is a fascinating project, too complex to describe here. Among other things, PyPy can take any interpreter written in a subset of Python, translate it to C, and automatically generate a JIT compiler for this language. Does it sound too good to be true? Let’s try this.

  • grab PyPy source code
  • create the interpreter in pypy/translator/goal/target-toy.py with the following code:
</pre>
<pre style="text-align: justify;">import os, sys
import autopath
import py

# these are the opcodes for the interpreted language
JUMP_IF_A  = 1
MOV_A_R    = 2
MOV_R_A    = 3
ADD_R_TO_A = 4
DECR_A     = 5
RETURN_A   = 6

<em>from pypy.rlib.jit import JitDriver tlrjitdriver = JitDriver(greens = ['pc', 'bytecode'], reds = ['a', 'regs'])</em>

# the main interpreter loop
def interpret(bytecode, a):
   regs = [0] * 256
   pc = 0
   while True:
<em> tlrjitdriver.jit_merge_point(bytecode=bytecode, pc=pc, a=a, regs=regs) </em>       opcode = bytecode[pc]
       pc += 1
       if opcode == JUMP_IF_A:
           target = bytecode[pc]
           pc += 1
           if a:
<em> if target<pc: tlrjitdriver.can_enter_jit(bytecode=bytecode, pc=target, a=a, regs=regs) </em>               pc = target
       elif opcode == MOV_A_R:
           n = bytecode[pc]
           pc += 1
           regs[n] = a
       elif opcode == MOV_R_A:
           n = bytecode[pc]
           pc += 1
           a = regs[n]
       elif opcode == ADD_R_TO_A:
           n = bytecode[pc]
           pc += 1
           a += regs[n]
       elif opcode == DECR_A:
           a -= 1
       elif opcode == RETURN_A:
           return a

# __________  Entry point  __________
def entry_point(argv):
    # the program we want to interpret
    # it computes the square of its argument
    bytecode = [
        MOV_A_R,    0, # i = a
        MOV_A_R,    1, # copy of ’a’
        # 4:
        MOV_R_A,    0, # i--
        DECR_A,
        MOV_A_R,    0,
        MOV_R_A,    2, # res += a
        ADD_R_TO_A, 1,
        MOV_A_R,    2,
        MOV_R_A,    0, # if i!=0: goto 4
        JUMP_IF_A,  4,
        MOV_R_A,    2,
        RETURN_A
    ]
    result = interpret(bytecode, int(argv[1]))
    print result
    return 0

def jitpolicy(driver):
    from pypy.jit.metainterp.policy import JitPolicy
    return JitPolicy()

# _____ Define and setup target ___
def target(*args):
    return entry_point, None

# main function, if this script is called from the command line
if __name__ == '__main__':
    entry_point(sys.argv)</pre>
<div><span style="color: #000000;">

  • the lines in italic are the annotations for the JIT compiler. We need to give PyPy some insight on the interpreted language by declaring what is the instruction pointer (the green variables), the beginning of the dispatch loop and the backward branches (see the paper for full details).
  • check that you can execute this script correctly by running python target-toy.py 12, the output should be 144
  • PyPy can translate this script in C. For this, first install the dependencies and then run the following command: python translate.py target-toy.py
  • this should give you an executable target-toy-c, rename it target-toy-native and check that ./target-toy-native 12 yields 144
  • now we can ask PyPy to translate target-toy.py in C and generate a JIT compiler for it. For this, we just run python translate.py --opt=jit target-toy.py
    • note: the 64-bit backend of PyPy is not implemented yet, so if you are on a 64-bit system, you will have to struggle a bit. You will have to use a 32-bit Python interpreter (see my former post), create an alias for gcc -m32 (let’s call it gcc32) and then pass the option –cc=gcc32 to translate.py.
  • this should give you another target-toy-c executable, rename it to target-toy-jit and check that ./target-toy-jit 12 yields 144

Ok, everything is working, so let’s now see how all this performs by computing large squares:

~/pypy-trunk/pypy/translator/goal$ time python target-toy.py 1000000
1000000000000
real 0m18.637s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-native 1000000
-727379968
real 0m0.024s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-jit 1000000
-727379968
[...]
real 0m0.005s

The first run is the square program interpreted by our program, itself interpreted by the Python interpreter. Double interpretation is slow.

The second run is the square program interpreted by a native version of our interpret function. Interpretation by native code is ok.

The third run is the square program interpreted and JIT’ed on the fly. It’s super awesome :)

Final note: I must thank everybody from #pypy on freenode, for their help and resilience to stupid questions. Thanks guys!

About these ads

Written by dan

February 8, 2010 at 19:30

Posted in code analysis

15 Responses

Subscribe to comments with RSS.

  1. Interesting, short but nice example

    Ange

    February 8, 2010 at 23:03

  2. Small question, it appears that the native pypy runs are returning a different value at the end? Have I missed something?

    Alex Gaynor

    February 8, 2010 at 23:25

    • That’s right, because the C and C-jit versions are 32-bit binaries and the translation process probably does not take into account the overflow.

      dan

      February 9, 2010 at 08:50

  3. Will you be doing more articles about using PyPy?

    Donny Viszneki

    February 9, 2010 at 06:50

    • Probably. Right now I’m just playing with it, but I’d like to dive deeper.

      dan

      February 9, 2010 at 08:52

  4. It’s great to get the wrong answer quickly :>

    maht

    February 9, 2010 at 14:04

  5. Yep finally …

    adesnos

    February 11, 2010 at 09:15

  6. [...] Creating a toy virtual machine with PyPy. The PyPy project is pretty cool. [...]

    Beej's Bit Bucket » Linkfest

    February 11, 2010 at 19:15

  7. […] in Lua, LuaJit and Pypy.  (I think the original toy interpreter example comes from PyPy).   After some suggestions, he ended up with a new PyPy version which performed very well — close to what […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: