Creating a toy virtual machine with PyPy

Here, you can use “virtual machine” as in “Java Virtual Machine”, not as in virtualization. We will play with the virtual machine described in the paper Tracing the Meta-Level: PyPy’s Tracing JIT Compiler by C.F. Bolz, A. Cuni, M. Fijalkowski and A. Rigo (it’s a great read by the way).

PyPy is a fascinating project, too complex to describe here. Among other things, PyPy can take any interpreter written in a subset of Python, translate it to C, and automatically generate a JIT compiler for this language. Does it sound too good to be true? Let’s try this.

  • grab PyPy source code
  • create the interpreter in pypy/translator/goal/target-toy.py with the following code:
</pre>
<pre style="text-align: justify;">import os, sys
import autopath
import py

# these are the opcodes for the interpreted language
JUMP_IF_A  = 1
MOV_A_R    = 2
MOV_R_A    = 3
ADD_R_TO_A = 4
DECR_A     = 5
RETURN_A   = 6

<em>from pypy.rlib.jit import JitDriver tlrjitdriver = JitDriver(greens = ['pc', 'bytecode'], reds = ['a', 'regs'])</em>

# the main interpreter loop
def interpret(bytecode, a):
   regs = [0] * 256
   pc = 0
   while True:
<em> tlrjitdriver.jit_merge_point(bytecode=bytecode, pc=pc, a=a, regs=regs) </em>       opcode = bytecode[pc]
       pc += 1
       if opcode == JUMP_IF_A:
           target = bytecode[pc]
           pc += 1
           if a:
<em> if target<pc: tlrjitdriver.can_enter_jit(bytecode=bytecode, pc=target, a=a, regs=regs) </em>               pc = target
       elif opcode == MOV_A_R:
           n = bytecode[pc]
           pc += 1
           regs[n] = a
       elif opcode == MOV_R_A:
           n = bytecode[pc]
           pc += 1
           a = regs[n]
       elif opcode == ADD_R_TO_A:
           n = bytecode[pc]
           pc += 1
           a += regs[n]
       elif opcode == DECR_A:
           a -= 1
       elif opcode == RETURN_A:
           return a

# __________  Entry point  __________
def entry_point(argv):
    # the program we want to interpret
    # it computes the square of its argument
    bytecode = [
        MOV_A_R,    0, # i = a
        MOV_A_R,    1, # copy of ’a’
        # 4:
        MOV_R_A,    0, # i--
        DECR_A,
        MOV_A_R,    0,
        MOV_R_A,    2, # res += a
        ADD_R_TO_A, 1,
        MOV_A_R,    2,
        MOV_R_A,    0, # if i!=0: goto 4
        JUMP_IF_A,  4,
        MOV_R_A,    2,
        RETURN_A
    ]
    result = interpret(bytecode, int(argv[1]))
    print result
    return 0

def jitpolicy(driver):
    from pypy.jit.metainterp.policy import JitPolicy
    return JitPolicy()

# _____ Define and setup target ___
def target(*args):
    return entry_point, None

# main function, if this script is called from the command line
if __name__ == '__main__':
    entry_point(sys.argv)</pre>
<div><span style="color: #000000;">

  • the lines in italic are the annotations for the JIT compiler. We need to give PyPy some insight on the interpreted language by declaring what is the instruction pointer (the green variables), the beginning of the dispatch loop and the backward branches (see the paper for full details).
  • check that you can execute this script correctly by running python target-toy.py 12, the output should be 144
  • PyPy can translate this script in C. For this, first install the dependencies and then run the following command: python translate.py target-toy.py
  • this should give you an executable target-toy-c, rename it target-toy-native and check that ./target-toy-native 12 yields 144
  • now we can ask PyPy to translate target-toy.py in C and generate a JIT compiler for it. For this, we just run python translate.py --opt=jit target-toy.py
    • note: the 64-bit backend of PyPy is not implemented yet, so if you are on a 64-bit system, you will have to struggle a bit. You will have to use a 32-bit Python interpreter (see my former post), create an alias for gcc -m32 (let’s call it gcc32) and then pass the option –cc=gcc32 to translate.py.
  • this should give you another target-toy-c executable, rename it to target-toy-jit and check that ./target-toy-jit 12 yields 144

Ok, everything is working, so let’s now see how all this performs by computing large squares:

~/pypy-trunk/pypy/translator/goal$ time python target-toy.py 1000000
1000000000000
real 0m18.637s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-native 1000000
-727379968
real 0m0.024s

~/pypy-trunk/pypy/translator/goal$ time ./target-toy-jit 1000000
-727379968
[...]
real 0m0.005s

The first run is the square program interpreted by our program, itself interpreted by the Python interpreter. Double interpretation is slow.

The second run is the square program interpreted by a native version of our interpret function. Interpretation by native code is ok.

The third run is the square program interpreted and JIT’ed on the fly. It’s super awesome :)

Final note: I must thank everybody from #pypy on freenode, for their help and resilience to stupid questions. Thanks guys!

Advertisement