Creating a toy virtual machine with PyPy
Here, you can use “virtual machine” as in “Java Virtual Machine”, not as in virtualization. We will play with the virtual machine described in the paper Tracing the Meta-Level: PyPy’s Tracing JIT Compiler by C.F. Bolz, A. Cuni, M. Fijalkowski and A. Rigo (it’s a great read by the way).
PyPy is a fascinating project, too complex to describe here. Among other things, PyPy can take any interpreter written in a subset of Python, translate it to C, and automatically generate a JIT compiler for this language. Does it sound too good to be true? Let’s try this.
- grab PyPy source code
- create the interpreter in pypy/translator/goal/target-toy.py with the following code:
</pre> <pre style="text-align: justify;">import os, sys import autopath import py # these are the opcodes for the interpreted language JUMP_IF_A = 1 MOV_A_R = 2 MOV_R_A = 3 ADD_R_TO_A = 4 DECR_A = 5 RETURN_A = 6 <em>from pypy.rlib.jit import JitDriver tlrjitdriver = JitDriver(greens = ['pc', 'bytecode'], reds = ['a', 'regs'])</em> # the main interpreter loop def interpret(bytecode, a): regs =  * 256 pc = 0 while True: <em> tlrjitdriver.jit_merge_point(bytecode=bytecode, pc=pc, a=a, regs=regs) </em> opcode = bytecode[pc] pc += 1 if opcode == JUMP_IF_A: target = bytecode[pc] pc += 1 if a: <em> if target<pc: tlrjitdriver.can_enter_jit(bytecode=bytecode, pc=target, a=a, regs=regs) </em> pc = target elif opcode == MOV_A_R: n = bytecode[pc] pc += 1 regs[n] = a elif opcode == MOV_R_A: n = bytecode[pc] pc += 1 a = regs[n] elif opcode == ADD_R_TO_A: n = bytecode[pc] pc += 1 a += regs[n] elif opcode == DECR_A: a -= 1 elif opcode == RETURN_A: return a # __________ Entry point __________ def entry_point(argv): # the program we want to interpret # it computes the square of its argument bytecode = [ MOV_A_R, 0, # i = a MOV_A_R, 1, # copy of ’a’ # 4: MOV_R_A, 0, # i-- DECR_A, MOV_A_R, 0, MOV_R_A, 2, # res += a ADD_R_TO_A, 1, MOV_A_R, 2, MOV_R_A, 0, # if i!=0: goto 4 JUMP_IF_A, 4, MOV_R_A, 2, RETURN_A ] result = interpret(bytecode, int(argv)) print result return 0 def jitpolicy(driver): from pypy.jit.metainterp.policy import JitPolicy return JitPolicy() # _____ Define and setup target ___ def target(*args): return entry_point, None # main function, if this script is called from the command line if __name__ == '__main__': entry_point(sys.argv)</pre> <div><span style="color: #000000;">
- the lines in italic are the annotations for the JIT compiler. We need to give PyPy some insight on the interpreted language by declaring what is the instruction pointer (the green variables), the beginning of the dispatch loop and the backward branches (see the paper for full details).
- check that you can execute this script correctly by running
python target-toy.py 12, the output should be 144
- PyPy can translate this script in C. For this, first install the dependencies and then run the following command:
python translate.py target-toy.py
- this should give you an executable target-toy-c, rename it target-toy-native and check that
./target-toy-native 12yields 144
- now we can ask PyPy to translate target-toy.py in C and generate a JIT compiler for it. For this, we just run
python translate.py --opt=jit target-toy.py
- note: the 64-bit backend of PyPy is not implemented yet, so if you are on a 64-bit system, you will have to struggle a bit. You will have to use a 32-bit Python interpreter (see my former post), create an alias for gcc -m32 (let’s call it gcc32) and then pass the option –cc=gcc32 to translate.py.
- this should give you another target-toy-c executable, rename it to target-toy-jit and check that
./target-toy-jit 12 yields 144
Ok, everything is working, so let’s now see how all this performs by computing large squares:
~/pypy-trunk/pypy/translator/goal$ time python target-toy.py 1000000
~/pypy-trunk/pypy/translator/goal$ time ./target-toy-native 1000000
~/pypy-trunk/pypy/translator/goal$ time ./target-toy-jit 1000000
The first run is the square program interpreted by our program, itself interpreted by the Python interpreter. Double interpretation is slow.
The second run is the square program interpreted by a native version of our interpret function. Interpretation by native code is ok.
The third run is the square program interpreted and JIT’ed on the fly. It’s super awesome :)
Final note: I must thank everybody from #pypy on freenode, for their help and resilience to stupid questions. Thanks guys!