Python Idioms: + versus join
I was told to use ”.join([]) instead of the ‘+’ operator in Python. However a (bad) benchmark showed ‘+’ to be a lot faster. I think it is reasonable to say that in some cases ‘+’ is faster, here is my test:
def test0(b, c, d, e, f):
for i in xrange(10**7):
a = b + c + d + e + f
print(a)
def test1():
l = ['hello ', 'world ', 'with ', '+ ', 'operator']
for i in xrange(10**7):
a = ''
for j in l:
a += j
print(a)
def test2():
l = ['hello', 'world', 'with', 'join', 'function']
for i in xrange(10**7):
a = ' '.join(l)
print(a)
test0('hello ', 'world ', 'with ', '+ ', 'operator')
test1()
test2()
And the result of the test:
$ python -m cProfile -s cumulative test.py
hello world with + operator
hello world with + operator
hello world with join function
10000007 function calls in 14.968 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 6.838 6.838 6.838 6.838 test.py:7(test1)
1 2.683 2.683 5.113 5.113 test.py:15(test2)
1 3.016 3.016 3.016 3.016 test.py:2(test0)
So clearly the worst way of using ‘+’ is when iterating over a list of strings and accumulating the concatenations in a variable (function test1). But there is nothing wrong with performing multiple ‘+’ operations in a single line and then storing the result in a variable (function test0).
A quick look at the bytecode of the function confirms this intuition, we can see a bunch of LOADs and ADDs and only one STORE:
>>> import dis
>>> dis.dis(test0)
...
19 LOAD_FAST 0 (b)
22 LOAD_FAST 1 (c)
25 BINARY_ADD
26 LOAD_FAST 2 (d)
29 BINARY_ADD
30 LOAD_FAST 3 (e)
33 BINARY_ADD
34 LOAD_FAST 4 (f)
37 BINARY_ADD
38 STORE_FAST 6 (a)
The test was performed with Python 2.5.4 on a Debian sid. Would be nice to see if the results hold for new versions of the Python interpreter.