UPDATED: Don’t use set.add() in python when running cProfile

UPDATE: As pointed out in the comments, my original conclusion was wrong and is due to the effect of the profiler on the performance. When using the time command instead with Python 3.2 on OS X Lion (averaged over 3 runs), the version with set.add takes 1.37s versus 4.85s for the version with set union (results are 2.17s and 7.12s respectively with Python 2.7.1). Sorry for the “d’oh” moment. Take-home lesson: use a profiler to count events, not to time them.

Use the union operator instead, it’s 2 to 3x faster on my machine.

$ cat sets.py
def mister():
    s = set([1,2,3])
    for i in range(10**7):
        s.add(1)

def hankey():
    s = set([1,2,3])
    for i in range(10**7):
        s |= set([1])
mister()
hankey()

$ python -m cProfile sets.py
10000008 function calls in 28.257 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
 1 11.507 11.507 22.978 22.978 sets.py:1(mister) 1 5.138 5.138 5.278 5.278 sets.py:6(hankey)

(tested on Python 2.5, 2.6 and 3.1 on Windows and Linux)

Advertisements