Indefinite Studies

Academic ramblings about software security.

UPDATED: Don’t use set.add() in python when running cProfile

with 9 comments

UPDATE: As pointed out in the comments, my original conclusion was wrong and is due to the effect of the profiler on the performance. When using the time command instead with Python 3.2 on OS X Lion (averaged over 3 runs), the version with set.add takes 1.37s versus 4.85s for the version with set union (results are 2.17s and 7.12s respectively with Python 2.7.1). Sorry for the “d’oh” moment. Take-home lesson: use a profiler to count events, not to time them.

Use the union operator instead, it’s 2 to 3x faster on my machine.

$ cat
def mister():
    s = set([1,2,3])
    for i in range(10**7):

def hankey():
    s = set([1,2,3])
    for i in range(10**7):
        s |= set([1])

$ python -m cProfile
10000008 function calls in 28.257 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
 1 11.507 11.507 22.978 22.978 1 5.138 5.138 5.278 5.278

(tested on Python 2.5, 2.6 and 3.1 on Windows and Linux)

About these ads

Written by dan

March 11, 2009 at 18:18

Posted in python

Tagged with , , , ,

9 Responses

Subscribe to comments with RSS.

  1. On a Celeron M, 1.5 GHz, Python 2.6:
    ncalls tottime percall cumtime percall filename:lineno(function)
    1 15.340 15.340 27.984 27.984
    1 12.124 12.124 12.574 12.574

    This is very odd, because for the union it also has to create a set() object, which should take more time by itself. Any idea why this happens?


    March 12, 2009 at 13:00

    • Hi vlad,

      Sure, I found this odd enough to post it ^^ The object oriented programmer in me told me “don’t create objects, rather use methods in existing objects”. I guess this is happening because sets are native objects for the interpreter and calling a function has a more important overhead.

      This is a not particularly educated guess though, I might be plain wrong.


      March 12, 2009 at 13:26

  2. 9 40 LOAD_FAST 0 (s)
    43 LOAD_GLOBAL 0 (set)
    46 LOAD_CONST 1 (1)
    49 BUILD_LIST 1
    56 STORE_FAST 0 (s)

    4 40 LOAD_FAST 0 (s)
    43 LOAD_ATTR 2 (add)
    46 LOAD_CONST 1 (1)
    52 POP_TOP

    I guess __dict__ lookup(LOAD_ATTR) is VERY slow


    November 29, 2009 at 18:14

    • I realized I did a bad guess, calling overhead is the point


      November 29, 2009 at 18:40

  3. Really old post I know, but in case somebody else runs into this site.

    If you aren’t running the profiler on this code, the opposite is true. I timed the two versions using time python … and found that. The .add() is registered by the profiler and is recorded where as the |= is not. The difference you are measuring is the difference in overhead of running the profiler. The actual cost of adding an element to a set is so small it doesn’t really register.

    Winston Ewert

    February 24, 2011 at 10:18

    • Hi, thanks for the update. As far as I remember, I actually noticed a speed-up in the overall application when using |=, but it is quite possible that it depends on a specific version of the interpreter.


      February 24, 2011 at 10:23

      • Ok. I have a very definite speed advantage in favor of add when not run under profiling. It is certainly quite possible the interpreter has changed since you wrote this.

        Winston Ewert

        February 24, 2011 at 13:04

  4. Winston Ewert is right on, the title on this article is completely deceiving. It should be “Don’t use set.add() in python when running profiler”


    September 26, 2011 at 12:12

    • Okay, I finally re-ran the tests without the profiler and it turns out you’re right. I have updated the post to reflect this. Thanks to you and Winston for pointing this out.


      September 26, 2011 at 12:33

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: