Profiler Evidence (That I am an Idiot)

In this post, I wrote:

I am intentionally not using for loops. For loops in Python are inefficient. Were I to use range(const.WELL_H), the interpreter would allocate a list of all numbers in the range. xrange is more efficient for memory, but incurs function call overhead on each iteration.

This has bugged me ever since, as I didn’t have any actual evidence as to whether Python’s range-based for loop is actually inefficient. (Spoiler: it’s not.)

Today I wrote some simple tests to profile:

# -*- coding: utf-8 -*-

import cProfile as profile

def for_range(n):
    for i in range(n):
        for j in range(n):
            # just some random calculations to make the loops actually do something
            i * j + j * i / 54321 

def for_xrange(n):    
    for i in xrange(n):
        for j in xrange(n):
            i * j + j * i / 54321 

def as_while(n):
    i = 0
    while i < n:
        j = 0
        while j < n:
            i * j + j * i / 54321 
            j += 1            
        i += 1


if __name__ == '__main__':
    
    # warm-up

    for i in xrange(10):
        for_range(100)
        for_xrange(100)
        as_while(100)

    profile.run("for_range(100)")
    profile.run("for_xrange(100)")
    profile.run("as_while(100)")

And run:

> python .\looptests.py
       104 function calls in 0.002 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    0.002    0.002 <string>:1(<module>)
      1    0.002    0.002    0.002    0.002 looptests.py:5(for_range)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    101    0.000    0.000    0.000    0.000 {range}


       3 function calls in 0.002 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    0.002    0.002 <string>:1(<module>)
      1    0.002    0.002    0.002    0.002 looptests.py:11(for_xrange)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


       3 function calls in 0.003 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    0.003    0.003 <string>:1(<module>)
      1    0.003    0.003    0.003    0.003 looptests.py:17(as_while)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Already it looks like my theory is wrong, but let’s get a clearer result, bumping n up to 10000:

 > python .\looptests.py
       10004 function calls in 22.219 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000   22.219   22.219 <string>:1(<module>)
      1   21.430   21.430   22.219   22.219 looptests.py:5(for_range)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  10001    0.789    0.000    0.789    0.000 {range}


       3 function calls in 22.508 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000   22.508   22.508 <string>:1(<module>)
      1   22.508   22.508   22.508   22.508 looptests.py:11(for_xrange)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


       3 function calls in 30.369 seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000   30.369   30.369 <string>:1(<module>)
      1   30.369   30.369   30.369   30.369 looptests.py:17(as_while)
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Well, then. As we can see, range() beats out both our manual while loop and the xrange() generator in terms of speed. And indeed, the “optimized” while loop performs terribly by comparison. I don’t have Heapy set up to check with, but I’m fairly certain we’d see xrange() come out with the most efficient memory usage.

This is not, overall, a surprising result. (Indeed, the only thing that really surprises me is that xrange() doesn’t appear to generate function calls in the profile.) But it serves as a reminder that even if you think you understand a system, you cannot accurately identify bottlenecks and inefficiencies without actual data.

Advertisements