Failing In So Many Ways

Icon

Liang Nuren – Failing In So Many Ways

Python and Pypy, Pyscopg2 vs Psycopg2cffi

I’ve been using Python with psycopg2 [init.d] pretty much full time for a couple of years now.  However, today I was browsing through the documentation and realized that I didn’t know the performance penalty for various operations and cursor factories.  I’ve traditionally assumed that the default tuple based cursor must have a pretty significant performance benefit to make up for its lack of features, but I the performance penalty for DictCursor didn’t feel all that bad so I’ve just rolled with it.

Another habit I’ve taken up recently was attempting to get processing underway sooner while minimizing the memory footprint.  This means in a lot of applications I’ve taken to avoiding cursor.fetchall() in favor of iteratively fetching from the cursor.  I also didn’t have a quantitative measurement for the performance impact of this.  For the curious, the approach looks something like this (check below for the gist):

with conn.cursor() as cur:
    for row in cur:
        do_something(row)

So today at work I resolved that I’d spend my Bart ride home writing a quick benchmark to test various interesting cursor factories as well as fetchall() vs [ x for x in cursor ].  Once the testing got underway I realized that I could run the same code to test the new psycopg2cffi module as well as pypy-2.0.2 (with psycopg2cffi).  These are the results for fetching 10000 rows with 8 columns 1000 times on my computer:

# 1k calls, cume duration
# 10k rows fetched
# +--------------------+----------------+--------------------+-------------------------+
# |   Default Cursor   | psycopg2/2.7.3 | psycopg2cffi/2.7.3 | psycopg2cffi/pypy-2.0.2 |
# +====================+================+====================+=========================+
# | fetch_results      | 18.072         | 18.076             | 32.817                  |
# +--------------------+----------------+--------------------+-------------------------+
# | fetch_iter_results | 20.560         | 20.691             | 33.817                  |
# +--------------------+----------------+--------------------+-------------------------+
# 
# +--------------------+----------------+--------------------+-------------------------+
# |     DictCursor     | psycopg2/2.7.3 | psycopg2cffi/2.7.3 | psycopg2cffi/pypy-2.0.2 |
# +====================+================+====================+=========================+
# | fetch_results      | 18.405         | 18.377             | 32.434                  |
# +--------------------+----------------+--------------------+-------------------------+
# | fetch_iter_results | 19.563         | 19.674             | 33.265                  |
# +--------------------+----------------+--------------------+-------------------------+
# 
# +--------------------+----------------+--------------------+-------------------------+
# |  NamedTupleCursor  | psycopg2/2.7.3 | psycopg2cffi/2.7.3 | psycopg2cffi/pypy-2.0.2 |
# +====================+================+====================+=========================+
# | fetch_results      | 18.296         | 18.291             | 32.158                  |
# +--------------------+----------------+--------------------+-------------------------+
# | fetch_iter_results | 19.599         | 19.650             | 32.999                  |
# +--------------------+----------------+--------------------+-------------------------+

The thing that surprised me most about these results was that iterating across the cursor wasn’t really that much more expensive than fetchall.  I suspect that the cost increases with increased network latency, but at least some of that cost will be paid with fetchall as well.   I think it’s a good idea to set up a more rigorous benchmark before saying it’s “low cost”, but either way I really appreciate the ability to start operating on a smaller dataset while keeping the memory footprint low.

I was also pretty surprised by how little of a performance penalty DictCursor and NamedTupleCursor had.  It probably shouldn’t surprise me much considering network latency and data transfer absolutely should trivialize object creation costs.  I guess the take away from this is: if you’re going to go through all the effort of going to the database to get a piece of data, make sure you return it in a way that makes sense and is convenient to manipulate.

I was, unfortunately, not terribly surprised by Pypy’s poor performance here.  Whiel this application is ostensibly a tight loop pulling from the database and creating objects, it doesn’t feel like it plays to what I think are Pypy’s  best strengths.  I’ve had my best luck using Pypy to run specialized applications that spend most of their time in tight computation loops.

For the curious, the benchmarking file is here (along with the above chart).

Advertisements

Filed under: Data Warehousing, Software Development, , , ,

There Is No Excuse For Bad Programming

I’d like to take a moment to discuss a common problem in software development. It comes up everywhere. That problem is the problem of what to do. No really, hear me out on this. Suppose we want to get a value based on the input of a function. We might build a function that looks like this:

def if_block_func(v):
if v == 'some string1': x = 1
elif v == 'some string2': x = 2
elif v == 'some string3': x = 3
elif v == 'some string4': x = 4
elif v == 'some string5': x = 5
elif v == 'some string6': x = 6
elif v == 'some string7': x = 7
else: x = 8

return x

Of course, our piece of code works and it’s relatively fast. However, it’s kinda ugly and painful to maintain. I’m pretty sure we can come up with a more maintainable piece of code, even if it’s slightly slower. Seeing as how we’re using Python for this example, we might come up with something clever like this:


def recreate_dict_func(v):
d = {
'some string1' : 1,
'some string2' : 2,
'some string3' : 3,
'some string4' : 4,
'some string5' : 5,
'some string6' : 6,
'some string7' : 7,
}
return d.get(v, 8)

Of course, this only works if there’s really simple requirements and a single value is a single value. What if we needed something slightly more complicated? Maybe the return value for ‘some string4’ depends on several other variables. Well, that’s certainly a lot stronger case for if/else. However, we might be able to adjust the dictionary based approach by doing something more clever like this:


def recreate_lambda_dict_func(v):
dl = {
'some string1' : lambda: 1,
'some string2' : lambda: 2,
'some string3' : lambda: 3,
'some string4' : lambda: 10 if random.choice(range(5)) % 3 == 0 else 4,
'some string5' : lambda: 5,
'some string6' : lambda: 6,
'some string7' : lambda: 7,
}
return dl.get(v, lambda: 8)()

Of course, the penalty of working in a higher level language is generally a performance penalty, and it’s pretty easy to put together a benchmark that shows that penalty here. However, sometimes programmer time might be worth that penalty. But sometimes it may not be. Maybe there’s a way we could improve these functions to be even faster? Maybe if we instantiated less objects the method would be faster.


external_dict = {
'some string1' : 1,
'some string2' : 2,
'some string3' : 3,
'some string4' : 4,
'some string5' : 5,
'some string6' : 6,
'some string7' : 7,
}
def external_dict_func(v):
global external_dict
return external_dict.get(v, 8)

external_lambda_dict = {
‘some string1’ : lambda: 1,
‘some string2’ : lambda: 2,
‘some string3’ : lambda: 3,
‘some string4’ : lambda: 4,
‘some string5’ : lambda: 5,
‘some string6’ : lambda: 6,
‘some string7’ : lambda: 7,
}

def external_lambda_dict_func(v):
global external_lambda_dict
return external_lambda_dict.get(v, lambda: 8)()

So let’s put together this benchmark that shows how bad the performance penalty is. However to keep things on the even, let’s just have the lambda function always return 4 (just because the rest of the code exhibits that behavior). Let’s throw a @benchmark decorator on those methods and run this. So here’s the results:


~ $ python dict_vs_ifs.py
+---------------------------+--------------+---------+
| Function | Sum Duration | Calls |
+===========================+==============+=========+
| external_dict_func | 0.541 | 1000000 |
+---------------------------+--------------+---------+
| if_block_func | 0.682 | 1000000 |
+---------------------------+--------------+---------+
| external_lambda_dict_func | 0.727 | 1000000 |
+---------------------------+--------------+---------+
| recreate_dict_func | 1.258 | 1000000 |
+---------------------------+--------------+---------+
| recreate_lambda_dict_func | 1.746 | 1000000 |
+---------------------------+--------------+---------+

Ok, that’s pretty interesting. The external dictionary function is actually the fastest (almost 25% faster!) – while being the most maintainable. That’s absolutely wild. I know, I know – you’re going to tell me that this doesn’t represent “the real world”. After all, we’re frequently working with classes and you’ve got to do the instance dereference to get the dictionary and .. ok. Sure. Let’s modify the code such that we have that layer of dereferencing. It’s certain to add some complexity and maybe our naive if/else block will be the most performant option again. While we’re at it, we can run a variant where the variable we’re switching on is an instance variable.

Class attribute lookups

Benchmark Results
+---------------------------+--------------+---------+
| Function | Sum Duration | Calls |
+===========================+==============+=========+
| external_dict_func | 0.609 | 1000000 |
+---------------------------+--------------+---------+
| if_block_func | 0.707 | 1000000 |
+---------------------------+--------------+---------+
| external_lambda_dict_func | 0.807 | 1000000 |
+---------------------------+--------------+---------+
| recreate_dict_func | 1.314 | 1000000 |
+---------------------------+--------------+---------+
| recreate_lambda_dict_func | 1.749 | 1000000 |
+---------------------------+--------------+---------+

Instance Variable

Benchmark Results
+---------------------------+--------------+---------+
| Function | Sum Duration | Calls |
+===========================+==============+=========+
| external_dict_func | 0.645 | 1000000 |
+---------------------------+--------------+---------+
| if_block_func | 0.851 | 1000000 |
+---------------------------+--------------+---------+
| external_lambda_dict_func | 0.854 | 1000000 |
+---------------------------+--------------+---------+
| recreate_dict_func | 1.342 | 1000000 |
+---------------------------+--------------+---------+
| recreate_lambda_dict_func | 1.821 | 1000000 |
+---------------------------+--------------+---------+

As we can see, the dictionary based approach continues to simultaneously excel at both readability and performance. As situations come closer to real life, the lambda approach begins to also look better than if/else blocks. Basically, there is no excuse for writing bad code.

Note: I apologize for the shitty code formatting.  Wordpress has deleted like 12 times and I’m done fighting it.  Maybe tomorrow.

Filed under: Software Development,

Primitive Class Variables in Python

I recently ran across something peculiar in my Python development.  I was writing some builders for complex JSON objects and decided to move away from random.randint and simply use a class variable.  I had some code that looked something like this:

class FooBuilder(object):
    def __init__(self, **kwargs):
        options = {
            "obj_id" : random.randint(1, 10000000),
        }

        options.update(kwargs)

I know, it’s not a great design and I could expect some failures due to random number collisions. It was also a bit slower than I really wanted, so I modified it to look like this:

class FooBuilder(object):
    next_obj_id = 0
    def __init__(self, **kwargs):
        options = {
            "obj_id" : self.next_obj_id,
        }

        options.update(kwargs)    
        self.next_obj_id += 1

However, it had a peculiar property: all my tests failed because it appeared that the class variable never updated. I did some experimenting and found lists, dictionaries, and pretty much everything but ‘native’ types worked exactly as expected. It turns out that what’s happening is that you’re assigning the incremented primitive int to the instance because it’s literally a new object. In order to assign it back to the class you have to take some special precautions – type(self).next_obj_id += 1. Here’s some sample code that demonstrates what I’m talking about:

import random

class Foo(object):
    def __init__(self, **kwargs):
        options = {
            "obj_id" : random.randint(1, 10000)
        }

        self.data = options
        print "Foo." + str(self.data)

class Bar(object):
    next_obj_id = 0

    def __init__(self, **kwargs):
        options = {
            "obj_id" : self.next_obj_id
        }

        self.next_obj_id += 1

        self.data = options
        print "Bar." + str(self.data)

class Working(object):
    next_obj_id = 0

    def __init__(self, **kwargs):
        options = {
            "obj_id" : self.next_obj_id
        }

        type(self).next_obj_id += 1

        self.data = options
        print "Working." + str(self.data)

Foo()
Foo()
Foo()

Bar()
Bar()
Bar()

Working()
Working()
Working()

It outputs:

Foo.{‘obj_id’: 1234}
Foo.{‘obj_id’: 40}
Foo.{‘obj_id’: 2770}
Bar.{‘obj_id’: 0}
Bar.{‘obj_id’: 0}
Bar.{‘obj_id’: 0}
Working.{‘obj_id’: 0}
Working.{‘obj_id’: 1}
Working.{‘obj_id’: 2}

tl;dr:
type(self).class_variable_name or self.__class__.class_variable_name to modify class variables seems to be a better choice than self.class_variable.

Filed under: Software Development, ,

Pessimistic vs Optimistic Memoization

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        if args not in cache:
            cache[args] = obj(*args, **kwargs)
        return cache[args]
    return memoizer

I think one of the first things that anyone does with a dynamic language is write a memoizer to save time on expensive calculations. This is the memoizer from http://wiki.python.org/moin/PythonDecoratorLibrary. I like this one because it exposes the cache for clearing – an important feature in tests. This can be accomplished by creating a list of memoized functions and manually resetting their cache. The actual cache reset looks something like this:

@memoize
def method_name(some_arg):
    return some_arg + 1

method_name.cache = {}

However, I think there’s several things we should know about this particular decorator before just using it:

  • It does not properly account for **kwargs. The thing to remember here is that **kwargs is implicitly a dictionary – an unhashable object. There are several popular methods of hashing a dictionary, but the far and away most popular appears to be hashing on frozenset(dict.items()). Another much less popular way is tuple(zip(dict)). We’ll do some testing to determine which is superior. One important thing to remember here is that large amounts of kwargs and long variable names can lead to quite a performance penalty no matter which one is ultimately better.
  • It does not properly handle unhashable or nested arguments. I think this is probably an acceptable limitation because solving it imposes a large penalty on both code performance and code maintainability. I think it is imperative to have a proper test suite to ensure that @memoized methods are not passed unhashable or nested arguments.
  • There appears to be two competing ways to do caching in Python. The first is the Look Before You Leap approach that conventional wisdom dictates, and is the one used here. Some cursory thought on the matter tells me that a more optimistic method of handling cache hits with try/except might work better. We’ll do some testing to determine which is superior.

Each caching strategy was tested over a list of 1 million tuples and utilize kwargs.  The numbers in the legend represent the basic cardinality of the unique values in the tested list.  The cache hit rate can be found by dividing the cardinality by 1 million.  Each memoization strategy was tested 20 times and the test results here are the average. I think that a picture is worth a thouand words, and so I’ve included a pretty graph. However, I’ve also included the base data below.

I think there should be several take aways from this:

  • tuple(zip(dict)) is superior to frozenset(dict.items())
  • Optimistic caching (try/except) is generally superior Look Before You Leap (key in cache)
  • There is a noticeable performance penalty for caching on kwargs. It might be worth having several memoize annotations and using the most appropriate one.
  • Lots of **kwargs with long names causes a major performance penalty

This is the final version of the memoizer (many thanks to Bryce Drennan in the comments for catching a bug in the memoizer):

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        key = (args, tuple(zip(kwargs.iteritems())))
        try:
            return cache[key]
        except KeyError, e:
            value = obj(*args, **kwargs)
            cache[key] = value
            return value
    return memoizer

This is the previous version of the memoizer:

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        key = (args, tuple(zip(kwargs)))
        try:
            return cache[key]
        except KeyError, e:
            cache[key] = value = obj(*args, **kwargs)
            return value
    return memoizer

Raw data

Tiny (Iterations: 1000000, Cardinality: 100)

  • Reference : 1.2129
  • Set : 3.8267
  • Zip : 3.0283
  • Pessimistic : 3.0055
  • Optimistic : 2.4478
  • No kwargs Reference : 0.5133
  • No kwargs Pessimistic : 1.1473
  • No kwargs Optimistic : 0.9309

Low (Iterations: 1000000, Cardinality: 10000)

  • Reference : 1.3167
  • Set : 4.5701
  • Zip : 3.4687
  • Pessimistic : 3.5359
  • Optimistic : 2.9393
  • No kwargs Reference : 0.6553
  • No kwargs Pessimistic : 1.3239
  • No kwargs Optimistic : 1.1201

Med (Iterations: 1000000, Cardinality: 99995)

  • Reference : 1.6757
  • Set : 4.9049
  • Zip : 3.8719
  • Pessimistic : 3.8955
  • Optimistic : 3.2962
  • No kwargs Reference : 0.9838
  • No kwargs Pessimistic : 1.7194
  • No kwargs Optimistic : 1.5371

Filed under: Software Development, , ,

Finding Unused Functions in a Python Source Tree

So for the last few months I have been crunching up a storm on getting the analytics out the door for our latest game.  I just finished some cool new features and had to do some major refactors (one step at a time, of course).  I began to suspect that I’d left some stray code hanging out there that wasn’t being used anymore.  I figured a great way to solve the problem was by looking at each function and grepping the source tree to see if it existed anywhere else.  First, let me introduce you to ack-grep [betterthangrep.com] (aliased as ‘ack’ below).  The best thing about ack-grep is that it lets you filter out certain file types from your results – and despite being written in Perl it’s frequently much faster than grep.

Now I’ll go over the evolution of a shell command.  Some of the steps are injected here because I knew where I was going – but for your benefit I’ll explain how things evolved.  The code base in question is about 10k lines over 140 files.

$ ack ‘def’

… bunch of functions and a bunch of other stuff …

$ ack ‘ def ‘

… function definitions …

Now let’s get the function name itself.  It should look like ‘def “function name”(arguments):’ so the first order of business is to cook up a regular expression to filter the line.  I’m better with Perl than with Sed, so I did it like this:

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’

I scrolled through looking for any obvious errors but didn’t find any.  Now comes the magic sauce.  xargs -I{} creates a new command for every input line and inserts the input row where {} exists.  Basically, this created a for each loop in shell.

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | xargs -I{} bash -c ‘ack “{}”‘

One thing I saw was that function usages were being attributed to functions that *contained* their name.

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | xargs -I{} bash -c ‘ack “\b{}\b”‘

That’s better, but functions that are defined multiple times (like __init__) are coming up a lot…

 $ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘ack “\b{}\b”‘

Ok, but now the text is just flying by so fast and some functions are used hundreds of times and others very few…

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘ack “\b{}\b” | wc -l’

Ok, now I know how many times something appears but not what it is…

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘echo {} && ack “\b{}\b” | wc -l’

Ok, that’s great.  Now I have the name of the function and how many times that name appears in the source tree.  It’s kinda unwieldy though because the name is on a different line than the count and I can’t use grep.

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘echo -n {} && ack “\b{}\b” | wc -l’

Fantastic.  Now I can look at just the functions that rarely get used!  Oh look, there’s so many tests coming up… :-/

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘echo -n {} && ack “\b{}\b” | wc -l’ | egrep -v ‘^test_’

Ok, now let’s limit it to things appearing once…

$ ack ‘def ‘ | perl -pe ‘s/.*def (\w+)\(.*/\1/’ | sort | uniq | xargs -I{} bash -c ‘echo -n {} && ack “\b{}\b” | wc -l’ | egrep -v ‘^test_’ | egrep ‘ 1$’

Fantastic.  That’s exactly what I was looking for.

Filed under: Software Development, , ,