Failing In So Many Ways

Icon

Liang Nuren – Failing In So Many Ways

Pessimistic vs Optimistic Memoization

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        if args not in cache:
            cache[args] = obj(*args, **kwargs)
        return cache[args]
    return memoizer

I think one of the first things that anyone does with a dynamic language is write a memoizer to save time on expensive calculations. This is the memoizer from http://wiki.python.org/moin/PythonDecoratorLibrary. I like this one because it exposes the cache for clearing – an important feature in tests. This can be accomplished by creating a list of memoized functions and manually resetting their cache. The actual cache reset looks something like this:

@memoize
def method_name(some_arg):
    return some_arg + 1

method_name.cache = {}

However, I think there’s several things we should know about this particular decorator before just using it:

  • It does not properly account for **kwargs. The thing to remember here is that **kwargs is implicitly a dictionary – an unhashable object. There are several popular methods of hashing a dictionary, but the far and away most popular appears to be hashing on frozenset(dict.items()). Another much less popular way is tuple(zip(dict)). We’ll do some testing to determine which is superior. One important thing to remember here is that large amounts of kwargs and long variable names can lead to quite a performance penalty no matter which one is ultimately better.
  • It does not properly handle unhashable or nested arguments. I think this is probably an acceptable limitation because solving it imposes a large penalty on both code performance and code maintainability. I think it is imperative to have a proper test suite to ensure that @memoized methods are not passed unhashable or nested arguments.
  • There appears to be two competing ways to do caching in Python. The first is the Look Before You Leap approach that conventional wisdom dictates, and is the one used here. Some cursory thought on the matter tells me that a more optimistic method of handling cache hits with try/except might work better. We’ll do some testing to determine which is superior.

Each caching strategy was tested over a list of 1 million tuples and utilize kwargs.  The numbers in the legend represent the basic cardinality of the unique values in the tested list.  The cache hit rate can be found by dividing the cardinality by 1 million.  Each memoization strategy was tested 20 times and the test results here are the average. I think that a picture is worth a thouand words, and so I’ve included a pretty graph. However, I’ve also included the base data below.

I think there should be several take aways from this:

  • tuple(zip(dict)) is superior to frozenset(dict.items())
  • Optimistic caching (try/except) is generally superior Look Before You Leap (key in cache)
  • There is a noticeable performance penalty for caching on kwargs. It might be worth having several memoize annotations and using the most appropriate one.
  • Lots of **kwargs with long names causes a major performance penalty

This is the final version of the memoizer (many thanks to Bryce Drennan in the comments for catching a bug in the memoizer):

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        key = (args, tuple(zip(kwargs.iteritems())))
        try:
            return cache[key]
        except KeyError, e:
            value = obj(*args, **kwargs)
            cache[key] = value
            return value
    return memoizer

This is the previous version of the memoizer:

def memoize(obj):
    cache = obj.cache = {}

    @functools.wraps(obj)
    def memoizer(*args, **kwargs):
        key = (args, tuple(zip(kwargs)))
        try:
            return cache[key]
        except KeyError, e:
            cache[key] = value = obj(*args, **kwargs)
            return value
    return memoizer

Raw data

Tiny (Iterations: 1000000, Cardinality: 100)

  • Reference : 1.2129
  • Set : 3.8267
  • Zip : 3.0283
  • Pessimistic : 3.0055
  • Optimistic : 2.4478
  • No kwargs Reference : 0.5133
  • No kwargs Pessimistic : 1.1473
  • No kwargs Optimistic : 0.9309

Low (Iterations: 1000000, Cardinality: 10000)

  • Reference : 1.3167
  • Set : 4.5701
  • Zip : 3.4687
  • Pessimistic : 3.5359
  • Optimistic : 2.9393
  • No kwargs Reference : 0.6553
  • No kwargs Pessimistic : 1.3239
  • No kwargs Optimistic : 1.1201

Med (Iterations: 1000000, Cardinality: 99995)

  • Reference : 1.6757
  • Set : 4.9049
  • Zip : 3.8719
  • Pessimistic : 3.8955
  • Optimistic : 3.2962
  • No kwargs Reference : 0.9838
  • No kwargs Pessimistic : 1.7194
  • No kwargs Optimistic : 1.5371
Advertisements

Filed under: Software Development, , ,