Failing In So Many Ways


Liang Nuren – Failing In So Many Ways

Pypy, Groovy, and more

Let me preface this with the fact that I currently work at a Java shop on the Data Warehousing team and there’s always some people making noise about what our Java.Next() will be.  For a long time I was sure that Java.Next() returned “Groovy”, but it appears that Scala has fast growing fan base where I work.  My new team lead even went so far as to imply that getting anything past the Sr Staff Steering Committee that wasn’t written in Java or Scala was going to be impossible.  That’s quite the bold statement given we have no Scala and lots of Groovy in the code base currently!

So this made me go out looking for examples of Scala and soon enough was wading through Scala, Scheme, Lisp, Clojure, Ocaml, Haskell, and Erlang.  The core concepts behind the languages aren’t really new to me, but the syntax certainly was.  In a lot of ways, I’d say that even well written programs looked every bit as bad as the worst Perl Line Noise I’ve ever seen.  Jumping off the deep end like that had a pretty predictable result: I got absolutely nowhere.  Still, I studied them enough that my head was swimming and I was dreaming about lambda calculus and color blindness tests.

Somewhere along the way, I ran across a rosetta stone [wikipedia] when I found this question on Stack Overflow.  The first thing I noticed was that the algorithm used was extremely naive – but that’s ok because it wasn’t really the purpose of the question.  The author even specifically asked people not to change the way factors were calculated.  I’d answer him on Stack Overflow myself, but frankly this blog post delves far too much into discussion and I don’t want to start my SO career off with negative karma.

So I’m pretty familiar with C and Python, but Erland and Haskell are obviously almost wholly new to me.  I probably would have kept going if it weren’t for Pypy’s superior performance really catching my eye.  Then I began to dig into the question (as posed) and noticed he was calling range() instead of xrange() for a Python2.7 implementation.  Then I thought his Python wasn’t very Pythonic so I rewrote it to use List Comprehensions.

def factorCount (n):
    square = math.sqrt(n)
    isquare = int(square)
    offset = -1 if isquare == square else 0
    return offset + sum([ 2 for x in xrange(1, isquare+1) if n % x == 0 ])

This lead me to some surprising results:


– His code: 1m1.468s
– With list comprehension: 1m13.966s


– His code: 0m34.382s
– With xrange(): 0m30.881s
– With xrange() and list comprenesion: 0m32.628s

Pypy 1.8:

– His code: 0m5.451s
– With xrange(): 0m4.780s
– With xrange() and list comprehension: 0m3.127s

I wasn’t totally sure what to make of CPython’s poor handling of list comprehensions, and Pypy’s superior performance came as a total shock to me.  Last time I’d checked up on Pypy, I had thought it was a Python interpreter written in Python but once all the cards are on the table (including JIT), it appears to run quite a bit closer to the machine than even CPython does.  If I had to describe it to a total neophyte, I think I’d call it something closer to a Python compiler.

At any rate, during the course of the weekend I looked back at the Java/Groovy discussion from above.  If you’re familiar with it, you’ll know that Groovy is basically a superset of Java so you can run exactly the same Java code as Groovy.  So I wrote the totally expected Java implementation and ran it as Java.  Then I ran it as Groovy.  Then I def’ed all the variables and ran it as groovy again.  And then I got a wild hankering to do find out what the performance was like in Scala and Perl.

Anyway: here are the sorted results, including all optimization from the original discussion:

  • Pypy 1.8 2.95 sec
  • Haskell [GHC 7.4.1] 3.27 sec
  • C [ gcc 4.6.3 ] 3.32 sec
  • Scala 2.9.1 9.79 sec
  • Java 1.6 15.77 sec
  • Erlang 5.8.5 28.43 sec
  • Python 2.7 31.79 sec
  • Groovy 1.8 47.61 sec
  • Perl 5.14 61.95 sec
  • Java as Groovy 1.8 69.97 sec
  • Python 3.2 70.38 sec

There’s two big shockers here.  The first is that Pypy was the fastest implementation.  That’s a pretty big shocker to me anyways.  However, the other big shocker was that Groovy with “dynamic” types was actually straight up faster than fully qualified Java-named-Groovy.  I don’t even know what to make of that so I’ll probably explore it a bit.  I currently have all the source code if anyone is actually curious and wants to try to reproduce it.


Filed under: Software Development