Failing In So Many Ways


Liang Nuren – Failing In So Many Ways

Pair Programming, Code Reviews, and Data Warehousing

Code Reviewing

Code reviewing [Wikipedia] is the concept of having some form of peer review of finished code in order to ensure that it does what its supposed to do and that the approach taken to solve the problem was a good one.  There are two really common forms of code review – the formal code review and the lightweight code review.  A formal code review involves a thorough review and understanding of every line of code, frequently by everyone on a team.  Obviously, this is a very heavy process and formal code reviews are considered too time intensive for anything but the most sensitive code; they are considered almost antiquated these days.  Lightweight code reviews tend to be more informal and involve shorter looks at smaller blocks of code – but the danger is that the code review can be meaningless because of “rubber stamping”.  Both formal and informal code reviews have been shown to decrease the defect rate and improve knowledge transfer within a team.

Pair Programming

Pair programming [Wikipedia] is the concept of having two (or more) developers work on the same piece of code at the same time at the same work station.  In a very real way, pair programming is “on the fly” code reviewing – as such it also lowers the defect rate and it improves knowledge transfer.  It’s generally accepted that two programmers get a single piece of work done faster than one, but not twice as fast.  There is a net productivity loss when pair programming, and its hoped that the benefits make up for it.  I’ve personally seen it work a variety of ways, from Driver/Navigator to Test Ping Pong.  In all cases, both parties are expected to fully understand the overall design and code being written.

Data Warehousing

Data Warehousing [Wikipedia] is a branch of computing which involves the creation and care of large stores of data for the purpose of answering questions.  For instance, it is useful to know how many people clicked on a particular ad banner, or how many RC Helicopters were sold at Best Buys in Ohio.  For the Eve readers, killboards are examples of either Data Marts or Data Warehouses – depending on who you ask.  The discipline is closely related to data mining [Wikipedia], which often makes use of a data warehouse.

Most data warehousing is done via Extract, Transform, Load [Wikipedia] processes in databases like  PostgreSQLOracle, and MySQL, though certainly most serious data warehousing is done with a combination of technologies involving  Distributed File Systems [Wikipedia] and Map/Reduce.  To give you some idea of the scales involved in data warehousing: the largest single instance databases in the world weigh in at over 2 PB and data is amazingly scarce about larger data stores.  I’d estimate some of the larger data warehouses in the world weigh in at hundreds of PB now.  Personally, I’ve worked with data warehouses on 500GB and processing millions of facts per day to 150TB+ and processing up to trillions of facts per day.  I’d say your average data warehousing company isn’t likely to see more than 75GB of data per day and will store something on the order of 500GB-2TB.

The Dilemma

The internet debates over pair programming vs code reviews seem to be endless, but most of the teams I’ve encountered practicing some form of XP have a fairly strong preference for pair programming over code reviews.  The argument tends to go that what is really important is the second set of eyes on the code.  Furthermore, pair programming naturally avoids the danger of “rubber stamp” code reviews because its much harder when your reviewer is helping write the code.  These are absolutely valid observations and I’m a big fan of pair programming.

However, I feel like the right answer for a data warehousing team is not to pick one – but to pick both when possible. While this does mean that the process is very slightly heavier, I want to point out that the cost of failure is much higher.  A friend of mine points out that when most SAAS developers make a mistake, they fix it and bounce a web server – but when I make a mistake, we spend three weeks (re)migrating data.  Ultimately what everyone involved  – from product managers to the developers – wants is for the team to deliver results in a timely manner… and really, three weeks is a hell of a delay because you didn’t spend 20 minutes doing a code review.

So please consider the cost of failure when you’re considering whether you should do pair programming or code reviews.


Filed under: Data Warehousing, Databases, Software Development, , , ,

2 Responses

  1. Our agile teams practice pair programming and informal code reviews with occasional formal reviews.

  2. Mara Rinn says:

    My company doesn’t have teams so much as a loose alliance of rogue programmers. So far my job has been cleaning up after the rogues, refactoring, documenting, adding test suites into code that was built to work once only, and assumed to work for every instance since then. Amusingly enough when it breaks, the people responsible just fix the errors by hand, rather than fix the code. After all, having stuff to do by hand means you can prove that you’ve been working. /facepalm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: