More Statistical Musings

February 5, 2008 at 10:27 PM

Here's a big surprise: the MPAA has been publishing grossly inaccurate research to corroborate their ridiculous claims about the cost of internet-based piracy and from where it comes. The original claim stated that 44% of the movie industry's domestic losses came from the illegal downloading of college students; the newly published numbers now claim it's only 15%. That's a 293% error; no big deal, right?

I posted a few days ago about statistical bugbears in Virgil Griffith's "Books that make you dumb". I just re-read mypost, and I sound like a bit of a jerk; Virgil, however, responded graciously to my post, and we had a nice little email thread about possible ways to coax the data he used into more meaningful relationships. His project is very interesting, and it's really cool to see what you can cobble together without actually having to perform a full on survey.

Of more interest to me, however, is Phil Haack's research purporting the effectiveness of Test-Driven Development (read up on TDD if you're unfamiliar with the concept). While I haven't delved into the numbers myself, there's an interesting post that takes the the paper's results, then dissects and largely refutes them:

Productivity is an example where causality is far from certain. It makes sense to me that more productive programmers write more tests if only because productive programmers feel like they have the time to do things the way they know they should. Even with "Test First" the emotional effect of "being productive" is going to have an impact on the number of tests you create before moving on to the next phase.

[snip]

Ask yourself this, which unit tests are going to be better, the ones created before you've implemented the functionality or those created after? Or, considered another way, at which point do you have a better understanding of the problem domain, before implementing it or after? If you're like me, you understand the problem better after implementing it and can thus create more relevant tests after the fact than before. I know better where the important edge-cases are and how to expose them and that could mean that my tests are more robust.

(via zedshaw.com)

Personally, I agree with this. When I was at Jobster, we tried to maintain good test coverage and write tests for all new code. I believe these practices helped bugs surface more quickly and served as built-in documentation of sorts. Sometimes, however, writing a test turned out to be harder than making the tested software: I remember many painful hours trying to set up test objects correctly because we had mocked our internal web service calls so we could run tests without having to actually hit a given service (note to anyone who has ever worked on a test that saves an instance of the Job model: you deserve the software engineer version of a Purple Heart).

Tests are not an end in themselves, and I found that they often mislead me in the initial design of a piece of software. Design first; once you've perfected the design, testing your code can help make sure it's working (and that others' changes don't break it too badly).

And please, use statistics carefully and wisely!