Thursday, March 22, 2007

Lies, Damned Lies, and Statistics

Lately I've been thinking I should blog about things I actually know and do less of my usual, ahem, making shit up. The problem is, the stuff I know is largely boring, and moreover it kinda reminds me of work. Well, maybe my 2 weeks of pseudo-unemployment (I start a new job on Monday, and nominally had these 2 weeks off) have rekindled an interest in blogging about technical stuff. Or maybe I'm feeling didactic. Either way, when I read this bit below, I thought it was time for a lesson. (from TPM)

Six of the eight U.S. attorneys fired by the Justice Department ranked in the top third among their peers for the number of prosecutions filed last year, according to an analysis of federal records.

I thought...what are the odds of that? And then I thought, well that can easily be calculated by binomial statistics (better here). The binomal distribution is used when things are split into a binary (two option) state with a certain probability. The chances of rolling a 6 on a die is 1 in 6. The chances you don't are 5 in 6. You either do or don't (ie binary), and you have a probability of 1 in 6 (ie known probability). In engineering terms, you have a probability distribution of either one state or another each has a defined probability. In general the probability distribution for multiple events can be obtained by the mathematical convolution of the probability distribution with itself multiple times. Convolution sounds scary, but it's not bad for binomials. It is a little harder because there are more states: you could have 0 sixes, 1 six, 2 sixes ... 5 sixes. Anyway the whole problem was worked out long ago and reduced to a formula.

Unfortunately, some of the mathematical notation required to explain the formula is really hard to type into Blogger (if you recognize that as a cheap excuse, you win a cookie). So instead of hitting the theory, I'll show you how to cheat and just get the answer out of Excel! (Go surf a mathematician's blog if you want theory, engineers are all about plug 'n chug.)

The Excel formula is:
=BINOMDIST(number,trials,probability,cumulative)

In this case the number and trials are, 6 (high-performers) of 8 (trials).

On to probability. We are given the criteria that some attorneys were in the top third of their peers. What's the probability that someone might be in the top third? Hmm, let's think hard... AHA! how about 1/3 (or 0.33333) ?

The next part gets technical. If we want a cumulative value, which is the sum of all smaller values in the probability distribution, we type "true". If we type "false" we get the probability for exactly 6 of 8. In this case we'll take the cumulative value to express the odds of getting 5 or less.

That gives us an answer of 0.9974. Probabilites are given on a scale of zero to one, so that means you have a 99.74% chance of getting zero through five attorneys from the top third. Alternatively, it means is if you randomly picked US district attorneys, there's a 0.26% chance that you would get 6 or more in the top third. When conducting a difference test one starts with the null hypothesis that two populations are the same. In this case, we compare our population to the general US attorney population. In most applications the threshold to demonstrate a significant difference is 5%. A result at 1% is usually considered a "highly significant" difference. So our value of 0.26% is highly significant. All of which reinforces the intuitive point of the article: that these attorneys were better than the baseline.

The White House originally claimed it fired the attorneys for performance reasons...and now we find that as a group, they do have a pretty uncommon performance record. Uncommonly good. Maybe so good they got fired?

Now the White House has retracted the performance issue (probably to draw attention away from the relatively good records). Even so, the chance of them randomly picking a set of attorneys with this quality work is under 3 in 1000. The chance of the White House being full of shit...it's significant.


BONUS MATH: If this is true, and the White House had not fired the loyal loser then the probability value would have been 0.99954. That would have been "extremely significant".

TAKEHOME PROBLEM: How many losers should the White House have fired to cover up for their purge of people actually doing too good a job?

Labels: , , , ,

0 Comments:

Post a Comment

<< Home