On Teachers, Baseball Players, WAR, and My Son’s Blog

When I launched Wachter’s World three years ago, I was aiming for a narrow sweet spot: I wanted the writing to have an unmistakable “voice” (that would be mine) without being too personal. Not to put too fine a point on it, but I remember telling a friend at the time, “If you catch me blogging about my kid’s soccer game, I want you to shoot me.”

Six weeks ago, my older son Doug, now a sophomore and sports management major at the University of Michigan, launched a baseball blog. This not-very-unbiased observer thinks the blog – called Saber by the Bay (the study of baseball statistics is known as “Sabermetrics”) – is splendid: fascinating, funny, and beautifully written. It has already garnered a fair amount of notice, and nearly 2500 visits. Being a dutiful dad, I’ve spent the past month thinking about how to work a mention of Doug’s blog into one of my posts… without causing my friend to pull out his six-shooter.

So here goes.

You’ve probably seen the controversy over some of the newfangled teacher evaluation methods being employed in U.S. school districts, most famously by Washington D.C.’s no-nonsense and combative school chancellor, Michelle Rhee. I recently learned of an increasingly popular system called “value-added modeling,” in which a teacher is assessed by examining each student’s trajectory through, let’s say, 2nd and 3rd grades. Based on this past performance (and some other variables such as family situation and income), one can make a reasonable prediction of the student’s 4th grade test scores.

Enter the fourth grade teacher, who presumably has some impact on his or her pupils. If one of the students does better than predicted, it’s assumed that the improvement is due to the teacher’s skills; more poorly, and the teacher is deemed the culprit. Of course, the usual problems with measurement lurk: Is the change a statistical fluke? Are there unmeasured variables that influence the results? Are test scores really the best way to assess student performance? One can find experts on both sides of the roiling debate over whether these teacher assessment methods are ready for prime time.

My friend David Leonhardt, who is generally supportive of the accountability movement in education (and, for that matter, healthcare), made note of the  shortcomings of the measures in a recent article in the New York Times Sunday Magazine.

Value-added data is not gospel. Among the limitations, scores can bounce around from year to year for any one teacher… In addition, students are not randomly assigned to teachers; indeed, principals may deliberately assign slow learners to certain teachers, unfairly lowering their scores.

Notwithstanding these caveats, Leonhardt ultimately comes down on the “use these measures, with caution” side of the argument. Because of the methodologic limitations, he writes, “some teachers, no doubt, are being done a disservice. Then again, [without such measures] so were a whole lot of students.”

Of course, concerns over value-added modeling are either tamped down or amped up depending on how the measures are deployed. If the data are used simply to provide feedback to individual teachers, no one’s collar gets too hot. But in many school districts, results are now being used in the calculation of bonus payments or pay cuts, markedly raising the stakes. And Chancellor Rhee went even further, firing 26 teachers based in part on the results of the modeling. The teachers union, predictably, went ballistic.

Blogger Kent Bottles, a former UCSF colleague, was, like me, struck by how this kind of methodology might be used in healthcare. One can envision following a large panel of outpatients and, based on baseline demographics and several years of observation, predicting that, say, 175 of the patients would have an ED visit over the next year, 6 would have heart attacks, and 28 would be hospitalized. Could one judge a physician’s quality by assessing how his or her panel of patients actually fared, compared with these predictions? It’s an intriguing idea. Such models would doubtless be as controversial in medicine as they are in teaching.

As I was thinking about this, I wondered whether similar methods were being used to judge performance in other fields. Like, maybe (and I’m pulling one out of a hat here), baseball?

In my day (think Ron Swoboda if you’re wondering when that was), baseball stats were confined to hits, batting average, home runs and RBIs. Baseball addicts like me understood what a slugging percentage was, but that’s as far as it went.

On his blog, Doug doesn’t bother with these pedestrian measures. Instead, the leading statistic on Saber by the Bay is “WAR,” which stands for “wins above replacement.” As he explained it to me, these fancy models depend on the statistical fabrication of a truly mediocre 2nd baseman, or right fielder, or pitcher. The performance of every real-live player in the league is then pitted against these mythic middling players, position by position. These comparisons tell us how many extra runs a given player would have delivered for his team over a year (“Runs Above Replacement”) and, ultimately, how many wins (WAR). According to the art of WAR, this year’s leading player is Josh Hamilton of the Texas Rangers; his WAR of 8.0 means that having Hamilton in the lineup instead of a run-of-the-mill schlub accounts for 8 additional wins over 162 games, easily enough to sway most pennant races.

I’m not enough of a statistical maven to appreciate all the nuances of this, but it seems to me that this concept of a “total value statistic” – a roll-up measure that compares the performance of physicians or hospitals against that of some “replacement level player” – might be an interesting way of capturing the overall quality of care. As Doug wrote me,

You could do something along the lines of taking the ‘death prevention’ ability of any given mediocre doctor and use that value to compare to a good or a great doctor.

The development of a healthcare total value statistic wouldn’t entirely replace our present measures, which tend to look at performance through very narrow lenses (door-to-balloon time, readmission rates, risk-adjusted mortality). But sometimes a wide-angle shot offers the best view.

While you’re mulling this over, check out Saber by the Bay, and tell your baseball fan friends and family. It’s cool.

OK, now shoot me.

2 Responses to “On Teachers, Baseball Players, WAR, and My Son’s Blog”

  1. kevinh76 September 28, 2010 at 3:39 am #

    Ban……. Hey wait. I like the Giants. And that site is pretty slick. And I think Buster Posey should be the rookie of the year (didn’t really know it was abbreviated ROY). And if the Giants make the playoffs, I’ll be going to that site everyday! Thanks Bob!

  2. ppk September 28, 2010 at 10:16 pm #

    I pitched (pardon the pun) a similar argument to colleague last year. We were sitting at a ballgame chatting about his trials/tribulations re: quality improvement at his hospital when a switch flipped. “Money Ball” – that’s what it’s all about. What Billy Beane had learned about baseball, namely, getting low-cost high-statistic players to plug-in to any system, anywhere, and get results is actually what we should be striving for in healthcare. An apt metaphor, no doubt, for the concept “high quality does not necessarily equal high cost.” How is it that the 2007 Colorado Rockies made it to the World Series to face the Boston Red Sox, with a payroll somewhere around $70M less? The key, in Money Ball and healthcare, as you allude to, is rigorous measurement (of the right variables) and then defining (divining?) the right formula – which is easier said than done.

    And then all we have to do is convince Red Sox fans that a middle-market expansion team from flyover country is an acceptable alternative.

Leave a Reply