A disconcerting pattern has emerged: a blockbuster study finds that a certain practice leads to improved outcomes. Large national organizations codify the practice into a quality measure, forcing widespread adoption. Later studies prove the practice to be unhelpful, perhaps even dangerous.
Oops.
Think about it – we’ve now seen quality measures that prompted the use of boatloads of unnecessary antibiotics (“door-to-antibiotic time” in patients with suspected community-acquired pneumonia), “can’t miss” quality measures that proved wrong (giving beta blockers to every perioperative patient), quality measures that promote gamesmanship and box-checking as a surrogate for meaningful action (smoking cessation counseling), and quality measures that trade efforts to prevent one kind of harm (preventing falls) for another (tethering some elderly hospitalized patients to their beds, leading to deconditioning and pressure ulcers).
Let’s now add tight glucose control in critically ill patients to the Hall of Hiccups.
A multicenter study of ICU patients in Australia and New Zealand (NICE-SUGAR) in this week’s NEJM demonstrates that tight glucose control – as pathophysiologically sensible as can be (remember those vivid movies of drunken white blood cells vainly trying to breaststroke their way through sugary serum to reach sites of infection) – resulted in significant harm. The study found that driving glucoses down to the range of 81-108 led to a 90-day mortality rate of 27.5% (vs. 24.9% in controls). In other words, intensive insulin therapy was associated with a nearly 15% higher odds of death. The exact mechanism of harm is unclear; some of it was undoubtedly from severe hypoglycemia (272/3016 episodes in the tight control group, vs. 16/3014 in the controls), but there may have been other undefined detrimental effects.
Tight glucose control first hit the charts in 2001, when a single site (in Leuven, Belgium) unblinded study in surgical ICU patients reported a whopping 34% reduction in mortality. The study setting was atypical in many ways: two-thirds of the patients were recovering from CT surgery (and the control post-op patients had an unusually high mortality rate), the nurse-to-patient ratio was 1:1 (vs. 1:2 in most U.S. ICUs), and the nurses were often backed up by a study physician.
Nevertheless, based nearly entirely on this single (and singular) result, tight glucose control became a standard-of-care practice for all ICU patients, and even some non-ICU patients. It was also integrated into “bundles” in other quality initiatives, such as the Surviving Sepsis campaign and the Surgical Care Improvement Program (SCIP).
Perhaps unsurprisingly (given Leuven’s rarified conditions), subsequent studies in other populations (medical ICU patients, for example) were less impressive; most [two nice reviews are here and here] found that tight control yielded no benefit. Two studies [here and here] were even stopped early because of unacceptably high rates of hypoglycemia. And now there’s NICE-SUGAR, whose results, if you’re an intensive insulin fan, are anything but “nice.”
Hypoglycemia isn’t the only risk of intensive insulin therapy; it is also a resource hog. Observing efforts to administer intensive insulin therapy in our UCSF critical care units, our ICU director Mike Gropper noted that
Each glucose determination required 7 minutes of nursing time; a nurse caring for 2 patients on the insulin protocol would spend approximately 2 hours of a 12-hour shift monitoring the patient, obtaining samples, performing tests, and intervening.
In other words, in a 16-bed ICU in which nearly half the patients might have “indications” for intensive insulin therapy, approximately 1 FTE of nursing time is focused on glucose control, around-the-clock. For our 80 ICU beds, that would sum up to about 4-5 FTEs. The yearly cost of this would run into the several millions of dollars (if the nurse time was paid for); the cost might be exacted in patient care if the nurse-to-patient ratio went unchanged but the nurses found the time by doing fewer of their other important tasks.
The bottom line is when we chisel a practice into the Tablet of Quality Measures and bring it down the Mountain, it's a very big deal – when we get one wrong, we can do an awful lot of harm. And it seems like we’ve gotten quite a few wrong lately.
Although many, including me, have long lamented the glacial pace of adoption of evidence-based practices in American medicine (often 5-15 years after the emergence of truly robust evidence supporting a practice), this traditional time lag did have one virtue: it allowed the literature to mature. Not infrequently, after an early trial showed benefit, later studies – performed under less controlled circumstances by less committed investigators with fewer dedicated resources – found no benefit or even harm. By the time the laggards were ready to consider changing their practice, the practice had been debunked by new evidence.
My friend and colleague Kaveh Shojania wrote a terrific piece on this phenomenon in our web-based safety journal, AHRQ WebM&M. Kaveh described his approach to the single blockbuster study that shows staggering benefit:
No single principle can encapsulate all of the interpretive issues for a body of literature, but nothing works that well comes close. To allow some wiggle room for the discovery of penicillin and other occasional quantum leaps in medical care, we can tone it down a little: most things don’t work that well. The relevant corollary is that any study reporting dramatic improvements in any major clinical outcome is probably flawed. When clinical interventions do work, they tend to bring very modest gains: relative improvements of 20% to 40% are often cause for celebration, and absolute improvements in the 5% to 10% range represent major advances in care. If an article reports improvements in these ranges, scrutinize it closely. If the improvements exceed these ranges, expect subsequent studies to show less impressive effects, or even no benefit.
Today's environment of quality metrics, public reporting, and P4P has turbo-charged the adoption curve. Just like the 24-hour news cycle sometimes makes us too impatient to give policy initiatives a chance to ripen (see Joe Klein’s thoughtful essay on this viz Obama’s economic policies in last week’s Time magazine), we need to recognize that this shorter cycle – taking a single positive study and mainlining it into a quality standard – risks disseminating some practices that will prove to be unhelpful, or even harmful.
Yes, we need more quality and safety measures to promote improvement. But we don’t need them so badly that we can’t afford to wait until the practices that look good in early studies, in atypical environments, are road tested in more typical settings to ensure that they really work.