On Quality Measurement, Babies, and Bathwater

Quality Measurement mavens are reeling these days, as a result of the air being let out of high-profile measures such as tight glucose control, door-to-antibiotic time, and beta-blockers. Some critics have even suggested that we put a moratorium on new quality measures until the science improves.

I hope we don’t.

I think we’re seeing a natural, and fairly predictable, ebb and flow, and our reaction – even to these significant setbacks – should be thoughtful and measured. Here’s why:

The publication of the IOM’s Quality Chasm report (and McGlynn’s findings that we adhere to evidence-based practice about half the time) generated intense pressure to launch transparency and pay-for-performance initiatives. Finding virtually no outcome measures ready for prime time (the data collection burden was too large and the science of case-mix adjustment too immature), policymakers and payers logically looked to process measures (aspirin, ACE inhibitors, pneumovax) for common diseases (MI, CHF, pneumonia), delivered in settings (hospitals) that could be held accountable. And they sought levels of evidence that were, if not perfect, then at least good enough.

The National Quality Forum was created to vet this evidence. But the NQF has a problem not unlike that of the FDA: too low an evidence bar and bad measures become “law”; too high a bar and the ravenous hunger for quality measures goes unsated. Unsurprisingly, the demand for measures won out and the bar was set relatively low – not so much in terms of study design, but rather in terms of the degree to which initial promising studies had their findings confirmed by subsequent research.

With that as prelude, we shouldn’t be shocked by what we’re seeing now: a mini-pattern in which one or two tightly managed, single-site studies that showed great benefit are followed by studies done in more diverse and real world settings whose results are disappointing. It has always been thusly. The difference is that now, by the time the later studies are published, the quality measures have long since been disseminated.

I won’t belabor the point since I’ve covered this ground previously in my discussions of the individual measures (such as glucose, beta-blockers, and door-to-antibiotics). But the fascinating trend to watch now is the beginnings of a Quality Measurement Backlash – it’s not a full-fledged, “spontaneous” Sean Hannity tea party just yet, but the night is young. Consider, for example, the Jerry Groopman/Pamela Hartzband article in last week’s Wall Street Journal:

In too many cases, the quality measures have been hastily adopted, only to be proven wrong and even potentially dangerous to patients….Yet too often quality metrics coerce doctors into rigid and ill-advised procedures. Orwell could have written about how the word “quality” became zealously defined by regulators, and then redefined with each change in consensus guidelines….

The solution, say the authors, is to stop the presses:

Before a surgeon begins an operation, he must stop and call a “time-out” to verify that he has all the correct information and instruments to safely proceed. We need a national time-out in the rush to mandate what policymakers term quality care to prevent doing more harm than good.

If that wasn’t enough fun for the quality measurers, the article by Rachel Werner and Bob McNutt in last week’s JAMA surely was. After critiquing today’s measures for the usual reasons, the authors suggest a “new approach”:

First, the focus of quality improvement initiatives should be on improving rather than measuring quality of care… Second, quality improvement initiatives should be tied to local actions and local results rather than national norms. This acknowledges that quality improvement efforts are not generalizable and one solution does not fit all.

…Quality improvement incentives can be restructured based on these principles. Current incentives are based on measured performance and are benchmarked to national norms. An alternative approach is to tie incentives to the local process of improving quality of care rather than the results of quality measures. This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement… A logical next step is to tie current quality improvement incentives to this approach—pay based on participation in quality improvement efforts rather than simply comparing each other on measures that do not reflect the learning that is required to really improve care.

The small, elite group of nationally recognized measure vettors is feeling increasingly besieged. You may have seen the comment on this blog by one of them, Dale Bratzler of the Oklahoma Foundation for Medical Quality (creator of many national measures, including those used in the Surgical Care Improvement Program), written in response to my post on tight glucose control. Bratzler wrote:

…I am tiring of some of the criticisms related to quality initiatives because the authors of those criticisms often fall victim to the same practices that they criticize. It seems to be increasingly common for opinion pieces, editorials, anecdotal reports, underpowered studies, and single-institution studies to be used to suggest that quality initiatives are resulting in widespread patient harm. Frankly, I have not seen systematic evidence of that for most national quality initiatives and in some cases, have data to suggest that for many of the conditions targeted in those initiatives, patient outcomes are slowly but progressively improving.

Bratzler goes on to state, correctly, that the glucose standard in SCIP was not the brutally tight 80-110 mg/dL, but rather a more generous (and less dangerous) <200 mg/dL. He then acknowledges that

…some hospitals undoubtedly go beyond the requirements of the SCIP measure and that could result in harm… But on a national basis, surgical outcomes actually are improving over time and there is no national requirement to implement programs of intensive blood sugar control.

That last point is technically accurate, but other national campaigns, or influential national organizations, have promoted tighter control than that recommended in SCIP. For example, the Surviving Sepsis Campaign targets a glucose level of <150 mg/dL, and the Institute for Healthcare Improvement’s target is the Van Den Berghe standard of 80-110 mg/dL (although, to be fair, IHI stuck to the SCIP standards in their recently completed 5 Million Lives campaign).

But before we get too distracted by all these angels dancing atop an insulin pen, let’s take a step back and consider the big picture. We’ve seen that some widely disseminated and promoted performance measures haven’t worked out as intended – usually because evidence emerged that was less impressive than the initial salvo.

And now we have Groopman and Hartzband arguing that we should take a “time out” on quality measures, leaving it to doctors to make their own choices since only they truly know their patients. Do we really believe that the world will be a better place if we went back to every doctor deciding by him or herself what treatment to offer, when we have irrefutable data demonstrating huge gaps between evidence-based and actual practice? Even when we KNOW the right thing to do (as in handwashing), we fail to do it nearly half the time! Do the authors really believe that the strategy should remain “Doctor Knows Best”; just stay out of our collective hair? Pullease…

And if we agree that we need some measurement to catalyze improvement efforts, do we really want measures that can be met through elaborate dog-and-pony shows, with no demonstration of improved processes or outcomes? Really? Sure, the Joint Commission should check to be sure that hospitals use strong QI methods (a major interest of new Joint Commission prez Mark Chassin, BTW), but there has to be more, much more. A close colleague, one of the world’s foremost quality experts, wrote me about the Werner/McNutt article, finding it unhelpful

…because you still have the measurement problem – how are you going to know whether or not any of these actions are actually happening, and whether or not they are actually improving anything?



The sentence in the JAMA piece, “This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement” reminded this colleague of the TQM fad twenty years ago, when some accreditors and insurers began requiring documentation of “storyboards” of hospitals’ PDSA cycles. He recalled visiting some hospitals preparing for inspections and

…I would see the storyboards on the wards and they were just laughable in terms of what they presented and their supposed relation to cause-and-effect. It was all a charade, done to get through the inspection and nothing – or very close to nothing – meaningful was really being accomplished.

So, to the Dale Bratzler’s of the world, I say, Courage! Keep it up. And don’t let the bums (including this one) get you (too) down. The bottom line is that we need quality measures, and we need rigorous research to create good ones. When we do create a flawed measure (an inevitability), let’s admit it and fix it. If hospitals have been pushed toward tight glucose control based on now-partly discredited evidence, let’s say so, improve the measure, and resolve to learn something from the experience – not just when we’re thinking about glucose measurement but also when we’re considering the strength of the evidence supporting other difficult-to-implement and potentially dangerous practices. Ditto door-to-antibiotic timing.

As for me, I’ll keep critiquing bad measures and pointing out when new science emerges that changes how we should think about existing measures. But I’ll continue to support thoughtful implementation of transparency programs and experiments using P4P. From where I sit, of all our options to meet the mandates to improve quality and safety, tenaciously clinging to a Marcus Welbian (and demonstrably low quality) status quo or creating tests that can be passed by appearing to be working on improvement seem like two of the worst.

6 Responses to “On Quality Measurement, Babies, and Bathwater”

  1. Dan Varga April 16, 2009 at 5:07 pm #

    An excellent post, Bob.

    Just a brief thought. Years ago, the standard “measurement” for hyperlipidemia was a goal cholesterol of 240 or less. We gave countless patients constipation (cholestyramine) and hot flashes (niacin) to try to meet a “performance indicator” that we all know today to be completely inadequate. Yet, striving to achieve this inadequate indicator with these side effect laden interventions reduced cardiac risk. When we recongized that we had subsets of patients at goal suffering heart attacks, etc., we did not abandon cholesterol measurement or lipid lowering therapy. Rather, we made the treatment and the indicators better. We save countless lives today because we did not “throw the baby out with the bathwater.”

    A moratorium on the use of qualtiy measure while we wait for the “perfect” indicator is misguided primarily because almost every indicator and indicator threshold we use today is the improved version of some “gold standard” indicator or threshold that has subsequently been proven inadequate. It reveals a simple fact: using indicators makes them better. We certainly must refine and improve our clinical evidence base, comparative effectiveness understanding, and the consensus development process. We must also do our best to prevent harm and unintended consequences in the process. But, we cannot stop measuring our performance, holding ourselves accountable, and improving our care while we wait for the tablets to be miraculously distributed.

    Thanks for all of the great posts and insights.

    Regards.

  2. menoalittle April 19, 2009 at 2:25 am #

    Bob,

    This is a thought-provoking piece. The reason that “Quality Measurement” mavens are reeling these days is because of the rigidity in their judgment as to what to measure. The generally accepted definition of quality in medicine is giving the right medicine (or treatment) to the right patient at the right time.

    The perfect execution of care to achieve the standards of that definition requires traits such as clinical judgment, clinical creativity, and seaoning. These are never constructively critiqued or measured in any way, shape, or form in peer review or elsewhere in the clinic. The quality definition presupposes that not any two patients are carbon copies of each other. Of course, there are general guidelines and sage teaching as to how to manage medical and surgical illness. But it is the differences between how any two people are affected by an illness or its treatment that causes the practice of medicine to be an art based on a foundation of scientific precepts. Then, add on to this, the distortion of clinical judgment caused by not so benevolent government and insurance carrier threats and interventions.

    The quality mavens should measure (and encourage) how often the right medicine or surgery is given to the right patient at the right time, instead of promoting rigid care criteria that predisposes patients with an “infiltrate” of edema fluid to be treated with antibiotics, and then be coded as a pneumonia for perpetuity (note the recent BIDC-Google Health conversation on diagnosis codes as reported in the Boston Globe).

    http://www.boston.com/news/nation/washington/articles/2009/04/13/electronic_health_records_raise_doubt/

    http://www.boston.com/news/nation/washington/articles/2009/04/18/beth_israel_halts_sending_insurance_data_to_google/

    And while we are focused on the quality of care we think we are measuring, we and our government should measure the quality of the measuring tools (before being used on patients), the HIT CPOE instruments (thought by at least some of these same “Quality Measurement mavens” to be the panacea for quality measurement and improvement, but never proven to be) that are proliferating across the US despite flaws, defects, and unusability.

    Do the gag and hold harmless clauses signed by CIOs and hospital “C suite” folks as recently reported in JAMA keep quality mavens from measuring the quality of their tools to measure quality? I will not even raise the impact of financial conflicts in this venue.

    Best regards,

    Menoalittle

  3. Helen Burstin April 22, 2009 at 5:30 pm #

    Bob, thanks for another thoughtful blog post on quality.  

    I wanted to chime in about the role of NQF.  As you point out, NQF was “created to vet this evidence” and we have worked hard to raise the bar on quality measurement.  In fact, for several of the examples cited, such as tight control of HbA1C, we have held the line and rejected measures that did not have a stable evidence base.  It is not always popular to do so, but it is the right thing to do. Last year, NQF rejected a measure on hemoglobin levels for ESRD patients receiving ESA [erythropoiesis-stimulating agent] therapy. Though safety concerns with ESAs were primarily focused on cancer patients, the lack of clinical consensus suggested that it was not the right time to fix a tight hemoglobin target while the evidence remains unclear at best.

    The quality measures that are endorsed and implemented must be as dynamic as the evidence on which they are based.  Evidence from unintended consequences from the measure regarding antibiotic use within 4 hours for pneumonia rapidly launched a review at NQF and the modified measure was quickly endorsed.   We are developing a web-based feedback loop so that end-users can alert us to problems with measures during implementation or changes in the evidence base.  The required three-year maintenance reviews on all measures should also ensure that NQF endorsed measures are based on the highest levels of up-to-date evidence.  

    Last August, we updated the NQF measurement evaluation criteria in order to raise the bar on measurement and move away from narrow process measures and toward outcomes and composite measures. A must-pass criterion for all new measures is now “importance to measure and report.”  If a significant gap in care or significant variation between providers cannot be demonstrated, then our energies should be better directed toward areas where real improvement in patient outcomes in possible.  

    Thanks for the dialogue.

  4. fmozdy May 3, 2009 at 9:54 pm #

    Bob,

    Something was bothering me about the backlash for tight glycemic control in particular, and quality measures in general. Improved glycemic control does make physiologic sense, as you and others have noted. The bad outcome for the intervention group in the NICE-SUGAR study tells us more about the deficiencies of the process than about the goal (improved control.) As Dan Varga commented, we should be encouraged to improve our targets (no more 80-110 sugars) and our methods (imperfect IV insulin protocols, lack of real-time monitoring devices to save nursing time.) It is discouraging to hear people call for cessation of efforts to improve our care of patients with hyperglycemia due to lack of skill in lowering sugar safely.

    I recall your post one year ago describing the Gartner Group’s Hype Cycle of Emerging Technology. It seems we are in the Trough of Disillusionment. But I have high hopes that soon we will be moving up the Slope of Enlightenment.

    Thanks,
    Frank Mozdy

  5. MKirschMD May 10, 2009 at 8:21 pm #

    Supporting medical quality is a slogan. Who would challenge the mission? Yet, how do you define and measure medical quality. This is the Gordian knot of this issue. See http://mdwhistleblower.blogspot.com/2009/02/appraising-art-and-medicine.html for additional thoughts on this issue. Medicine is an art that can’t be assesses and quantified like factory widgets or electronic devices. How can you assess and reward thorough medical history taking, physical examination skills and medical judgment – the true components of medical quality. Just because we can measure it, doesn’t mean it really matters.

    Michael Kirsch, M.D.
    http://www.MDWhistleblower.blogspot.com

  6. william reichert,MD May 27, 2010 at 2:33 pm #

    Last week I received a notice from my hospital that from now on any patient who was under my care and who had more than 20% blood glucose readings over 180 had received “inappropriate care”.
    The committee noted that I treated a patient for DKA with an initial blood glucose of over 700. Placed on an insulin drip, the readings came down slowly and the patient was discharged after an inpatient stay of 36 hours with a blood glucose of 127. Of course, the majority of readings were above 180 before the goal was reached .Hence, I had delivered “inappropriate care”.
    I guess I should, in the future, keep the patient in the hospital till the number
    of readings below 180 exceeds 80%.

    Despite my chagrin regarding the above, my greater concern is not the target of
    180 but the feasibility of reaching this target safely in patients outside the ICU setting. I have read several articles recommending protocols for achieving excellent glycemic control on the wards but I have yet to find reports that these protocols actually work on patients who eat irregularly, are put on and off drugs that abruptly effect blood sugar, have fluctuating metabolic influences on the glucose level and so on. Furthermore, I have used these protocols and find that they do NOT work. Do you know any hospital that has met the glucose targets of consistent levels below 180 in patients off the ICU ( off the insulin pump, etc)?
    If so, I would like to pay a site visit and discover the methods they are using .
    Thank you.

    William Reichert,MD

Leave a Reply