Urea breath testing and analysis in the primary care office

Article Type
Changed
Mon, 01/14/2019 - 13:07
Display Headline
Urea breath testing and analysis in the primary care office

The13C-urea breath test provides accurate, noninvasive diagnosis of active Helicobacter pylori infection and can document posttherapy cure. This study evaluated point-of-care testing with onsite sample analysis with the use of a desktop infrared spectrophotometer. Ambulatory patients (N = 320) underwent 13C-urea breath testing, and breath samples were analyzed immediately by clinic staff with no prior breath testing experience. Duplicate samples were sent to a reference laboratory, and the results of both methods were compared. Point-of-care testing was simple, with an overall agreement of 99.1%. Accurate near-patient 13C-urea breath testing is now practical in the primary care setting even when done by inexperienced personnel.

Helicobacter pylori infection is etiologically associated with chronic gastritis, peptic ulcer disease, and gastric cancer.1-5Helicobacter pylori infection is also a consideration in the evaluation of uninvestigated or undifferentiated dyspepsia.4-7 The steps in the management of H pylori infection include diagnosis, choice of appropriate therapy, and confirmation of cure.4,5,8 Diagnosis and confirmation of cure require diagnostic testing, and the recent consensus has been in the direction of noninvasive testing.4,5

The urea breath test (UBT) is generally considered the clinical, gold standard, noninvasive test for detection of active H pylori infection.9 The intragastric hydrolysis of orally administered urea by H pylori urease produces a change in the isotopic ratio (13CO2/12CO2) in the breath.10 The 13C-UBT has been approved by the US Food and Drug Administration for pre- and posttherapy testing.11,12

Recently the 13C-UBT has been shortened and simplified by the use of a citric acid test meal and elimination of the requirement of a 4-hour fast.13 Use of infrared mass spectrophotometry for measuring 13CO2 enrichment in breath samples makes it theoretically possible to do office-based breath testing with onsite analysis.14-16 Infrared spectrophotometry has been shown in comparative studies to be an accurate method for assessing 13CO2 enrichment in breath.17-19 No comparative studies of the 2 analytical methods have been done in the United States.

This study examined the utility of UBT testing by comparing infrared spectrophotometry with traditional gas isotope ratio mass spectrometry in the primary care environment. Our hypothesis was that the results obtained from the primary care clinics would be as accurate as those obtained from the commercial laboratory or the more experienced hospital-based clinical laboratory.

Methods

This was a multicenter, prospective study designed to compare a new infrared spectrophotometer (UBiTIR300, Otsuka Pharmaceuticals, Tokyo, Japan) for measuring 13CO2 enrichment in breath with gas isotope ratio mass spectrometry (ABCA, Europa Scientific, Cheshire, UK). Subjects were recruited at the offices or clinics of 4 physicians’ including an indigent care primary care clinic, a hospital-based gastroenterology clinic, a private practice internal medicine office, an academic family medicine clinic, and a tertiary care clinical laboratory site. The study was done between July and September 2001. Consecutive patients were enrolled in the study if they expressed interest in participating and met the inclusion and exclusion criteria. Each site was provided with an infrared spectrophotometer, a breath gas transfer device, and commercially available UBT breath test collection kits (Meretek Diagnostics, Nashville, TN). Each kit contained a 13C-urea solution (125 mg in 75 mL of water), test meal pudding, and a specimen return box containing 4 bar-coded, evacuated 10-mL sample tubes. Before enrollment, all study personnel received approximately 1 hour of training in the performance of the test and use of the equipment.

Study procedures

Subjects included in the study were medically stable, ambulatory patients between 18 and 75 years old who were asymptomatic or experiencing dyspepsia. Potential subjects were excluded from study participation if they took bismuth preparations, antibiotics (ie, amoxicillin, tetracycline, metronidazole, clarithromycin, or azithromycin) or any anti-ulcer medication in dosages indicated for ulcer disease (ie, proton pump inhibitors, type 2 histamine blockers, or misoprostol) within 2 weeks before the study breath test. Exclusion criteria also included participation in a drug study within 4 weeks, treatment for eradication of H pylori within 4 weeks of the study breath test, or a history of gastric surgery or vagotomy for ulcer disease except simple closure of a gastric perforation.

The protocol was approved by the local Institutional Review Board for Human Studies, and all subjects provided written informed consent.

Testing began with a minimum 4-hour fast from solid food. Breath samples were obtained in disposable, balloonlike, breath collection bags designed for use with the infrared spectrophotometer. One sample was obtained immediately before ingestion of the 13C-urea test solution, and the second was collected 30 minutes after substrate ingestion. Paired sample aliquots were taken for separate analyses. Results from local infrared instruments were blinded to the central laboratory.

 

 

Statistical analyses

Each primary care site was asked to enroll subjects until 80 positive cases were identified from among all sites. The clinical laboratory site tested a minimum of 30 positive and 30 negative cases. The primary endpoint was the percentage of agreement (overall and within positive and negative cases separately) of results from both methods. Delta-over-baseline (DOB) enrichment values below 2.4 per mil were deemed negative and values greater than or equal to 2.4 per mil were deemed positive. The predicate reference method was gas isotope ratio mass spectrometry. Equivalence was defined by the percentage of agreement for positive and negative cases based on gas isotope ratio mass spectrometry results of at least 95%, and the lower limit of the 95% confidence interval was based on a percentage of agreement of at least 90% for positive cases.

Results

The primary care centers enrolled 258 subjects and the clinical laboratory enrolled 64 subjects, for a total enrollment of 322. The subjects’ mean age was 41.5 years (range, 18–70 years), with 88 black non-Hispanics, 106 Hispanics, 92 whites, 32 Asian/Pacific Islanders, and 4 other ethnic groups. There were 215 women and 107 men. Approximately 18% had active or previous gastrointestinal ailments, including previously diagnosed H pylori infection (11%) and peptic ulcer disease (3%).

There was excellent agreement between methods (Table), with an overall agreement of 99% (95% confidence interval, 97.3–99.7). Two subjects were excluded from the analysis of the primary and secondary endpoints because 1 or both assay values were missing. The data showed close correlation between methods among all sites (Figure). Evaluation by the personnel who performed the tests and analyses indicated that the office procedure was easy and nonintrusive.

There were 3 disagreements between the results obtained with the devices, 1 from the gastroenterology site and 2 from the clinical laboratory site. All results were near the cutoff value.

TABLE
Comparison of results between IRMS and GIRMS*

 GIRMS, %
IRMS, %NegativePositiveTotal
Negative1115116
Positive2022204
Total203117320
*Overall agreement, 99.1% (95% confidence interval, 97.3–99.7); positive agreement, 98.2% (95% confidence interval, 94.2–99.7); negative agreement, 99.5% (95% confidence interval, 97.3–99.9).
Kappa statistic = 0.98.
GIRMS, gas isotope ratio mass spectrometry; IRMS, infrared mass spectrophotometry.

FIGURE
Comparison of results between 2 methods of analysis of 13CO2 breath sample enrichments

Discussion

13C-urea breath testing is an accurate diagnostic method for the detection of active H pylori infection12,19-21 and point-of-care assessment of curative therapy. This study confirmed the hypothesis that the infrared instrument is an easy to operate alternative to the original sendout analyses. Rapid turnaround allows for decisions regarding therapy to be made at the time of care. The currently approved UBT in the United States does not require fasting from solid food for longer than 1 hour.

Noninvasive alternatives to UBT include serology and stool antigen testing. Serology assays cannot discriminate between active and recent past infections. Stool antigen testing requires patient compliance with specimen collection and is a sendout test. In general, although studies using pretreatment stool antigen tests have shown sensitivity and specificity comparable to those of histology or UBT, it has become evident that there can be considerable lot-to-lot variation in stool antigen tests.22 The most likely explanation is that the polyclonal serum used for the capture antibody is obtained from rabbits and thus difficult to standardize.23 Stool antigen testing has also proven to be less reliable when used soon after the end of therapy, and it is now generally recommended that one must wait 6 or 8 weeks after therapy when using the stool antigen test to confirm eradication. For example, a recent study had a false negative rate of 12.5% (95% confidence interval, 1.5–33).24Recent recommendations are that the UBT is preferred where available.4

The US Food and Drug Administration recently cleared the UBiT-IR300 instrument for use with the commercial 13C-UBT. The costs of the UBT are but a fraction of those of endoscopy, not including indirect patient costs. Office-based testing has a separate reimbursement for testing, and overall the costs appear less than those of the stool antigen test. Economic impact studies comparing the tests are planned. Office-based infrared analysis for 13C makes near-patient or point-of-care UBT and analysis practical and should make accurate diagnosis of active H pylori infection readily available.

Acknowledgments

The authors thank the clinical study coordinators, including Ms Rebecca Garza, Ms Flora Godard, and Mr Zachary Patton for their conscientious efforts.

References

1. Levine TS, Price AB. Helicobacter pylori: enough to give anyone an ulcer. Br J Clin Pract 1993;47:328-32.

2. Staat MA, Kruszon-Moran D, McQuillan GM, Kaslow RA. A population-based survey of Helicobacter pylori infection in children and adolescents in the United States. J Infect Dis 1996;174:1120-3.

3. Opekun AR, Gilger MA, Denyes SM, et al. Helicobacter pylori infection in children of Texas. J Pediatr Gastroenterol Nutr 2000;31:405-10.

4. Malfertheiner P, Megraud F, O’Morain C, et al. Current concepts in the management of Helicobacter pylori infection—the Maastricht 2-2000 Consensus Report. Aliment Pharmacol Ther 2002;16:167-80.

5. Shiotani A, Nurgalieva ZZ, Yamaoka Y, Graham DY. Helicobacter pylori Med Clin North Am 2000;84:1125-36.

6. Meuer LN. Treatment of peptic ulcer disease and nonulcer dyspepsia. J Fam Pract 2001;50:614-9.

7. Greenberg PD, Koch J, Cello JP. Clinical utility of cost effectiveness of Helicobacter pylori testing for patients with duodenal and gastric ulcers. Am J Gastroenterol 1996;91:228-32.

8. Meurer LN, Bower DJ. Management of Helicobacter pylori infection. Am Fam Phys 2002;65:1327-36.

9. Megraud F. Diagnosis of Helicobacter pylori. Scand J Gastroenterol 1996;31(suppl):214-46.

10. Graham DY, Klein PD, Evans DJ, et al. Campylobacter pylori detected noninvasively by the 13C-urea breath test. Lancet 1987;1:1174-7.

11. Klein PD, Malaty HM, Martin RF, Graham KS, Genta RM, Graham DY. Noninvasive detection of Helicobacter pylori infection in clinical practice: the 13C urea breath test. Am J Gastroenterol 1996;91:690-4.

12. Graham DY, Klein PD. Accurate diagnosis of Helicobacter pylori: 13C-urea breath test. Gastroenterol Clin North Am 2000;29:885-93.

13. Graham DY, Runke D, Anderson-SY, Malaty HM, Klein PD. Citric acid as the test meal for the 13C-urea breath test. Am J Gastroenterol 1999;94:1214-7.

14. Braden B, Haisch M, Duan LP, Lembcke B, Caspary WF, Hering P. Clinically feasible stable isotope techniques at a reasonable price: analysis of 13CO2/12CO2-abundance in breath samples with a new isotope selective non-dispersive infrared spectrometer. Z Gastroenterol 1994;32:675-8.

15. Mion F, Ecochard R, Guitton J, Ponchon T. 13CO(2) breath tests: comparison of isotope-ratio mass spectrometry and non-dispersive infrared spectrometry results. Gastroenterol Clin Bio 2001;25:345-9.

16. Sheu BS, Lee SC, Yang HB, et al. Lower-dose 13C-urea breath test to detect Helicobacter pylori infection-comparison between infrared spectrometer and mass spectrometer. Aliment Pharmacol Ther 2000;10:1359-63.

17. Mansfield CD, Rutt HN. The application of infrared spectrometry to breath CO2 isotope-ratio measurements and the risk of spurious results. Phys Med Biol 1998;43:1225-39.

18. Ohara S, Kato M, Asaka M, Toyota T. The UbiT-100 13CO2 infrared analyzer: comparison between infrared spectrometric analysis and mass spectrometric analysis. Helicobacter 1998;3:49-53.

19. Savarino V, Mela GS, Zentilin P, et al. Comparison of isotope-ratio mass spectrometry and non-dispersive isotope-selective infrared spectroscopy for 13C-urea breath test. Am J Gastroenterol 1999;94:1203-8.

20. Goddard AF, Logan RPH. Review article: urea breath tests for detecting Helicobacter pylori. Aliment Pharmacol Ther 1997;11:641-9.

21. Goodwin CS, Mendall MM, Northfield TC. Helicobacter pylori infection. Lancet 1997;349:265-9.

22. Vaira D, Vakil N, Menegatti, et al. The stool antigen test for detection of Helicobacter pylori after eradication therapy. Ann Intern Med 2002;136:280-7.

23. Graham DY, Qureshi WA. Markers of infection. In: Mobley HLT, Mendz GL, Hazell SL, eds. Helicobacter pylori: physiology and genetics. Washington, DC: ASM Press; 2001;499-510.

24. Lopez Penas D, Naranjo Rodriguez A, Munoz Molinero J, et al. Efficacy of fecal detection of Helicobacter pylori with the HpSA technique in patients with upper digestive hemorrhage. Gastroenterol Hepatol 2001;24:5-8.

Article PDF
Author and Disclosure Information

ANTONE R. OPEKUN, PA-C
NAGEEB ABDALLA, MD
FRED M. SUTTON, MD
FADI HAMMOUD, MD
GRACE M. KUO, PHARMD
ELIZABETH TORRES, MD
JEFFREY STEINBAUER, MD
DAVID Y. GRAHAM, MD
Houston and Sugar Land, Texas
From the Departments of Medicine (A.R.O., F.H., D.Y.G.) and Family and Community Medicine (A.R.O., N.A., F.M.S., G.M., J.P.), Baylor College of Medicine and the Veterans Affairs Medical Center (A.R.O., F.H., D.Y.G.), Harris County Hospital District Facilities (A.R.O., N.A., F.M.S.), Houston, TX; and Premier Internal Medicine Associates, Sugar Land, TX (E.T.). This work was supported in part by Cambridge Isotopes, Inc, a Division of Otsuka Pharmaceuticals Ltd, Andover, MA; the Office of Research and Development, Medical Research Service, Department of Veterans Affairs; Public Health Service grants DK-53659 and DK56338; and funds from the Texas Gulf Coast Digestive Diseases Center. David Y. Graham, MD, is a consultant to Otsuka Pharmaceuticals. David Y. Graham, MD, and Antone R. Opekun, PA-C, receive royalties from the sales of the urea breath test in the United States. Address reprint requests to Antone R. Opekun, PA-C, Departments of Medicine and Family and Community Medicine, Baylor College of Medicine, 6550 Fannin Street (SM-1122), Houston, TX 77030-2399. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(12)
Publications
Page Number
1030-1032
Legacy Keywords
,Helicobacter pyloridiagnosispoint-of-care analysisclinical trial (J Fam Pract 2002; 51:1030–1033)
Sections
Author and Disclosure Information

ANTONE R. OPEKUN, PA-C
NAGEEB ABDALLA, MD
FRED M. SUTTON, MD
FADI HAMMOUD, MD
GRACE M. KUO, PHARMD
ELIZABETH TORRES, MD
JEFFREY STEINBAUER, MD
DAVID Y. GRAHAM, MD
Houston and Sugar Land, Texas
From the Departments of Medicine (A.R.O., F.H., D.Y.G.) and Family and Community Medicine (A.R.O., N.A., F.M.S., G.M., J.P.), Baylor College of Medicine and the Veterans Affairs Medical Center (A.R.O., F.H., D.Y.G.), Harris County Hospital District Facilities (A.R.O., N.A., F.M.S.), Houston, TX; and Premier Internal Medicine Associates, Sugar Land, TX (E.T.). This work was supported in part by Cambridge Isotopes, Inc, a Division of Otsuka Pharmaceuticals Ltd, Andover, MA; the Office of Research and Development, Medical Research Service, Department of Veterans Affairs; Public Health Service grants DK-53659 and DK56338; and funds from the Texas Gulf Coast Digestive Diseases Center. David Y. Graham, MD, is a consultant to Otsuka Pharmaceuticals. David Y. Graham, MD, and Antone R. Opekun, PA-C, receive royalties from the sales of the urea breath test in the United States. Address reprint requests to Antone R. Opekun, PA-C, Departments of Medicine and Family and Community Medicine, Baylor College of Medicine, 6550 Fannin Street (SM-1122), Houston, TX 77030-2399. E-mail: [email protected].

Author and Disclosure Information

ANTONE R. OPEKUN, PA-C
NAGEEB ABDALLA, MD
FRED M. SUTTON, MD
FADI HAMMOUD, MD
GRACE M. KUO, PHARMD
ELIZABETH TORRES, MD
JEFFREY STEINBAUER, MD
DAVID Y. GRAHAM, MD
Houston and Sugar Land, Texas
From the Departments of Medicine (A.R.O., F.H., D.Y.G.) and Family and Community Medicine (A.R.O., N.A., F.M.S., G.M., J.P.), Baylor College of Medicine and the Veterans Affairs Medical Center (A.R.O., F.H., D.Y.G.), Harris County Hospital District Facilities (A.R.O., N.A., F.M.S.), Houston, TX; and Premier Internal Medicine Associates, Sugar Land, TX (E.T.). This work was supported in part by Cambridge Isotopes, Inc, a Division of Otsuka Pharmaceuticals Ltd, Andover, MA; the Office of Research and Development, Medical Research Service, Department of Veterans Affairs; Public Health Service grants DK-53659 and DK56338; and funds from the Texas Gulf Coast Digestive Diseases Center. David Y. Graham, MD, is a consultant to Otsuka Pharmaceuticals. David Y. Graham, MD, and Antone R. Opekun, PA-C, receive royalties from the sales of the urea breath test in the United States. Address reprint requests to Antone R. Opekun, PA-C, Departments of Medicine and Family and Community Medicine, Baylor College of Medicine, 6550 Fannin Street (SM-1122), Houston, TX 77030-2399. E-mail: [email protected].

Article PDF
Article PDF

The13C-urea breath test provides accurate, noninvasive diagnosis of active Helicobacter pylori infection and can document posttherapy cure. This study evaluated point-of-care testing with onsite sample analysis with the use of a desktop infrared spectrophotometer. Ambulatory patients (N = 320) underwent 13C-urea breath testing, and breath samples were analyzed immediately by clinic staff with no prior breath testing experience. Duplicate samples were sent to a reference laboratory, and the results of both methods were compared. Point-of-care testing was simple, with an overall agreement of 99.1%. Accurate near-patient 13C-urea breath testing is now practical in the primary care setting even when done by inexperienced personnel.

Helicobacter pylori infection is etiologically associated with chronic gastritis, peptic ulcer disease, and gastric cancer.1-5Helicobacter pylori infection is also a consideration in the evaluation of uninvestigated or undifferentiated dyspepsia.4-7 The steps in the management of H pylori infection include diagnosis, choice of appropriate therapy, and confirmation of cure.4,5,8 Diagnosis and confirmation of cure require diagnostic testing, and the recent consensus has been in the direction of noninvasive testing.4,5

The urea breath test (UBT) is generally considered the clinical, gold standard, noninvasive test for detection of active H pylori infection.9 The intragastric hydrolysis of orally administered urea by H pylori urease produces a change in the isotopic ratio (13CO2/12CO2) in the breath.10 The 13C-UBT has been approved by the US Food and Drug Administration for pre- and posttherapy testing.11,12

Recently the 13C-UBT has been shortened and simplified by the use of a citric acid test meal and elimination of the requirement of a 4-hour fast.13 Use of infrared mass spectrophotometry for measuring 13CO2 enrichment in breath samples makes it theoretically possible to do office-based breath testing with onsite analysis.14-16 Infrared spectrophotometry has been shown in comparative studies to be an accurate method for assessing 13CO2 enrichment in breath.17-19 No comparative studies of the 2 analytical methods have been done in the United States.

This study examined the utility of UBT testing by comparing infrared spectrophotometry with traditional gas isotope ratio mass spectrometry in the primary care environment. Our hypothesis was that the results obtained from the primary care clinics would be as accurate as those obtained from the commercial laboratory or the more experienced hospital-based clinical laboratory.

Methods

This was a multicenter, prospective study designed to compare a new infrared spectrophotometer (UBiTIR300, Otsuka Pharmaceuticals, Tokyo, Japan) for measuring 13CO2 enrichment in breath with gas isotope ratio mass spectrometry (ABCA, Europa Scientific, Cheshire, UK). Subjects were recruited at the offices or clinics of 4 physicians’ including an indigent care primary care clinic, a hospital-based gastroenterology clinic, a private practice internal medicine office, an academic family medicine clinic, and a tertiary care clinical laboratory site. The study was done between July and September 2001. Consecutive patients were enrolled in the study if they expressed interest in participating and met the inclusion and exclusion criteria. Each site was provided with an infrared spectrophotometer, a breath gas transfer device, and commercially available UBT breath test collection kits (Meretek Diagnostics, Nashville, TN). Each kit contained a 13C-urea solution (125 mg in 75 mL of water), test meal pudding, and a specimen return box containing 4 bar-coded, evacuated 10-mL sample tubes. Before enrollment, all study personnel received approximately 1 hour of training in the performance of the test and use of the equipment.

Study procedures

Subjects included in the study were medically stable, ambulatory patients between 18 and 75 years old who were asymptomatic or experiencing dyspepsia. Potential subjects were excluded from study participation if they took bismuth preparations, antibiotics (ie, amoxicillin, tetracycline, metronidazole, clarithromycin, or azithromycin) or any anti-ulcer medication in dosages indicated for ulcer disease (ie, proton pump inhibitors, type 2 histamine blockers, or misoprostol) within 2 weeks before the study breath test. Exclusion criteria also included participation in a drug study within 4 weeks, treatment for eradication of H pylori within 4 weeks of the study breath test, or a history of gastric surgery or vagotomy for ulcer disease except simple closure of a gastric perforation.

The protocol was approved by the local Institutional Review Board for Human Studies, and all subjects provided written informed consent.

Testing began with a minimum 4-hour fast from solid food. Breath samples were obtained in disposable, balloonlike, breath collection bags designed for use with the infrared spectrophotometer. One sample was obtained immediately before ingestion of the 13C-urea test solution, and the second was collected 30 minutes after substrate ingestion. Paired sample aliquots were taken for separate analyses. Results from local infrared instruments were blinded to the central laboratory.

 

 

Statistical analyses

Each primary care site was asked to enroll subjects until 80 positive cases were identified from among all sites. The clinical laboratory site tested a minimum of 30 positive and 30 negative cases. The primary endpoint was the percentage of agreement (overall and within positive and negative cases separately) of results from both methods. Delta-over-baseline (DOB) enrichment values below 2.4 per mil were deemed negative and values greater than or equal to 2.4 per mil were deemed positive. The predicate reference method was gas isotope ratio mass spectrometry. Equivalence was defined by the percentage of agreement for positive and negative cases based on gas isotope ratio mass spectrometry results of at least 95%, and the lower limit of the 95% confidence interval was based on a percentage of agreement of at least 90% for positive cases.

Results

The primary care centers enrolled 258 subjects and the clinical laboratory enrolled 64 subjects, for a total enrollment of 322. The subjects’ mean age was 41.5 years (range, 18–70 years), with 88 black non-Hispanics, 106 Hispanics, 92 whites, 32 Asian/Pacific Islanders, and 4 other ethnic groups. There were 215 women and 107 men. Approximately 18% had active or previous gastrointestinal ailments, including previously diagnosed H pylori infection (11%) and peptic ulcer disease (3%).

There was excellent agreement between methods (Table), with an overall agreement of 99% (95% confidence interval, 97.3–99.7). Two subjects were excluded from the analysis of the primary and secondary endpoints because 1 or both assay values were missing. The data showed close correlation between methods among all sites (Figure). Evaluation by the personnel who performed the tests and analyses indicated that the office procedure was easy and nonintrusive.

There were 3 disagreements between the results obtained with the devices, 1 from the gastroenterology site and 2 from the clinical laboratory site. All results were near the cutoff value.

TABLE
Comparison of results between IRMS and GIRMS*

 GIRMS, %
IRMS, %NegativePositiveTotal
Negative1115116
Positive2022204
Total203117320
*Overall agreement, 99.1% (95% confidence interval, 97.3–99.7); positive agreement, 98.2% (95% confidence interval, 94.2–99.7); negative agreement, 99.5% (95% confidence interval, 97.3–99.9).
Kappa statistic = 0.98.
GIRMS, gas isotope ratio mass spectrometry; IRMS, infrared mass spectrophotometry.

FIGURE
Comparison of results between 2 methods of analysis of 13CO2 breath sample enrichments

Discussion

13C-urea breath testing is an accurate diagnostic method for the detection of active H pylori infection12,19-21 and point-of-care assessment of curative therapy. This study confirmed the hypothesis that the infrared instrument is an easy to operate alternative to the original sendout analyses. Rapid turnaround allows for decisions regarding therapy to be made at the time of care. The currently approved UBT in the United States does not require fasting from solid food for longer than 1 hour.

Noninvasive alternatives to UBT include serology and stool antigen testing. Serology assays cannot discriminate between active and recent past infections. Stool antigen testing requires patient compliance with specimen collection and is a sendout test. In general, although studies using pretreatment stool antigen tests have shown sensitivity and specificity comparable to those of histology or UBT, it has become evident that there can be considerable lot-to-lot variation in stool antigen tests.22 The most likely explanation is that the polyclonal serum used for the capture antibody is obtained from rabbits and thus difficult to standardize.23 Stool antigen testing has also proven to be less reliable when used soon after the end of therapy, and it is now generally recommended that one must wait 6 or 8 weeks after therapy when using the stool antigen test to confirm eradication. For example, a recent study had a false negative rate of 12.5% (95% confidence interval, 1.5–33).24Recent recommendations are that the UBT is preferred where available.4

The US Food and Drug Administration recently cleared the UBiT-IR300 instrument for use with the commercial 13C-UBT. The costs of the UBT are but a fraction of those of endoscopy, not including indirect patient costs. Office-based testing has a separate reimbursement for testing, and overall the costs appear less than those of the stool antigen test. Economic impact studies comparing the tests are planned. Office-based infrared analysis for 13C makes near-patient or point-of-care UBT and analysis practical and should make accurate diagnosis of active H pylori infection readily available.

Acknowledgments

The authors thank the clinical study coordinators, including Ms Rebecca Garza, Ms Flora Godard, and Mr Zachary Patton for their conscientious efforts.

The13C-urea breath test provides accurate, noninvasive diagnosis of active Helicobacter pylori infection and can document posttherapy cure. This study evaluated point-of-care testing with onsite sample analysis with the use of a desktop infrared spectrophotometer. Ambulatory patients (N = 320) underwent 13C-urea breath testing, and breath samples were analyzed immediately by clinic staff with no prior breath testing experience. Duplicate samples were sent to a reference laboratory, and the results of both methods were compared. Point-of-care testing was simple, with an overall agreement of 99.1%. Accurate near-patient 13C-urea breath testing is now practical in the primary care setting even when done by inexperienced personnel.

Helicobacter pylori infection is etiologically associated with chronic gastritis, peptic ulcer disease, and gastric cancer.1-5Helicobacter pylori infection is also a consideration in the evaluation of uninvestigated or undifferentiated dyspepsia.4-7 The steps in the management of H pylori infection include diagnosis, choice of appropriate therapy, and confirmation of cure.4,5,8 Diagnosis and confirmation of cure require diagnostic testing, and the recent consensus has been in the direction of noninvasive testing.4,5

The urea breath test (UBT) is generally considered the clinical, gold standard, noninvasive test for detection of active H pylori infection.9 The intragastric hydrolysis of orally administered urea by H pylori urease produces a change in the isotopic ratio (13CO2/12CO2) in the breath.10 The 13C-UBT has been approved by the US Food and Drug Administration for pre- and posttherapy testing.11,12

Recently the 13C-UBT has been shortened and simplified by the use of a citric acid test meal and elimination of the requirement of a 4-hour fast.13 Use of infrared mass spectrophotometry for measuring 13CO2 enrichment in breath samples makes it theoretically possible to do office-based breath testing with onsite analysis.14-16 Infrared spectrophotometry has been shown in comparative studies to be an accurate method for assessing 13CO2 enrichment in breath.17-19 No comparative studies of the 2 analytical methods have been done in the United States.

This study examined the utility of UBT testing by comparing infrared spectrophotometry with traditional gas isotope ratio mass spectrometry in the primary care environment. Our hypothesis was that the results obtained from the primary care clinics would be as accurate as those obtained from the commercial laboratory or the more experienced hospital-based clinical laboratory.

Methods

This was a multicenter, prospective study designed to compare a new infrared spectrophotometer (UBiTIR300, Otsuka Pharmaceuticals, Tokyo, Japan) for measuring 13CO2 enrichment in breath with gas isotope ratio mass spectrometry (ABCA, Europa Scientific, Cheshire, UK). Subjects were recruited at the offices or clinics of 4 physicians’ including an indigent care primary care clinic, a hospital-based gastroenterology clinic, a private practice internal medicine office, an academic family medicine clinic, and a tertiary care clinical laboratory site. The study was done between July and September 2001. Consecutive patients were enrolled in the study if they expressed interest in participating and met the inclusion and exclusion criteria. Each site was provided with an infrared spectrophotometer, a breath gas transfer device, and commercially available UBT breath test collection kits (Meretek Diagnostics, Nashville, TN). Each kit contained a 13C-urea solution (125 mg in 75 mL of water), test meal pudding, and a specimen return box containing 4 bar-coded, evacuated 10-mL sample tubes. Before enrollment, all study personnel received approximately 1 hour of training in the performance of the test and use of the equipment.

Study procedures

Subjects included in the study were medically stable, ambulatory patients between 18 and 75 years old who were asymptomatic or experiencing dyspepsia. Potential subjects were excluded from study participation if they took bismuth preparations, antibiotics (ie, amoxicillin, tetracycline, metronidazole, clarithromycin, or azithromycin) or any anti-ulcer medication in dosages indicated for ulcer disease (ie, proton pump inhibitors, type 2 histamine blockers, or misoprostol) within 2 weeks before the study breath test. Exclusion criteria also included participation in a drug study within 4 weeks, treatment for eradication of H pylori within 4 weeks of the study breath test, or a history of gastric surgery or vagotomy for ulcer disease except simple closure of a gastric perforation.

The protocol was approved by the local Institutional Review Board for Human Studies, and all subjects provided written informed consent.

Testing began with a minimum 4-hour fast from solid food. Breath samples were obtained in disposable, balloonlike, breath collection bags designed for use with the infrared spectrophotometer. One sample was obtained immediately before ingestion of the 13C-urea test solution, and the second was collected 30 minutes after substrate ingestion. Paired sample aliquots were taken for separate analyses. Results from local infrared instruments were blinded to the central laboratory.

 

 

Statistical analyses

Each primary care site was asked to enroll subjects until 80 positive cases were identified from among all sites. The clinical laboratory site tested a minimum of 30 positive and 30 negative cases. The primary endpoint was the percentage of agreement (overall and within positive and negative cases separately) of results from both methods. Delta-over-baseline (DOB) enrichment values below 2.4 per mil were deemed negative and values greater than or equal to 2.4 per mil were deemed positive. The predicate reference method was gas isotope ratio mass spectrometry. Equivalence was defined by the percentage of agreement for positive and negative cases based on gas isotope ratio mass spectrometry results of at least 95%, and the lower limit of the 95% confidence interval was based on a percentage of agreement of at least 90% for positive cases.

Results

The primary care centers enrolled 258 subjects and the clinical laboratory enrolled 64 subjects, for a total enrollment of 322. The subjects’ mean age was 41.5 years (range, 18–70 years), with 88 black non-Hispanics, 106 Hispanics, 92 whites, 32 Asian/Pacific Islanders, and 4 other ethnic groups. There were 215 women and 107 men. Approximately 18% had active or previous gastrointestinal ailments, including previously diagnosed H pylori infection (11%) and peptic ulcer disease (3%).

There was excellent agreement between methods (Table), with an overall agreement of 99% (95% confidence interval, 97.3–99.7). Two subjects were excluded from the analysis of the primary and secondary endpoints because 1 or both assay values were missing. The data showed close correlation between methods among all sites (Figure). Evaluation by the personnel who performed the tests and analyses indicated that the office procedure was easy and nonintrusive.

There were 3 disagreements between the results obtained with the devices, 1 from the gastroenterology site and 2 from the clinical laboratory site. All results were near the cutoff value.

TABLE
Comparison of results between IRMS and GIRMS*

 GIRMS, %
IRMS, %NegativePositiveTotal
Negative1115116
Positive2022204
Total203117320
*Overall agreement, 99.1% (95% confidence interval, 97.3–99.7); positive agreement, 98.2% (95% confidence interval, 94.2–99.7); negative agreement, 99.5% (95% confidence interval, 97.3–99.9).
Kappa statistic = 0.98.
GIRMS, gas isotope ratio mass spectrometry; IRMS, infrared mass spectrophotometry.

FIGURE
Comparison of results between 2 methods of analysis of 13CO2 breath sample enrichments

Discussion

13C-urea breath testing is an accurate diagnostic method for the detection of active H pylori infection12,19-21 and point-of-care assessment of curative therapy. This study confirmed the hypothesis that the infrared instrument is an easy to operate alternative to the original sendout analyses. Rapid turnaround allows for decisions regarding therapy to be made at the time of care. The currently approved UBT in the United States does not require fasting from solid food for longer than 1 hour.

Noninvasive alternatives to UBT include serology and stool antigen testing. Serology assays cannot discriminate between active and recent past infections. Stool antigen testing requires patient compliance with specimen collection and is a sendout test. In general, although studies using pretreatment stool antigen tests have shown sensitivity and specificity comparable to those of histology or UBT, it has become evident that there can be considerable lot-to-lot variation in stool antigen tests.22 The most likely explanation is that the polyclonal serum used for the capture antibody is obtained from rabbits and thus difficult to standardize.23 Stool antigen testing has also proven to be less reliable when used soon after the end of therapy, and it is now generally recommended that one must wait 6 or 8 weeks after therapy when using the stool antigen test to confirm eradication. For example, a recent study had a false negative rate of 12.5% (95% confidence interval, 1.5–33).24Recent recommendations are that the UBT is preferred where available.4

The US Food and Drug Administration recently cleared the UBiT-IR300 instrument for use with the commercial 13C-UBT. The costs of the UBT are but a fraction of those of endoscopy, not including indirect patient costs. Office-based testing has a separate reimbursement for testing, and overall the costs appear less than those of the stool antigen test. Economic impact studies comparing the tests are planned. Office-based infrared analysis for 13C makes near-patient or point-of-care UBT and analysis practical and should make accurate diagnosis of active H pylori infection readily available.

Acknowledgments

The authors thank the clinical study coordinators, including Ms Rebecca Garza, Ms Flora Godard, and Mr Zachary Patton for their conscientious efforts.

References

1. Levine TS, Price AB. Helicobacter pylori: enough to give anyone an ulcer. Br J Clin Pract 1993;47:328-32.

2. Staat MA, Kruszon-Moran D, McQuillan GM, Kaslow RA. A population-based survey of Helicobacter pylori infection in children and adolescents in the United States. J Infect Dis 1996;174:1120-3.

3. Opekun AR, Gilger MA, Denyes SM, et al. Helicobacter pylori infection in children of Texas. J Pediatr Gastroenterol Nutr 2000;31:405-10.

4. Malfertheiner P, Megraud F, O’Morain C, et al. Current concepts in the management of Helicobacter pylori infection—the Maastricht 2-2000 Consensus Report. Aliment Pharmacol Ther 2002;16:167-80.

5. Shiotani A, Nurgalieva ZZ, Yamaoka Y, Graham DY. Helicobacter pylori Med Clin North Am 2000;84:1125-36.

6. Meuer LN. Treatment of peptic ulcer disease and nonulcer dyspepsia. J Fam Pract 2001;50:614-9.

7. Greenberg PD, Koch J, Cello JP. Clinical utility of cost effectiveness of Helicobacter pylori testing for patients with duodenal and gastric ulcers. Am J Gastroenterol 1996;91:228-32.

8. Meurer LN, Bower DJ. Management of Helicobacter pylori infection. Am Fam Phys 2002;65:1327-36.

9. Megraud F. Diagnosis of Helicobacter pylori. Scand J Gastroenterol 1996;31(suppl):214-46.

10. Graham DY, Klein PD, Evans DJ, et al. Campylobacter pylori detected noninvasively by the 13C-urea breath test. Lancet 1987;1:1174-7.

11. Klein PD, Malaty HM, Martin RF, Graham KS, Genta RM, Graham DY. Noninvasive detection of Helicobacter pylori infection in clinical practice: the 13C urea breath test. Am J Gastroenterol 1996;91:690-4.

12. Graham DY, Klein PD. Accurate diagnosis of Helicobacter pylori: 13C-urea breath test. Gastroenterol Clin North Am 2000;29:885-93.

13. Graham DY, Runke D, Anderson-SY, Malaty HM, Klein PD. Citric acid as the test meal for the 13C-urea breath test. Am J Gastroenterol 1999;94:1214-7.

14. Braden B, Haisch M, Duan LP, Lembcke B, Caspary WF, Hering P. Clinically feasible stable isotope techniques at a reasonable price: analysis of 13CO2/12CO2-abundance in breath samples with a new isotope selective non-dispersive infrared spectrometer. Z Gastroenterol 1994;32:675-8.

15. Mion F, Ecochard R, Guitton J, Ponchon T. 13CO(2) breath tests: comparison of isotope-ratio mass spectrometry and non-dispersive infrared spectrometry results. Gastroenterol Clin Bio 2001;25:345-9.

16. Sheu BS, Lee SC, Yang HB, et al. Lower-dose 13C-urea breath test to detect Helicobacter pylori infection-comparison between infrared spectrometer and mass spectrometer. Aliment Pharmacol Ther 2000;10:1359-63.

17. Mansfield CD, Rutt HN. The application of infrared spectrometry to breath CO2 isotope-ratio measurements and the risk of spurious results. Phys Med Biol 1998;43:1225-39.

18. Ohara S, Kato M, Asaka M, Toyota T. The UbiT-100 13CO2 infrared analyzer: comparison between infrared spectrometric analysis and mass spectrometric analysis. Helicobacter 1998;3:49-53.

19. Savarino V, Mela GS, Zentilin P, et al. Comparison of isotope-ratio mass spectrometry and non-dispersive isotope-selective infrared spectroscopy for 13C-urea breath test. Am J Gastroenterol 1999;94:1203-8.

20. Goddard AF, Logan RPH. Review article: urea breath tests for detecting Helicobacter pylori. Aliment Pharmacol Ther 1997;11:641-9.

21. Goodwin CS, Mendall MM, Northfield TC. Helicobacter pylori infection. Lancet 1997;349:265-9.

22. Vaira D, Vakil N, Menegatti, et al. The stool antigen test for detection of Helicobacter pylori after eradication therapy. Ann Intern Med 2002;136:280-7.

23. Graham DY, Qureshi WA. Markers of infection. In: Mobley HLT, Mendz GL, Hazell SL, eds. Helicobacter pylori: physiology and genetics. Washington, DC: ASM Press; 2001;499-510.

24. Lopez Penas D, Naranjo Rodriguez A, Munoz Molinero J, et al. Efficacy of fecal detection of Helicobacter pylori with the HpSA technique in patients with upper digestive hemorrhage. Gastroenterol Hepatol 2001;24:5-8.

References

1. Levine TS, Price AB. Helicobacter pylori: enough to give anyone an ulcer. Br J Clin Pract 1993;47:328-32.

2. Staat MA, Kruszon-Moran D, McQuillan GM, Kaslow RA. A population-based survey of Helicobacter pylori infection in children and adolescents in the United States. J Infect Dis 1996;174:1120-3.

3. Opekun AR, Gilger MA, Denyes SM, et al. Helicobacter pylori infection in children of Texas. J Pediatr Gastroenterol Nutr 2000;31:405-10.

4. Malfertheiner P, Megraud F, O’Morain C, et al. Current concepts in the management of Helicobacter pylori infection—the Maastricht 2-2000 Consensus Report. Aliment Pharmacol Ther 2002;16:167-80.

5. Shiotani A, Nurgalieva ZZ, Yamaoka Y, Graham DY. Helicobacter pylori Med Clin North Am 2000;84:1125-36.

6. Meuer LN. Treatment of peptic ulcer disease and nonulcer dyspepsia. J Fam Pract 2001;50:614-9.

7. Greenberg PD, Koch J, Cello JP. Clinical utility of cost effectiveness of Helicobacter pylori testing for patients with duodenal and gastric ulcers. Am J Gastroenterol 1996;91:228-32.

8. Meurer LN, Bower DJ. Management of Helicobacter pylori infection. Am Fam Phys 2002;65:1327-36.

9. Megraud F. Diagnosis of Helicobacter pylori. Scand J Gastroenterol 1996;31(suppl):214-46.

10. Graham DY, Klein PD, Evans DJ, et al. Campylobacter pylori detected noninvasively by the 13C-urea breath test. Lancet 1987;1:1174-7.

11. Klein PD, Malaty HM, Martin RF, Graham KS, Genta RM, Graham DY. Noninvasive detection of Helicobacter pylori infection in clinical practice: the 13C urea breath test. Am J Gastroenterol 1996;91:690-4.

12. Graham DY, Klein PD. Accurate diagnosis of Helicobacter pylori: 13C-urea breath test. Gastroenterol Clin North Am 2000;29:885-93.

13. Graham DY, Runke D, Anderson-SY, Malaty HM, Klein PD. Citric acid as the test meal for the 13C-urea breath test. Am J Gastroenterol 1999;94:1214-7.

14. Braden B, Haisch M, Duan LP, Lembcke B, Caspary WF, Hering P. Clinically feasible stable isotope techniques at a reasonable price: analysis of 13CO2/12CO2-abundance in breath samples with a new isotope selective non-dispersive infrared spectrometer. Z Gastroenterol 1994;32:675-8.

15. Mion F, Ecochard R, Guitton J, Ponchon T. 13CO(2) breath tests: comparison of isotope-ratio mass spectrometry and non-dispersive infrared spectrometry results. Gastroenterol Clin Bio 2001;25:345-9.

16. Sheu BS, Lee SC, Yang HB, et al. Lower-dose 13C-urea breath test to detect Helicobacter pylori infection-comparison between infrared spectrometer and mass spectrometer. Aliment Pharmacol Ther 2000;10:1359-63.

17. Mansfield CD, Rutt HN. The application of infrared spectrometry to breath CO2 isotope-ratio measurements and the risk of spurious results. Phys Med Biol 1998;43:1225-39.

18. Ohara S, Kato M, Asaka M, Toyota T. The UbiT-100 13CO2 infrared analyzer: comparison between infrared spectrometric analysis and mass spectrometric analysis. Helicobacter 1998;3:49-53.

19. Savarino V, Mela GS, Zentilin P, et al. Comparison of isotope-ratio mass spectrometry and non-dispersive isotope-selective infrared spectroscopy for 13C-urea breath test. Am J Gastroenterol 1999;94:1203-8.

20. Goddard AF, Logan RPH. Review article: urea breath tests for detecting Helicobacter pylori. Aliment Pharmacol Ther 1997;11:641-9.

21. Goodwin CS, Mendall MM, Northfield TC. Helicobacter pylori infection. Lancet 1997;349:265-9.

22. Vaira D, Vakil N, Menegatti, et al. The stool antigen test for detection of Helicobacter pylori after eradication therapy. Ann Intern Med 2002;136:280-7.

23. Graham DY, Qureshi WA. Markers of infection. In: Mobley HLT, Mendz GL, Hazell SL, eds. Helicobacter pylori: physiology and genetics. Washington, DC: ASM Press; 2001;499-510.

24. Lopez Penas D, Naranjo Rodriguez A, Munoz Molinero J, et al. Efficacy of fecal detection of Helicobacter pylori with the HpSA technique in patients with upper digestive hemorrhage. Gastroenterol Hepatol 2001;24:5-8.

Issue
The Journal of Family Practice - 51(12)
Issue
The Journal of Family Practice - 51(12)
Page Number
1030-1032
Page Number
1030-1032
Publications
Publications
Article Type
Display Headline
Urea breath testing and analysis in the primary care office
Display Headline
Urea breath testing and analysis in the primary care office
Legacy Keywords
,Helicobacter pyloridiagnosispoint-of-care analysisclinical trial (J Fam Pract 2002; 51:1030–1033)
Legacy Keywords
,Helicobacter pyloridiagnosispoint-of-care analysisclinical trial (J Fam Pract 2002; 51:1030–1033)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Prevalence of overactive bladder and urinary incontinence

Article Type
Changed
Mon, 01/14/2019 - 11:36
Display Headline
Prevalence of overactive bladder and urinary incontinence

We conducted a cross-sectional study in Italy among men at least 50 years old and women at least 40 years old who consecutively visited their general practitioners. Patients were asked about the frequency of symptoms of overactive bladder and urinary incontinence. A total of 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years) were identified by 774 general practitioners. The frequencies of overactive bladder were 3.0% (95% confidence interval, 2.7–3.5) in men and 1.1% (95% confidence interval, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% confidence interval, 7.7–8.9) in men and 10.2% (95% confidence interval, 9.6–10.8) in women.

The reported prevalence of urinary incontinence ranges from 10% to 50%.11-9 The differences can be explained in part by the age distribution of the populations considered, with higher rates in studies including more older subjects.2,10 Further, some studies were conducted in selected populations.7,11

With regard to diagnosis, although urinary incontinence should be an objectively proven condition of involuntary loss of urine, the criteria for its definition vary.12 Differences also have been reported in the frequency of various types of urinary incontinence (urge, stress, and mixed incontinences),11,13-16 and data are scant on the frequency of overactive bladder without urinary incontinence.16 We therefore conducted a study in Italian men at least 50 years old and women at least 40 years old to determine the prevalence of overactive bladder and various types of urinary incontinence.14

Materials And Methods

Eligible subjects were consecutive men at least 50 years old and women at least 40 years old who asked to be seen by their general practitioners during the study period. There were no exclusion criteria. Age cutoffs were chosen because of the very low rate of urinary incontinence in men younger than 50 years and in women younger than 40 years.2

Participating general practitioners were invited to take part in the study on the basis of lists of general practitioners specifically interested in epidemiologic studies and affiliated with the Gruppo Interdisciplinare di Studio Incontinenza Urinaria (Interdisciplinary Group for the Study of Incontinence). Each general practitioner could decide when to start recruitment (generally soon after agreeing to participate) and stopped when 50 cases had been identified, or after 5 days. The motivation for participating was purely an interest in the data collection. All participating physicians were general practitioners, without any specific interest in urogynecological problems.

We obtained demographic information from each subject. The subjects were then asked: Have you had any involuntary urinary loss during the past 3 months? On average, do you urinate more than 8 times a day and/or more than once during the night? Have you any urgency symptoms? If the subject answered yes to the first question, that subject was defined as having urinary incontinence; if the subject answered yes to the second and third questions, that subject was considered to have overactive bladder.

Subjects with incontinence were interviewed further with the questionnaire proposed by Wein and colleagues17,18 for diagnosis of type of urinary incontinence: stress incontinence, mixed incontinence, or urge incontinence. The questionnaire provides a presumptive diagnosis of stress and urge incontinence on the basis of the presence or absence of the following symptoms: urgency, frequency with urgency, leaking during physical activity, amount of urinary leakage with each episode of incontinence, ability to reach the toilet in time after an urge to void, and nocturia. The criteria for overactive bladder without urinary incontinence (ie, patients reporting an urge to urinate more than 8 times a day and/or more than once during the night and/or with urgency symptoms) and various types of urinary incontinence were the criteria accepted by the main Italian research groups on urinary incontinence.

Informed consent was obtained from each subject. Participation in the study did not commit the patient to any instrumental examination or laboratory tests. Confidence intervals (CI) of the estimated percentages of frequency of urinary incontinence were based on the Poisson approximation. Statistical differences in the frequency of urinary incontinence among strata of age and sex were analyzed with the standard chi square test comparing observed and expected events and, when appropriate, using the test for trend.

Results

A total of 9858 men and 13,671 women were identified by 774 general practitioners, representing approximately 1.5% of all Italian general practitioners. Of those, 26.6% were in northern Italy, 17.4% were in central Italy, and 56.0% were in southern Italy. Each general practitioner identified a mean of 30 subjects (range, 15–50). A total of 245 men (2.5%) and 306 women (2.2%) refused to enter the study. Thus the present report included information on 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years).

 

 

The frequencies of overactive bladder without incontinence were 3.0% (95% CI, 2.7–3.5) in men and 1.1% (95% CI, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% CI, 7.7–8.9) in men and 10.2% (95% CI, 9.6–10.8) in women (Table 1).

The frequencies of overactive bladder and urinary incontinence increased with age in both sexes. For example, the frequencies of overactive bladder were 2.0% and 1.1% in men and women 51 to 60 years but 3.4% and 1.4% in subjects older than 70 years. The frequency of urinary incontinence also increased with age; this trend was statistically significant (chi square with 1 df, adjusted for sex, P < .05).

Table 2 shows the distribution of subjects with urinary problems stratified by sex, age, and type of problem. Mixed incontinence was the most frequent condition in men older than 60 years, and overactive bladder was more frequent in younger men. The relative frequency of stress incontinence tended to decrease with age. In women, stress incontinence and mixed incontinence were the most common causes of urinary incontinence in all age strata. The frequency of stress incontinence decreased with age, whereas that of mixed incontinence increased.

TABLE 1
Frequency of overactive bladder and urinary incontinence according to sex and age

Age, yMen*Women*
No urinary problemOB onlyUINo urinary problemOB onlyUI
40–50   3122 (93.6)23 (0.7)190 (5.7)
51–603160 (93.8)69 (2.0)139 (4.1)323 (90.6)38 (1.1)299 (8.4)
61–703071 (88.8)116 (3.4)273 (7.9)3180 (87.5)52 (1.4)403 (11.1)
>702298 (82.5)100 (3.4)387 (13.6)2320 (82.2)39 (1.4)463 (16.4)
Total8529 (88.7)285 (3.0)799 (8.3)11858 (88.7)150 (1.1)1358 (10.2)
*Data are presented as number (%) of subjects.OB, overactive bladder; UI, urinary incontinence.

TABLE 2
Frequency of overactive bladder and various types of urinary incontinence in strata by sex and age*

Age, yMen†Women†
OB only UGISIMXIOB onlyUGISIMXI
40–50    23 (11.2)31 (15.1)93 (45.4)58 (28.3)
51–6069 (35.5)30 (16.4)33 (18.0)55 (30.1)38 (11.4)59 (17.8)127 (38.3)108 (32.5)
61–70116 (33.1)63 (18.0)37 (10.6)134 (38.3)52 (11.9)72 (16.5)169 (37.8)147 (33.7)
>70100 (24.1)69 (15.7)52 (12.5)198 (47.7)37 (7.6)89 (18.3)143 (29.4)217 (44.7)
Total285 (29.7)162 (16.9)122 (12.7)391 (40.7)150 (10.3)251 (17.2)532 (36.4)530 (36.2)
*These totals are not the same as those in Table 1 due to missing values of subjects with OB or urinary incontinence.
†Data are presented as number (%) of subjects.
MXI, mixed incontinence; OB, overactive bladder; SI, stress incontinence; UGI, urge incontinence.

Discussion

Before discussing the results, the study limitations must be considered. The study population consisted of men at least 50 years and women at least 40 years identified among patients who asked to be seen by their general practitioners during the study period, but not among all patients registered with these physicians. The general practitioners were not randomly identified among all Italian general practitioners, so their patients cannot be formally considered representative of the Italian population. Nevertheless, general practitioners participating in this study were placed throughout the main areas of the country. The strengths of the study included the opportunity to analyze the prevalence of urinary incontinence and overactive bladder in a large series of subjects with the use of standard methods for recording the symptoms and data collection. Further, the interview was conducted by physicians well known to the subjects, which should increase the reliability of diagnosis, particularly of various types of urinary incontinence.

The limitations of a patient’s history in the diagnosis of type of urinary incontinence are widely recognized. In a review of the literature, a clinical history indicating stress or urge incontinence, when compared with a urodynamically based diagnosis, showed sensitivities of 0.9 and 0.4 and specificities of 0.5 and 0.6, respectively, in clinical studies.19 In epidemiologic studies with self-reported information, these values might be lower. Any misclassification of the type of incontinence would tend to reduce the differences in the frequency of different types.

The frequency of urinary incontinence in this population was consistent with that reported in studies conducted in European and North American areas and in Italy in the general population. For example, the prevalences of urinary incontinence in women 50 to 60 years were approximately 18% in Denmark20 and 12% to 17% in the United Kingdom.1,4 An Italian study of 2767 women found a prevalence of 11.8% for urinary incontinence in women 51 to 60 years old.

In contrast, the rates of urinary incontinence reported in this study were slightly lower than those reported by Lagace and colleagues in a similar ambulatory setting in the United States.6 They found prevalences of 35% for urinary incontinence in women 50 to 59 years and 5% in men in the same age group.

Among men the prevalence of urinary incontinence was similar to or slightly lower than that reported in the general population.1 In the study by Bortolotti and associates, the overall prevalence of urinary incontinence in men older than 50 years was 2.3%, a figure somewhat lower than in the present study.14

 

 

An interesting finding of this study was the prevalence of overactive bladder without urinary incontinence, a condition rarely analyzed. It appears less frequently than urinary incontinence, particularly in women, and causes approximately 30% of urinary problems in men but only 10% in women. However, considering the frequencies of overactive bladder without urinary incontinence and urge incontinence, the conditions due to detrusor overactivity likely account for a large proportion of urinary problems.

In conclusion, this study, in a large number of subjects, provided an estimate of the prevalence of overactive bladder and urinary incontinence among people attending their general practitioners in Italy. It also emphasized the importance of detrusor overactivity-related conditions as a cause of urinary problems in this population, in both sexes and all ages.

MEMBERS OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA

Walter Artibani, Cattedra di Urologia, Università di Verona; Francesco Benvenuti, Unità Operativa di Geriatria, Ospedale INRCA c/o Presidio IOT, Firenze; Roberto Carone, Divisione di Urologia, Centro Rieducazionale Funzionale, Torino; Francesco Catanzaro, Divisione di Urologia, Multimedica, Sesto San Giovanni (MI); Claudio Cricelli, SIMG, Firenze; Paolo Di Benedetto, Centro di Riabilitazione, Ospedale Santoro, Trieste; Vincenzo Giambanco, Divisione II di Ostetricia e Ginecologia, Ospedale Civico, Palermo; Gian Battista Massi, Clinica Ostetrica e Ginecologica, Università di Firenze; Rodolfo Milani, Divisione di Ginecologia, Ospedale Bassini, Cinisello Balsamo (MI); and Alberto Zanollo, Divisione di Urologia, Ospedale Civile, Magenta, Milano.

References

1. Thomas TM, Plymat KR, Blannin J, Meade TW. Prevalence of urinary incontinence. BMJ 1980;281:1243-5.

2. Parazzini F, Colli E, Origgi G, et al. Risk factors for urinary incontinence in women. Eur Urol 2000;37:637-43.

3. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Urol 1990;66:12-5.

4. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

5. Kok ALM, Voorhorst FJ, Burger CW, Van Houten P, Kenemans P, Janssens J. Urinary and fecal incontinence in the community analysis of a MORI poll. BMJ 1992;306:832-4.

6. Lagace EA, Hansen W, Hickner JM. Prevalence and severity of urinary incontinence in ambulatory adults: an UPRNet study. J Fam Pract 1993;36:610-4.

7. Thom D. Variation in estimates of urinary incontinence prevalence in the community. Effects of differences in definition, population characteristics, and study type. J Am Geriatr Soc 1998;46:473-80.

8. Schulman C, Claes H, Matthijs J. Urinary incontinence in Belgium: a population-based epidemiological survey. Eur Urol 1997;32:315-20.

9. Hergoz AR, Fultz NH. Prevalence and incidence of urinary incontinence in community dwelling populations. J Am Geriatr Soc 1990;38:273-81.

10. Brieger GM, Yip SK, Hin LY, Chung TKH. The prevalence of urinary dysfunction in Hong Kong Chinese women. Obstet Gynecol 1996;88:1041.

11. Resnick NM, Yalla SV, Laurino E. The pathophysiology of urinary incontinence among institutionalized elderly persons. N Engl J Med 1989;320:1-7.

12. Hampel C, Wienhould D, Benken N, Eggersmann C, Thuroff JW. Definition of overactive bladder and epidemiology of urinary incontinence. Urology 1997;50:4-14.

13. Harrison GL, Memel DS. Urinary incontinence in women: its prevalence and its management in a health promotion clinic. Br J Gen Pract 1994;44:149-52.

14. Bortolotti A, Bernardini B, Colli E, et al. Prevalence and risk factors for urinary incontinence in Italy. Eur Urol 2000;37:30-5.

15. Cheow C, Swan LK, Merriman A, Choon TE, Viegas O. Urinary incontinence among the elderly people of Singapore. Age Ageing 1991;20:262-6.

16. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

17. Wein AJ, Rovner ES. The overactive bladder: an overview for primary care health provides. Int J Fertil 1999;44:56-66.

18. Abrams P, Wein AJ. The Overactive Bladder: A Widespread but Treatable Condition. Stockholm: Sparre Medical Group; 1998.

19. Jensen JK, Nielsen FR, Ostergard DR. The role of patient history in the diagnosis of urinary incontinence. Obstet Gynecol 1994;83:904-10.

20. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Uro 1990;66:12-5.

Article PDF
Author and Disclosure Information

FABIO PARAZZINI, MD
MAURIZIO LAVEZZARI, SCD
WALTER ARTIBANI, MD; ON BEHALF OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA
Milano and Verona, Italy
From the Istituto di Ricerche Farmacologiche “Mario Negri” and the Prima Clinica Ostetrico Ginecologica, Università di Milano (F.P.); the Direzione Medica, Pharmacia and Upjohn (M.L.), Milano; and the Clinica Urologica, Università di Verona, Verona (W.A.), Italy. This study was supported by a grant from Pharmacia Italia, where Maurizio Lavezzari is the director. Fabio Parazzini spoke at the symposium, “Vescica iperattiva ed incontinenza urinaria nella donna,” in Napoli, 14th May, 2002, on behalf of Pharmacia Italia. Address reprint requests to Fabio Parazzini, MD, Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea, 62, 20157 Milano, Italy. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(12)
Publications
Page Number
1072-1075
Legacy Keywords
,Epidemiologyurinary incontinenceoveractive bladder. (J Fam Pract 2002; 51:1072–1075)
Sections
Author and Disclosure Information

FABIO PARAZZINI, MD
MAURIZIO LAVEZZARI, SCD
WALTER ARTIBANI, MD; ON BEHALF OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA
Milano and Verona, Italy
From the Istituto di Ricerche Farmacologiche “Mario Negri” and the Prima Clinica Ostetrico Ginecologica, Università di Milano (F.P.); the Direzione Medica, Pharmacia and Upjohn (M.L.), Milano; and the Clinica Urologica, Università di Verona, Verona (W.A.), Italy. This study was supported by a grant from Pharmacia Italia, where Maurizio Lavezzari is the director. Fabio Parazzini spoke at the symposium, “Vescica iperattiva ed incontinenza urinaria nella donna,” in Napoli, 14th May, 2002, on behalf of Pharmacia Italia. Address reprint requests to Fabio Parazzini, MD, Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea, 62, 20157 Milano, Italy. E-mail: [email protected].

Author and Disclosure Information

FABIO PARAZZINI, MD
MAURIZIO LAVEZZARI, SCD
WALTER ARTIBANI, MD; ON BEHALF OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA
Milano and Verona, Italy
From the Istituto di Ricerche Farmacologiche “Mario Negri” and the Prima Clinica Ostetrico Ginecologica, Università di Milano (F.P.); the Direzione Medica, Pharmacia and Upjohn (M.L.), Milano; and the Clinica Urologica, Università di Verona, Verona (W.A.), Italy. This study was supported by a grant from Pharmacia Italia, where Maurizio Lavezzari is the director. Fabio Parazzini spoke at the symposium, “Vescica iperattiva ed incontinenza urinaria nella donna,” in Napoli, 14th May, 2002, on behalf of Pharmacia Italia. Address reprint requests to Fabio Parazzini, MD, Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea, 62, 20157 Milano, Italy. E-mail: [email protected].

Article PDF
Article PDF

We conducted a cross-sectional study in Italy among men at least 50 years old and women at least 40 years old who consecutively visited their general practitioners. Patients were asked about the frequency of symptoms of overactive bladder and urinary incontinence. A total of 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years) were identified by 774 general practitioners. The frequencies of overactive bladder were 3.0% (95% confidence interval, 2.7–3.5) in men and 1.1% (95% confidence interval, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% confidence interval, 7.7–8.9) in men and 10.2% (95% confidence interval, 9.6–10.8) in women.

The reported prevalence of urinary incontinence ranges from 10% to 50%.11-9 The differences can be explained in part by the age distribution of the populations considered, with higher rates in studies including more older subjects.2,10 Further, some studies were conducted in selected populations.7,11

With regard to diagnosis, although urinary incontinence should be an objectively proven condition of involuntary loss of urine, the criteria for its definition vary.12 Differences also have been reported in the frequency of various types of urinary incontinence (urge, stress, and mixed incontinences),11,13-16 and data are scant on the frequency of overactive bladder without urinary incontinence.16 We therefore conducted a study in Italian men at least 50 years old and women at least 40 years old to determine the prevalence of overactive bladder and various types of urinary incontinence.14

Materials And Methods

Eligible subjects were consecutive men at least 50 years old and women at least 40 years old who asked to be seen by their general practitioners during the study period. There were no exclusion criteria. Age cutoffs were chosen because of the very low rate of urinary incontinence in men younger than 50 years and in women younger than 40 years.2

Participating general practitioners were invited to take part in the study on the basis of lists of general practitioners specifically interested in epidemiologic studies and affiliated with the Gruppo Interdisciplinare di Studio Incontinenza Urinaria (Interdisciplinary Group for the Study of Incontinence). Each general practitioner could decide when to start recruitment (generally soon after agreeing to participate) and stopped when 50 cases had been identified, or after 5 days. The motivation for participating was purely an interest in the data collection. All participating physicians were general practitioners, without any specific interest in urogynecological problems.

We obtained demographic information from each subject. The subjects were then asked: Have you had any involuntary urinary loss during the past 3 months? On average, do you urinate more than 8 times a day and/or more than once during the night? Have you any urgency symptoms? If the subject answered yes to the first question, that subject was defined as having urinary incontinence; if the subject answered yes to the second and third questions, that subject was considered to have overactive bladder.

Subjects with incontinence were interviewed further with the questionnaire proposed by Wein and colleagues17,18 for diagnosis of type of urinary incontinence: stress incontinence, mixed incontinence, or urge incontinence. The questionnaire provides a presumptive diagnosis of stress and urge incontinence on the basis of the presence or absence of the following symptoms: urgency, frequency with urgency, leaking during physical activity, amount of urinary leakage with each episode of incontinence, ability to reach the toilet in time after an urge to void, and nocturia. The criteria for overactive bladder without urinary incontinence (ie, patients reporting an urge to urinate more than 8 times a day and/or more than once during the night and/or with urgency symptoms) and various types of urinary incontinence were the criteria accepted by the main Italian research groups on urinary incontinence.

Informed consent was obtained from each subject. Participation in the study did not commit the patient to any instrumental examination or laboratory tests. Confidence intervals (CI) of the estimated percentages of frequency of urinary incontinence were based on the Poisson approximation. Statistical differences in the frequency of urinary incontinence among strata of age and sex were analyzed with the standard chi square test comparing observed and expected events and, when appropriate, using the test for trend.

Results

A total of 9858 men and 13,671 women were identified by 774 general practitioners, representing approximately 1.5% of all Italian general practitioners. Of those, 26.6% were in northern Italy, 17.4% were in central Italy, and 56.0% were in southern Italy. Each general practitioner identified a mean of 30 subjects (range, 15–50). A total of 245 men (2.5%) and 306 women (2.2%) refused to enter the study. Thus the present report included information on 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years).

 

 

The frequencies of overactive bladder without incontinence were 3.0% (95% CI, 2.7–3.5) in men and 1.1% (95% CI, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% CI, 7.7–8.9) in men and 10.2% (95% CI, 9.6–10.8) in women (Table 1).

The frequencies of overactive bladder and urinary incontinence increased with age in both sexes. For example, the frequencies of overactive bladder were 2.0% and 1.1% in men and women 51 to 60 years but 3.4% and 1.4% in subjects older than 70 years. The frequency of urinary incontinence also increased with age; this trend was statistically significant (chi square with 1 df, adjusted for sex, P < .05).

Table 2 shows the distribution of subjects with urinary problems stratified by sex, age, and type of problem. Mixed incontinence was the most frequent condition in men older than 60 years, and overactive bladder was more frequent in younger men. The relative frequency of stress incontinence tended to decrease with age. In women, stress incontinence and mixed incontinence were the most common causes of urinary incontinence in all age strata. The frequency of stress incontinence decreased with age, whereas that of mixed incontinence increased.

TABLE 1
Frequency of overactive bladder and urinary incontinence according to sex and age

Age, yMen*Women*
No urinary problemOB onlyUINo urinary problemOB onlyUI
40–50   3122 (93.6)23 (0.7)190 (5.7)
51–603160 (93.8)69 (2.0)139 (4.1)323 (90.6)38 (1.1)299 (8.4)
61–703071 (88.8)116 (3.4)273 (7.9)3180 (87.5)52 (1.4)403 (11.1)
>702298 (82.5)100 (3.4)387 (13.6)2320 (82.2)39 (1.4)463 (16.4)
Total8529 (88.7)285 (3.0)799 (8.3)11858 (88.7)150 (1.1)1358 (10.2)
*Data are presented as number (%) of subjects.OB, overactive bladder; UI, urinary incontinence.

TABLE 2
Frequency of overactive bladder and various types of urinary incontinence in strata by sex and age*

Age, yMen†Women†
OB only UGISIMXIOB onlyUGISIMXI
40–50    23 (11.2)31 (15.1)93 (45.4)58 (28.3)
51–6069 (35.5)30 (16.4)33 (18.0)55 (30.1)38 (11.4)59 (17.8)127 (38.3)108 (32.5)
61–70116 (33.1)63 (18.0)37 (10.6)134 (38.3)52 (11.9)72 (16.5)169 (37.8)147 (33.7)
>70100 (24.1)69 (15.7)52 (12.5)198 (47.7)37 (7.6)89 (18.3)143 (29.4)217 (44.7)
Total285 (29.7)162 (16.9)122 (12.7)391 (40.7)150 (10.3)251 (17.2)532 (36.4)530 (36.2)
*These totals are not the same as those in Table 1 due to missing values of subjects with OB or urinary incontinence.
†Data are presented as number (%) of subjects.
MXI, mixed incontinence; OB, overactive bladder; SI, stress incontinence; UGI, urge incontinence.

Discussion

Before discussing the results, the study limitations must be considered. The study population consisted of men at least 50 years and women at least 40 years identified among patients who asked to be seen by their general practitioners during the study period, but not among all patients registered with these physicians. The general practitioners were not randomly identified among all Italian general practitioners, so their patients cannot be formally considered representative of the Italian population. Nevertheless, general practitioners participating in this study were placed throughout the main areas of the country. The strengths of the study included the opportunity to analyze the prevalence of urinary incontinence and overactive bladder in a large series of subjects with the use of standard methods for recording the symptoms and data collection. Further, the interview was conducted by physicians well known to the subjects, which should increase the reliability of diagnosis, particularly of various types of urinary incontinence.

The limitations of a patient’s history in the diagnosis of type of urinary incontinence are widely recognized. In a review of the literature, a clinical history indicating stress or urge incontinence, when compared with a urodynamically based diagnosis, showed sensitivities of 0.9 and 0.4 and specificities of 0.5 and 0.6, respectively, in clinical studies.19 In epidemiologic studies with self-reported information, these values might be lower. Any misclassification of the type of incontinence would tend to reduce the differences in the frequency of different types.

The frequency of urinary incontinence in this population was consistent with that reported in studies conducted in European and North American areas and in Italy in the general population. For example, the prevalences of urinary incontinence in women 50 to 60 years were approximately 18% in Denmark20 and 12% to 17% in the United Kingdom.1,4 An Italian study of 2767 women found a prevalence of 11.8% for urinary incontinence in women 51 to 60 years old.

In contrast, the rates of urinary incontinence reported in this study were slightly lower than those reported by Lagace and colleagues in a similar ambulatory setting in the United States.6 They found prevalences of 35% for urinary incontinence in women 50 to 59 years and 5% in men in the same age group.

Among men the prevalence of urinary incontinence was similar to or slightly lower than that reported in the general population.1 In the study by Bortolotti and associates, the overall prevalence of urinary incontinence in men older than 50 years was 2.3%, a figure somewhat lower than in the present study.14

 

 

An interesting finding of this study was the prevalence of overactive bladder without urinary incontinence, a condition rarely analyzed. It appears less frequently than urinary incontinence, particularly in women, and causes approximately 30% of urinary problems in men but only 10% in women. However, considering the frequencies of overactive bladder without urinary incontinence and urge incontinence, the conditions due to detrusor overactivity likely account for a large proportion of urinary problems.

In conclusion, this study, in a large number of subjects, provided an estimate of the prevalence of overactive bladder and urinary incontinence among people attending their general practitioners in Italy. It also emphasized the importance of detrusor overactivity-related conditions as a cause of urinary problems in this population, in both sexes and all ages.

MEMBERS OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA

Walter Artibani, Cattedra di Urologia, Università di Verona; Francesco Benvenuti, Unità Operativa di Geriatria, Ospedale INRCA c/o Presidio IOT, Firenze; Roberto Carone, Divisione di Urologia, Centro Rieducazionale Funzionale, Torino; Francesco Catanzaro, Divisione di Urologia, Multimedica, Sesto San Giovanni (MI); Claudio Cricelli, SIMG, Firenze; Paolo Di Benedetto, Centro di Riabilitazione, Ospedale Santoro, Trieste; Vincenzo Giambanco, Divisione II di Ostetricia e Ginecologia, Ospedale Civico, Palermo; Gian Battista Massi, Clinica Ostetrica e Ginecologica, Università di Firenze; Rodolfo Milani, Divisione di Ginecologia, Ospedale Bassini, Cinisello Balsamo (MI); and Alberto Zanollo, Divisione di Urologia, Ospedale Civile, Magenta, Milano.

We conducted a cross-sectional study in Italy among men at least 50 years old and women at least 40 years old who consecutively visited their general practitioners. Patients were asked about the frequency of symptoms of overactive bladder and urinary incontinence. A total of 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years) were identified by 774 general practitioners. The frequencies of overactive bladder were 3.0% (95% confidence interval, 2.7–3.5) in men and 1.1% (95% confidence interval, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% confidence interval, 7.7–8.9) in men and 10.2% (95% confidence interval, 9.6–10.8) in women.

The reported prevalence of urinary incontinence ranges from 10% to 50%.11-9 The differences can be explained in part by the age distribution of the populations considered, with higher rates in studies including more older subjects.2,10 Further, some studies were conducted in selected populations.7,11

With regard to diagnosis, although urinary incontinence should be an objectively proven condition of involuntary loss of urine, the criteria for its definition vary.12 Differences also have been reported in the frequency of various types of urinary incontinence (urge, stress, and mixed incontinences),11,13-16 and data are scant on the frequency of overactive bladder without urinary incontinence.16 We therefore conducted a study in Italian men at least 50 years old and women at least 40 years old to determine the prevalence of overactive bladder and various types of urinary incontinence.14

Materials And Methods

Eligible subjects were consecutive men at least 50 years old and women at least 40 years old who asked to be seen by their general practitioners during the study period. There were no exclusion criteria. Age cutoffs were chosen because of the very low rate of urinary incontinence in men younger than 50 years and in women younger than 40 years.2

Participating general practitioners were invited to take part in the study on the basis of lists of general practitioners specifically interested in epidemiologic studies and affiliated with the Gruppo Interdisciplinare di Studio Incontinenza Urinaria (Interdisciplinary Group for the Study of Incontinence). Each general practitioner could decide when to start recruitment (generally soon after agreeing to participate) and stopped when 50 cases had been identified, or after 5 days. The motivation for participating was purely an interest in the data collection. All participating physicians were general practitioners, without any specific interest in urogynecological problems.

We obtained demographic information from each subject. The subjects were then asked: Have you had any involuntary urinary loss during the past 3 months? On average, do you urinate more than 8 times a day and/or more than once during the night? Have you any urgency symptoms? If the subject answered yes to the first question, that subject was defined as having urinary incontinence; if the subject answered yes to the second and third questions, that subject was considered to have overactive bladder.

Subjects with incontinence were interviewed further with the questionnaire proposed by Wein and colleagues17,18 for diagnosis of type of urinary incontinence: stress incontinence, mixed incontinence, or urge incontinence. The questionnaire provides a presumptive diagnosis of stress and urge incontinence on the basis of the presence or absence of the following symptoms: urgency, frequency with urgency, leaking during physical activity, amount of urinary leakage with each episode of incontinence, ability to reach the toilet in time after an urge to void, and nocturia. The criteria for overactive bladder without urinary incontinence (ie, patients reporting an urge to urinate more than 8 times a day and/or more than once during the night and/or with urgency symptoms) and various types of urinary incontinence were the criteria accepted by the main Italian research groups on urinary incontinence.

Informed consent was obtained from each subject. Participation in the study did not commit the patient to any instrumental examination or laboratory tests. Confidence intervals (CI) of the estimated percentages of frequency of urinary incontinence were based on the Poisson approximation. Statistical differences in the frequency of urinary incontinence among strata of age and sex were analyzed with the standard chi square test comparing observed and expected events and, when appropriate, using the test for trend.

Results

A total of 9858 men and 13,671 women were identified by 774 general practitioners, representing approximately 1.5% of all Italian general practitioners. Of those, 26.6% were in northern Italy, 17.4% were in central Italy, and 56.0% were in southern Italy. Each general practitioner identified a mean of 30 subjects (range, 15–50). A total of 245 men (2.5%) and 306 women (2.2%) refused to enter the study. Thus the present report included information on 9613 men (mean age, 64.8 years; range, 50–98 years) and 13,365 women (mean age, 60.3 years; range, 40–98 years).

 

 

The frequencies of overactive bladder without incontinence were 3.0% (95% CI, 2.7–3.5) in men and 1.1% (95% CI, 0.9–1.3) in women. The corresponding frequencies for urinary incontinence were 8.3% (95% CI, 7.7–8.9) in men and 10.2% (95% CI, 9.6–10.8) in women (Table 1).

The frequencies of overactive bladder and urinary incontinence increased with age in both sexes. For example, the frequencies of overactive bladder were 2.0% and 1.1% in men and women 51 to 60 years but 3.4% and 1.4% in subjects older than 70 years. The frequency of urinary incontinence also increased with age; this trend was statistically significant (chi square with 1 df, adjusted for sex, P < .05).

Table 2 shows the distribution of subjects with urinary problems stratified by sex, age, and type of problem. Mixed incontinence was the most frequent condition in men older than 60 years, and overactive bladder was more frequent in younger men. The relative frequency of stress incontinence tended to decrease with age. In women, stress incontinence and mixed incontinence were the most common causes of urinary incontinence in all age strata. The frequency of stress incontinence decreased with age, whereas that of mixed incontinence increased.

TABLE 1
Frequency of overactive bladder and urinary incontinence according to sex and age

Age, yMen*Women*
No urinary problemOB onlyUINo urinary problemOB onlyUI
40–50   3122 (93.6)23 (0.7)190 (5.7)
51–603160 (93.8)69 (2.0)139 (4.1)323 (90.6)38 (1.1)299 (8.4)
61–703071 (88.8)116 (3.4)273 (7.9)3180 (87.5)52 (1.4)403 (11.1)
>702298 (82.5)100 (3.4)387 (13.6)2320 (82.2)39 (1.4)463 (16.4)
Total8529 (88.7)285 (3.0)799 (8.3)11858 (88.7)150 (1.1)1358 (10.2)
*Data are presented as number (%) of subjects.OB, overactive bladder; UI, urinary incontinence.

TABLE 2
Frequency of overactive bladder and various types of urinary incontinence in strata by sex and age*

Age, yMen†Women†
OB only UGISIMXIOB onlyUGISIMXI
40–50    23 (11.2)31 (15.1)93 (45.4)58 (28.3)
51–6069 (35.5)30 (16.4)33 (18.0)55 (30.1)38 (11.4)59 (17.8)127 (38.3)108 (32.5)
61–70116 (33.1)63 (18.0)37 (10.6)134 (38.3)52 (11.9)72 (16.5)169 (37.8)147 (33.7)
>70100 (24.1)69 (15.7)52 (12.5)198 (47.7)37 (7.6)89 (18.3)143 (29.4)217 (44.7)
Total285 (29.7)162 (16.9)122 (12.7)391 (40.7)150 (10.3)251 (17.2)532 (36.4)530 (36.2)
*These totals are not the same as those in Table 1 due to missing values of subjects with OB or urinary incontinence.
†Data are presented as number (%) of subjects.
MXI, mixed incontinence; OB, overactive bladder; SI, stress incontinence; UGI, urge incontinence.

Discussion

Before discussing the results, the study limitations must be considered. The study population consisted of men at least 50 years and women at least 40 years identified among patients who asked to be seen by their general practitioners during the study period, but not among all patients registered with these physicians. The general practitioners were not randomly identified among all Italian general practitioners, so their patients cannot be formally considered representative of the Italian population. Nevertheless, general practitioners participating in this study were placed throughout the main areas of the country. The strengths of the study included the opportunity to analyze the prevalence of urinary incontinence and overactive bladder in a large series of subjects with the use of standard methods for recording the symptoms and data collection. Further, the interview was conducted by physicians well known to the subjects, which should increase the reliability of diagnosis, particularly of various types of urinary incontinence.

The limitations of a patient’s history in the diagnosis of type of urinary incontinence are widely recognized. In a review of the literature, a clinical history indicating stress or urge incontinence, when compared with a urodynamically based diagnosis, showed sensitivities of 0.9 and 0.4 and specificities of 0.5 and 0.6, respectively, in clinical studies.19 In epidemiologic studies with self-reported information, these values might be lower. Any misclassification of the type of incontinence would tend to reduce the differences in the frequency of different types.

The frequency of urinary incontinence in this population was consistent with that reported in studies conducted in European and North American areas and in Italy in the general population. For example, the prevalences of urinary incontinence in women 50 to 60 years were approximately 18% in Denmark20 and 12% to 17% in the United Kingdom.1,4 An Italian study of 2767 women found a prevalence of 11.8% for urinary incontinence in women 51 to 60 years old.

In contrast, the rates of urinary incontinence reported in this study were slightly lower than those reported by Lagace and colleagues in a similar ambulatory setting in the United States.6 They found prevalences of 35% for urinary incontinence in women 50 to 59 years and 5% in men in the same age group.

Among men the prevalence of urinary incontinence was similar to or slightly lower than that reported in the general population.1 In the study by Bortolotti and associates, the overall prevalence of urinary incontinence in men older than 50 years was 2.3%, a figure somewhat lower than in the present study.14

 

 

An interesting finding of this study was the prevalence of overactive bladder without urinary incontinence, a condition rarely analyzed. It appears less frequently than urinary incontinence, particularly in women, and causes approximately 30% of urinary problems in men but only 10% in women. However, considering the frequencies of overactive bladder without urinary incontinence and urge incontinence, the conditions due to detrusor overactivity likely account for a large proportion of urinary problems.

In conclusion, this study, in a large number of subjects, provided an estimate of the prevalence of overactive bladder and urinary incontinence among people attending their general practitioners in Italy. It also emphasized the importance of detrusor overactivity-related conditions as a cause of urinary problems in this population, in both sexes and all ages.

MEMBERS OF THE GRUPPO INTERDISCIPLINARE DI STUDIO INCONTINENZA URINARIA

Walter Artibani, Cattedra di Urologia, Università di Verona; Francesco Benvenuti, Unità Operativa di Geriatria, Ospedale INRCA c/o Presidio IOT, Firenze; Roberto Carone, Divisione di Urologia, Centro Rieducazionale Funzionale, Torino; Francesco Catanzaro, Divisione di Urologia, Multimedica, Sesto San Giovanni (MI); Claudio Cricelli, SIMG, Firenze; Paolo Di Benedetto, Centro di Riabilitazione, Ospedale Santoro, Trieste; Vincenzo Giambanco, Divisione II di Ostetricia e Ginecologia, Ospedale Civico, Palermo; Gian Battista Massi, Clinica Ostetrica e Ginecologica, Università di Firenze; Rodolfo Milani, Divisione di Ginecologia, Ospedale Bassini, Cinisello Balsamo (MI); and Alberto Zanollo, Divisione di Urologia, Ospedale Civile, Magenta, Milano.

References

1. Thomas TM, Plymat KR, Blannin J, Meade TW. Prevalence of urinary incontinence. BMJ 1980;281:1243-5.

2. Parazzini F, Colli E, Origgi G, et al. Risk factors for urinary incontinence in women. Eur Urol 2000;37:637-43.

3. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Urol 1990;66:12-5.

4. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

5. Kok ALM, Voorhorst FJ, Burger CW, Van Houten P, Kenemans P, Janssens J. Urinary and fecal incontinence in the community analysis of a MORI poll. BMJ 1992;306:832-4.

6. Lagace EA, Hansen W, Hickner JM. Prevalence and severity of urinary incontinence in ambulatory adults: an UPRNet study. J Fam Pract 1993;36:610-4.

7. Thom D. Variation in estimates of urinary incontinence prevalence in the community. Effects of differences in definition, population characteristics, and study type. J Am Geriatr Soc 1998;46:473-80.

8. Schulman C, Claes H, Matthijs J. Urinary incontinence in Belgium: a population-based epidemiological survey. Eur Urol 1997;32:315-20.

9. Hergoz AR, Fultz NH. Prevalence and incidence of urinary incontinence in community dwelling populations. J Am Geriatr Soc 1990;38:273-81.

10. Brieger GM, Yip SK, Hin LY, Chung TKH. The prevalence of urinary dysfunction in Hong Kong Chinese women. Obstet Gynecol 1996;88:1041.

11. Resnick NM, Yalla SV, Laurino E. The pathophysiology of urinary incontinence among institutionalized elderly persons. N Engl J Med 1989;320:1-7.

12. Hampel C, Wienhould D, Benken N, Eggersmann C, Thuroff JW. Definition of overactive bladder and epidemiology of urinary incontinence. Urology 1997;50:4-14.

13. Harrison GL, Memel DS. Urinary incontinence in women: its prevalence and its management in a health promotion clinic. Br J Gen Pract 1994;44:149-52.

14. Bortolotti A, Bernardini B, Colli E, et al. Prevalence and risk factors for urinary incontinence in Italy. Eur Urol 2000;37:30-5.

15. Cheow C, Swan LK, Merriman A, Choon TE, Viegas O. Urinary incontinence among the elderly people of Singapore. Age Ageing 1991;20:262-6.

16. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

17. Wein AJ, Rovner ES. The overactive bladder: an overview for primary care health provides. Int J Fertil 1999;44:56-66.

18. Abrams P, Wein AJ. The Overactive Bladder: A Widespread but Treatable Condition. Stockholm: Sparre Medical Group; 1998.

19. Jensen JK, Nielsen FR, Ostergard DR. The role of patient history in the diagnosis of urinary incontinence. Obstet Gynecol 1994;83:904-10.

20. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Uro 1990;66:12-5.

References

1. Thomas TM, Plymat KR, Blannin J, Meade TW. Prevalence of urinary incontinence. BMJ 1980;281:1243-5.

2. Parazzini F, Colli E, Origgi G, et al. Risk factors for urinary incontinence in women. Eur Urol 2000;37:637-43.

3. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Urol 1990;66:12-5.

4. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

5. Kok ALM, Voorhorst FJ, Burger CW, Van Houten P, Kenemans P, Janssens J. Urinary and fecal incontinence in the community analysis of a MORI poll. BMJ 1992;306:832-4.

6. Lagace EA, Hansen W, Hickner JM. Prevalence and severity of urinary incontinence in ambulatory adults: an UPRNet study. J Fam Pract 1993;36:610-4.

7. Thom D. Variation in estimates of urinary incontinence prevalence in the community. Effects of differences in definition, population characteristics, and study type. J Am Geriatr Soc 1998;46:473-80.

8. Schulman C, Claes H, Matthijs J. Urinary incontinence in Belgium: a population-based epidemiological survey. Eur Urol 1997;32:315-20.

9. Hergoz AR, Fultz NH. Prevalence and incidence of urinary incontinence in community dwelling populations. J Am Geriatr Soc 1990;38:273-81.

10. Brieger GM, Yip SK, Hin LY, Chung TKH. The prevalence of urinary dysfunction in Hong Kong Chinese women. Obstet Gynecol 1996;88:1041.

11. Resnick NM, Yalla SV, Laurino E. The pathophysiology of urinary incontinence among institutionalized elderly persons. N Engl J Med 1989;320:1-7.

12. Hampel C, Wienhould D, Benken N, Eggersmann C, Thuroff JW. Definition of overactive bladder and epidemiology of urinary incontinence. Urology 1997;50:4-14.

13. Harrison GL, Memel DS. Urinary incontinence in women: its prevalence and its management in a health promotion clinic. Br J Gen Pract 1994;44:149-52.

14. Bortolotti A, Bernardini B, Colli E, et al. Prevalence and risk factors for urinary incontinence in Italy. Eur Urol 2000;37:30-5.

15. Cheow C, Swan LK, Merriman A, Choon TE, Viegas O. Urinary incontinence among the elderly people of Singapore. Age Ageing 1991;20:262-6.

16. O’Brien J, Austin M, Sethi P, O’Boyle P. Urinary incontinence: prevalence, need for treatment, and effectiveness of intervention by nurse. BMJ 1991;303:1308-12.

17. Wein AJ, Rovner ES. The overactive bladder: an overview for primary care health provides. Int J Fertil 1999;44:56-66.

18. Abrams P, Wein AJ. The Overactive Bladder: A Widespread but Treatable Condition. Stockholm: Sparre Medical Group; 1998.

19. Jensen JK, Nielsen FR, Ostergard DR. The role of patient history in the diagnosis of urinary incontinence. Obstet Gynecol 1994;83:904-10.

20. Sommer P, Bauer T, Nielsen KK, et al. Voiding patterns and prevalence of incontinence in women. A questionnaire survey. Br J Uro 1990;66:12-5.

Issue
The Journal of Family Practice - 51(12)
Issue
The Journal of Family Practice - 51(12)
Page Number
1072-1075
Page Number
1072-1075
Publications
Publications
Article Type
Display Headline
Prevalence of overactive bladder and urinary incontinence
Display Headline
Prevalence of overactive bladder and urinary incontinence
Legacy Keywords
,Epidemiologyurinary incontinenceoveractive bladder. (J Fam Pract 2002; 51:1072–1075)
Legacy Keywords
,Epidemiologyurinary incontinenceoveractive bladder. (J Fam Pract 2002; 51:1072–1075)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

The Stability of Tretinoin in Tretinoin Gel Microsphere 0.1%

Article Type
Changed
Thu, 01/10/2019 - 11:56
Display Headline
The Stability of Tretinoin in Tretinoin Gel Microsphere 0.1%

Article PDF
Author and Disclosure Information

Nyirady J, Lucas C, Yusuf M, Mignone P, Wisniewski S

Issue
Cutis - 70(5)
Publications
Topics
Page Number
295-298
Sections
Author and Disclosure Information

Nyirady J, Lucas C, Yusuf M, Mignone P, Wisniewski S

Author and Disclosure Information

Nyirady J, Lucas C, Yusuf M, Mignone P, Wisniewski S

Article PDF
Article PDF

Issue
Cutis - 70(5)
Issue
Cutis - 70(5)
Page Number
295-298
Page Number
295-298
Publications
Publications
Topics
Article Type
Display Headline
The Stability of Tretinoin in Tretinoin Gel Microsphere 0.1%
Display Headline
The Stability of Tretinoin in Tretinoin Gel Microsphere 0.1%
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

The Spirituality Index of Well-Being: Development and testing of a new measure

Article Type
Changed
Mon, 01/14/2019 - 13:06
Display Headline
The Spirituality Index of Well-Being: Development and testing of a new measure

ABSTRACT

OBJECTIVE: To evaluate the reliability and validity of the Spirituality Index of Well-Being (SIWB) Scale in a patient population.

STUDY DESIGN: Cross-sectional survey.

POPULATION: Community-dwelling elderly individuals (n = 277) recruited from primary care clinic sites in the Kansas City metropolitan area.

OUTCOMES MEASURED: Internal consistency, concurrent construct validity, discriminant validity, and factor analysis with Varimax rotation.

RESULTS: The initial version of the SIWB contained 40 items: 20 from a self-efficacy domain and 20 from a life scheme domain. Factor analysis yielded 6 items loaded most strongly on factor 1 (intrapersonal self-efficacy) and 6 other items loaded strongly on factor 2 (life scheme). The self-efficacy subscale had an of .83 and the life scheme subscale had an of .80; the total 12-item SIWB scale had an of .87. The SIWB had significant and expected correlations with other quality of life measures related to subjective well-being: EuroQol (r = .18), Geriatric Depression Scale (r = -.35), the Physical Functioning Index from the Short Form 36 (r = .28), and the Years of Healthy Life Scale (r = -.35). Religiosity did not correlate significantly with the SIWB (r = .12; P = .056).

CONCLUSIONS: The 12-item SIWB scale is a valid and reliable measure of subjective well-being in an older patient population.

Spirituality and religion are embedded within contemporary American culture1 and have become an increasingly important part of the patient experience of health and illness.2 There is growing interest in examining the association of spirituality, religion, and health-related outcomes in the United States, particularly in the areas of health behavior and promotion3 and psychoneuroimmunology.4 Despite this interest, the absence of operational definitions of spirituality and religion, the contamination of spirituality items with measures of religion, and the lack of valid and reliable instruments that gauge these constructs continue to be major limitations to work in this area.5

Conceptually, religion or religiosity is often viewed in terms of the various organized, individual, and attitudinal manifestations of different faith traditions, and spirituality connotes and expresses a sense of meaning, purpose, or power from within or from a transcendent source.6 There is no shortage of instruments that measure dimensions of either construct, and researchers from the fields of sociology,7 psychology,6 and pastoral theology and chaplaincy8 have developed a variety of scales of religion and spirituality.9 It remains unclear, however, whether these constructs can be extended to health care settings or whether these instruments are applicable and useful as measures of individual or population health. For example, frequency of religious service attendance is often a single-item measure used as an independent variable in studies of health outcomes, such as health status.10 Although service attendance is associated with self-reported health in community-dwelling elderly individuals, the effect of this activity on perceived health disappears when functional status is controlled.10 Therefore, can religious service attendance be considered an independent variable, or is it simply a proxy of functional status within a geriatric population?11

This example highlights the importance of context in the use of any measure of religion or spirituality. It also points to the health-related quality of life field as a useful orientation for conceptualizing spirituality and religion in health care settings. Health-related quality of life, an individual’s or group’s perception of health over time, is predicated on the assumption that a patient’s experiences, beliefs, expectations, and perceptions directly influence the physical, psychological, and social domains of health.12 Spirituality and religion have been proposed as mediators of 1 characteristic of psychological health, subjective well-being, in 4 ways: by ensuring social support and integration within a community, by establishing personal relationships with a divine other, by promoting a salubrious personal lifestyle that is congruent with a personal faith tradition, and by providing systems of meaning and existential coherence.13

To identify and describe elements of spirituality that are linked to subjective well-being, our prior qualitative work explored the patient perspective. We found that patients consider spirituality in predominantly cognitive terms and incorporate the domains of life scheme and positive intentionality, or self-efficacy, as primary components Figure 1.14 In addition to suggesting a dynamic conceptual framework, this research supported the assumption that patients associate spirituality with well-being largely through the provision of systems of meaning and coherence.

The current study builds on this work and describes the development and evaluation of a brief research instrument, the Spirituality Index of Well-Being (SIWB), which is designed to measure the effect of spirituality on subjective well-being. Several assumptions guided our study design and analysis. First, we recognized that there are no global yet parsimonious instruments that capture the complexity and depth of spirituality in any context, health care or otherwise. Second, based on our qualitative work, we viewed spirituality as subsumed within the psychological rather than within the social or physical domain. Third, we considered the SIWB as a health-related quality of life measure, one to be used in studies of individual or population health, rather than as an assessment tool.

 

 

From the cultural and social perspectives, spirituality and religion are especially salient in the lives of minority elderly,15,16 particularly within the settings of serious illness and end-of-life care.17 From a population health perspective, increased life expectancy in the United States highlights the importance of health-related quality of life assessment in the areas of chronic illness, aging, and end-of-life care, and Healthy People 2010 has identified quality of life improvement as a specific public health objective.18 By bridging both perspectives, the SIWB has the potential to add a unique and patient-centered dimension to health-related quality of life research.

Methods

Scale and item development

The SIWB was designed as a research tool to measure the effect of patient-reported spirituality on subjective well-being. Our understanding of spirituality and the stimulus material for the index have been described elsewhere.14 In brief, a congruent, meaningful life scheme and a high degree of positive intentionality or self-efficacy promote personal agency, an intermediary between spirituality and subjective well-being Figure.

Life scheme is similar to the construct of sense of coherence, which was described by Antonovsky as a positive, pervasive way of viewing the world, and one’s life in it, lending elements of comprehensibility, manageability, and meaningfulness.19 Positive intentionality shares characteristics with self-efficacy, which is an individual’s belief in the capacity to organize and perform activities that are required for a prescribed goal.20 Self-efficacy beliefs are domain and task specific, and participants in our focus group study depicted these beliefs within the context of overcoming threatened or actual changes to their functioning.

Forty items, 20 for the life scheme domain and 20 for the self-efficacy domain, were developed by investigators who conducted the qualitative study (T.P.D., B.B.F.). The scale was prefaced by the question, “Which statement best describes your feelings and choices,” and each item was a statement accompanied by a 5-point Likert scale response from “strongly agree” to “strongly disagree,” with the midpoint representing “neither agree nor disagree.” Item content consisted of positive and negative statements regarding life scheme (eg, “I haven’t yet found my life’s purpose”) and personal self-efficacy (eg, “Despite any problem that I may face, I can get through the day”).

Study population

Participants were 65 years or older and enrolled in a cohort study to assess the ability of performance measures to predict future health service use, health status, and functional status. Recruitment for the parent study occurred between April and November 1996 from primary care sites within the Veteran’s Affairs network and a Medicare health management organization serving the Kansas City metropolitan area. The study population represented the cohort 36 months after enrollment.

Measures

Demographic information. Participants had the following demographic information collected: age, sex, race, and education level.

Health and functional status. Subjective health status was measured by the EuroQol, a recognized quality-of-life measure,21 in addition to a single-item measure of global health from the Years of Healthy Life (YOHL) Scale.22 The Physical Functioning Index of the Medical Outcomes Study Short Form 36 was used to assess functional status.23

Mental health status. We measured mental health status with the Geriatric Depression Scale (GDS), a 15-item instrument with a dichotomous (yes/no) response format.24 Items from the fear of death domain of the Death Attitude Profile Scale-Revised (DAP-R) were selected as an additional proxy of psychological well-being.25

Religiosity. Five items derived from questions developed by the National Opinion Research Center26 were preferentially selected according to a previously tested and validated model of religiosity.27 Frequency of religious or spiritual service attendance was used to assess organizational religiosity, and frequency of private prayer or spiritual practice was used to measure nonorganizational religiosity. Three items were used to measure subjective or intrinsic religiosity: self-reported strength of religious or spiritual orientation, closeness to God (or a Higher Force), and frequency of affective spiritual experiences..

Data analysis

Item reduction and reliability testing. The initial 40-item pool was reduced to 20 life scheme items and 14 self-efficacy items based on subject response and feedback during survey administration. Items that subjects could not understand or answer by self-report were removed.

First, internal reliability analyses were conducted for each subscale (life scheme, self-efficacy) and for the SIWB scale with a goal of producing high internal consistency as measured by the Cronbach’s α (eg, > .70). Items that contributed to lower internal reliability were discarded, which removed 1 self-efficacy item and 6 life scheme items from the scale.

To further refine the SIWB and its subscales, the remaining items were subjected to principal components analysis by using Varimax rotation. After rotation, the 2 largest factors were readily interpretable, with items loading as expected: self-efficacy items loading on the first factor and life scheme items loading on the second factor. From each factor, the top 6 items ranked by loading magnitude were selected for inclusion into the final scale.

 

 

Internal reliabilities for the subscales (6 items each) and the SIWB scale (12 items total) were calculated. A maximum likelihood factor analysis with Varimax rotation also was conducted to verify that a 2-factor solution remained for the reduced 12-item scale.

Validity testing. Well-being is conceptually subsumed within the psychological domain of quality of life measures and is comprised of the dimensions of positive affect (affective) and subjective perceptions of general health and life satisfaction (cognitive).12 As a result, we determined concurrent construct validity by correlating the 2 6-item subscale scores and the total SIWB score with summed scores from the fear of death items from the DAP-R, the GDS, YOHL, the Physical Functioning Index from the SF-36, and the EuroQol. We anticipated positive correlations of the SIWB with physical functioning (SF-36) and quality of life (EuroQol) and inverse correlations with fear of death (DAP-R), depression (GDS), and self-reported poor health status (YOHL). Discriminant validity was examined by correlating the SIWB subscale and total scores with the religiosity measure. All analyses were performed with the Statistical Package for the Social Sciences version 9.0 (SPSS, Chicago, IL, 1996).

Results

Study population

Two hundred seventy-seven patients were in the final cohort and participated in the study Table 1. The mean age of the study population was 74 years, with a range of 65 to 90 years. Most participants (66%) were 75 years or younger, and the population was evenly distributed between males and females. Participants were predominantly white (78%), reported a wide range of education levels, and had a mean physical function score (SF-36) of 62.92 and a mean health status score (EuroQol) of 0.77.

Internal consistency and factor analysis

Twelve items, 6 each from the self-efficacy and life scheme subscales, remained from the original 40 items after item reduction; initial reliability testing and factor analysis were performed. This 12-item measure of the SIWB produced a coefficient α of .87, indicating good internal consistency. The 6-item subscales also demonstrated good reliability: .83 for self-efficacy and .80 for life scheme.

Results of factor analysis with individual items and item loadings for the final SIWB scale are presented in Table 2. A confirmatory approach anticipated 2 factors, which was based on our conceptual framework. Factor analysis found that 2 factors, reasonably named self-efficacy and life scheme, accounted for a substantial proportion of the variance in responses. The eigenvalue for the self-efficacy factor was 2.88, accounting for 24.04% of the total variance. The eigenvalue for the life scheme factor was 2.35, accounting for 19.57% of the total variance. A Pearson chi-square goodness of fit test of the difference between the actual and reproduced correlation patterns was not significant (51.72; df = 43; P = .17), which suggested that a 2-factor solution was reasonable. Table 3 contains the descriptive statistics for the SIWB scale and its subscales.

Validity testing

To provide a more consistent and intuitive interpretation of scores and correlations, SIWB total and subscale scores were produced by reverse scoring and summing items. As a result, higher SIWB scores indicated a greater degree of spirituality or its components. Correlations between the summed SIWB and subscale scores and other health-related measures of well-being are presented in Table 4. The SIWB and its subscales had significant and expected correlations in direction and magnitude with other measures related to subjective well-being. Fear of death and depression (GDS) had the highest inverse correlations with the SIWB and its subscales. Subjective perceptions of general health and life satisfaction, as measured by self-reports of poor health status (YOHL), functional quality of life (EuroQol), and physical functioning (SF-36) had significant correlations with the SIWB.

Although the life scheme subscale did have a significant but small correlation with a previously validated measure of religiosity, the total SIWB scale and self-efficacy subscale did not have a significant correlation with religiosity.

Discussion

The purpose of this study was to evaluate a brief research instrument designed to measure the effect of spirituality on subjective well-being in a patient population. Instruments that are developed to measure health-related quality of life are evaluated according to several criteria, most notably their degree of validity and reliability.28 The SIWB demonstrated very good reliability with good internal consistency for the total and subscales as assessed by α coefficient in a geriatric patient population.

The construct spirituality has multiple dimensions and connotations in health-related settings,29 which challenge the validity testing of any spirituality instrument. We chose a qualitative approach, rather than the use of experts or preexisting measures in health services research, pastoral theology and chaplaincy, and the social sciences, to conceptualize how patients understand and define spirituality, in particular as if affects their well-being. This approach also provided stimulus material for SIWB item selection and scale construction.

 

 

In our conceptual framework, spirituality within a health context is a state that is comprised primarily of the domains of life scheme and self-efficacy. Patients who report high self-efficacy beliefs regarding their functioning and who view their lives as purposeful and meaningful should score higher on measures of subjective well-being than those who do not hold such beliefs or attitudes. The use of concurrent construct validity testing allowed us to test this assumption through the correlation of SIWB scores with other established proxies of subjective well-being. Face validity may suggest that the SIWB is a measure of affective or cognitive states (eg, depression) or a proxy for self-efficacy and alienation rather than spirituality. Concurrent construct validity testing provided a means to determine the independence of the SIWB from an accepted measure of depression, the GDS.

Although the pilot version of the SIWB consisted of 40 items with positive and negative statements regarding life scheme and personal self-efficacy, only negative items remained after validity and reliability testing. One explanation for the exclusion of positive statements from the SIWB may involve the predominance of a specific component of subjective well-being in older persons, a low level of negative affect. There are several additional components of subjective well-being (eg, positive affect, satisfaction with work or other domains, and life satisfaction),30 that may not be as salient or as operational in an older population.

However, the SIWB consistently had significant and expected correlations in direction and magnitude with other established measures related to subjective well-being. Spirituality had the highest inverse correlations with fear of death, depression, and perceived health status, which are supportive of affective and cognitive dimensions of subjective well-being in our instrument. A modest correlation with the GDS also suggested that the SIWB is a measure that is independent of depression.

Discriminant validity testing was used to differentiate the SIWB from religiosity. The total SIWB scale did not have a significant correlation with a measure of religiosity that has been used in a geriatric population,27 although the life scheme subscale did have a significant but small (r = .18) correlation. The distinction between conceptualizations of religiosity and spirituality is a major consideration in measurement development,31 and there are other measures of spirituality that have been used in clinical and research settings. Virtually all are contaminated by the inclusion of items that assess religiosity.9 For example, the Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale contains items that measure the comfort and strength derived from religious faith, in addition to a sense of meaning, purpose, and peace in life.32 The Systems of Belief Inventory, which was designed for use in quality of life and psychosocial research examining illness adjustment, measures religious and spiritual beliefs and practices and the social support that accompanies those beliefs and practices.33

The Spiritual Well-Being Scale has been used widely in health care settings and consists of 2 subscales: a religious well-being subscale and an existential well-being subscale.34 Religious well-being is conceptualized as the quality of one’s relationship with God, whereas existential well-being includes characteristics such as life purpose, life satisfaction, and positive and negative life experiences. Scores from the Spiritual Well-Being Scale have been inversely correlated with measures of psychological well-being.

However, much of this unpublished research has been compromised by ceiling effects or an inability to detect differences in those who score high on the scale, particularly in religious populations35 and by a lack of peer review.36

Our study has several limitations. Our conceptualization of spirituality is a new construct based on qualitative research, and the study purpose was to evaluate the psychometric properties of a new instrument to measure this construct. As a result, we did not analyze or report normative data about the SIWB. Spirituality may have conceptual overlap with existing constructs, such as self-efficacy and alienation, and we did not evaluate the independence of our scale against these constructs. The SIWB was embedded in the final cohort of a longitudinal study, and we were unable to perform test-retest reliability to determine the stability and the responsiveness or sensitivity of the instrument over time. Due to subject burden, the parent study limited the inclusion of additional measures and the quality-of-life instruments were selected a priori.

Our cross-sectional design also did not allow us to draw any definitive conclusions about the causal relations of the variables. The study population consisted primarily of predominantly white, older patients with some functional limitations, and the generalizability of our findings to other populations is uncertain. However, good theory development and item construction from prior qualitative studies, a high α coefficient, and factor analysis support the validity and reliability of our measure.

 

 

In summary, the SIWB appears to be a valid and reliable measure of patient subjective well-being, one that is uncontaminated by the inclusion of religiosity. This instrument may be used in observational studies of chronic illness, aging, and end-of-life care that use spirituality as an explanatory or predictor variable of well-being. Future validation studies with multiple, diverse populations and a longitudinal design are needed to refine, modify, or verify the SIWB as an additional, complementary instrument of wellbeing.

Acknowledgments

We thank Lynn Maxwell, Annette Becker, Danielle Sirchak, Donna Clausen, June Belt, Marjoire Frank, and Lisa Rogers for their dedicated service in this study.

References

1. Wuthnow R. After Heaven: Spirituality in America Since 1950. Berkeley: University of California Press; 1998.

2. Furnham A. Explaining health and illness: lay perceptions on current and future health, the causes of illness, and the nature of recovery. Soc Sci Med 1994;39:715-25.

3. Ellison CG, Levin JS. The religion-health connection: evidence, theory, and future directions. Health Educ Behav 1998;25:700-20.

4. Newberg A, D’Aquili EG, Rause V. Why God Won’t Go Away: Brain Science and the Biology of Belief. New York: Ballentine; 2001.

5. Sloan RP, Bagiella E, Powell T. Religion, spirituality, and medicine. Lancet 1999;353:664-7.

6. Wulff DM. Psychology of Religion: Classic and Contemporary. 2nd ed. New York: John Wiley & Sons; 1997.

7. Johnstone RL. Religion in Society: A Sociology of Religion. 5th ed. Upper Saddle River, NJ: Prentice-Hall; 1997.

8. Fitchett G. Selected resources for screening for spiritual risk. Chaplaincy Today 1999;15:13-26.

9. Hill PC, Hood RW, eds. Measures of Religiosity. Birmingham, AL: Religious Education Press; 1999.

10. Musick MA. Religion and subjective health among black and white elders. J Health Soc Behav 1996;37:221-37.

11. Idler EL, Kasl SV. Religion among disabled and nondisabled persons II: attendance at religious service as a predictor of the course of disability. J Gerontol B Psychol Soc Sci 1997;52B:S306-16.

12. Testa MA, Simonson DC. Assessment of quality-of-life outcomes. N Engl J Med 1996;334:835-40.

13. Ellison CG. Religious involvement and subjective well-being. J Health Soc Behav 1991;32:80-99.

14. Daaleman TP, Cobb AK, Frey BB. Spirituality and well-being: an exploratory approach to the patient perspective. Soc Sci Med 2001;53:119-27.

15. Williams DR, Wilson CM. Race, ethnicity, and aging. In: Binstock RH, George LK, eds. Handbook of Aging and the Social Sciences. 5th ed. San Diego: Academic Press; 2001;160-78.

16. Gallup G, Lindsay DM. Surveying the Religious Landscape. Harrisburg, PA: Morehouse Publishing; 1999.

17. Daaleman TP, VandeCreek L. Placing religion and spirituality in end-of-life care. JAMA 2000;284:2514-7.

18. Department of Health and Human Services. Healthy people 2010: understanding and improving health. Available at: http://web.health.gov/healthypeople/. Accessed March 29, 2001.

19. Antonovsky A. Unraveling the Mystery of Health: How People Manage Stress and Stay Well. San Francisco: Jossey-Bass Press; 1987.

20. Bandura A. Self-Efficacy, the Exercise of Control. New York: WH Freeman; 1997.

21. EuroQol Group. EuroQol: a new facility for the measurement of health related quality of life. Health Policy 1990;16:199-208.

22. Erickson P, Wilson R, Shannon I. Years of Healthy Life. Healthy People 2000, Statistical Notes No. 7. Hyattsville, MD: Centers for Disease Control and Prevention/National Center for Health Statistics; April 1995. DHHS publication PHS 95-1237 4-1484. Available at: http://www.cdc.gov/nchs/data/statnt/statnt07.pdf.

23. Stewart AL, Hays RD, Ware JE. The MOS short-form general health survey. Med Care 1988;26:724-5.

24. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale. J Psychiatr Res 1982;17:37-49.

25. Wong PTP, Reker GT, Gesser G. Death attitude profile-revised: a multidimensional measure of attitudes toward death. In: Neimeyer RA, ed. Death Anxiety Handbook. Washington, DC: Taylor & Francis; 1994;121-48.

26. Davis JA, Smith TW. General Social Surveys, 1972-1985. Chicago: National Opinion Research Center; 1985.

27. Chatters LM, Levin JS, Taylor RJ. Antecedents and dimensions of religious involvement among older black adults. J Gerontol B Psychol Soc Sci 1992;47:S269-78.

28. McSweeny AJ, Creer TL. Health-related quality-of-life assessment in medical care. Dis Mon 1995;16:1-71.

29. Koenig HG, McCullough ME, Larson DB. Handbook of Religion and Health. New York: Oxford University Press; 2001.

30. Diener E. Subjective well-being, the science of happiness and a proposal for a national index. Am Psychol 2000;55:34-43.

31. Fetzer Institute/National Institute on Aging Working Group. Multidimensional Measurement of Religiousness/Spirituality for Use in Health Research. Kalamazoo, MI: John A. Fetzer Institute; 1999.

32. Brady MJ, Peterman AH, Fitchett G, Mo M, Cella D. A case for including spirituality in quality of life measurement in oncology. Psychooncology 1999;8:417-28.

33. Holland JC, Kash KM, Passik S, et al. A brief spiritual beliefs inventory for use in quality-of-life research in life-threatening illness. Psychooncology 1998;7:460-9.

34. Ellison CW. Spiritual well-being: conceptualization and measurement. J Psychol Theol 1983;11:330-40.

35. Ledbetter MF, Smith LA, Vosler-Hunter WL, Fischer JD. An evaluation of the research and clinical usefulness of the spiritual wellbeing scale. J Psychol Theol 1991;19:49-55.

36. Ellison CW, Smith J. Toward an integrative measure of health and well-being. J Psychol Theol 1991;19:35-48.

Address reprint requests to Timothy P. Daaleman, DO, Department of Family Medicine, University of North Carolina at Chapel Hill, CB 7595, Manning Drive, Chapel Hill, NC 27599-7595. E-mail: [email protected].

To submit a letter to the editor on this topic, click here: [email protected].

Article PDF
Author and Disclosure Information

Timothy P. Daaleman, DO
Bruce B. Frey, PhD
Dennis Wallace, PhD
Stephanie A. Studenski, MD, MPH
Kansas City and Lawrence, Kansas and Chapel Hill, North Carolina
From the Department of Family Medicine (T.P.D.) and the Center on Aging (T.P.D.), (S.A.S.), University of Kansas Medical Center, Kansas City, KS; Psychology and Research in Education, School of Education, University of Kansas, Lawrence, KS (B.B.F.); and Rho Inc., Chapel Hill, NC (D.W.). This study was supported by the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program (T.P.D.), the John A. Hartford Foundation (T.P.D.), Merck Research Laboratories (D.W., S.A.S.), and the Kansas Claude D. Pepper Older Americans Independence Center (AG 14635; D.W., S.A.S.). This work was presented at the Annual Meeting of the North American Primary Care Research Group; November 5, 2000; Amelia Island, FL; and the Annual Meeting of the American Geriatrics Society; May 11, 2001; Chicago, IL.

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
1-1
Legacy Keywords
,Quality of lifesubjective well-beingmeasurementspiritualityolder persons. (J Fam Pract 2002; 51:00-00)
Sections
Author and Disclosure Information

Timothy P. Daaleman, DO
Bruce B. Frey, PhD
Dennis Wallace, PhD
Stephanie A. Studenski, MD, MPH
Kansas City and Lawrence, Kansas and Chapel Hill, North Carolina
From the Department of Family Medicine (T.P.D.) and the Center on Aging (T.P.D.), (S.A.S.), University of Kansas Medical Center, Kansas City, KS; Psychology and Research in Education, School of Education, University of Kansas, Lawrence, KS (B.B.F.); and Rho Inc., Chapel Hill, NC (D.W.). This study was supported by the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program (T.P.D.), the John A. Hartford Foundation (T.P.D.), Merck Research Laboratories (D.W., S.A.S.), and the Kansas Claude D. Pepper Older Americans Independence Center (AG 14635; D.W., S.A.S.). This work was presented at the Annual Meeting of the North American Primary Care Research Group; November 5, 2000; Amelia Island, FL; and the Annual Meeting of the American Geriatrics Society; May 11, 2001; Chicago, IL.

Author and Disclosure Information

Timothy P. Daaleman, DO
Bruce B. Frey, PhD
Dennis Wallace, PhD
Stephanie A. Studenski, MD, MPH
Kansas City and Lawrence, Kansas and Chapel Hill, North Carolina
From the Department of Family Medicine (T.P.D.) and the Center on Aging (T.P.D.), (S.A.S.), University of Kansas Medical Center, Kansas City, KS; Psychology and Research in Education, School of Education, University of Kansas, Lawrence, KS (B.B.F.); and Rho Inc., Chapel Hill, NC (D.W.). This study was supported by the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program (T.P.D.), the John A. Hartford Foundation (T.P.D.), Merck Research Laboratories (D.W., S.A.S.), and the Kansas Claude D. Pepper Older Americans Independence Center (AG 14635; D.W., S.A.S.). This work was presented at the Annual Meeting of the North American Primary Care Research Group; November 5, 2000; Amelia Island, FL; and the Annual Meeting of the American Geriatrics Society; May 11, 2001; Chicago, IL.

Article PDF
Article PDF

ABSTRACT

OBJECTIVE: To evaluate the reliability and validity of the Spirituality Index of Well-Being (SIWB) Scale in a patient population.

STUDY DESIGN: Cross-sectional survey.

POPULATION: Community-dwelling elderly individuals (n = 277) recruited from primary care clinic sites in the Kansas City metropolitan area.

OUTCOMES MEASURED: Internal consistency, concurrent construct validity, discriminant validity, and factor analysis with Varimax rotation.

RESULTS: The initial version of the SIWB contained 40 items: 20 from a self-efficacy domain and 20 from a life scheme domain. Factor analysis yielded 6 items loaded most strongly on factor 1 (intrapersonal self-efficacy) and 6 other items loaded strongly on factor 2 (life scheme). The self-efficacy subscale had an of .83 and the life scheme subscale had an of .80; the total 12-item SIWB scale had an of .87. The SIWB had significant and expected correlations with other quality of life measures related to subjective well-being: EuroQol (r = .18), Geriatric Depression Scale (r = -.35), the Physical Functioning Index from the Short Form 36 (r = .28), and the Years of Healthy Life Scale (r = -.35). Religiosity did not correlate significantly with the SIWB (r = .12; P = .056).

CONCLUSIONS: The 12-item SIWB scale is a valid and reliable measure of subjective well-being in an older patient population.

Spirituality and religion are embedded within contemporary American culture1 and have become an increasingly important part of the patient experience of health and illness.2 There is growing interest in examining the association of spirituality, religion, and health-related outcomes in the United States, particularly in the areas of health behavior and promotion3 and psychoneuroimmunology.4 Despite this interest, the absence of operational definitions of spirituality and religion, the contamination of spirituality items with measures of religion, and the lack of valid and reliable instruments that gauge these constructs continue to be major limitations to work in this area.5

Conceptually, religion or religiosity is often viewed in terms of the various organized, individual, and attitudinal manifestations of different faith traditions, and spirituality connotes and expresses a sense of meaning, purpose, or power from within or from a transcendent source.6 There is no shortage of instruments that measure dimensions of either construct, and researchers from the fields of sociology,7 psychology,6 and pastoral theology and chaplaincy8 have developed a variety of scales of religion and spirituality.9 It remains unclear, however, whether these constructs can be extended to health care settings or whether these instruments are applicable and useful as measures of individual or population health. For example, frequency of religious service attendance is often a single-item measure used as an independent variable in studies of health outcomes, such as health status.10 Although service attendance is associated with self-reported health in community-dwelling elderly individuals, the effect of this activity on perceived health disappears when functional status is controlled.10 Therefore, can religious service attendance be considered an independent variable, or is it simply a proxy of functional status within a geriatric population?11

This example highlights the importance of context in the use of any measure of religion or spirituality. It also points to the health-related quality of life field as a useful orientation for conceptualizing spirituality and religion in health care settings. Health-related quality of life, an individual’s or group’s perception of health over time, is predicated on the assumption that a patient’s experiences, beliefs, expectations, and perceptions directly influence the physical, psychological, and social domains of health.12 Spirituality and religion have been proposed as mediators of 1 characteristic of psychological health, subjective well-being, in 4 ways: by ensuring social support and integration within a community, by establishing personal relationships with a divine other, by promoting a salubrious personal lifestyle that is congruent with a personal faith tradition, and by providing systems of meaning and existential coherence.13

To identify and describe elements of spirituality that are linked to subjective well-being, our prior qualitative work explored the patient perspective. We found that patients consider spirituality in predominantly cognitive terms and incorporate the domains of life scheme and positive intentionality, or self-efficacy, as primary components Figure 1.14 In addition to suggesting a dynamic conceptual framework, this research supported the assumption that patients associate spirituality with well-being largely through the provision of systems of meaning and coherence.

The current study builds on this work and describes the development and evaluation of a brief research instrument, the Spirituality Index of Well-Being (SIWB), which is designed to measure the effect of spirituality on subjective well-being. Several assumptions guided our study design and analysis. First, we recognized that there are no global yet parsimonious instruments that capture the complexity and depth of spirituality in any context, health care or otherwise. Second, based on our qualitative work, we viewed spirituality as subsumed within the psychological rather than within the social or physical domain. Third, we considered the SIWB as a health-related quality of life measure, one to be used in studies of individual or population health, rather than as an assessment tool.

 

 

From the cultural and social perspectives, spirituality and religion are especially salient in the lives of minority elderly,15,16 particularly within the settings of serious illness and end-of-life care.17 From a population health perspective, increased life expectancy in the United States highlights the importance of health-related quality of life assessment in the areas of chronic illness, aging, and end-of-life care, and Healthy People 2010 has identified quality of life improvement as a specific public health objective.18 By bridging both perspectives, the SIWB has the potential to add a unique and patient-centered dimension to health-related quality of life research.

Methods

Scale and item development

The SIWB was designed as a research tool to measure the effect of patient-reported spirituality on subjective well-being. Our understanding of spirituality and the stimulus material for the index have been described elsewhere.14 In brief, a congruent, meaningful life scheme and a high degree of positive intentionality or self-efficacy promote personal agency, an intermediary between spirituality and subjective well-being Figure.

Life scheme is similar to the construct of sense of coherence, which was described by Antonovsky as a positive, pervasive way of viewing the world, and one’s life in it, lending elements of comprehensibility, manageability, and meaningfulness.19 Positive intentionality shares characteristics with self-efficacy, which is an individual’s belief in the capacity to organize and perform activities that are required for a prescribed goal.20 Self-efficacy beliefs are domain and task specific, and participants in our focus group study depicted these beliefs within the context of overcoming threatened or actual changes to their functioning.

Forty items, 20 for the life scheme domain and 20 for the self-efficacy domain, were developed by investigators who conducted the qualitative study (T.P.D., B.B.F.). The scale was prefaced by the question, “Which statement best describes your feelings and choices,” and each item was a statement accompanied by a 5-point Likert scale response from “strongly agree” to “strongly disagree,” with the midpoint representing “neither agree nor disagree.” Item content consisted of positive and negative statements regarding life scheme (eg, “I haven’t yet found my life’s purpose”) and personal self-efficacy (eg, “Despite any problem that I may face, I can get through the day”).

Study population

Participants were 65 years or older and enrolled in a cohort study to assess the ability of performance measures to predict future health service use, health status, and functional status. Recruitment for the parent study occurred between April and November 1996 from primary care sites within the Veteran’s Affairs network and a Medicare health management organization serving the Kansas City metropolitan area. The study population represented the cohort 36 months after enrollment.

Measures

Demographic information. Participants had the following demographic information collected: age, sex, race, and education level.

Health and functional status. Subjective health status was measured by the EuroQol, a recognized quality-of-life measure,21 in addition to a single-item measure of global health from the Years of Healthy Life (YOHL) Scale.22 The Physical Functioning Index of the Medical Outcomes Study Short Form 36 was used to assess functional status.23

Mental health status. We measured mental health status with the Geriatric Depression Scale (GDS), a 15-item instrument with a dichotomous (yes/no) response format.24 Items from the fear of death domain of the Death Attitude Profile Scale-Revised (DAP-R) were selected as an additional proxy of psychological well-being.25

Religiosity. Five items derived from questions developed by the National Opinion Research Center26 were preferentially selected according to a previously tested and validated model of religiosity.27 Frequency of religious or spiritual service attendance was used to assess organizational religiosity, and frequency of private prayer or spiritual practice was used to measure nonorganizational religiosity. Three items were used to measure subjective or intrinsic religiosity: self-reported strength of religious or spiritual orientation, closeness to God (or a Higher Force), and frequency of affective spiritual experiences..

Data analysis

Item reduction and reliability testing. The initial 40-item pool was reduced to 20 life scheme items and 14 self-efficacy items based on subject response and feedback during survey administration. Items that subjects could not understand or answer by self-report were removed.

First, internal reliability analyses were conducted for each subscale (life scheme, self-efficacy) and for the SIWB scale with a goal of producing high internal consistency as measured by the Cronbach’s α (eg, > .70). Items that contributed to lower internal reliability were discarded, which removed 1 self-efficacy item and 6 life scheme items from the scale.

To further refine the SIWB and its subscales, the remaining items were subjected to principal components analysis by using Varimax rotation. After rotation, the 2 largest factors were readily interpretable, with items loading as expected: self-efficacy items loading on the first factor and life scheme items loading on the second factor. From each factor, the top 6 items ranked by loading magnitude were selected for inclusion into the final scale.

 

 

Internal reliabilities for the subscales (6 items each) and the SIWB scale (12 items total) were calculated. A maximum likelihood factor analysis with Varimax rotation also was conducted to verify that a 2-factor solution remained for the reduced 12-item scale.

Validity testing. Well-being is conceptually subsumed within the psychological domain of quality of life measures and is comprised of the dimensions of positive affect (affective) and subjective perceptions of general health and life satisfaction (cognitive).12 As a result, we determined concurrent construct validity by correlating the 2 6-item subscale scores and the total SIWB score with summed scores from the fear of death items from the DAP-R, the GDS, YOHL, the Physical Functioning Index from the SF-36, and the EuroQol. We anticipated positive correlations of the SIWB with physical functioning (SF-36) and quality of life (EuroQol) and inverse correlations with fear of death (DAP-R), depression (GDS), and self-reported poor health status (YOHL). Discriminant validity was examined by correlating the SIWB subscale and total scores with the religiosity measure. All analyses were performed with the Statistical Package for the Social Sciences version 9.0 (SPSS, Chicago, IL, 1996).

Results

Study population

Two hundred seventy-seven patients were in the final cohort and participated in the study Table 1. The mean age of the study population was 74 years, with a range of 65 to 90 years. Most participants (66%) were 75 years or younger, and the population was evenly distributed between males and females. Participants were predominantly white (78%), reported a wide range of education levels, and had a mean physical function score (SF-36) of 62.92 and a mean health status score (EuroQol) of 0.77.

Internal consistency and factor analysis

Twelve items, 6 each from the self-efficacy and life scheme subscales, remained from the original 40 items after item reduction; initial reliability testing and factor analysis were performed. This 12-item measure of the SIWB produced a coefficient α of .87, indicating good internal consistency. The 6-item subscales also demonstrated good reliability: .83 for self-efficacy and .80 for life scheme.

Results of factor analysis with individual items and item loadings for the final SIWB scale are presented in Table 2. A confirmatory approach anticipated 2 factors, which was based on our conceptual framework. Factor analysis found that 2 factors, reasonably named self-efficacy and life scheme, accounted for a substantial proportion of the variance in responses. The eigenvalue for the self-efficacy factor was 2.88, accounting for 24.04% of the total variance. The eigenvalue for the life scheme factor was 2.35, accounting for 19.57% of the total variance. A Pearson chi-square goodness of fit test of the difference between the actual and reproduced correlation patterns was not significant (51.72; df = 43; P = .17), which suggested that a 2-factor solution was reasonable. Table 3 contains the descriptive statistics for the SIWB scale and its subscales.

Validity testing

To provide a more consistent and intuitive interpretation of scores and correlations, SIWB total and subscale scores were produced by reverse scoring and summing items. As a result, higher SIWB scores indicated a greater degree of spirituality or its components. Correlations between the summed SIWB and subscale scores and other health-related measures of well-being are presented in Table 4. The SIWB and its subscales had significant and expected correlations in direction and magnitude with other measures related to subjective well-being. Fear of death and depression (GDS) had the highest inverse correlations with the SIWB and its subscales. Subjective perceptions of general health and life satisfaction, as measured by self-reports of poor health status (YOHL), functional quality of life (EuroQol), and physical functioning (SF-36) had significant correlations with the SIWB.

Although the life scheme subscale did have a significant but small correlation with a previously validated measure of religiosity, the total SIWB scale and self-efficacy subscale did not have a significant correlation with religiosity.

Discussion

The purpose of this study was to evaluate a brief research instrument designed to measure the effect of spirituality on subjective well-being in a patient population. Instruments that are developed to measure health-related quality of life are evaluated according to several criteria, most notably their degree of validity and reliability.28 The SIWB demonstrated very good reliability with good internal consistency for the total and subscales as assessed by α coefficient in a geriatric patient population.

The construct spirituality has multiple dimensions and connotations in health-related settings,29 which challenge the validity testing of any spirituality instrument. We chose a qualitative approach, rather than the use of experts or preexisting measures in health services research, pastoral theology and chaplaincy, and the social sciences, to conceptualize how patients understand and define spirituality, in particular as if affects their well-being. This approach also provided stimulus material for SIWB item selection and scale construction.

 

 

In our conceptual framework, spirituality within a health context is a state that is comprised primarily of the domains of life scheme and self-efficacy. Patients who report high self-efficacy beliefs regarding their functioning and who view their lives as purposeful and meaningful should score higher on measures of subjective well-being than those who do not hold such beliefs or attitudes. The use of concurrent construct validity testing allowed us to test this assumption through the correlation of SIWB scores with other established proxies of subjective well-being. Face validity may suggest that the SIWB is a measure of affective or cognitive states (eg, depression) or a proxy for self-efficacy and alienation rather than spirituality. Concurrent construct validity testing provided a means to determine the independence of the SIWB from an accepted measure of depression, the GDS.

Although the pilot version of the SIWB consisted of 40 items with positive and negative statements regarding life scheme and personal self-efficacy, only negative items remained after validity and reliability testing. One explanation for the exclusion of positive statements from the SIWB may involve the predominance of a specific component of subjective well-being in older persons, a low level of negative affect. There are several additional components of subjective well-being (eg, positive affect, satisfaction with work or other domains, and life satisfaction),30 that may not be as salient or as operational in an older population.

However, the SIWB consistently had significant and expected correlations in direction and magnitude with other established measures related to subjective well-being. Spirituality had the highest inverse correlations with fear of death, depression, and perceived health status, which are supportive of affective and cognitive dimensions of subjective well-being in our instrument. A modest correlation with the GDS also suggested that the SIWB is a measure that is independent of depression.

Discriminant validity testing was used to differentiate the SIWB from religiosity. The total SIWB scale did not have a significant correlation with a measure of religiosity that has been used in a geriatric population,27 although the life scheme subscale did have a significant but small (r = .18) correlation. The distinction between conceptualizations of religiosity and spirituality is a major consideration in measurement development,31 and there are other measures of spirituality that have been used in clinical and research settings. Virtually all are contaminated by the inclusion of items that assess religiosity.9 For example, the Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale contains items that measure the comfort and strength derived from religious faith, in addition to a sense of meaning, purpose, and peace in life.32 The Systems of Belief Inventory, which was designed for use in quality of life and psychosocial research examining illness adjustment, measures religious and spiritual beliefs and practices and the social support that accompanies those beliefs and practices.33

The Spiritual Well-Being Scale has been used widely in health care settings and consists of 2 subscales: a religious well-being subscale and an existential well-being subscale.34 Religious well-being is conceptualized as the quality of one’s relationship with God, whereas existential well-being includes characteristics such as life purpose, life satisfaction, and positive and negative life experiences. Scores from the Spiritual Well-Being Scale have been inversely correlated with measures of psychological well-being.

However, much of this unpublished research has been compromised by ceiling effects or an inability to detect differences in those who score high on the scale, particularly in religious populations35 and by a lack of peer review.36

Our study has several limitations. Our conceptualization of spirituality is a new construct based on qualitative research, and the study purpose was to evaluate the psychometric properties of a new instrument to measure this construct. As a result, we did not analyze or report normative data about the SIWB. Spirituality may have conceptual overlap with existing constructs, such as self-efficacy and alienation, and we did not evaluate the independence of our scale against these constructs. The SIWB was embedded in the final cohort of a longitudinal study, and we were unable to perform test-retest reliability to determine the stability and the responsiveness or sensitivity of the instrument over time. Due to subject burden, the parent study limited the inclusion of additional measures and the quality-of-life instruments were selected a priori.

Our cross-sectional design also did not allow us to draw any definitive conclusions about the causal relations of the variables. The study population consisted primarily of predominantly white, older patients with some functional limitations, and the generalizability of our findings to other populations is uncertain. However, good theory development and item construction from prior qualitative studies, a high α coefficient, and factor analysis support the validity and reliability of our measure.

 

 

In summary, the SIWB appears to be a valid and reliable measure of patient subjective well-being, one that is uncontaminated by the inclusion of religiosity. This instrument may be used in observational studies of chronic illness, aging, and end-of-life care that use spirituality as an explanatory or predictor variable of well-being. Future validation studies with multiple, diverse populations and a longitudinal design are needed to refine, modify, or verify the SIWB as an additional, complementary instrument of wellbeing.

Acknowledgments

We thank Lynn Maxwell, Annette Becker, Danielle Sirchak, Donna Clausen, June Belt, Marjoire Frank, and Lisa Rogers for their dedicated service in this study.

ABSTRACT

OBJECTIVE: To evaluate the reliability and validity of the Spirituality Index of Well-Being (SIWB) Scale in a patient population.

STUDY DESIGN: Cross-sectional survey.

POPULATION: Community-dwelling elderly individuals (n = 277) recruited from primary care clinic sites in the Kansas City metropolitan area.

OUTCOMES MEASURED: Internal consistency, concurrent construct validity, discriminant validity, and factor analysis with Varimax rotation.

RESULTS: The initial version of the SIWB contained 40 items: 20 from a self-efficacy domain and 20 from a life scheme domain. Factor analysis yielded 6 items loaded most strongly on factor 1 (intrapersonal self-efficacy) and 6 other items loaded strongly on factor 2 (life scheme). The self-efficacy subscale had an of .83 and the life scheme subscale had an of .80; the total 12-item SIWB scale had an of .87. The SIWB had significant and expected correlations with other quality of life measures related to subjective well-being: EuroQol (r = .18), Geriatric Depression Scale (r = -.35), the Physical Functioning Index from the Short Form 36 (r = .28), and the Years of Healthy Life Scale (r = -.35). Religiosity did not correlate significantly with the SIWB (r = .12; P = .056).

CONCLUSIONS: The 12-item SIWB scale is a valid and reliable measure of subjective well-being in an older patient population.

Spirituality and religion are embedded within contemporary American culture1 and have become an increasingly important part of the patient experience of health and illness.2 There is growing interest in examining the association of spirituality, religion, and health-related outcomes in the United States, particularly in the areas of health behavior and promotion3 and psychoneuroimmunology.4 Despite this interest, the absence of operational definitions of spirituality and religion, the contamination of spirituality items with measures of religion, and the lack of valid and reliable instruments that gauge these constructs continue to be major limitations to work in this area.5

Conceptually, religion or religiosity is often viewed in terms of the various organized, individual, and attitudinal manifestations of different faith traditions, and spirituality connotes and expresses a sense of meaning, purpose, or power from within or from a transcendent source.6 There is no shortage of instruments that measure dimensions of either construct, and researchers from the fields of sociology,7 psychology,6 and pastoral theology and chaplaincy8 have developed a variety of scales of religion and spirituality.9 It remains unclear, however, whether these constructs can be extended to health care settings or whether these instruments are applicable and useful as measures of individual or population health. For example, frequency of religious service attendance is often a single-item measure used as an independent variable in studies of health outcomes, such as health status.10 Although service attendance is associated with self-reported health in community-dwelling elderly individuals, the effect of this activity on perceived health disappears when functional status is controlled.10 Therefore, can religious service attendance be considered an independent variable, or is it simply a proxy of functional status within a geriatric population?11

This example highlights the importance of context in the use of any measure of religion or spirituality. It also points to the health-related quality of life field as a useful orientation for conceptualizing spirituality and religion in health care settings. Health-related quality of life, an individual’s or group’s perception of health over time, is predicated on the assumption that a patient’s experiences, beliefs, expectations, and perceptions directly influence the physical, psychological, and social domains of health.12 Spirituality and religion have been proposed as mediators of 1 characteristic of psychological health, subjective well-being, in 4 ways: by ensuring social support and integration within a community, by establishing personal relationships with a divine other, by promoting a salubrious personal lifestyle that is congruent with a personal faith tradition, and by providing systems of meaning and existential coherence.13

To identify and describe elements of spirituality that are linked to subjective well-being, our prior qualitative work explored the patient perspective. We found that patients consider spirituality in predominantly cognitive terms and incorporate the domains of life scheme and positive intentionality, or self-efficacy, as primary components Figure 1.14 In addition to suggesting a dynamic conceptual framework, this research supported the assumption that patients associate spirituality with well-being largely through the provision of systems of meaning and coherence.

The current study builds on this work and describes the development and evaluation of a brief research instrument, the Spirituality Index of Well-Being (SIWB), which is designed to measure the effect of spirituality on subjective well-being. Several assumptions guided our study design and analysis. First, we recognized that there are no global yet parsimonious instruments that capture the complexity and depth of spirituality in any context, health care or otherwise. Second, based on our qualitative work, we viewed spirituality as subsumed within the psychological rather than within the social or physical domain. Third, we considered the SIWB as a health-related quality of life measure, one to be used in studies of individual or population health, rather than as an assessment tool.

 

 

From the cultural and social perspectives, spirituality and religion are especially salient in the lives of minority elderly,15,16 particularly within the settings of serious illness and end-of-life care.17 From a population health perspective, increased life expectancy in the United States highlights the importance of health-related quality of life assessment in the areas of chronic illness, aging, and end-of-life care, and Healthy People 2010 has identified quality of life improvement as a specific public health objective.18 By bridging both perspectives, the SIWB has the potential to add a unique and patient-centered dimension to health-related quality of life research.

Methods

Scale and item development

The SIWB was designed as a research tool to measure the effect of patient-reported spirituality on subjective well-being. Our understanding of spirituality and the stimulus material for the index have been described elsewhere.14 In brief, a congruent, meaningful life scheme and a high degree of positive intentionality or self-efficacy promote personal agency, an intermediary between spirituality and subjective well-being Figure.

Life scheme is similar to the construct of sense of coherence, which was described by Antonovsky as a positive, pervasive way of viewing the world, and one’s life in it, lending elements of comprehensibility, manageability, and meaningfulness.19 Positive intentionality shares characteristics with self-efficacy, which is an individual’s belief in the capacity to organize and perform activities that are required for a prescribed goal.20 Self-efficacy beliefs are domain and task specific, and participants in our focus group study depicted these beliefs within the context of overcoming threatened or actual changes to their functioning.

Forty items, 20 for the life scheme domain and 20 for the self-efficacy domain, were developed by investigators who conducted the qualitative study (T.P.D., B.B.F.). The scale was prefaced by the question, “Which statement best describes your feelings and choices,” and each item was a statement accompanied by a 5-point Likert scale response from “strongly agree” to “strongly disagree,” with the midpoint representing “neither agree nor disagree.” Item content consisted of positive and negative statements regarding life scheme (eg, “I haven’t yet found my life’s purpose”) and personal self-efficacy (eg, “Despite any problem that I may face, I can get through the day”).

Study population

Participants were 65 years or older and enrolled in a cohort study to assess the ability of performance measures to predict future health service use, health status, and functional status. Recruitment for the parent study occurred between April and November 1996 from primary care sites within the Veteran’s Affairs network and a Medicare health management organization serving the Kansas City metropolitan area. The study population represented the cohort 36 months after enrollment.

Measures

Demographic information. Participants had the following demographic information collected: age, sex, race, and education level.

Health and functional status. Subjective health status was measured by the EuroQol, a recognized quality-of-life measure,21 in addition to a single-item measure of global health from the Years of Healthy Life (YOHL) Scale.22 The Physical Functioning Index of the Medical Outcomes Study Short Form 36 was used to assess functional status.23

Mental health status. We measured mental health status with the Geriatric Depression Scale (GDS), a 15-item instrument with a dichotomous (yes/no) response format.24 Items from the fear of death domain of the Death Attitude Profile Scale-Revised (DAP-R) were selected as an additional proxy of psychological well-being.25

Religiosity. Five items derived from questions developed by the National Opinion Research Center26 were preferentially selected according to a previously tested and validated model of religiosity.27 Frequency of religious or spiritual service attendance was used to assess organizational religiosity, and frequency of private prayer or spiritual practice was used to measure nonorganizational religiosity. Three items were used to measure subjective or intrinsic religiosity: self-reported strength of religious or spiritual orientation, closeness to God (or a Higher Force), and frequency of affective spiritual experiences..

Data analysis

Item reduction and reliability testing. The initial 40-item pool was reduced to 20 life scheme items and 14 self-efficacy items based on subject response and feedback during survey administration. Items that subjects could not understand or answer by self-report were removed.

First, internal reliability analyses were conducted for each subscale (life scheme, self-efficacy) and for the SIWB scale with a goal of producing high internal consistency as measured by the Cronbach’s α (eg, > .70). Items that contributed to lower internal reliability were discarded, which removed 1 self-efficacy item and 6 life scheme items from the scale.

To further refine the SIWB and its subscales, the remaining items were subjected to principal components analysis by using Varimax rotation. After rotation, the 2 largest factors were readily interpretable, with items loading as expected: self-efficacy items loading on the first factor and life scheme items loading on the second factor. From each factor, the top 6 items ranked by loading magnitude were selected for inclusion into the final scale.

 

 

Internal reliabilities for the subscales (6 items each) and the SIWB scale (12 items total) were calculated. A maximum likelihood factor analysis with Varimax rotation also was conducted to verify that a 2-factor solution remained for the reduced 12-item scale.

Validity testing. Well-being is conceptually subsumed within the psychological domain of quality of life measures and is comprised of the dimensions of positive affect (affective) and subjective perceptions of general health and life satisfaction (cognitive).12 As a result, we determined concurrent construct validity by correlating the 2 6-item subscale scores and the total SIWB score with summed scores from the fear of death items from the DAP-R, the GDS, YOHL, the Physical Functioning Index from the SF-36, and the EuroQol. We anticipated positive correlations of the SIWB with physical functioning (SF-36) and quality of life (EuroQol) and inverse correlations with fear of death (DAP-R), depression (GDS), and self-reported poor health status (YOHL). Discriminant validity was examined by correlating the SIWB subscale and total scores with the religiosity measure. All analyses were performed with the Statistical Package for the Social Sciences version 9.0 (SPSS, Chicago, IL, 1996).

Results

Study population

Two hundred seventy-seven patients were in the final cohort and participated in the study Table 1. The mean age of the study population was 74 years, with a range of 65 to 90 years. Most participants (66%) were 75 years or younger, and the population was evenly distributed between males and females. Participants were predominantly white (78%), reported a wide range of education levels, and had a mean physical function score (SF-36) of 62.92 and a mean health status score (EuroQol) of 0.77.

Internal consistency and factor analysis

Twelve items, 6 each from the self-efficacy and life scheme subscales, remained from the original 40 items after item reduction; initial reliability testing and factor analysis were performed. This 12-item measure of the SIWB produced a coefficient α of .87, indicating good internal consistency. The 6-item subscales also demonstrated good reliability: .83 for self-efficacy and .80 for life scheme.

Results of factor analysis with individual items and item loadings for the final SIWB scale are presented in Table 2. A confirmatory approach anticipated 2 factors, which was based on our conceptual framework. Factor analysis found that 2 factors, reasonably named self-efficacy and life scheme, accounted for a substantial proportion of the variance in responses. The eigenvalue for the self-efficacy factor was 2.88, accounting for 24.04% of the total variance. The eigenvalue for the life scheme factor was 2.35, accounting for 19.57% of the total variance. A Pearson chi-square goodness of fit test of the difference between the actual and reproduced correlation patterns was not significant (51.72; df = 43; P = .17), which suggested that a 2-factor solution was reasonable. Table 3 contains the descriptive statistics for the SIWB scale and its subscales.

Validity testing

To provide a more consistent and intuitive interpretation of scores and correlations, SIWB total and subscale scores were produced by reverse scoring and summing items. As a result, higher SIWB scores indicated a greater degree of spirituality or its components. Correlations between the summed SIWB and subscale scores and other health-related measures of well-being are presented in Table 4. The SIWB and its subscales had significant and expected correlations in direction and magnitude with other measures related to subjective well-being. Fear of death and depression (GDS) had the highest inverse correlations with the SIWB and its subscales. Subjective perceptions of general health and life satisfaction, as measured by self-reports of poor health status (YOHL), functional quality of life (EuroQol), and physical functioning (SF-36) had significant correlations with the SIWB.

Although the life scheme subscale did have a significant but small correlation with a previously validated measure of religiosity, the total SIWB scale and self-efficacy subscale did not have a significant correlation with religiosity.

Discussion

The purpose of this study was to evaluate a brief research instrument designed to measure the effect of spirituality on subjective well-being in a patient population. Instruments that are developed to measure health-related quality of life are evaluated according to several criteria, most notably their degree of validity and reliability.28 The SIWB demonstrated very good reliability with good internal consistency for the total and subscales as assessed by α coefficient in a geriatric patient population.

The construct spirituality has multiple dimensions and connotations in health-related settings,29 which challenge the validity testing of any spirituality instrument. We chose a qualitative approach, rather than the use of experts or preexisting measures in health services research, pastoral theology and chaplaincy, and the social sciences, to conceptualize how patients understand and define spirituality, in particular as if affects their well-being. This approach also provided stimulus material for SIWB item selection and scale construction.

 

 

In our conceptual framework, spirituality within a health context is a state that is comprised primarily of the domains of life scheme and self-efficacy. Patients who report high self-efficacy beliefs regarding their functioning and who view their lives as purposeful and meaningful should score higher on measures of subjective well-being than those who do not hold such beliefs or attitudes. The use of concurrent construct validity testing allowed us to test this assumption through the correlation of SIWB scores with other established proxies of subjective well-being. Face validity may suggest that the SIWB is a measure of affective or cognitive states (eg, depression) or a proxy for self-efficacy and alienation rather than spirituality. Concurrent construct validity testing provided a means to determine the independence of the SIWB from an accepted measure of depression, the GDS.

Although the pilot version of the SIWB consisted of 40 items with positive and negative statements regarding life scheme and personal self-efficacy, only negative items remained after validity and reliability testing. One explanation for the exclusion of positive statements from the SIWB may involve the predominance of a specific component of subjective well-being in older persons, a low level of negative affect. There are several additional components of subjective well-being (eg, positive affect, satisfaction with work or other domains, and life satisfaction),30 that may not be as salient or as operational in an older population.

However, the SIWB consistently had significant and expected correlations in direction and magnitude with other established measures related to subjective well-being. Spirituality had the highest inverse correlations with fear of death, depression, and perceived health status, which are supportive of affective and cognitive dimensions of subjective well-being in our instrument. A modest correlation with the GDS also suggested that the SIWB is a measure that is independent of depression.

Discriminant validity testing was used to differentiate the SIWB from religiosity. The total SIWB scale did not have a significant correlation with a measure of religiosity that has been used in a geriatric population,27 although the life scheme subscale did have a significant but small (r = .18) correlation. The distinction between conceptualizations of religiosity and spirituality is a major consideration in measurement development,31 and there are other measures of spirituality that have been used in clinical and research settings. Virtually all are contaminated by the inclusion of items that assess religiosity.9 For example, the Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale contains items that measure the comfort and strength derived from religious faith, in addition to a sense of meaning, purpose, and peace in life.32 The Systems of Belief Inventory, which was designed for use in quality of life and psychosocial research examining illness adjustment, measures religious and spiritual beliefs and practices and the social support that accompanies those beliefs and practices.33

The Spiritual Well-Being Scale has been used widely in health care settings and consists of 2 subscales: a religious well-being subscale and an existential well-being subscale.34 Religious well-being is conceptualized as the quality of one’s relationship with God, whereas existential well-being includes characteristics such as life purpose, life satisfaction, and positive and negative life experiences. Scores from the Spiritual Well-Being Scale have been inversely correlated with measures of psychological well-being.

However, much of this unpublished research has been compromised by ceiling effects or an inability to detect differences in those who score high on the scale, particularly in religious populations35 and by a lack of peer review.36

Our study has several limitations. Our conceptualization of spirituality is a new construct based on qualitative research, and the study purpose was to evaluate the psychometric properties of a new instrument to measure this construct. As a result, we did not analyze or report normative data about the SIWB. Spirituality may have conceptual overlap with existing constructs, such as self-efficacy and alienation, and we did not evaluate the independence of our scale against these constructs. The SIWB was embedded in the final cohort of a longitudinal study, and we were unable to perform test-retest reliability to determine the stability and the responsiveness or sensitivity of the instrument over time. Due to subject burden, the parent study limited the inclusion of additional measures and the quality-of-life instruments were selected a priori.

Our cross-sectional design also did not allow us to draw any definitive conclusions about the causal relations of the variables. The study population consisted primarily of predominantly white, older patients with some functional limitations, and the generalizability of our findings to other populations is uncertain. However, good theory development and item construction from prior qualitative studies, a high α coefficient, and factor analysis support the validity and reliability of our measure.

 

 

In summary, the SIWB appears to be a valid and reliable measure of patient subjective well-being, one that is uncontaminated by the inclusion of religiosity. This instrument may be used in observational studies of chronic illness, aging, and end-of-life care that use spirituality as an explanatory or predictor variable of well-being. Future validation studies with multiple, diverse populations and a longitudinal design are needed to refine, modify, or verify the SIWB as an additional, complementary instrument of wellbeing.

Acknowledgments

We thank Lynn Maxwell, Annette Becker, Danielle Sirchak, Donna Clausen, June Belt, Marjoire Frank, and Lisa Rogers for their dedicated service in this study.

References

1. Wuthnow R. After Heaven: Spirituality in America Since 1950. Berkeley: University of California Press; 1998.

2. Furnham A. Explaining health and illness: lay perceptions on current and future health, the causes of illness, and the nature of recovery. Soc Sci Med 1994;39:715-25.

3. Ellison CG, Levin JS. The religion-health connection: evidence, theory, and future directions. Health Educ Behav 1998;25:700-20.

4. Newberg A, D’Aquili EG, Rause V. Why God Won’t Go Away: Brain Science and the Biology of Belief. New York: Ballentine; 2001.

5. Sloan RP, Bagiella E, Powell T. Religion, spirituality, and medicine. Lancet 1999;353:664-7.

6. Wulff DM. Psychology of Religion: Classic and Contemporary. 2nd ed. New York: John Wiley & Sons; 1997.

7. Johnstone RL. Religion in Society: A Sociology of Religion. 5th ed. Upper Saddle River, NJ: Prentice-Hall; 1997.

8. Fitchett G. Selected resources for screening for spiritual risk. Chaplaincy Today 1999;15:13-26.

9. Hill PC, Hood RW, eds. Measures of Religiosity. Birmingham, AL: Religious Education Press; 1999.

10. Musick MA. Religion and subjective health among black and white elders. J Health Soc Behav 1996;37:221-37.

11. Idler EL, Kasl SV. Religion among disabled and nondisabled persons II: attendance at religious service as a predictor of the course of disability. J Gerontol B Psychol Soc Sci 1997;52B:S306-16.

12. Testa MA, Simonson DC. Assessment of quality-of-life outcomes. N Engl J Med 1996;334:835-40.

13. Ellison CG. Religious involvement and subjective well-being. J Health Soc Behav 1991;32:80-99.

14. Daaleman TP, Cobb AK, Frey BB. Spirituality and well-being: an exploratory approach to the patient perspective. Soc Sci Med 2001;53:119-27.

15. Williams DR, Wilson CM. Race, ethnicity, and aging. In: Binstock RH, George LK, eds. Handbook of Aging and the Social Sciences. 5th ed. San Diego: Academic Press; 2001;160-78.

16. Gallup G, Lindsay DM. Surveying the Religious Landscape. Harrisburg, PA: Morehouse Publishing; 1999.

17. Daaleman TP, VandeCreek L. Placing religion and spirituality in end-of-life care. JAMA 2000;284:2514-7.

18. Department of Health and Human Services. Healthy people 2010: understanding and improving health. Available at: http://web.health.gov/healthypeople/. Accessed March 29, 2001.

19. Antonovsky A. Unraveling the Mystery of Health: How People Manage Stress and Stay Well. San Francisco: Jossey-Bass Press; 1987.

20. Bandura A. Self-Efficacy, the Exercise of Control. New York: WH Freeman; 1997.

21. EuroQol Group. EuroQol: a new facility for the measurement of health related quality of life. Health Policy 1990;16:199-208.

22. Erickson P, Wilson R, Shannon I. Years of Healthy Life. Healthy People 2000, Statistical Notes No. 7. Hyattsville, MD: Centers for Disease Control and Prevention/National Center for Health Statistics; April 1995. DHHS publication PHS 95-1237 4-1484. Available at: http://www.cdc.gov/nchs/data/statnt/statnt07.pdf.

23. Stewart AL, Hays RD, Ware JE. The MOS short-form general health survey. Med Care 1988;26:724-5.

24. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale. J Psychiatr Res 1982;17:37-49.

25. Wong PTP, Reker GT, Gesser G. Death attitude profile-revised: a multidimensional measure of attitudes toward death. In: Neimeyer RA, ed. Death Anxiety Handbook. Washington, DC: Taylor & Francis; 1994;121-48.

26. Davis JA, Smith TW. General Social Surveys, 1972-1985. Chicago: National Opinion Research Center; 1985.

27. Chatters LM, Levin JS, Taylor RJ. Antecedents and dimensions of religious involvement among older black adults. J Gerontol B Psychol Soc Sci 1992;47:S269-78.

28. McSweeny AJ, Creer TL. Health-related quality-of-life assessment in medical care. Dis Mon 1995;16:1-71.

29. Koenig HG, McCullough ME, Larson DB. Handbook of Religion and Health. New York: Oxford University Press; 2001.

30. Diener E. Subjective well-being, the science of happiness and a proposal for a national index. Am Psychol 2000;55:34-43.

31. Fetzer Institute/National Institute on Aging Working Group. Multidimensional Measurement of Religiousness/Spirituality for Use in Health Research. Kalamazoo, MI: John A. Fetzer Institute; 1999.

32. Brady MJ, Peterman AH, Fitchett G, Mo M, Cella D. A case for including spirituality in quality of life measurement in oncology. Psychooncology 1999;8:417-28.

33. Holland JC, Kash KM, Passik S, et al. A brief spiritual beliefs inventory for use in quality-of-life research in life-threatening illness. Psychooncology 1998;7:460-9.

34. Ellison CW. Spiritual well-being: conceptualization and measurement. J Psychol Theol 1983;11:330-40.

35. Ledbetter MF, Smith LA, Vosler-Hunter WL, Fischer JD. An evaluation of the research and clinical usefulness of the spiritual wellbeing scale. J Psychol Theol 1991;19:49-55.

36. Ellison CW, Smith J. Toward an integrative measure of health and well-being. J Psychol Theol 1991;19:35-48.

Address reprint requests to Timothy P. Daaleman, DO, Department of Family Medicine, University of North Carolina at Chapel Hill, CB 7595, Manning Drive, Chapel Hill, NC 27599-7595. E-mail: [email protected].

To submit a letter to the editor on this topic, click here: [email protected].

References

1. Wuthnow R. After Heaven: Spirituality in America Since 1950. Berkeley: University of California Press; 1998.

2. Furnham A. Explaining health and illness: lay perceptions on current and future health, the causes of illness, and the nature of recovery. Soc Sci Med 1994;39:715-25.

3. Ellison CG, Levin JS. The religion-health connection: evidence, theory, and future directions. Health Educ Behav 1998;25:700-20.

4. Newberg A, D’Aquili EG, Rause V. Why God Won’t Go Away: Brain Science and the Biology of Belief. New York: Ballentine; 2001.

5. Sloan RP, Bagiella E, Powell T. Religion, spirituality, and medicine. Lancet 1999;353:664-7.

6. Wulff DM. Psychology of Religion: Classic and Contemporary. 2nd ed. New York: John Wiley & Sons; 1997.

7. Johnstone RL. Religion in Society: A Sociology of Religion. 5th ed. Upper Saddle River, NJ: Prentice-Hall; 1997.

8. Fitchett G. Selected resources for screening for spiritual risk. Chaplaincy Today 1999;15:13-26.

9. Hill PC, Hood RW, eds. Measures of Religiosity. Birmingham, AL: Religious Education Press; 1999.

10. Musick MA. Religion and subjective health among black and white elders. J Health Soc Behav 1996;37:221-37.

11. Idler EL, Kasl SV. Religion among disabled and nondisabled persons II: attendance at religious service as a predictor of the course of disability. J Gerontol B Psychol Soc Sci 1997;52B:S306-16.

12. Testa MA, Simonson DC. Assessment of quality-of-life outcomes. N Engl J Med 1996;334:835-40.

13. Ellison CG. Religious involvement and subjective well-being. J Health Soc Behav 1991;32:80-99.

14. Daaleman TP, Cobb AK, Frey BB. Spirituality and well-being: an exploratory approach to the patient perspective. Soc Sci Med 2001;53:119-27.

15. Williams DR, Wilson CM. Race, ethnicity, and aging. In: Binstock RH, George LK, eds. Handbook of Aging and the Social Sciences. 5th ed. San Diego: Academic Press; 2001;160-78.

16. Gallup G, Lindsay DM. Surveying the Religious Landscape. Harrisburg, PA: Morehouse Publishing; 1999.

17. Daaleman TP, VandeCreek L. Placing religion and spirituality in end-of-life care. JAMA 2000;284:2514-7.

18. Department of Health and Human Services. Healthy people 2010: understanding and improving health. Available at: http://web.health.gov/healthypeople/. Accessed March 29, 2001.

19. Antonovsky A. Unraveling the Mystery of Health: How People Manage Stress and Stay Well. San Francisco: Jossey-Bass Press; 1987.

20. Bandura A. Self-Efficacy, the Exercise of Control. New York: WH Freeman; 1997.

21. EuroQol Group. EuroQol: a new facility for the measurement of health related quality of life. Health Policy 1990;16:199-208.

22. Erickson P, Wilson R, Shannon I. Years of Healthy Life. Healthy People 2000, Statistical Notes No. 7. Hyattsville, MD: Centers for Disease Control and Prevention/National Center for Health Statistics; April 1995. DHHS publication PHS 95-1237 4-1484. Available at: http://www.cdc.gov/nchs/data/statnt/statnt07.pdf.

23. Stewart AL, Hays RD, Ware JE. The MOS short-form general health survey. Med Care 1988;26:724-5.

24. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale. J Psychiatr Res 1982;17:37-49.

25. Wong PTP, Reker GT, Gesser G. Death attitude profile-revised: a multidimensional measure of attitudes toward death. In: Neimeyer RA, ed. Death Anxiety Handbook. Washington, DC: Taylor & Francis; 1994;121-48.

26. Davis JA, Smith TW. General Social Surveys, 1972-1985. Chicago: National Opinion Research Center; 1985.

27. Chatters LM, Levin JS, Taylor RJ. Antecedents and dimensions of religious involvement among older black adults. J Gerontol B Psychol Soc Sci 1992;47:S269-78.

28. McSweeny AJ, Creer TL. Health-related quality-of-life assessment in medical care. Dis Mon 1995;16:1-71.

29. Koenig HG, McCullough ME, Larson DB. Handbook of Religion and Health. New York: Oxford University Press; 2001.

30. Diener E. Subjective well-being, the science of happiness and a proposal for a national index. Am Psychol 2000;55:34-43.

31. Fetzer Institute/National Institute on Aging Working Group. Multidimensional Measurement of Religiousness/Spirituality for Use in Health Research. Kalamazoo, MI: John A. Fetzer Institute; 1999.

32. Brady MJ, Peterman AH, Fitchett G, Mo M, Cella D. A case for including spirituality in quality of life measurement in oncology. Psychooncology 1999;8:417-28.

33. Holland JC, Kash KM, Passik S, et al. A brief spiritual beliefs inventory for use in quality-of-life research in life-threatening illness. Psychooncology 1998;7:460-9.

34. Ellison CW. Spiritual well-being: conceptualization and measurement. J Psychol Theol 1983;11:330-40.

35. Ledbetter MF, Smith LA, Vosler-Hunter WL, Fischer JD. An evaluation of the research and clinical usefulness of the spiritual wellbeing scale. J Psychol Theol 1991;19:49-55.

36. Ellison CW, Smith J. Toward an integrative measure of health and well-being. J Psychol Theol 1991;19:35-48.

Address reprint requests to Timothy P. Daaleman, DO, Department of Family Medicine, University of North Carolina at Chapel Hill, CB 7595, Manning Drive, Chapel Hill, NC 27599-7595. E-mail: [email protected].

To submit a letter to the editor on this topic, click here: [email protected].

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
1-1
Page Number
1-1
Publications
Publications
Article Type
Display Headline
The Spirituality Index of Well-Being: Development and testing of a new measure
Display Headline
The Spirituality Index of Well-Being: Development and testing of a new measure
Legacy Keywords
,Quality of lifesubjective well-beingmeasurementspiritualityolder persons. (J Fam Pract 2002; 51:00-00)
Legacy Keywords
,Quality of lifesubjective well-beingmeasurementspiritualityolder persons. (J Fam Pract 2002; 51:00-00)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Intention-to-treat analysis: Who is in? Who is out?

Article Type
Changed
Mon, 01/14/2019 - 11:38
Display Headline
Intention-to-treat analysis: Who is in? Who is out?

KEY POINTS FOR CLINICIANS

  • Including all randomized subjects when analyzing randomized controlled trials—the “intention-to-treat” principle—is an important factor in minimizing bias.
  • Studies have found that fewer than half of randomized controlled trials reported intention-to-treat analysis.
  • Among studies reporting intention-to-treat analyses, fewer than half actually analyzed all randomized subjects.

The randomized controlled trial (RCT) has become the most important test of therapeutic benefit.1 When evaluating an RCT, readers should determine whether the analysis was by intention to treat (ITT).1-4 ITT analysis, often described as “once randomized, always analyzed,”5 is the practice of attributing all participants to the group to which they were randomized, regardless of what subsequently occurred.2,6,7 ITT analysis avoids the problems created by omitting dropouts and noncompliant patients, which can negate randomization, introduce bias, and overestimate clinical effectiveness.2,8

Surveys of the literature found that ITT analysis was reported in 7% to 48% of RCTs8-10; however, reporting ITT analysis does not guarantee that the analysis was conducted properly or that the results promoted by the authors were derived from the ITT analysis. For articles reporting ITT analysis of an RCT, we specifically examined which participants were included in the analysis.

Methods

We searched MEDLINE for abstracts that included the text words “intention to treat” or “intent to treat,” limiting the results to randomized controlled trials published in English during 1999. We entered the resultant studies in a database (FileMaker Pro 4.0; FileMaker, Inc., Santa Clara, CA), ordered them using the database’s random number function, and reviewed the first 100 eligible studies.

Two of us (in a rotating fashion cycling through each pair-wise combination of the 6 authors) were systematically assigned to review each article. We used a structured form (available on request from the authors) to evaluate each article for the number of subjects randomized, the number in the ITT analysis, the number in the primary analysis, which categories of subjects were in the ITT analysis, and where ITT was defined within the article. We defined the primary analysis as the most prominently featured outcome in the abstract. Two of us (R.L.K., J.J.S.) independently assessed whether articles contained a diagram showing the flow of participants through each stage, a feature strongly recommended in the Consolidated Standards of Reporting Trials (CONSORT).3 All coauthors discussed discrepant results and made final determinations using majority voting. We conducted all analyses using SAS software (SAS system for Windows, Release 8.0; SAS Institute, Cary, NC).

Results

The MEDLINE search identified 335 studies. We reviewed 129 articles to obtain 100 eligible studies (Figure). We excluded articles for the following reasons: the words “intent” or “intention” and “treat” did not refer to a method of analysis (14); the study was not an RCT (5); the study involved randomized groups (eg, villages or hospitals) rather than individuals (4); the study presented a secondary analysis of a previously published study (4); the study used a crossover design (1) or described a trial protocol without results (1). The paired reviewers agreed on all abstracted data elements for 83 articles; there was a disagreement on one or more items for the remainder, which were determined by committee.

Of 100 studies selected, 42 included all randomized subjects in the ITT analysis (Table). Among those studies that excluded randomized patients from analysis, the most common reasons given were that the patients received no follow-up after randomization (16) or received none of the allocated treatment (14). For 13 studies, we could not determine which categories of participants were excluded from the ITT analysis.

We used the number of subjects randomized, the number in the ITT analysis, and the number in the primary analysis to determine the proportion of randomized subjects in both the ITT and primary analyses. Ideally, 100% of randomized subjects should be included in the ITT analysis. The proportion of randomized subjects included in the ITT analysis could be determined for 92 studies, and ranged from 69% to 100%, with a median of 99%. Nineteen of the 92 studies (21%) excluded more than 5% of randomized participants from the ITT analysis, while 10 studies (11%) excluded more than 10%.

We could determine the proportion of randomized participants in the primary analysis in 93 studies; it varied from 49% to 100%, with a median of 98.7%. Ten of the 93 studies (11%) excluded more than 20% of participants from the primary analysis. In 16 of the 93 studies (17%), a non-ITT analysis (eg, “per protocol”) was presented as the primary analysis. In these studies, an average of 80.1% (median, 82.4%; range, 49.0% to 92.4%) of randomized patients were included in the primary analysis.

 

 

Fifty-six studies included a definition of the ITT population, primarily within the methods (38) and results (18) sections. Of the 42 studies where all randomized subjects were analyzed, 20 included definitions of ITT. Diagrams showing the flow of participants through each trial were present in 41 of 100 articles, including 1 on a journal’s web site. An additional 8 articles had diagrams that showed patient flow without giving the number of patients. Presence of a flow diagram was not related to whether or not all randomized subjects were included in the ITT analysis (36% vs 45% respectively, P = .37). Of the 31 articles from journals that participate in CONSORT, 29 included flow diagrams, compared with 12 of the 69 articles from journals that do not participate in CONSORT (P < .0001).

TABLE
Categories of randomized patients excluded from ITT analysis*

CategoryNumber of studies
All randomized subjects were analyzed (true intention to treat)42
Some randomized subjects were excluded58
  Subjects found not to meet entry criteria12
  Subjects who did not receive any of the assigned treatment14
  Subjects who received some but not all of the assigned treatment1
  Subjects with no follow-up after randomization16
  Subjects with some but not all follow-up achieved1
  Subjects who dropped out for selected reasons4
  Subjects with specific protocol violations2
  Subjects with protocol violations but details not given2
  Other9
  Author needs to be contacted to determine who was in the ITT group13
*Reports of 100 randomized trials were analyzed. Studies could have more than one group of excluded subjects. ITT, intention to treat.

Discussion

The hallmark of ITT analysis is that all randomized subjects are analyzed.7 In more than half of the articles we examined, this was not the case. Analysis of only certain subgroups of patients is sometimes appropriate, but an explanation should be provided whenever subjects are left out of any analysis. For example, we examined a report of a trial that was stopped based on the results of an interim analysis, thus excluding subjects who were randomized after the interim analysis.11 This type of exclusion, based on an a priori decision rather than individual characteristics or behavior, is less likely to bias results.

While all the articles in our sample reported analysis by ITT, many authors did not define the term, even when they excluded some randomized subjects from the ITT analysis. In these cases, the reader is left to infer which subjects were excluded based on information given in the text, figures, and tables.

Despite numerous recommendations for detailed reporting of RCT methods,1-4 many articles were vague and lacked detail. We could not determine which categories of participants were excluded from the ITT analysis in 13 articles. In 8 of the 100 articles we examined, we could not determine how many subjects were randomized or included in the ITT or primary analysis. Four of these 8 articles were in journals that endorsed the CONSORT statement. All were published well after the initial CONSORT statement was released in 1996.1

The number of randomized subjects excluded from the ITT analysis was usually small. It is unlikely that excluding up to 1% of subjects had a major effect on the results. In 11% of our sample, however, more than 10% of randomized subjects were excluded. Exclusions of this magnitude have significant potential to alter the findings. When outcome data can’t be determined and the outcome is categorical (eg, alive/dead), it can be helpful to produce best-case and worst-case scenarios in which patients lost to follow-up are arbitrarily ascribed good or bad outcomes. These extremes delimit the potential effect of the exclusions on results.12 Similarly, missing continuous outcomes (eg, weight change) can be assigned specific values to determine the potential impact on the results.

We assessed only articles that mentioned ITT in the abstract, so we probably missed some studies that used ITT analysis; however, we doubt that this caused us to significantly underestimate accurate use of the term ITT. The articles came from a wide spectrum of journals (62), of which 21 were listed in the Abridged Index Medicus subset. The 17 articles requiring a committee vote described the analytic process in terms that were often vague and ambiguous. In these cases, we cannot be certain that we correctly interpreted the authors’ methods; most readers would have similar difficulties.

We found considerable variation in how the term ITT was used in reports of RCTs. Fewer than half of the reports we examined included all randomized subjects in the ITT analysis. While exclusions were negligible in many cases, more than 10% of the subjects were excluded in 10% of the trials. In 7 trials, including some drawn from journals that endorse the CONSORT statement, it was not even possible to determine the number of subjects included in the ITT analysis. These problems highlight the continued need for better reporting of clinical trials.

References

1. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996;276:637-9.

2. Guyatt GH, Sackett DL, Cook DJ. , for the Evidence-Based Medicine Working Group. Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? JAMA 1993;270:2598-601.

3. Moher D, Schulz KF, Altman D. for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91.

4. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ 2001;165:1339-41.

5. Hennekens CH, Buring JE, Mayrent SL. Epidemiology in Medicine. 1st ed. Boston: Little, Brown; 1987;207.-

6. Altman DG, Schulz KF, Moher D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663-94.

7. Lewis JA, Machin D. Intention to treat—who should use ITT? Br J Cancer 1993;68:647-50.

8. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999;319:670-4.

9. Schulz KF, Grimes DA, Altman DG, Hayes RJ. Blinding and exclusions after allocation in randomised controlled trials: survey of published parallel group trials in obstetrics and gynaecology. BMJ 1996;312:742-4.

10. Ruiz-Canela M, Martínez-González MA, de Irala-Estévez J. Intention to treat analysis is related to methodological quality. BMJ 2000;320:1007-8.

11. Interferon-alpha and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Medical Research Council Renal Cancer Collaborators. Lancet 1999;353:14-7.

12. Winnock M, Rancinan C, De Ledinghen V, Couzigou P, Chene G. What hides behind an intention-to-treat analysis? Hepatology 2001;33:1014-5.

Article PDF
Author and Disclosure Information

ROBIN L. KRUSE, PHD
BRIAN S. ALPER, MD, MSPH
CARIN REUST, MD, MSPH
JAMES J. STEVERMER, MD, MSPH
SCOTT SHANNON, MD
RANDY H. WILLIAMS, PHD
Columbia, Hallsville, and Jefferson City, Missouri
From the Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, Columbia, MO (R.L.K., B.S.A., J.J.S., S.S.); University Physicians-Hallsville, Hallsville, MO (C.R.); and the Missouri Department of Health and Senior Services, Division of Chronic Disease Prevention and Health Promotion, Office of Surveillance, Research and Evaluation, Jefferson City, MO (R.H.W.). The authors report no competing interests. Address reprint requests to Robin L. Kruse, PhD, Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, M228 Medical Sciences Building, Columbia, MO 65212. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
969-971
Legacy Keywords
,Randomized controlled trialsresearch designrandom allocationintention to treat. (J Fam Pract 2002; 51:969-971)
Sections
Author and Disclosure Information

ROBIN L. KRUSE, PHD
BRIAN S. ALPER, MD, MSPH
CARIN REUST, MD, MSPH
JAMES J. STEVERMER, MD, MSPH
SCOTT SHANNON, MD
RANDY H. WILLIAMS, PHD
Columbia, Hallsville, and Jefferson City, Missouri
From the Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, Columbia, MO (R.L.K., B.S.A., J.J.S., S.S.); University Physicians-Hallsville, Hallsville, MO (C.R.); and the Missouri Department of Health and Senior Services, Division of Chronic Disease Prevention and Health Promotion, Office of Surveillance, Research and Evaluation, Jefferson City, MO (R.H.W.). The authors report no competing interests. Address reprint requests to Robin L. Kruse, PhD, Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, M228 Medical Sciences Building, Columbia, MO 65212. E-mail: [email protected].

Author and Disclosure Information

ROBIN L. KRUSE, PHD
BRIAN S. ALPER, MD, MSPH
CARIN REUST, MD, MSPH
JAMES J. STEVERMER, MD, MSPH
SCOTT SHANNON, MD
RANDY H. WILLIAMS, PHD
Columbia, Hallsville, and Jefferson City, Missouri
From the Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, Columbia, MO (R.L.K., B.S.A., J.J.S., S.S.); University Physicians-Hallsville, Hallsville, MO (C.R.); and the Missouri Department of Health and Senior Services, Division of Chronic Disease Prevention and Health Promotion, Office of Surveillance, Research and Evaluation, Jefferson City, MO (R.H.W.). The authors report no competing interests. Address reprint requests to Robin L. Kruse, PhD, Department of Family and Community Medicine, University of Missouri-Columbia School of Medicine, M228 Medical Sciences Building, Columbia, MO 65212. E-mail: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • Including all randomized subjects when analyzing randomized controlled trials—the “intention-to-treat” principle—is an important factor in minimizing bias.
  • Studies have found that fewer than half of randomized controlled trials reported intention-to-treat analysis.
  • Among studies reporting intention-to-treat analyses, fewer than half actually analyzed all randomized subjects.

The randomized controlled trial (RCT) has become the most important test of therapeutic benefit.1 When evaluating an RCT, readers should determine whether the analysis was by intention to treat (ITT).1-4 ITT analysis, often described as “once randomized, always analyzed,”5 is the practice of attributing all participants to the group to which they were randomized, regardless of what subsequently occurred.2,6,7 ITT analysis avoids the problems created by omitting dropouts and noncompliant patients, which can negate randomization, introduce bias, and overestimate clinical effectiveness.2,8

Surveys of the literature found that ITT analysis was reported in 7% to 48% of RCTs8-10; however, reporting ITT analysis does not guarantee that the analysis was conducted properly or that the results promoted by the authors were derived from the ITT analysis. For articles reporting ITT analysis of an RCT, we specifically examined which participants were included in the analysis.

Methods

We searched MEDLINE for abstracts that included the text words “intention to treat” or “intent to treat,” limiting the results to randomized controlled trials published in English during 1999. We entered the resultant studies in a database (FileMaker Pro 4.0; FileMaker, Inc., Santa Clara, CA), ordered them using the database’s random number function, and reviewed the first 100 eligible studies.

Two of us (in a rotating fashion cycling through each pair-wise combination of the 6 authors) were systematically assigned to review each article. We used a structured form (available on request from the authors) to evaluate each article for the number of subjects randomized, the number in the ITT analysis, the number in the primary analysis, which categories of subjects were in the ITT analysis, and where ITT was defined within the article. We defined the primary analysis as the most prominently featured outcome in the abstract. Two of us (R.L.K., J.J.S.) independently assessed whether articles contained a diagram showing the flow of participants through each stage, a feature strongly recommended in the Consolidated Standards of Reporting Trials (CONSORT).3 All coauthors discussed discrepant results and made final determinations using majority voting. We conducted all analyses using SAS software (SAS system for Windows, Release 8.0; SAS Institute, Cary, NC).

Results

The MEDLINE search identified 335 studies. We reviewed 129 articles to obtain 100 eligible studies (Figure). We excluded articles for the following reasons: the words “intent” or “intention” and “treat” did not refer to a method of analysis (14); the study was not an RCT (5); the study involved randomized groups (eg, villages or hospitals) rather than individuals (4); the study presented a secondary analysis of a previously published study (4); the study used a crossover design (1) or described a trial protocol without results (1). The paired reviewers agreed on all abstracted data elements for 83 articles; there was a disagreement on one or more items for the remainder, which were determined by committee.

Of 100 studies selected, 42 included all randomized subjects in the ITT analysis (Table). Among those studies that excluded randomized patients from analysis, the most common reasons given were that the patients received no follow-up after randomization (16) or received none of the allocated treatment (14). For 13 studies, we could not determine which categories of participants were excluded from the ITT analysis.

We used the number of subjects randomized, the number in the ITT analysis, and the number in the primary analysis to determine the proportion of randomized subjects in both the ITT and primary analyses. Ideally, 100% of randomized subjects should be included in the ITT analysis. The proportion of randomized subjects included in the ITT analysis could be determined for 92 studies, and ranged from 69% to 100%, with a median of 99%. Nineteen of the 92 studies (21%) excluded more than 5% of randomized participants from the ITT analysis, while 10 studies (11%) excluded more than 10%.

We could determine the proportion of randomized participants in the primary analysis in 93 studies; it varied from 49% to 100%, with a median of 98.7%. Ten of the 93 studies (11%) excluded more than 20% of participants from the primary analysis. In 16 of the 93 studies (17%), a non-ITT analysis (eg, “per protocol”) was presented as the primary analysis. In these studies, an average of 80.1% (median, 82.4%; range, 49.0% to 92.4%) of randomized patients were included in the primary analysis.

 

 

Fifty-six studies included a definition of the ITT population, primarily within the methods (38) and results (18) sections. Of the 42 studies where all randomized subjects were analyzed, 20 included definitions of ITT. Diagrams showing the flow of participants through each trial were present in 41 of 100 articles, including 1 on a journal’s web site. An additional 8 articles had diagrams that showed patient flow without giving the number of patients. Presence of a flow diagram was not related to whether or not all randomized subjects were included in the ITT analysis (36% vs 45% respectively, P = .37). Of the 31 articles from journals that participate in CONSORT, 29 included flow diagrams, compared with 12 of the 69 articles from journals that do not participate in CONSORT (P < .0001).

TABLE
Categories of randomized patients excluded from ITT analysis*

CategoryNumber of studies
All randomized subjects were analyzed (true intention to treat)42
Some randomized subjects were excluded58
  Subjects found not to meet entry criteria12
  Subjects who did not receive any of the assigned treatment14
  Subjects who received some but not all of the assigned treatment1
  Subjects with no follow-up after randomization16
  Subjects with some but not all follow-up achieved1
  Subjects who dropped out for selected reasons4
  Subjects with specific protocol violations2
  Subjects with protocol violations but details not given2
  Other9
  Author needs to be contacted to determine who was in the ITT group13
*Reports of 100 randomized trials were analyzed. Studies could have more than one group of excluded subjects. ITT, intention to treat.

Discussion

The hallmark of ITT analysis is that all randomized subjects are analyzed.7 In more than half of the articles we examined, this was not the case. Analysis of only certain subgroups of patients is sometimes appropriate, but an explanation should be provided whenever subjects are left out of any analysis. For example, we examined a report of a trial that was stopped based on the results of an interim analysis, thus excluding subjects who were randomized after the interim analysis.11 This type of exclusion, based on an a priori decision rather than individual characteristics or behavior, is less likely to bias results.

While all the articles in our sample reported analysis by ITT, many authors did not define the term, even when they excluded some randomized subjects from the ITT analysis. In these cases, the reader is left to infer which subjects were excluded based on information given in the text, figures, and tables.

Despite numerous recommendations for detailed reporting of RCT methods,1-4 many articles were vague and lacked detail. We could not determine which categories of participants were excluded from the ITT analysis in 13 articles. In 8 of the 100 articles we examined, we could not determine how many subjects were randomized or included in the ITT or primary analysis. Four of these 8 articles were in journals that endorsed the CONSORT statement. All were published well after the initial CONSORT statement was released in 1996.1

The number of randomized subjects excluded from the ITT analysis was usually small. It is unlikely that excluding up to 1% of subjects had a major effect on the results. In 11% of our sample, however, more than 10% of randomized subjects were excluded. Exclusions of this magnitude have significant potential to alter the findings. When outcome data can’t be determined and the outcome is categorical (eg, alive/dead), it can be helpful to produce best-case and worst-case scenarios in which patients lost to follow-up are arbitrarily ascribed good or bad outcomes. These extremes delimit the potential effect of the exclusions on results.12 Similarly, missing continuous outcomes (eg, weight change) can be assigned specific values to determine the potential impact on the results.

We assessed only articles that mentioned ITT in the abstract, so we probably missed some studies that used ITT analysis; however, we doubt that this caused us to significantly underestimate accurate use of the term ITT. The articles came from a wide spectrum of journals (62), of which 21 were listed in the Abridged Index Medicus subset. The 17 articles requiring a committee vote described the analytic process in terms that were often vague and ambiguous. In these cases, we cannot be certain that we correctly interpreted the authors’ methods; most readers would have similar difficulties.

We found considerable variation in how the term ITT was used in reports of RCTs. Fewer than half of the reports we examined included all randomized subjects in the ITT analysis. While exclusions were negligible in many cases, more than 10% of the subjects were excluded in 10% of the trials. In 7 trials, including some drawn from journals that endorse the CONSORT statement, it was not even possible to determine the number of subjects included in the ITT analysis. These problems highlight the continued need for better reporting of clinical trials.

KEY POINTS FOR CLINICIANS

  • Including all randomized subjects when analyzing randomized controlled trials—the “intention-to-treat” principle—is an important factor in minimizing bias.
  • Studies have found that fewer than half of randomized controlled trials reported intention-to-treat analysis.
  • Among studies reporting intention-to-treat analyses, fewer than half actually analyzed all randomized subjects.

The randomized controlled trial (RCT) has become the most important test of therapeutic benefit.1 When evaluating an RCT, readers should determine whether the analysis was by intention to treat (ITT).1-4 ITT analysis, often described as “once randomized, always analyzed,”5 is the practice of attributing all participants to the group to which they were randomized, regardless of what subsequently occurred.2,6,7 ITT analysis avoids the problems created by omitting dropouts and noncompliant patients, which can negate randomization, introduce bias, and overestimate clinical effectiveness.2,8

Surveys of the literature found that ITT analysis was reported in 7% to 48% of RCTs8-10; however, reporting ITT analysis does not guarantee that the analysis was conducted properly or that the results promoted by the authors were derived from the ITT analysis. For articles reporting ITT analysis of an RCT, we specifically examined which participants were included in the analysis.

Methods

We searched MEDLINE for abstracts that included the text words “intention to treat” or “intent to treat,” limiting the results to randomized controlled trials published in English during 1999. We entered the resultant studies in a database (FileMaker Pro 4.0; FileMaker, Inc., Santa Clara, CA), ordered them using the database’s random number function, and reviewed the first 100 eligible studies.

Two of us (in a rotating fashion cycling through each pair-wise combination of the 6 authors) were systematically assigned to review each article. We used a structured form (available on request from the authors) to evaluate each article for the number of subjects randomized, the number in the ITT analysis, the number in the primary analysis, which categories of subjects were in the ITT analysis, and where ITT was defined within the article. We defined the primary analysis as the most prominently featured outcome in the abstract. Two of us (R.L.K., J.J.S.) independently assessed whether articles contained a diagram showing the flow of participants through each stage, a feature strongly recommended in the Consolidated Standards of Reporting Trials (CONSORT).3 All coauthors discussed discrepant results and made final determinations using majority voting. We conducted all analyses using SAS software (SAS system for Windows, Release 8.0; SAS Institute, Cary, NC).

Results

The MEDLINE search identified 335 studies. We reviewed 129 articles to obtain 100 eligible studies (Figure). We excluded articles for the following reasons: the words “intent” or “intention” and “treat” did not refer to a method of analysis (14); the study was not an RCT (5); the study involved randomized groups (eg, villages or hospitals) rather than individuals (4); the study presented a secondary analysis of a previously published study (4); the study used a crossover design (1) or described a trial protocol without results (1). The paired reviewers agreed on all abstracted data elements for 83 articles; there was a disagreement on one or more items for the remainder, which were determined by committee.

Of 100 studies selected, 42 included all randomized subjects in the ITT analysis (Table). Among those studies that excluded randomized patients from analysis, the most common reasons given were that the patients received no follow-up after randomization (16) or received none of the allocated treatment (14). For 13 studies, we could not determine which categories of participants were excluded from the ITT analysis.

We used the number of subjects randomized, the number in the ITT analysis, and the number in the primary analysis to determine the proportion of randomized subjects in both the ITT and primary analyses. Ideally, 100% of randomized subjects should be included in the ITT analysis. The proportion of randomized subjects included in the ITT analysis could be determined for 92 studies, and ranged from 69% to 100%, with a median of 99%. Nineteen of the 92 studies (21%) excluded more than 5% of randomized participants from the ITT analysis, while 10 studies (11%) excluded more than 10%.

We could determine the proportion of randomized participants in the primary analysis in 93 studies; it varied from 49% to 100%, with a median of 98.7%. Ten of the 93 studies (11%) excluded more than 20% of participants from the primary analysis. In 16 of the 93 studies (17%), a non-ITT analysis (eg, “per protocol”) was presented as the primary analysis. In these studies, an average of 80.1% (median, 82.4%; range, 49.0% to 92.4%) of randomized patients were included in the primary analysis.

 

 

Fifty-six studies included a definition of the ITT population, primarily within the methods (38) and results (18) sections. Of the 42 studies where all randomized subjects were analyzed, 20 included definitions of ITT. Diagrams showing the flow of participants through each trial were present in 41 of 100 articles, including 1 on a journal’s web site. An additional 8 articles had diagrams that showed patient flow without giving the number of patients. Presence of a flow diagram was not related to whether or not all randomized subjects were included in the ITT analysis (36% vs 45% respectively, P = .37). Of the 31 articles from journals that participate in CONSORT, 29 included flow diagrams, compared with 12 of the 69 articles from journals that do not participate in CONSORT (P < .0001).

TABLE
Categories of randomized patients excluded from ITT analysis*

CategoryNumber of studies
All randomized subjects were analyzed (true intention to treat)42
Some randomized subjects were excluded58
  Subjects found not to meet entry criteria12
  Subjects who did not receive any of the assigned treatment14
  Subjects who received some but not all of the assigned treatment1
  Subjects with no follow-up after randomization16
  Subjects with some but not all follow-up achieved1
  Subjects who dropped out for selected reasons4
  Subjects with specific protocol violations2
  Subjects with protocol violations but details not given2
  Other9
  Author needs to be contacted to determine who was in the ITT group13
*Reports of 100 randomized trials were analyzed. Studies could have more than one group of excluded subjects. ITT, intention to treat.

Discussion

The hallmark of ITT analysis is that all randomized subjects are analyzed.7 In more than half of the articles we examined, this was not the case. Analysis of only certain subgroups of patients is sometimes appropriate, but an explanation should be provided whenever subjects are left out of any analysis. For example, we examined a report of a trial that was stopped based on the results of an interim analysis, thus excluding subjects who were randomized after the interim analysis.11 This type of exclusion, based on an a priori decision rather than individual characteristics or behavior, is less likely to bias results.

While all the articles in our sample reported analysis by ITT, many authors did not define the term, even when they excluded some randomized subjects from the ITT analysis. In these cases, the reader is left to infer which subjects were excluded based on information given in the text, figures, and tables.

Despite numerous recommendations for detailed reporting of RCT methods,1-4 many articles were vague and lacked detail. We could not determine which categories of participants were excluded from the ITT analysis in 13 articles. In 8 of the 100 articles we examined, we could not determine how many subjects were randomized or included in the ITT or primary analysis. Four of these 8 articles were in journals that endorsed the CONSORT statement. All were published well after the initial CONSORT statement was released in 1996.1

The number of randomized subjects excluded from the ITT analysis was usually small. It is unlikely that excluding up to 1% of subjects had a major effect on the results. In 11% of our sample, however, more than 10% of randomized subjects were excluded. Exclusions of this magnitude have significant potential to alter the findings. When outcome data can’t be determined and the outcome is categorical (eg, alive/dead), it can be helpful to produce best-case and worst-case scenarios in which patients lost to follow-up are arbitrarily ascribed good or bad outcomes. These extremes delimit the potential effect of the exclusions on results.12 Similarly, missing continuous outcomes (eg, weight change) can be assigned specific values to determine the potential impact on the results.

We assessed only articles that mentioned ITT in the abstract, so we probably missed some studies that used ITT analysis; however, we doubt that this caused us to significantly underestimate accurate use of the term ITT. The articles came from a wide spectrum of journals (62), of which 21 were listed in the Abridged Index Medicus subset. The 17 articles requiring a committee vote described the analytic process in terms that were often vague and ambiguous. In these cases, we cannot be certain that we correctly interpreted the authors’ methods; most readers would have similar difficulties.

We found considerable variation in how the term ITT was used in reports of RCTs. Fewer than half of the reports we examined included all randomized subjects in the ITT analysis. While exclusions were negligible in many cases, more than 10% of the subjects were excluded in 10% of the trials. In 7 trials, including some drawn from journals that endorse the CONSORT statement, it was not even possible to determine the number of subjects included in the ITT analysis. These problems highlight the continued need for better reporting of clinical trials.

References

1. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996;276:637-9.

2. Guyatt GH, Sackett DL, Cook DJ. , for the Evidence-Based Medicine Working Group. Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? JAMA 1993;270:2598-601.

3. Moher D, Schulz KF, Altman D. for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91.

4. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ 2001;165:1339-41.

5. Hennekens CH, Buring JE, Mayrent SL. Epidemiology in Medicine. 1st ed. Boston: Little, Brown; 1987;207.-

6. Altman DG, Schulz KF, Moher D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663-94.

7. Lewis JA, Machin D. Intention to treat—who should use ITT? Br J Cancer 1993;68:647-50.

8. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999;319:670-4.

9. Schulz KF, Grimes DA, Altman DG, Hayes RJ. Blinding and exclusions after allocation in randomised controlled trials: survey of published parallel group trials in obstetrics and gynaecology. BMJ 1996;312:742-4.

10. Ruiz-Canela M, Martínez-González MA, de Irala-Estévez J. Intention to treat analysis is related to methodological quality. BMJ 2000;320:1007-8.

11. Interferon-alpha and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Medical Research Council Renal Cancer Collaborators. Lancet 1999;353:14-7.

12. Winnock M, Rancinan C, De Ledinghen V, Couzigou P, Chene G. What hides behind an intention-to-treat analysis? Hepatology 2001;33:1014-5.

References

1. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996;276:637-9.

2. Guyatt GH, Sackett DL, Cook DJ. , for the Evidence-Based Medicine Working Group. Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? JAMA 1993;270:2598-601.

3. Moher D, Schulz KF, Altman D. for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91.

4. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ 2001;165:1339-41.

5. Hennekens CH, Buring JE, Mayrent SL. Epidemiology in Medicine. 1st ed. Boston: Little, Brown; 1987;207.-

6. Altman DG, Schulz KF, Moher D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663-94.

7. Lewis JA, Machin D. Intention to treat—who should use ITT? Br J Cancer 1993;68:647-50.

8. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999;319:670-4.

9. Schulz KF, Grimes DA, Altman DG, Hayes RJ. Blinding and exclusions after allocation in randomised controlled trials: survey of published parallel group trials in obstetrics and gynaecology. BMJ 1996;312:742-4.

10. Ruiz-Canela M, Martínez-González MA, de Irala-Estévez J. Intention to treat analysis is related to methodological quality. BMJ 2000;320:1007-8.

11. Interferon-alpha and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Medical Research Council Renal Cancer Collaborators. Lancet 1999;353:14-7.

12. Winnock M, Rancinan C, De Ledinghen V, Couzigou P, Chene G. What hides behind an intention-to-treat analysis? Hepatology 2001;33:1014-5.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
969-971
Page Number
969-971
Publications
Publications
Article Type
Display Headline
Intention-to-treat analysis: Who is in? Who is out?
Display Headline
Intention-to-treat analysis: Who is in? Who is out?
Legacy Keywords
,Randomized controlled trialsresearch designrandom allocationintention to treat. (J Fam Pract 2002; 51:969-971)
Legacy Keywords
,Randomized controlled trialsresearch designrandom allocationintention to treat. (J Fam Pract 2002; 51:969-971)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Classification of medical errors and preventable adverse events in primary care: A synthesis of the literature

Article Type
Changed
Mon, 01/14/2019 - 10:57
Display Headline
Classification of medical errors and preventable adverse events in primary care: A synthesis of the literature

ABSTRACT

OBJECTIVE: To describe and classify process errors and preventable adverse events that occur from medical care in outpatient primary care settings.

STUDY DESIGN: Systematic review and synthesis of the medical literature.

DATA SOURCES: We searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care and limited the search to English-language publications. Published bibliographies and Web sites from patient safety and primary care organizations were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed and from seminal papers in the field.

OUTCOMES MEASURED: Process errors and preventable adverse events.

RESULTS: Four original research studies directly studied and described medical errors and adverse events in primary care, and 3 other studies peripherally addressed primary care medical errors. A variety of quantitative and qualitative methods were used in the studies. Extraction of results from the studies led to a classification of 3 main categories of preventable adverse events: diagnosis, treatment, and preventive services. Process errors were classified into 4 categories: clinician, communication, administration, and blunt end.

CONCLUSIONS: Original research on medical errors in the primary care setting consists of a limited number of small studies that offer a rich description of medical errors and preventable adverse events primarily from the physician’s viewpoint. We describe a classification derived from these studies that is based on the actual practice of primary care and provides a starting point for future epidemiologic and interventional research. Missing are studies that have patient, consumer, or other health care provider input.

KEY POINTS FOR CLINICIANS

  • Little is known about medical errors and preventable adverse events in the primary care setting.
  • Preventable adverse events reported from primary care practices include diagnostic, treatment, and preventive care incidents.
  • Process errors reported from primary care practices can be categorized as clinician factors (judgment, decision making, skill execution), communication factors (between clinician and patient and between health care providers), administration factors (office and personnel issues), and blunt end factors (insurance and government regulations).
  • Current knowledge of errors and preventable adverse events in primary care is missing input from patients and other health care providers.

Every primary care clinician in the United States knows the frustration of lost charts, misplaced reports, and messages from patients that should have been answered yesterday. These are some of the common frustrations and failures in day-to-day clinical practice. Many clinicians also know the guilt, shame, and self-doubt that occur when patients suffer a serious complication or die due to a mistake made by the clinician, health care team, or health care system. Between the common frustrations of practice and the rare patient death due to an error lies a large chasm, a rarely explored territory of relationships, causes and effects, and mitigating factors. Looking backward from a catastrophic patient outcome rarely goes beyond blaming the immediate person “at fault.” 1 Looking forward from common charting errors rarely goes beyond a conclusion to be “more careful.”

Hospital-based research has categorized preventable adverse outcomes and some process errors associated with them,2-4 but this has not been done in primary care.5 There are difficulties in studying errors in the primary care setting: care takes place in many locations; involves multiple visits; is provided in person, by phone, by mail, and even by computer; and involves interactions with many health care workers. However, it is important to study errors in primary care6 because it is the location of most health care visits in the United States.7

A classification or taxonomy of errors and preventable adverse events is an important first step in improving patient care. Prevalence and epidemiology studies, clinical and system interventions, and even individual practice group databases of errors and adverse events8 can more easily be developed if there is a beginning classification system. Just as clinicians use a differential diagnostic list for analyzing symptoms or a list of risk factors for assessing disease, so, too, can clinicians use a classification and listing of process errors and preventable adverse events to “diagnose” and “prevent” patient harm from medical care. Many taxonomies of medical error do exist and have been used in hospital accreditation or malpractice contexts for some time.9 These taxonomies have not been generally available for purposes other than their intended use, ie, to help their developers understand the data they were dealing with, and because these data do not originate from primary care practice, it remains unknown how well the taxonomies might meet the needs of family physicians and other primary care researchers.

 

 

The purpose of this study was to use published data from original research to understand and classify process errors and preventable adverse events associated with primary medical care. Through a systematic review and synthesis of the medical literature, we developed a classification of medical errors relevant to primary care.

Methods

To identify eligible published English-language original research articles, we searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH search term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care to the primary term. Published bibliographies from the National Patient Safety Foundation (NPSF) and the Institute for Healthcare Improvement (IHI) were also reviewed. The Web sites of the American Academy of Family Physicians, the American College of Physicians–American Society of Internal Medicine, the Institute of Medicine, the NPSF, and the IHI were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed, from seminal papers in the field, and from discussion with others working in the field of patient safety or quality improvement in primary care.

We reviewed titles of 379 articles identified by electronic searches for inclusion. We excluded papers if they related to comparisons of different approaches to diagnosis or treatment of specific diseases, the evaluation of teaching or research tools, or exclusively to hospitalized patients. If there was uncertainty as to the appropriateness of an article, we read the abstract. We reviewed complete papers if they appeared from the title and abstract to report original research involving a broad assessment of medical errors and preventable adverse events in primary care. Data relating to topic, study quality, and research results were abstracted from identified papers. Both authors performed independent MEDLINE searches and reviewed citations in the papers. To broaden the search for potential studies, one author searched Web sites and NPSF and IHI bibliographies. Both authors agreed on the inclusion of the chosen studies, appraised them independently, and abstracted key classification components. One author (N.C.E.) initially prepared the classification system presented here; it was then reviewed by both authors and revised after their discussions.

Results

Four original research studies directly studied and described medical errors and preventable adverse events in primary care.10-13 Three other studies peripherally addressed primary care medical errors as part of an investigation with another central focus14-16 (Table 1).

TABLE 1
Primary care studies describing medical error

StudyResearch purposeDefinition of errorMethodPertinent results
Primary care studies directly describing medical error
Bhasale et al10Describe incidents occurring in general practiceAn unintended event, no matter how seemingly trivial or commonplace, that could have harmed or did harm a patientSelf-report by 324 Australian sentinel research network FPs using reporting cards805 incidents reported, 76% preventable; categories were drug management, non-drug management, diagnosis, and equipment; causes included communication, actions of others, and clinical judgment errors
Ely et al12Describe the causes to which family physicians attribute errorsAct or omission for which the physician felt responsible and which had serious consequences for the patient30-min interviews with 53 randomly chosen Iowa FPs53 errors reported: delayed diagnoses, surgical and medical treatment mishaps; causes included physical stressors, process of care factors, patient related factors, and physician characteristics
Dovey et al11Describe medical errors reported by FPsSomething in one’s practice that should not have happened, that was not anticipated, and that makes one say, “I don’t want it to happen again”Self-report by 42 American research network FPs using electronic and reply card reporting330 reported errors, 83% from health care system and 13% from knowledge and skills; subcategories were office administration, investigations, treatments, communication, execution of clinical tasks, misdiagnosis, and wrong treatment decision
Fischer et al13Describe the prevalence of adverse events in a risk management databaseIncidents resulting in, or having the potential for, physical, emotional, or financial liability for the patientReview of incident reports entered by 8 primary care clinics into risk management databasePrevalence of adverse events was 3.7/100,000 clinic visits, 83% were preventable; categories included diagnostic, treatment, and preventive and other errors
Primary care studies peripherally describing medical error
Holden et al15Determine patterns of death and potential preventive factors Formal review of all patient deaths in a group of general practices5.1% of deaths due to preventable FP factors; 2 main categories were delay of diagnosis and treatment and lack of prevention with aspirin therapy
Gandhi et al14communicationEvaluate primary care and specialist inter physician Surveys in academic medical centerMain issues for doctors were lack of timeliness and inadequate content
Britten et al16Describe misunderstandings between patients and FPs Qualitative study using 5 data sources14 categories of misunderstandings were identified
FP, family physician.
 

 

Outcome measures

Bhasale and colleagues10 and Fischer and coworkers13 collected patient outcome data; they specifically examined incidents that had “harmed” patients or had “potential for harm.” Ely and associates12 also studied incidents causing patients harm by investigating possible causes of these incidents. Dovey and colleagues11 reported physician-observed errors regardless of whether they were associated with an adverse event. Britten and coworkers16 analyzed misunderstandings between patients and physicians that had adverse consequences for taking medicines. Gandhi and associates14 described communication between primary care physicians and specialists. Holden and colleagues15 investigated deaths in general practices. All these studies attempted some categorization of medical errors. Bhasale and associates10 and Fischer and colleagues13 defined 4 incident cate gories and then assessed preventability. Dovey and coworkers11 and Ely and associates12 placed medical errors into categories, and Bhasale and colleagues10 listed a number of contributing factors. Britten and coworkers16 and Gandhi and associates14 categorized clinician communication problems. Holden and colleagues15 classified clinician actions that led to preventable deaths.

Due to the multiple methods used in the 7 studies and the descriptive nature of the studies, a standard assessment of quality and quantitative synthesis of data were not possible. Six studies used practicing community-based primary care physicians as their main study group. The study by Gandhi and coworkers, of communication between primary care physicians and specialists,14 was performed in an academic institution.

Classification system

We derived the following classification system (outlined in Tables 2 and 3) from the errors and preventable adverse events reported in these 7 studies.10-16Table 2 defines the three main categories of preventable adverse events related by primary care physicians: diagnosis, treatment, and preventive services. These offer descriptors of what went wrong in the care of the patient but not of the level of harm. For example, a patient who was prescribed and took an incorrect drug has experienced a preventable adverse event. As a consequence, that patient may suffer no ill effects (a near miss), may die from anaphylaxis, or may experience some intermediate outcome (such as a rash).

Table 3 outlines “process errors” that clarify why something went wrong. For example, Why was the patient prescribed an incorrect drug? The answer may lie with a clinician factor (the doctor took an inadequate history), a communication factor (not dealing with a language or cultural barrier), an administrative factor (the medical chart was missing), or a blunt end factor (Medicare regulations). Often, multiple factors may be involved.

TABLE 2
Classification of preventable adverse events in primary care

Diagnosis
Related to symptoms
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Related to prevention
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Treatment
Drug
  Incorrect drug
  Incorrect dose
  Delayed administration
  Omitted administration
Non-drug
  Inappropriate
  Delayed
  Omitted
  Procedural complication
Preventive services
Inappropriate
Delayed
Omitted
Procedural complication

TABLE 3
Classification of process errors in primary care

Clinician factors
Clinical judgment
Procedural skills error
Communication factors
Clinician–patient
Clinician–clinician or health care system personnel
Administration factors
Clinician
Pharmacy
Ancillary providers (physical therapy, occupational therapy, etc)
Office setting
Blunt end factors
Personal and family issues of clinicians and staff
Insurance company regulations
Government regulations
Funding and employers
Physical size and location of practice
General health care system

Discussion

The results of this literature synthesis are important for 3 main reasons. First, they offer a summary of the current state of published research. Second, by synthesizing the results of this small body of literature, we were able to develop a working classification system of preventable adverse events (what went wrong) and process errors (why did it go wrong). Third, this classification may clarify the relations between patient safety, process errors, and preventable adverse events in primary care.

Other published classification systems of medical errors and preventable adverse events range from sparse (3 categories with 19 root causes)17 to dense (80 categories with more than 12,000 branching trees).18They generally derive from studies of safety in non-medical industries17 or from studies emphasizing hospital care.2,18 In a recent review of the medical literature, Wilson and Sheikh noted the lack of a typology of medical errors in primary care and reasoned that the key safety issues in primary care are in the arenas of diagnosis, prescribing, communication, and organizational change.5 Their conclusions are congruent with ours, and our more structured classification system contains these arenas.

The classification in Table 3 was generated from research in primary care settings by using data from practicing family physicians and general practitioners. (A more complete version of Table 3 may be found at http://www.jfponline.com.) If the classification is valid and useful, it should assist clinicians and researchers in understanding how process errors and preventable adverse events happen during the practice of primary care. Models assist us in understanding these relations. Among previously proposed models are the “Swiss Cheese”19 and the “Toxic Cascades.”20 The Swiss Cheese model postulates that barriers exist to prevent adverse events, but they are like slices of Swiss cheese with many holes (or errors) in them. Adverse events happen when the holes in many layers temporarily line up. The Toxic Cascades model conceptualizes 4 levels of threats to patient safety: trickles, which leave little trace of their existence; creeks, which have potential seriousness; rivers, which are the actual errors that harm patients; and torrents, which are errors that lead to a patient’s death or serious injury. From our classification, we can define some of the holes in the Swiss Cheese and name many trickles and creeks in primary care Toxic Cascades.

 

 

However, we found a striking gap in the literature of an absence of discussion of the contribution of patient factors to medical errors, despite a logic suggesting these are important issues.21,22 A new model of patient safety dynamics should incorporate features of these models and add patient issues. Our proposed “Hourglass” model, derived from the classification system, incorporates 4 potential components of preventable adverse events in the primary care setting: 2 relating mainly to the primary health care system (process errors and patient safety factors) and 2 relating mainly to patients (patient risk factors for adverse events and patient-controlled patient safety factors; Figure). At the top of the hourglass, patient encounters enter like pieces of sand that flow through a health care system full of process errors that happen regularly. But, as in the Swiss Cheese model, there are barriers (patient safety factors) stopping these process errors from becoming preventable adverse events. Unfortunately, these barriers sometimes allow errors to slip through and a bad outcome results. Luckily, only a small number of patient encounters likely exits the primary health care system with a preventable adverse event, as demonstrated by the narrow part of the hourglass.

Outside the doctor’s office, factors in the patient’s milieu influence the probability of a preventable adverse event occurring. We postulate an experience analogous to that within the health care system. There are more factors increasing a patient’s likelihood of suffering a preventable adverse event,23 but there are also patient-controlled factors serving as barriers against errors and their consequences. These are not well researched24 but occur, for example, when a patient receives a blue pill from the pharmacy that had been pink in the past. The patient may prevent an adverse event by not taking the pill and double-checking with the clinician and pharmacist.

The order in which various process errors and safety factors interact with each other likely varies with each encounter and episode. Interactions within the classification suggest that, for any episode of disease or preventive care, the hourglass gets shaken and turned over numerous times as the health care system and patient factors interact with each other at multiple levels.

Future research needs

The literature review that led to our classification system and the proposed model of interaction have identified specific areas for future study. These include assessing patients’ perspectives, investigating prevalence and causality, and testing interventions designed to improve patient safety. The current medical literature based primarily on physician reports describes events that are meaningful to the physician half of the dyad between patient and physician. Patients’ opinions about what constitutes error and the role of patients as active participants in error and safety are unknown,24 although preliminary studies are currently underway.25

No published studies to date have explored the prevalence of preventable adverse events and errors in primary care. Physician self-report biases reporting toward remembered events and errors. In addition, medical error studies to date have not directly studied causal links between errors and adverse events.26,27 Observational and epidemiologic studies incorporating multiple methods may be necessary to ascertain and compare all components of the medical error equation: the amount of harm done, the preventable adverse events and near misses, the process errors, and the error-free functioning of the health care system. Although observational studies have assessed adverse events in a hospital setting28 and described primary care practices,29 they have not been used to assess preventable adverse events in the primary care setting.

This literature review and synthesis may have missed some studies that merited inclusion. Only English-language studies were included. Studies pertaining to specific diseases, diagnoses, or treatments or from non-primary care settings may have shed light on the interaction of errors, adverse events, and harm but could not have helped in defining a classification system for primary care errors. The small number of studies available and their small sample sizes also limit the depth and breadth of derived classification components.

Decreasing medical errors and increasing patient safety are important parts of quality health care.30 Currently, the research agenda aiming to identify effective error reduction strategies appears to be based more on ease of study subject or accessibility of patients than on the severity or importance of the problem.31 By categorizing process errors and preventable adverse events and studying their relations more thoroughly and by adding the patient’s perspective, interventions can be designed that address the most common and the most serious of preventable adverse events in primary care.

References

1. Leape LL. Error in medicine. JAMA 1994;272:1851-68.

2. Lesar TS, Briceland L, Stein DS. Factors related to errors in medication prescribing. JAMA 1997;277:312-7.

3. Leape LL, Brennan TA, Laird N, et al. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med 1991;324:377-84.

4. Thomas EJ, Studdert DM, Burstin HR, et al. Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 2000;38:261-71.

5. Wilson T, Sheikh A. Enhancing public safety in primary care. BMJ 2002;324:584-7.

6. Wilson T, Pringle M, Sheikh A. Promoting patient safety in primary care: Research, action, and leadership are required. BMJ 2001;323:583-4.

7. Green L, Fryer G, Yawn B, Lanier D, Dovey S. The ecology of medical care revisited. N Engl J Med 2001;344:2021-5.

8. Sheikh A, Hurwitz B. Setting up a database of medical error in general practice: conceptual and methodological considerations. Br J Gen Pract 2001;51:57-60.

9. Victoroff MS. The right intentions: errors and accountability. J Fam Pract 1997;45:38-9.

10. Bhasale AL, Miller GC, Reid S, Britt HC. Analysing potential harm in Australian general practice; an incident-monitoring study. Med J Aust 1998;169:73-6.

11. Dovey SM, Meyers DS, Phillips RL Jr, et al. A preliminary taxonomy of medical errors in family practice. Qual Saf Health Care 2002;11:233-8.

12. Ely JW, Levinson W, Elder NC, Mainous AG, III, Vinson DC. Perceived causes of family physicians’ errors. J Fam Pract 1995;40:337-44.

13. Fischer G, Fetters MD, Munro AP, Goldman EB. Adverse events in primary care identified from a risk-management database. J Fam Pract 1997;45:40-6.

14. Gandhi TK, Sittig DF, Franklin M, Sussman AJ, Fairchild DG, Bates DW. Communication breakdown in the outpatient referral process. J Gen Intern Med 2000;15:626-31.

15. Holden J, O’Donnell S, Brindley J, Miles L. Analysis of 1263 deaths in four general practices. Br J Gen Pract 1998;48:1409-12.

16. Britten N, Stevenson FA, Barry CA, Barber N, Bradley CP. Misunderstandings in prescribing decisions in general practice: qualitative study. BMJ 2000;320:484-8.

17. Battles JB, Shea CE. A system of analyzing medical errors to improve GME curricula and programs. Acad Med 2001;76:125-33.

18. Runciman WB, Helps SC, Sexton EJ, Malpass A. A classification for incidents and accidents in the health-care system. J Qual Clin Pract 1998;18:199-211.

19. Reason J. Human error: models and management. BMJ 2000;320:768-70.

20. Toxic: cascades: a comprehensive way to think about medical errors. Am Fam Phys 2000;62-848.

21. Barach P, Moss F. Delivering safe health care. BMJ 2001;232:585-6.

22. Deyo R. A key medical decision maker: the patient. BMJ 2001;323:466-7.

23. Kohn L, Corrigan J, Donaldson M. To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press; 1999.

24. Pizzi L, Goldfarb N, Nash D. Other Practices Related to Patient Participation in Making Health Care Safer: A Critical Analysis of Patient Safety Practices. Rockville, MD: Agency for Healthcare Quality and Research; 2001. AHRQ publication 01-E058.

25. Kuzel A, Woolf S, Engel J, et al. Characterizing medical error in primary care settings. Paper presented at: North American Primary Care Research Group 29th Annual Meeting; 2001; Halifax, Nova Scotia.

26. Hofer TP, Kerr EA, Hayward RA. What is an error? Effect Clin Pract 2000;3:261-9.

27. Brennan TA. The Institute of Medicine report on medical errors— could it do harm? N Engl J Med 2000;342:1123-5.

28. Andrews LB, Stocking C, Krizek T, Gottlieb LKC, Vargish T, Siegler M. An alternative strategy for studying adverse events in medical care. Lancet 1997;349:309-13.

29. Stange KC, Zyzanski SJ, Jaen CR, et al. Illuminating the “black box.” A description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.

30. Committee on Health Care Quality in America. Crossing the Quality Chasm. A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001.

31. Ioannidis J, Lau J. Evidence on interventions to reduce medical errors. J Gen Intern Med 2001;16:325-34.

Article PDF
Author and Disclosure Information

NANCY C. ELDER, MD, MSPH
SUSAN M. DOVEY, PHD
Cincinnati, Ohio, and Washington, DC
From the Department of Family Medicine, University of Cincinnati, Cincinnati, OH (N.C.E.) and the Robert Graham Center for Policy Studies in Family Practice and Primary Care, Washington, DC (S.M.D.). The authors report no competing interests. Address reprint requests to Nancy C. Elder, MD, MSPH, Associate Professor, Department of Family Medicine, University of Cincinnati, PO Box 670582, Eden Avenue and Albert Sabin Way, Cincinnati, OH 45267-0582. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
927-932
Legacy Keywords
,Medical errorprimary care physiciansfamily physicians. (J Fam Pract 2002;51:927–932)
Sections
Author and Disclosure Information

NANCY C. ELDER, MD, MSPH
SUSAN M. DOVEY, PHD
Cincinnati, Ohio, and Washington, DC
From the Department of Family Medicine, University of Cincinnati, Cincinnati, OH (N.C.E.) and the Robert Graham Center for Policy Studies in Family Practice and Primary Care, Washington, DC (S.M.D.). The authors report no competing interests. Address reprint requests to Nancy C. Elder, MD, MSPH, Associate Professor, Department of Family Medicine, University of Cincinnati, PO Box 670582, Eden Avenue and Albert Sabin Way, Cincinnati, OH 45267-0582. E-mail: [email protected].

Author and Disclosure Information

NANCY C. ELDER, MD, MSPH
SUSAN M. DOVEY, PHD
Cincinnati, Ohio, and Washington, DC
From the Department of Family Medicine, University of Cincinnati, Cincinnati, OH (N.C.E.) and the Robert Graham Center for Policy Studies in Family Practice and Primary Care, Washington, DC (S.M.D.). The authors report no competing interests. Address reprint requests to Nancy C. Elder, MD, MSPH, Associate Professor, Department of Family Medicine, University of Cincinnati, PO Box 670582, Eden Avenue and Albert Sabin Way, Cincinnati, OH 45267-0582. E-mail: [email protected].

Article PDF
Article PDF

ABSTRACT

OBJECTIVE: To describe and classify process errors and preventable adverse events that occur from medical care in outpatient primary care settings.

STUDY DESIGN: Systematic review and synthesis of the medical literature.

DATA SOURCES: We searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care and limited the search to English-language publications. Published bibliographies and Web sites from patient safety and primary care organizations were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed and from seminal papers in the field.

OUTCOMES MEASURED: Process errors and preventable adverse events.

RESULTS: Four original research studies directly studied and described medical errors and adverse events in primary care, and 3 other studies peripherally addressed primary care medical errors. A variety of quantitative and qualitative methods were used in the studies. Extraction of results from the studies led to a classification of 3 main categories of preventable adverse events: diagnosis, treatment, and preventive services. Process errors were classified into 4 categories: clinician, communication, administration, and blunt end.

CONCLUSIONS: Original research on medical errors in the primary care setting consists of a limited number of small studies that offer a rich description of medical errors and preventable adverse events primarily from the physician’s viewpoint. We describe a classification derived from these studies that is based on the actual practice of primary care and provides a starting point for future epidemiologic and interventional research. Missing are studies that have patient, consumer, or other health care provider input.

KEY POINTS FOR CLINICIANS

  • Little is known about medical errors and preventable adverse events in the primary care setting.
  • Preventable adverse events reported from primary care practices include diagnostic, treatment, and preventive care incidents.
  • Process errors reported from primary care practices can be categorized as clinician factors (judgment, decision making, skill execution), communication factors (between clinician and patient and between health care providers), administration factors (office and personnel issues), and blunt end factors (insurance and government regulations).
  • Current knowledge of errors and preventable adverse events in primary care is missing input from patients and other health care providers.

Every primary care clinician in the United States knows the frustration of lost charts, misplaced reports, and messages from patients that should have been answered yesterday. These are some of the common frustrations and failures in day-to-day clinical practice. Many clinicians also know the guilt, shame, and self-doubt that occur when patients suffer a serious complication or die due to a mistake made by the clinician, health care team, or health care system. Between the common frustrations of practice and the rare patient death due to an error lies a large chasm, a rarely explored territory of relationships, causes and effects, and mitigating factors. Looking backward from a catastrophic patient outcome rarely goes beyond blaming the immediate person “at fault.” 1 Looking forward from common charting errors rarely goes beyond a conclusion to be “more careful.”

Hospital-based research has categorized preventable adverse outcomes and some process errors associated with them,2-4 but this has not been done in primary care.5 There are difficulties in studying errors in the primary care setting: care takes place in many locations; involves multiple visits; is provided in person, by phone, by mail, and even by computer; and involves interactions with many health care workers. However, it is important to study errors in primary care6 because it is the location of most health care visits in the United States.7

A classification or taxonomy of errors and preventable adverse events is an important first step in improving patient care. Prevalence and epidemiology studies, clinical and system interventions, and even individual practice group databases of errors and adverse events8 can more easily be developed if there is a beginning classification system. Just as clinicians use a differential diagnostic list for analyzing symptoms or a list of risk factors for assessing disease, so, too, can clinicians use a classification and listing of process errors and preventable adverse events to “diagnose” and “prevent” patient harm from medical care. Many taxonomies of medical error do exist and have been used in hospital accreditation or malpractice contexts for some time.9 These taxonomies have not been generally available for purposes other than their intended use, ie, to help their developers understand the data they were dealing with, and because these data do not originate from primary care practice, it remains unknown how well the taxonomies might meet the needs of family physicians and other primary care researchers.

 

 

The purpose of this study was to use published data from original research to understand and classify process errors and preventable adverse events associated with primary medical care. Through a systematic review and synthesis of the medical literature, we developed a classification of medical errors relevant to primary care.

Methods

To identify eligible published English-language original research articles, we searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH search term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care to the primary term. Published bibliographies from the National Patient Safety Foundation (NPSF) and the Institute for Healthcare Improvement (IHI) were also reviewed. The Web sites of the American Academy of Family Physicians, the American College of Physicians–American Society of Internal Medicine, the Institute of Medicine, the NPSF, and the IHI were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed, from seminal papers in the field, and from discussion with others working in the field of patient safety or quality improvement in primary care.

We reviewed titles of 379 articles identified by electronic searches for inclusion. We excluded papers if they related to comparisons of different approaches to diagnosis or treatment of specific diseases, the evaluation of teaching or research tools, or exclusively to hospitalized patients. If there was uncertainty as to the appropriateness of an article, we read the abstract. We reviewed complete papers if they appeared from the title and abstract to report original research involving a broad assessment of medical errors and preventable adverse events in primary care. Data relating to topic, study quality, and research results were abstracted from identified papers. Both authors performed independent MEDLINE searches and reviewed citations in the papers. To broaden the search for potential studies, one author searched Web sites and NPSF and IHI bibliographies. Both authors agreed on the inclusion of the chosen studies, appraised them independently, and abstracted key classification components. One author (N.C.E.) initially prepared the classification system presented here; it was then reviewed by both authors and revised after their discussions.

Results

Four original research studies directly studied and described medical errors and preventable adverse events in primary care.10-13 Three other studies peripherally addressed primary care medical errors as part of an investigation with another central focus14-16 (Table 1).

TABLE 1
Primary care studies describing medical error

StudyResearch purposeDefinition of errorMethodPertinent results
Primary care studies directly describing medical error
Bhasale et al10Describe incidents occurring in general practiceAn unintended event, no matter how seemingly trivial or commonplace, that could have harmed or did harm a patientSelf-report by 324 Australian sentinel research network FPs using reporting cards805 incidents reported, 76% preventable; categories were drug management, non-drug management, diagnosis, and equipment; causes included communication, actions of others, and clinical judgment errors
Ely et al12Describe the causes to which family physicians attribute errorsAct or omission for which the physician felt responsible and which had serious consequences for the patient30-min interviews with 53 randomly chosen Iowa FPs53 errors reported: delayed diagnoses, surgical and medical treatment mishaps; causes included physical stressors, process of care factors, patient related factors, and physician characteristics
Dovey et al11Describe medical errors reported by FPsSomething in one’s practice that should not have happened, that was not anticipated, and that makes one say, “I don’t want it to happen again”Self-report by 42 American research network FPs using electronic and reply card reporting330 reported errors, 83% from health care system and 13% from knowledge and skills; subcategories were office administration, investigations, treatments, communication, execution of clinical tasks, misdiagnosis, and wrong treatment decision
Fischer et al13Describe the prevalence of adverse events in a risk management databaseIncidents resulting in, or having the potential for, physical, emotional, or financial liability for the patientReview of incident reports entered by 8 primary care clinics into risk management databasePrevalence of adverse events was 3.7/100,000 clinic visits, 83% were preventable; categories included diagnostic, treatment, and preventive and other errors
Primary care studies peripherally describing medical error
Holden et al15Determine patterns of death and potential preventive factors Formal review of all patient deaths in a group of general practices5.1% of deaths due to preventable FP factors; 2 main categories were delay of diagnosis and treatment and lack of prevention with aspirin therapy
Gandhi et al14communicationEvaluate primary care and specialist inter physician Surveys in academic medical centerMain issues for doctors were lack of timeliness and inadequate content
Britten et al16Describe misunderstandings between patients and FPs Qualitative study using 5 data sources14 categories of misunderstandings were identified
FP, family physician.
 

 

Outcome measures

Bhasale and colleagues10 and Fischer and coworkers13 collected patient outcome data; they specifically examined incidents that had “harmed” patients or had “potential for harm.” Ely and associates12 also studied incidents causing patients harm by investigating possible causes of these incidents. Dovey and colleagues11 reported physician-observed errors regardless of whether they were associated with an adverse event. Britten and coworkers16 analyzed misunderstandings between patients and physicians that had adverse consequences for taking medicines. Gandhi and associates14 described communication between primary care physicians and specialists. Holden and colleagues15 investigated deaths in general practices. All these studies attempted some categorization of medical errors. Bhasale and associates10 and Fischer and colleagues13 defined 4 incident cate gories and then assessed preventability. Dovey and coworkers11 and Ely and associates12 placed medical errors into categories, and Bhasale and colleagues10 listed a number of contributing factors. Britten and coworkers16 and Gandhi and associates14 categorized clinician communication problems. Holden and colleagues15 classified clinician actions that led to preventable deaths.

Due to the multiple methods used in the 7 studies and the descriptive nature of the studies, a standard assessment of quality and quantitative synthesis of data were not possible. Six studies used practicing community-based primary care physicians as their main study group. The study by Gandhi and coworkers, of communication between primary care physicians and specialists,14 was performed in an academic institution.

Classification system

We derived the following classification system (outlined in Tables 2 and 3) from the errors and preventable adverse events reported in these 7 studies.10-16Table 2 defines the three main categories of preventable adverse events related by primary care physicians: diagnosis, treatment, and preventive services. These offer descriptors of what went wrong in the care of the patient but not of the level of harm. For example, a patient who was prescribed and took an incorrect drug has experienced a preventable adverse event. As a consequence, that patient may suffer no ill effects (a near miss), may die from anaphylaxis, or may experience some intermediate outcome (such as a rash).

Table 3 outlines “process errors” that clarify why something went wrong. For example, Why was the patient prescribed an incorrect drug? The answer may lie with a clinician factor (the doctor took an inadequate history), a communication factor (not dealing with a language or cultural barrier), an administrative factor (the medical chart was missing), or a blunt end factor (Medicare regulations). Often, multiple factors may be involved.

TABLE 2
Classification of preventable adverse events in primary care

Diagnosis
Related to symptoms
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Related to prevention
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Treatment
Drug
  Incorrect drug
  Incorrect dose
  Delayed administration
  Omitted administration
Non-drug
  Inappropriate
  Delayed
  Omitted
  Procedural complication
Preventive services
Inappropriate
Delayed
Omitted
Procedural complication

TABLE 3
Classification of process errors in primary care

Clinician factors
Clinical judgment
Procedural skills error
Communication factors
Clinician–patient
Clinician–clinician or health care system personnel
Administration factors
Clinician
Pharmacy
Ancillary providers (physical therapy, occupational therapy, etc)
Office setting
Blunt end factors
Personal and family issues of clinicians and staff
Insurance company regulations
Government regulations
Funding and employers
Physical size and location of practice
General health care system

Discussion

The results of this literature synthesis are important for 3 main reasons. First, they offer a summary of the current state of published research. Second, by synthesizing the results of this small body of literature, we were able to develop a working classification system of preventable adverse events (what went wrong) and process errors (why did it go wrong). Third, this classification may clarify the relations between patient safety, process errors, and preventable adverse events in primary care.

Other published classification systems of medical errors and preventable adverse events range from sparse (3 categories with 19 root causes)17 to dense (80 categories with more than 12,000 branching trees).18They generally derive from studies of safety in non-medical industries17 or from studies emphasizing hospital care.2,18 In a recent review of the medical literature, Wilson and Sheikh noted the lack of a typology of medical errors in primary care and reasoned that the key safety issues in primary care are in the arenas of diagnosis, prescribing, communication, and organizational change.5 Their conclusions are congruent with ours, and our more structured classification system contains these arenas.

The classification in Table 3 was generated from research in primary care settings by using data from practicing family physicians and general practitioners. (A more complete version of Table 3 may be found at http://www.jfponline.com.) If the classification is valid and useful, it should assist clinicians and researchers in understanding how process errors and preventable adverse events happen during the practice of primary care. Models assist us in understanding these relations. Among previously proposed models are the “Swiss Cheese”19 and the “Toxic Cascades.”20 The Swiss Cheese model postulates that barriers exist to prevent adverse events, but they are like slices of Swiss cheese with many holes (or errors) in them. Adverse events happen when the holes in many layers temporarily line up. The Toxic Cascades model conceptualizes 4 levels of threats to patient safety: trickles, which leave little trace of their existence; creeks, which have potential seriousness; rivers, which are the actual errors that harm patients; and torrents, which are errors that lead to a patient’s death or serious injury. From our classification, we can define some of the holes in the Swiss Cheese and name many trickles and creeks in primary care Toxic Cascades.

 

 

However, we found a striking gap in the literature of an absence of discussion of the contribution of patient factors to medical errors, despite a logic suggesting these are important issues.21,22 A new model of patient safety dynamics should incorporate features of these models and add patient issues. Our proposed “Hourglass” model, derived from the classification system, incorporates 4 potential components of preventable adverse events in the primary care setting: 2 relating mainly to the primary health care system (process errors and patient safety factors) and 2 relating mainly to patients (patient risk factors for adverse events and patient-controlled patient safety factors; Figure). At the top of the hourglass, patient encounters enter like pieces of sand that flow through a health care system full of process errors that happen regularly. But, as in the Swiss Cheese model, there are barriers (patient safety factors) stopping these process errors from becoming preventable adverse events. Unfortunately, these barriers sometimes allow errors to slip through and a bad outcome results. Luckily, only a small number of patient encounters likely exits the primary health care system with a preventable adverse event, as demonstrated by the narrow part of the hourglass.

Outside the doctor’s office, factors in the patient’s milieu influence the probability of a preventable adverse event occurring. We postulate an experience analogous to that within the health care system. There are more factors increasing a patient’s likelihood of suffering a preventable adverse event,23 but there are also patient-controlled factors serving as barriers against errors and their consequences. These are not well researched24 but occur, for example, when a patient receives a blue pill from the pharmacy that had been pink in the past. The patient may prevent an adverse event by not taking the pill and double-checking with the clinician and pharmacist.

The order in which various process errors and safety factors interact with each other likely varies with each encounter and episode. Interactions within the classification suggest that, for any episode of disease or preventive care, the hourglass gets shaken and turned over numerous times as the health care system and patient factors interact with each other at multiple levels.

Future research needs

The literature review that led to our classification system and the proposed model of interaction have identified specific areas for future study. These include assessing patients’ perspectives, investigating prevalence and causality, and testing interventions designed to improve patient safety. The current medical literature based primarily on physician reports describes events that are meaningful to the physician half of the dyad between patient and physician. Patients’ opinions about what constitutes error and the role of patients as active participants in error and safety are unknown,24 although preliminary studies are currently underway.25

No published studies to date have explored the prevalence of preventable adverse events and errors in primary care. Physician self-report biases reporting toward remembered events and errors. In addition, medical error studies to date have not directly studied causal links between errors and adverse events.26,27 Observational and epidemiologic studies incorporating multiple methods may be necessary to ascertain and compare all components of the medical error equation: the amount of harm done, the preventable adverse events and near misses, the process errors, and the error-free functioning of the health care system. Although observational studies have assessed adverse events in a hospital setting28 and described primary care practices,29 they have not been used to assess preventable adverse events in the primary care setting.

This literature review and synthesis may have missed some studies that merited inclusion. Only English-language studies were included. Studies pertaining to specific diseases, diagnoses, or treatments or from non-primary care settings may have shed light on the interaction of errors, adverse events, and harm but could not have helped in defining a classification system for primary care errors. The small number of studies available and their small sample sizes also limit the depth and breadth of derived classification components.

Decreasing medical errors and increasing patient safety are important parts of quality health care.30 Currently, the research agenda aiming to identify effective error reduction strategies appears to be based more on ease of study subject or accessibility of patients than on the severity or importance of the problem.31 By categorizing process errors and preventable adverse events and studying their relations more thoroughly and by adding the patient’s perspective, interventions can be designed that address the most common and the most serious of preventable adverse events in primary care.

ABSTRACT

OBJECTIVE: To describe and classify process errors and preventable adverse events that occur from medical care in outpatient primary care settings.

STUDY DESIGN: Systematic review and synthesis of the medical literature.

DATA SOURCES: We searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care and limited the search to English-language publications. Published bibliographies and Web sites from patient safety and primary care organizations were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed and from seminal papers in the field.

OUTCOMES MEASURED: Process errors and preventable adverse events.

RESULTS: Four original research studies directly studied and described medical errors and adverse events in primary care, and 3 other studies peripherally addressed primary care medical errors. A variety of quantitative and qualitative methods were used in the studies. Extraction of results from the studies led to a classification of 3 main categories of preventable adverse events: diagnosis, treatment, and preventive services. Process errors were classified into 4 categories: clinician, communication, administration, and blunt end.

CONCLUSIONS: Original research on medical errors in the primary care setting consists of a limited number of small studies that offer a rich description of medical errors and preventable adverse events primarily from the physician’s viewpoint. We describe a classification derived from these studies that is based on the actual practice of primary care and provides a starting point for future epidemiologic and interventional research. Missing are studies that have patient, consumer, or other health care provider input.

KEY POINTS FOR CLINICIANS

  • Little is known about medical errors and preventable adverse events in the primary care setting.
  • Preventable adverse events reported from primary care practices include diagnostic, treatment, and preventive care incidents.
  • Process errors reported from primary care practices can be categorized as clinician factors (judgment, decision making, skill execution), communication factors (between clinician and patient and between health care providers), administration factors (office and personnel issues), and blunt end factors (insurance and government regulations).
  • Current knowledge of errors and preventable adverse events in primary care is missing input from patients and other health care providers.

Every primary care clinician in the United States knows the frustration of lost charts, misplaced reports, and messages from patients that should have been answered yesterday. These are some of the common frustrations and failures in day-to-day clinical practice. Many clinicians also know the guilt, shame, and self-doubt that occur when patients suffer a serious complication or die due to a mistake made by the clinician, health care team, or health care system. Between the common frustrations of practice and the rare patient death due to an error lies a large chasm, a rarely explored territory of relationships, causes and effects, and mitigating factors. Looking backward from a catastrophic patient outcome rarely goes beyond blaming the immediate person “at fault.” 1 Looking forward from common charting errors rarely goes beyond a conclusion to be “more careful.”

Hospital-based research has categorized preventable adverse outcomes and some process errors associated with them,2-4 but this has not been done in primary care.5 There are difficulties in studying errors in the primary care setting: care takes place in many locations; involves multiple visits; is provided in person, by phone, by mail, and even by computer; and involves interactions with many health care workers. However, it is important to study errors in primary care6 because it is the location of most health care visits in the United States.7

A classification or taxonomy of errors and preventable adverse events is an important first step in improving patient care. Prevalence and epidemiology studies, clinical and system interventions, and even individual practice group databases of errors and adverse events8 can more easily be developed if there is a beginning classification system. Just as clinicians use a differential diagnostic list for analyzing symptoms or a list of risk factors for assessing disease, so, too, can clinicians use a classification and listing of process errors and preventable adverse events to “diagnose” and “prevent” patient harm from medical care. Many taxonomies of medical error do exist and have been used in hospital accreditation or malpractice contexts for some time.9 These taxonomies have not been generally available for purposes other than their intended use, ie, to help their developers understand the data they were dealing with, and because these data do not originate from primary care practice, it remains unknown how well the taxonomies might meet the needs of family physicians and other primary care researchers.

 

 

The purpose of this study was to use published data from original research to understand and classify process errors and preventable adverse events associated with primary medical care. Through a systematic review and synthesis of the medical literature, we developed a classification of medical errors relevant to primary care.

Methods

To identify eligible published English-language original research articles, we searched MEDLINE and the Cochrane Library from 1965 through March 2001 with the MESH search term medical errors, modified by adding family practice, primary health care, physicians/family, or ambulatory care to the primary term. Published bibliographies from the National Patient Safety Foundation (NPSF) and the Institute for Healthcare Improvement (IHI) were also reviewed. The Web sites of the American Academy of Family Physicians, the American College of Physicians–American Society of Internal Medicine, the Institute of Medicine, the NPSF, and the IHI were also reviewed for unpublished reports, presentations, and leads to other sites, journals, or investigators with relevant work. Additional papers were identified from the references of the papers reviewed, from seminal papers in the field, and from discussion with others working in the field of patient safety or quality improvement in primary care.

We reviewed titles of 379 articles identified by electronic searches for inclusion. We excluded papers if they related to comparisons of different approaches to diagnosis or treatment of specific diseases, the evaluation of teaching or research tools, or exclusively to hospitalized patients. If there was uncertainty as to the appropriateness of an article, we read the abstract. We reviewed complete papers if they appeared from the title and abstract to report original research involving a broad assessment of medical errors and preventable adverse events in primary care. Data relating to topic, study quality, and research results were abstracted from identified papers. Both authors performed independent MEDLINE searches and reviewed citations in the papers. To broaden the search for potential studies, one author searched Web sites and NPSF and IHI bibliographies. Both authors agreed on the inclusion of the chosen studies, appraised them independently, and abstracted key classification components. One author (N.C.E.) initially prepared the classification system presented here; it was then reviewed by both authors and revised after their discussions.

Results

Four original research studies directly studied and described medical errors and preventable adverse events in primary care.10-13 Three other studies peripherally addressed primary care medical errors as part of an investigation with another central focus14-16 (Table 1).

TABLE 1
Primary care studies describing medical error

StudyResearch purposeDefinition of errorMethodPertinent results
Primary care studies directly describing medical error
Bhasale et al10Describe incidents occurring in general practiceAn unintended event, no matter how seemingly trivial or commonplace, that could have harmed or did harm a patientSelf-report by 324 Australian sentinel research network FPs using reporting cards805 incidents reported, 76% preventable; categories were drug management, non-drug management, diagnosis, and equipment; causes included communication, actions of others, and clinical judgment errors
Ely et al12Describe the causes to which family physicians attribute errorsAct or omission for which the physician felt responsible and which had serious consequences for the patient30-min interviews with 53 randomly chosen Iowa FPs53 errors reported: delayed diagnoses, surgical and medical treatment mishaps; causes included physical stressors, process of care factors, patient related factors, and physician characteristics
Dovey et al11Describe medical errors reported by FPsSomething in one’s practice that should not have happened, that was not anticipated, and that makes one say, “I don’t want it to happen again”Self-report by 42 American research network FPs using electronic and reply card reporting330 reported errors, 83% from health care system and 13% from knowledge and skills; subcategories were office administration, investigations, treatments, communication, execution of clinical tasks, misdiagnosis, and wrong treatment decision
Fischer et al13Describe the prevalence of adverse events in a risk management databaseIncidents resulting in, or having the potential for, physical, emotional, or financial liability for the patientReview of incident reports entered by 8 primary care clinics into risk management databasePrevalence of adverse events was 3.7/100,000 clinic visits, 83% were preventable; categories included diagnostic, treatment, and preventive and other errors
Primary care studies peripherally describing medical error
Holden et al15Determine patterns of death and potential preventive factors Formal review of all patient deaths in a group of general practices5.1% of deaths due to preventable FP factors; 2 main categories were delay of diagnosis and treatment and lack of prevention with aspirin therapy
Gandhi et al14communicationEvaluate primary care and specialist inter physician Surveys in academic medical centerMain issues for doctors were lack of timeliness and inadequate content
Britten et al16Describe misunderstandings between patients and FPs Qualitative study using 5 data sources14 categories of misunderstandings were identified
FP, family physician.
 

 

Outcome measures

Bhasale and colleagues10 and Fischer and coworkers13 collected patient outcome data; they specifically examined incidents that had “harmed” patients or had “potential for harm.” Ely and associates12 also studied incidents causing patients harm by investigating possible causes of these incidents. Dovey and colleagues11 reported physician-observed errors regardless of whether they were associated with an adverse event. Britten and coworkers16 analyzed misunderstandings between patients and physicians that had adverse consequences for taking medicines. Gandhi and associates14 described communication between primary care physicians and specialists. Holden and colleagues15 investigated deaths in general practices. All these studies attempted some categorization of medical errors. Bhasale and associates10 and Fischer and colleagues13 defined 4 incident cate gories and then assessed preventability. Dovey and coworkers11 and Ely and associates12 placed medical errors into categories, and Bhasale and colleagues10 listed a number of contributing factors. Britten and coworkers16 and Gandhi and associates14 categorized clinician communication problems. Holden and colleagues15 classified clinician actions that led to preventable deaths.

Due to the multiple methods used in the 7 studies and the descriptive nature of the studies, a standard assessment of quality and quantitative synthesis of data were not possible. Six studies used practicing community-based primary care physicians as their main study group. The study by Gandhi and coworkers, of communication between primary care physicians and specialists,14 was performed in an academic institution.

Classification system

We derived the following classification system (outlined in Tables 2 and 3) from the errors and preventable adverse events reported in these 7 studies.10-16Table 2 defines the three main categories of preventable adverse events related by primary care physicians: diagnosis, treatment, and preventive services. These offer descriptors of what went wrong in the care of the patient but not of the level of harm. For example, a patient who was prescribed and took an incorrect drug has experienced a preventable adverse event. As a consequence, that patient may suffer no ill effects (a near miss), may die from anaphylaxis, or may experience some intermediate outcome (such as a rash).

Table 3 outlines “process errors” that clarify why something went wrong. For example, Why was the patient prescribed an incorrect drug? The answer may lie with a clinician factor (the doctor took an inadequate history), a communication factor (not dealing with a language or cultural barrier), an administrative factor (the medical chart was missing), or a blunt end factor (Medicare regulations). Often, multiple factors may be involved.

TABLE 2
Classification of preventable adverse events in primary care

Diagnosis
Related to symptoms
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Related to prevention
  Misdiagnosis
    Missed diagnosis
    Delayed diagnosis
Treatment
Drug
  Incorrect drug
  Incorrect dose
  Delayed administration
  Omitted administration
Non-drug
  Inappropriate
  Delayed
  Omitted
  Procedural complication
Preventive services
Inappropriate
Delayed
Omitted
Procedural complication

TABLE 3
Classification of process errors in primary care

Clinician factors
Clinical judgment
Procedural skills error
Communication factors
Clinician–patient
Clinician–clinician or health care system personnel
Administration factors
Clinician
Pharmacy
Ancillary providers (physical therapy, occupational therapy, etc)
Office setting
Blunt end factors
Personal and family issues of clinicians and staff
Insurance company regulations
Government regulations
Funding and employers
Physical size and location of practice
General health care system

Discussion

The results of this literature synthesis are important for 3 main reasons. First, they offer a summary of the current state of published research. Second, by synthesizing the results of this small body of literature, we were able to develop a working classification system of preventable adverse events (what went wrong) and process errors (why did it go wrong). Third, this classification may clarify the relations between patient safety, process errors, and preventable adverse events in primary care.

Other published classification systems of medical errors and preventable adverse events range from sparse (3 categories with 19 root causes)17 to dense (80 categories with more than 12,000 branching trees).18They generally derive from studies of safety in non-medical industries17 or from studies emphasizing hospital care.2,18 In a recent review of the medical literature, Wilson and Sheikh noted the lack of a typology of medical errors in primary care and reasoned that the key safety issues in primary care are in the arenas of diagnosis, prescribing, communication, and organizational change.5 Their conclusions are congruent with ours, and our more structured classification system contains these arenas.

The classification in Table 3 was generated from research in primary care settings by using data from practicing family physicians and general practitioners. (A more complete version of Table 3 may be found at http://www.jfponline.com.) If the classification is valid and useful, it should assist clinicians and researchers in understanding how process errors and preventable adverse events happen during the practice of primary care. Models assist us in understanding these relations. Among previously proposed models are the “Swiss Cheese”19 and the “Toxic Cascades.”20 The Swiss Cheese model postulates that barriers exist to prevent adverse events, but they are like slices of Swiss cheese with many holes (or errors) in them. Adverse events happen when the holes in many layers temporarily line up. The Toxic Cascades model conceptualizes 4 levels of threats to patient safety: trickles, which leave little trace of their existence; creeks, which have potential seriousness; rivers, which are the actual errors that harm patients; and torrents, which are errors that lead to a patient’s death or serious injury. From our classification, we can define some of the holes in the Swiss Cheese and name many trickles and creeks in primary care Toxic Cascades.

 

 

However, we found a striking gap in the literature of an absence of discussion of the contribution of patient factors to medical errors, despite a logic suggesting these are important issues.21,22 A new model of patient safety dynamics should incorporate features of these models and add patient issues. Our proposed “Hourglass” model, derived from the classification system, incorporates 4 potential components of preventable adverse events in the primary care setting: 2 relating mainly to the primary health care system (process errors and patient safety factors) and 2 relating mainly to patients (patient risk factors for adverse events and patient-controlled patient safety factors; Figure). At the top of the hourglass, patient encounters enter like pieces of sand that flow through a health care system full of process errors that happen regularly. But, as in the Swiss Cheese model, there are barriers (patient safety factors) stopping these process errors from becoming preventable adverse events. Unfortunately, these barriers sometimes allow errors to slip through and a bad outcome results. Luckily, only a small number of patient encounters likely exits the primary health care system with a preventable adverse event, as demonstrated by the narrow part of the hourglass.

Outside the doctor’s office, factors in the patient’s milieu influence the probability of a preventable adverse event occurring. We postulate an experience analogous to that within the health care system. There are more factors increasing a patient’s likelihood of suffering a preventable adverse event,23 but there are also patient-controlled factors serving as barriers against errors and their consequences. These are not well researched24 but occur, for example, when a patient receives a blue pill from the pharmacy that had been pink in the past. The patient may prevent an adverse event by not taking the pill and double-checking with the clinician and pharmacist.

The order in which various process errors and safety factors interact with each other likely varies with each encounter and episode. Interactions within the classification suggest that, for any episode of disease or preventive care, the hourglass gets shaken and turned over numerous times as the health care system and patient factors interact with each other at multiple levels.

Future research needs

The literature review that led to our classification system and the proposed model of interaction have identified specific areas for future study. These include assessing patients’ perspectives, investigating prevalence and causality, and testing interventions designed to improve patient safety. The current medical literature based primarily on physician reports describes events that are meaningful to the physician half of the dyad between patient and physician. Patients’ opinions about what constitutes error and the role of patients as active participants in error and safety are unknown,24 although preliminary studies are currently underway.25

No published studies to date have explored the prevalence of preventable adverse events and errors in primary care. Physician self-report biases reporting toward remembered events and errors. In addition, medical error studies to date have not directly studied causal links between errors and adverse events.26,27 Observational and epidemiologic studies incorporating multiple methods may be necessary to ascertain and compare all components of the medical error equation: the amount of harm done, the preventable adverse events and near misses, the process errors, and the error-free functioning of the health care system. Although observational studies have assessed adverse events in a hospital setting28 and described primary care practices,29 they have not been used to assess preventable adverse events in the primary care setting.

This literature review and synthesis may have missed some studies that merited inclusion. Only English-language studies were included. Studies pertaining to specific diseases, diagnoses, or treatments or from non-primary care settings may have shed light on the interaction of errors, adverse events, and harm but could not have helped in defining a classification system for primary care errors. The small number of studies available and their small sample sizes also limit the depth and breadth of derived classification components.

Decreasing medical errors and increasing patient safety are important parts of quality health care.30 Currently, the research agenda aiming to identify effective error reduction strategies appears to be based more on ease of study subject or accessibility of patients than on the severity or importance of the problem.31 By categorizing process errors and preventable adverse events and studying their relations more thoroughly and by adding the patient’s perspective, interventions can be designed that address the most common and the most serious of preventable adverse events in primary care.

References

1. Leape LL. Error in medicine. JAMA 1994;272:1851-68.

2. Lesar TS, Briceland L, Stein DS. Factors related to errors in medication prescribing. JAMA 1997;277:312-7.

3. Leape LL, Brennan TA, Laird N, et al. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med 1991;324:377-84.

4. Thomas EJ, Studdert DM, Burstin HR, et al. Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 2000;38:261-71.

5. Wilson T, Sheikh A. Enhancing public safety in primary care. BMJ 2002;324:584-7.

6. Wilson T, Pringle M, Sheikh A. Promoting patient safety in primary care: Research, action, and leadership are required. BMJ 2001;323:583-4.

7. Green L, Fryer G, Yawn B, Lanier D, Dovey S. The ecology of medical care revisited. N Engl J Med 2001;344:2021-5.

8. Sheikh A, Hurwitz B. Setting up a database of medical error in general practice: conceptual and methodological considerations. Br J Gen Pract 2001;51:57-60.

9. Victoroff MS. The right intentions: errors and accountability. J Fam Pract 1997;45:38-9.

10. Bhasale AL, Miller GC, Reid S, Britt HC. Analysing potential harm in Australian general practice; an incident-monitoring study. Med J Aust 1998;169:73-6.

11. Dovey SM, Meyers DS, Phillips RL Jr, et al. A preliminary taxonomy of medical errors in family practice. Qual Saf Health Care 2002;11:233-8.

12. Ely JW, Levinson W, Elder NC, Mainous AG, III, Vinson DC. Perceived causes of family physicians’ errors. J Fam Pract 1995;40:337-44.

13. Fischer G, Fetters MD, Munro AP, Goldman EB. Adverse events in primary care identified from a risk-management database. J Fam Pract 1997;45:40-6.

14. Gandhi TK, Sittig DF, Franklin M, Sussman AJ, Fairchild DG, Bates DW. Communication breakdown in the outpatient referral process. J Gen Intern Med 2000;15:626-31.

15. Holden J, O’Donnell S, Brindley J, Miles L. Analysis of 1263 deaths in four general practices. Br J Gen Pract 1998;48:1409-12.

16. Britten N, Stevenson FA, Barry CA, Barber N, Bradley CP. Misunderstandings in prescribing decisions in general practice: qualitative study. BMJ 2000;320:484-8.

17. Battles JB, Shea CE. A system of analyzing medical errors to improve GME curricula and programs. Acad Med 2001;76:125-33.

18. Runciman WB, Helps SC, Sexton EJ, Malpass A. A classification for incidents and accidents in the health-care system. J Qual Clin Pract 1998;18:199-211.

19. Reason J. Human error: models and management. BMJ 2000;320:768-70.

20. Toxic: cascades: a comprehensive way to think about medical errors. Am Fam Phys 2000;62-848.

21. Barach P, Moss F. Delivering safe health care. BMJ 2001;232:585-6.

22. Deyo R. A key medical decision maker: the patient. BMJ 2001;323:466-7.

23. Kohn L, Corrigan J, Donaldson M. To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press; 1999.

24. Pizzi L, Goldfarb N, Nash D. Other Practices Related to Patient Participation in Making Health Care Safer: A Critical Analysis of Patient Safety Practices. Rockville, MD: Agency for Healthcare Quality and Research; 2001. AHRQ publication 01-E058.

25. Kuzel A, Woolf S, Engel J, et al. Characterizing medical error in primary care settings. Paper presented at: North American Primary Care Research Group 29th Annual Meeting; 2001; Halifax, Nova Scotia.

26. Hofer TP, Kerr EA, Hayward RA. What is an error? Effect Clin Pract 2000;3:261-9.

27. Brennan TA. The Institute of Medicine report on medical errors— could it do harm? N Engl J Med 2000;342:1123-5.

28. Andrews LB, Stocking C, Krizek T, Gottlieb LKC, Vargish T, Siegler M. An alternative strategy for studying adverse events in medical care. Lancet 1997;349:309-13.

29. Stange KC, Zyzanski SJ, Jaen CR, et al. Illuminating the “black box.” A description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.

30. Committee on Health Care Quality in America. Crossing the Quality Chasm. A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001.

31. Ioannidis J, Lau J. Evidence on interventions to reduce medical errors. J Gen Intern Med 2001;16:325-34.

References

1. Leape LL. Error in medicine. JAMA 1994;272:1851-68.

2. Lesar TS, Briceland L, Stein DS. Factors related to errors in medication prescribing. JAMA 1997;277:312-7.

3. Leape LL, Brennan TA, Laird N, et al. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med 1991;324:377-84.

4. Thomas EJ, Studdert DM, Burstin HR, et al. Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 2000;38:261-71.

5. Wilson T, Sheikh A. Enhancing public safety in primary care. BMJ 2002;324:584-7.

6. Wilson T, Pringle M, Sheikh A. Promoting patient safety in primary care: Research, action, and leadership are required. BMJ 2001;323:583-4.

7. Green L, Fryer G, Yawn B, Lanier D, Dovey S. The ecology of medical care revisited. N Engl J Med 2001;344:2021-5.

8. Sheikh A, Hurwitz B. Setting up a database of medical error in general practice: conceptual and methodological considerations. Br J Gen Pract 2001;51:57-60.

9. Victoroff MS. The right intentions: errors and accountability. J Fam Pract 1997;45:38-9.

10. Bhasale AL, Miller GC, Reid S, Britt HC. Analysing potential harm in Australian general practice; an incident-monitoring study. Med J Aust 1998;169:73-6.

11. Dovey SM, Meyers DS, Phillips RL Jr, et al. A preliminary taxonomy of medical errors in family practice. Qual Saf Health Care 2002;11:233-8.

12. Ely JW, Levinson W, Elder NC, Mainous AG, III, Vinson DC. Perceived causes of family physicians’ errors. J Fam Pract 1995;40:337-44.

13. Fischer G, Fetters MD, Munro AP, Goldman EB. Adverse events in primary care identified from a risk-management database. J Fam Pract 1997;45:40-6.

14. Gandhi TK, Sittig DF, Franklin M, Sussman AJ, Fairchild DG, Bates DW. Communication breakdown in the outpatient referral process. J Gen Intern Med 2000;15:626-31.

15. Holden J, O’Donnell S, Brindley J, Miles L. Analysis of 1263 deaths in four general practices. Br J Gen Pract 1998;48:1409-12.

16. Britten N, Stevenson FA, Barry CA, Barber N, Bradley CP. Misunderstandings in prescribing decisions in general practice: qualitative study. BMJ 2000;320:484-8.

17. Battles JB, Shea CE. A system of analyzing medical errors to improve GME curricula and programs. Acad Med 2001;76:125-33.

18. Runciman WB, Helps SC, Sexton EJ, Malpass A. A classification for incidents and accidents in the health-care system. J Qual Clin Pract 1998;18:199-211.

19. Reason J. Human error: models and management. BMJ 2000;320:768-70.

20. Toxic: cascades: a comprehensive way to think about medical errors. Am Fam Phys 2000;62-848.

21. Barach P, Moss F. Delivering safe health care. BMJ 2001;232:585-6.

22. Deyo R. A key medical decision maker: the patient. BMJ 2001;323:466-7.

23. Kohn L, Corrigan J, Donaldson M. To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press; 1999.

24. Pizzi L, Goldfarb N, Nash D. Other Practices Related to Patient Participation in Making Health Care Safer: A Critical Analysis of Patient Safety Practices. Rockville, MD: Agency for Healthcare Quality and Research; 2001. AHRQ publication 01-E058.

25. Kuzel A, Woolf S, Engel J, et al. Characterizing medical error in primary care settings. Paper presented at: North American Primary Care Research Group 29th Annual Meeting; 2001; Halifax, Nova Scotia.

26. Hofer TP, Kerr EA, Hayward RA. What is an error? Effect Clin Pract 2000;3:261-9.

27. Brennan TA. The Institute of Medicine report on medical errors— could it do harm? N Engl J Med 2000;342:1123-5.

28. Andrews LB, Stocking C, Krizek T, Gottlieb LKC, Vargish T, Siegler M. An alternative strategy for studying adverse events in medical care. Lancet 1997;349:309-13.

29. Stange KC, Zyzanski SJ, Jaen CR, et al. Illuminating the “black box.” A description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.

30. Committee on Health Care Quality in America. Crossing the Quality Chasm. A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001.

31. Ioannidis J, Lau J. Evidence on interventions to reduce medical errors. J Gen Intern Med 2001;16:325-34.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
927-932
Page Number
927-932
Publications
Publications
Article Type
Display Headline
Classification of medical errors and preventable adverse events in primary care: A synthesis of the literature
Display Headline
Classification of medical errors and preventable adverse events in primary care: A synthesis of the literature
Legacy Keywords
,Medical errorprimary care physiciansfamily physicians. (J Fam Pract 2002;51:927–932)
Legacy Keywords
,Medical errorprimary care physiciansfamily physicians. (J Fam Pract 2002;51:927–932)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

The relation between methods and recommendations in clinical practice guidelines for hypertension and hyperlipidemia

Article Type
Changed
Mon, 01/14/2019 - 10:57
Display Headline
The relation between methods and recommendations in clinical practice guidelines for hypertension and hyperlipidemia

KEY POINTS FOR CLINICIANS

  • Many guidelines address the same problems, often with conflicting recommendations.
  • There is considerable variation regarding the methods the guideline developers use to make recommendations.
  • Guideline developers who did not use rigorous methods appeared to make more aggressive recommendations for screening and treatment (ie, were more likely to promote interventions).
Clinicians are inundated with clinical practice guidelines. Many guidelines address the same problems, often with conflicting recommendations. The disagreement concerning recommendations often hinges on how aggressive clinicians are in promoting interventions, eg, screening for prostate cancer or prescribe antibiotics for the treatment of sore throat. There is also variation with regard to the methods that guideline developers use to make recommendations. The traditional “GOBSAT”-technique (“Good Old Boys Sat At Table”) has been criticized.1 Several groups have proposed more systematic approaches for guideline development,2-6 and some databases, such as the Guideline Advisory Committee’s Recommended Clinical Practice Guidelines, include only those guidelines that have been assessed for the rigor of the development process.7

There are strong logical arguments for developing guidelines systematically, eg, to ensure that they are based on current best evidence, to protect against bias, and to make the process transparent and open to criticism. However, guideline developers frequently do not adhere to such methods.8-11 We explored the possible association between the methods used in guideline development and the recommendations given in those guidelines. We used guidelines for hypertension and hyperlipidemia in our study. Our hypothesis was that less rigorous methods would, on average, be associated with more aggressive recommendations in these guidelines.

Methods

Inclusion criteria

We defined clinical guidelines as recommendations intended to assist health professionals and patients in making decisions for specific clinical circumstances. To be included a guideline had to address at least 1 of the following issues: threshold for drug treatment of essential hypertension in primary prevention, threshold for drug treatment of hyperlipidemia in primary prevention, or identification of the target population for cholesterol screening. Guidelines were excluded if they did not clearly identify the panel responsible for developing the guideline, identify a sponsoring organization, or include a reference list. We excluded textbooks, editorials, and commentaries. Review articles that were prepared and published as background documents were included with the relevant clinical practice guidelines. We included guidelines published after 1992. When multiple versions were available from the same organization, we used the most recent version.

Search strategy

We searched MEDLINE from 1992 through February 2000 by using hypertension, blood pressure, hyperlipidemia, or cholesterol as the key term, limited to practice guidelines as publication type. In addition, we searched databases of guidelines maintained by several groups around the world. One author (A.F.) reviewed all the citations and reference lists of relevant guidelines, retrieved potentially relevant guidelines, and selected guidelines for inclusion.

Guideline development methods

We used 8 criteria to rate the methodologic quality of the guidelines (Table 1). The criteria were adapted from a guideline appraisal instrument that is being developed and tested by a group of European researchers (the Agree/Biomed collaboration).12 It is based on a British Appraisal Instrument for Clinical Guidelines 13 that has been tested for its validity and reliability and has been characterized as “the most well developed to date” in a recent study.14

Two authors (A.F. and J.W.W.) evaluated each guideline independently. For analytical purposes, all criteria were dichotomized (Table 1). Because fewer than 50% of the guidelines reported sufficient information to determine stakeholder involvement, we supplemented information on authors through an Internet search.

TABLE 1

Criteria used to appraise guideline-development methods

CriterionStandard for fulfillment
Main outcomes identified
Is there an explicit statement of the main outcomes considered when developing the guideline?Explicit statement of the main outcomes considered in developing the guidelines
Key stakeholders involved
Are the essential stakeholders involved in the development group?Inclusion of all “essential” stakeholders (generalist physicians, specialists, and methodologists)*
Systematic search and selection
Has a systematic search for evidence been carried out and are criteria for inclusion and exclusion specified?Search specifying all relevant databases or described “electronic databases” or
 Inclusion–exclusion criteria defined, at least briefly
Recommendations linked to evidence
Is there an explicit link between the evidence and the recommendations given?Grading of strength of recommendations or level of evidence
Benefits and risks considered
Have the health benefits, side effects, and risks been considered?Any quantitative or qualitative weighing of benefits and harms that is incorporated into formulating recommendations
Resources/costs
Has the impact on resources been considered?Economic analysis or qualitative consideration of cost issues that is linked explicitly to the formulation of the recommendations
No industry influence
Is the guideline developed without funding or influence from the pharmaceutical industry?No financial support from a pharmaceutical company and no involvement on the panel
Conflicts of interest stated
Is there an explicit statement of conflict of interest?Statement of potential conflicts of interest for panel members
*Essential stakeholders are generalist physicians, specialists, and methodologists. Optional stakeholders are patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians.
 

 

Aggressiveness of recommendations

To grade the aggressiveness of the treatment recommendations, we evaluated the threshold for initiating pharmacologic treatment of hypertension (low = aggressive), threshold for initiating pharmacologic treatment of hyperlipidemia (low = aggressive), first-line antihypertensive drug (all drugs = aggressive), and the number of persons eligible for cholesterol screening (high = aggressive).

The thresholds for treatment of hypertension and hyperlipidemia were categorized with 4 clinical scenarios. By applying the scenarios we could determine the recommended thresholds for drug treatment among the various guidelines. We chose these specific scenarios to illustrate the existing variation in recommendations among the guidelines:

  1. A 50-year-old man without a high-risk profile of cardiovascular disease.
  2. A 50-year-old man with a high-risk profile of cardiovascular disease.
  3. A 70-year-old man without a high-risk profile of cardiovascular disease.
  4. A 70-year-old man with a high-risk profile of cardiovascular disease.
High-risk profile was defined separately for treatment of hypertension and treatment of hyperlipidemia. For hypertension, the high-risk scenario was a patient who smoked and was hyperlipidemic (total cholesterol > 310 mg/dL). For hyperlipidemia, the high-risk scenario was a patient who smoked and was hypertensive (blood pressure > 160/100 mm Hg). We assumed that lifestyle interventions had been attempted. For simplicity we used only the systolic value for blood pressure. We dichotomized the aggressiveness of thresholds for lipid-lowering treatment rather than calculating averages because many guidelines gave recommendations other than actual cholesterol values; eg, no treatment or familial hypercholesterolemia. Guidelines were considered nonaggressive if, for 2 or more of the scenarios, the threshold was set at 310 mg/dL or higher or was specified as familial hypercholesterolemia.

Guidelines recommending all common antihypertensive drugs were considered aggressive, and the ones suggesting more restrictive recommendations were considered nonaggressive. In a few guidelines this recommendation depended on the patient’s age. For simplicity we examined the recommendations for 50-year-olds. We graded aggressiveness of recommendations on cholesterol screening by estimating the proportion of the general adult population (in Norway) who would be candidates for screening or by case finding per year, if the guidelines were fully implemented.

Analysis

We qualitatively and quantitatively examined the relation between fulfillment of a methodologic criterion and the aggressiveness of recommendations. The power of our statistical analyses was limited by the available sample size. For hypertension, we averaged the treatment threshold for the 4 clinical scenarios within each guideline and compared the overall mean between guidelines meeting and not meeting the criterion. For the threshold to treat hyperlipidemia and for first-line therapy for hypertension, the degree of aggressiveness was dichotomized. We used the Fisher exact test to calculate P values for the association between the proportion of guidelines fulfilling a methodologic criterion and whether the recommendation was classified as aggressive. For cholesterol screening we found the mean yearly proportion of the adult population eligible for screening among guidelines fulfilling a methodologic criterion, and compared this with the mean for guidelines that did not fulfill the criterion. In addition to examining the association between methods and recommendations, we examined whether the level of stakeholder involvement or sponsorship by specialty societies was associated with fulfillment of the methodologic criteria. We included generalist physicians, specialists, and methodologists as “essential stakeholders” and patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians as “optional stakeholders.”

Results

We found 12 clinical guidelines for managing hypertension, 12 for hyperlipidemia, 5 for cholesterol screening, and 4 general guidelines for the prevention of coronary heart disease that met our inclusion criteria (references are available from the authors). Because each guideline was appraised according to the 8 methodologic criteria, we ended up with 264 appraisals (8 criteria applied to each of the 33 guidelines). There were 28 disagreements (11%), all of which were easily resolved by discussion. As expected, there was variation among the guidelines regarding fulfillment of methodologic criteria and the aggressiveness of recommendations.

Most guidelines did not meet the majority of the methodologic criteria (Table 2). Only 6 of the 33 guidelines met 5 or more of the 8 criteria. The threshold to start antihypertensive treatment varied in systolic blood pressure from 140 to 180 mm Hg for each of the 4 clinical scenarios we applied to the guidelines (Table 3). For 3 of the scenarios, the threshold to treat hyperlipidemia ranged from a total cholesterol value of 190 mg/dL to more than 310 mg/dL (Table 3). Fifteen guidelines gave recommendations for first-line therapy for hypertension. Three recommended thiazides only; 6 recommended thiazides and β-blockers; 1 recommended thiazides, β-blockers, and angiotensinconverting enzyme inhibitors; and 5 recommended all commonly used drugs. Recommendations for cholesterol screening ranged from no screening to testing the entire adult population every 2 to 5 years.

 

 

TABLE 2

Variability in fulfillment of methodologic criteria

CriterionFulfilled, n (%)Not fulfilled, n (%)No information, n (%)
Main outcomes identified10 (30)23 (70)0 (0)
Key stakeholders involved21 (64)9 (27)3 (9)
Systematic search and selection7 (21)26 (79)0 (0)
Recommendations linked to evidence10 (30)23 (70)0 (0)
Benefits and risks considered21 (64)12 (36)0 (0)
Resources/costs14 (42)19 (58)0 (0)
No industry influence23 (70)2 (6)8 (24)
Conflicts of interest stated4 (12)29 (88)0 (0)
TABLE 3

Variability in guideline recommendations*

 BP threshold (mm Hg) to treat hypertension
Age, clinical scenario140150160170180
50 y, low risk4 (25)1 (6)5 (31)4 (25)2 (13)
50 y, high risk9 (56)2 (13)3 (19)1 (6)1 (6)
70 y, high risk6 (38)1 (6)5 (31)2 (13)2 (13)
70 y, low risk8 (50)1 (6)5 (31)1 (6)1 (6)
 Cholesterol threshold (mg/dL) to treat hyperlipidemia 
 ≤230 ≤270 ≥310 
50 y, low risk3 (19)3 (19)1 (6)9 (56) 
50 y, high risk6 (38)8 (50)02 (13) 
70 y, high risk5 (31)3 (19)2 (13)3 (19) 
70 y, low risk7 (44)5 (31)1 (6)3 (19) 
*Data are presented as number (%) of patients.
Includes the recommendations “no treatment” and “familial hypercholesterolemia.”
BP, blood pressure.

Associations between recommendations and methodologic criteria

The threshold to treat hypertension did not seem to be associated with fulfillment of methodologic criteria. Differences in recommendations for first-line drugs for hypertension were not strongly associated with any of the criteria. Although not statistically significant, there was a trend for guidelines to recommend all commonly available drugs when methodologic criteria were not met (Table W1, available on the JFP web site: http://www.jfponline.com).

For all but 1 quality criteria (main outcomes identified), fulfilling the criteria tended to be associated with a higher threshold to treat hyperlipidemia. Similarly, guidelines meeting quality criteria tended to give less aggressive recommendations for cholesterol screening than did guidelines not fulfilling the criteria. The criterion on stakeholder involvement was the exception, but this criterion was fulfilled by all but 1 of the guidelines (Table 4).

TABLE 4

Guidelines for hyperlipidemia: relation between adherence to methodologic criteria and recommendations given*

 Guidelines recommending a low treatment-threshold
CriterionCriterion fulfilledCriterion not fulfilledP
Main outcomes identified3/4 (75)7/12 (58)1.00
Key stakeholders involved5/10 (50)3/4 (75).58
Systematic search and selection0/310/13 (77).036
Recommendations linked to evidence1/3 (33)9/13 (69).52
Benefits and risks considered4/10 (40)6/6 (100).034
Resources/costs4/8 (50)6/8 (75).61
Conflicts of interest stated0/210/14 (71).13
 Population to screen annually §
Main outcomes identified8 (0–21)11 (5–17)3 (−8.1 to 14)
Key stakeholders involved11 (4.8–16)1−10||,¶
Systematic search and selection4 (0–9.8)12 (6.2–18)8 (−2.3 to 18)
Recommendations linked to evidence6 (0–13)11 (5.3–18)5 (−5.5 to 16)
Benefits and risks considered8 (1.2–15)14 (6.0–21)6 (−4.1 to 15)
Resources/costs6 (0.9–12)14 (5.8–22)8 (−1.2 to 16)
Conflicts of interest stated012 (6.6–16)12 (−1.8 to 25)
*The criterion on industry influence is not included because all the guidelines either fulfilled the criterion or provided insufficient information to assess if the criterion was met
P values assessed with the Fisher exact test.
Guidelines in which the threshold to treat is less than 310 mg/dL for 3 or more of the clinical scenarios described in the text. Data are presented as proportion (%).
§ Data are presented as percentage (95% confidence interval).
|| One guideline did not fulfill this criterion, so confidence intervals could not be calculated.
Difference in percentage (95% confidence interval).

Stakeholder involvement and sponsorship by specialty societies

Guidelines that involved major stakeholders in the development process tended to fulfill the methodologic criteria to a greater extent than did guidelines that did not (Table W2, available on the JFP web site: http://www.jfponline.com). Nine of the 33 guidelines were sponsored by specialty societies. These fulfilled the methodologic criteria less often than did other guidelines (Table 5).

TABLE 5

Relation between specialty society sponsorship and fulfillment of methodologic criteria*

 Guidelines fulfilling criterion
Criterionspecialty by specialty societySponsored by Not sponsored societyP
Main outcomes identified6/9 (67)4/24 (17).010
Key stakeholders involved2/9 (22)19/21 (90).001
Systematic search and selection1/9 (11)6/24 (25).64
Recommendations linked to evidence2/9 (22)8/24 (33).69
Benefits and risks considered3/9 (33)18/24 (75).044
Resources/costs3/9 (33)11/24 (46).70
No industry influence2/4 (50)21/21 (100).02
Conflicts of interest stated2/9 (22)2/24 (8).30
*Data are presented as proportion (%).
Assessed with the Fisher exact test.
We did not take into account the guidelines for which we had insufficient information to assess whether the criterion was met.

Discussion

We found that nonadherence to rigorous methods when developing guidelines for hypertension and hyperlipidemia tends to be associated with more aggressive recommendations. We are not aware of other studies that have investigated the relation between methods and recommendations in clinical practice guidelines. The relatively small number of guidelines that met our inclusion criteria limited the power of our analyses, which rarely reached the conventional level of statistical significance (P

Many articles have assessed the methodologic quality of clinical practice guidelines with the use of similar criteria, all these studies found poor adherence to recommendations for guideline development.8-11 Grilli and colleagues found that “the quality of reporting of practice guidelines produced by specialty societies fell short of acceptable methodology” for the 431 guidelines they assessed.10(p104) Shaneyfelt and colleagues found no difference in methodologic rigor between guidelines published by specialty societies and those published by others but decided that methodologic criteria frequently were not met.11 We also found that methodologic criteria frequently were not met, and that they were met less often for guidelines sponsored by specialty societies than for those sponsored by other groups.

 

 

Stakeholder involvement, as we have defined it, was closely related to panel composition, which has been examined by others. For example, a link was found between panel composition and ratings of the appropriateness of procedures. Those who used a given procedure were more likely to rate it as appropriate than were those who did not use it.15,16 Murphy and coworkers found that “members of a specialty are more likely to advocate techniques that involve their special-ty.”17(p37) Savoie and colleagues, in their critical appraisal of guidelines for cholesterol testing, found that “the greater the involvement of clinical experts in the development process of the clinical practice guidelines, the less the recommendations reflected the research evidence.”9(p76) This is consistent with our finding that broader stakeholder involvement was associated with methodologic criteria being met more often.

In our study, guideline developers that did not use rigorous methods appeared more likely to promote aggressive intervention. This may be true for guidelines for conditions other than hypertension and hyperlipidemia. However, guideline developers also may introduce biases toward less aggressive recommendations, eg, purchasers of health services. The degree to which bias is likely and even the direction sometimes may be difficult to predict.

The quality among the guidelines we assessed was not associated with year of publication or the country where the guidelines were developed. The 6 guidelines fulfilling 5 or more of the quality criteria were not published more recently. The countries of origin for these 6 guidelines were Australia, Canada, France, the United Kingdom, and the United States.

There are strong logical reasons for users of guidelines to consider the methods used by guideline developers. Given the extent of disagreement among guidelines, it is necessary for users to understand the basis of those recommendations. This is only possible if guideline developers employ systematic methods and explicitly report the methods that were used. Our study provides empirical support of skepticism toward guidelines that have been developed without employing systematic methods.

· Acknowledgments ·

Wethank Signe Flottorp and Lena Nordheim for helping with the appraisal of non-English guidelines and Jonathan Lomas and Brian Hutchison who helped develop the idea for this study.

References

1. Miller J, Petrie J. Development of practice guidelines. Lancet 2000;355:82-3.

2. Eddy D. A Manual for Assessing Health Practices and Designing Practice Policies: The Explicit Approach. Philadelphia: American College of Physicians; 1992.

3. Scottish Intercollegiate Guidelines Network. An Introduction to SIGN Methodology for the Development of Evidence-Based Clinical Guidelines. Vol 39. Edinburgh: Scottish Intercollegiate Guidelines Network; 1999.

4. Shekelle PG, Woolf SH, Eccles M, Grimshaw J. Clinical guidelines: developing guidelines. BMJ 1999;318:593-6.

5. Guidelines for Clinical Practice. From Development to Use. Washington, DC: National Academy Press; 1992.

6. Canadian Medical Association. Quality of Care Program: The Guidelines for Canadian Clinical Practice. Ottawa: Canadian Medical Association; 1993.

7. Guideline Advisory Committee. Recommended clinical practice guidelines. Available at: http://www.gacguidelines.ca/ aboutGAC. html. Accessed April 4, 2002.

8. Gibson P. Asthma guidelines and evidence-based medicine. Lancet 1993;342:1305.-

9. Savoie I, Kazanjian A, Bassett K. Do clinical practice guidelines reflect research evidence? J Health Serv Res Policy 2000;5:76-82.

10. Grilli R, Magrini N, Penna A, Mura G, Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet 2000;355:103-6.

11. Shaneyfelt TM, Mayo-Smith MF, Rothwangl J. Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA 1999;281:1900-5.

12. St George’s Hospital Medical School Health Care Evaluation Unit. Appraisal of guidelines research & evaluation. Available at: http://www.sghms.ac.uk/depts/phs/hceu/biomed.htm. Accessed December 12, 2000.

13. Cluzeau FA, Littlejohns P, Grimshaw J, Feder G. Appraisal Instrument for Clinical Guidelines. London: St George’s Hospital Medical School; 1997.

14. Graham ID, Calder LA, Hébert PC, Carter AO, Tetroe JM. A comparison of clinical practice guideline appraisal instruments. Int J Technol Assess Health Care 2000;16:1024-38.

15. Herrin J, Etchason JA, Kahan JP, Brook RH, Ballard DJ. Effect of panel composition on physician ratings of appropriateness of abdominal aortic aneurysm surgery: elucidating differences between multispecialty panel results and specialty society recommendations. Health Policy 1997;42:67-81.

16. Coulter I, Adams A, Shekelle P. Impact of varying panel membership on ratings of appropriateness in consensus panels: a comparison of a multi- and single disciplinary panel. Health Serv Res 1995;30:577-91.

17. Murphy MK, Black NA, Lamping DL, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess 1998;2(3):i-iv,1-88.

Article PDF
Author and Disclosure Information

ATLE FRETHEIM, MD
JOHN W. WILLIAMSJR, MD, MHS
ANDREW D. OXMAN, MD
JEPH HERRIN, PHD
Oslo, Norway; Durham, North Carolina; and Charlottesville, Virginia
From the Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, Oslo, Norway (A.F., A.D.O.); the Department of Medicine, The Center for Health Services Research in Primary Care, Department of Veterans Affairs Medical Center and Duke University Medical Center, Durham, NC (J.W.W.); and Flying Buttress Associates, Charlottesville, VA (J.H.). Preliminary results were presented at a seminar on clinical practice guidelines hosted by the Norwegian Centre for Health Technology Assessment; Oslo, Norway; June 26, 2000. Some of the results also were presented at the Nordic Workshop on Evidence Based Health Care, hosted by the National Institute of Public Health; Oslo, Norway; May 2001. Address reprint requests to Atle Fretheim, MD, Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, PO Box 8054 Dep, N-0031 Oslo, Norway. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
963-968
Legacy Keywords
,Practice guidelineshypertensionhyperlipidemiaevidence-based medicine. (J Fam Pract 2002; 51:963–968)
Sections
Author and Disclosure Information

ATLE FRETHEIM, MD
JOHN W. WILLIAMSJR, MD, MHS
ANDREW D. OXMAN, MD
JEPH HERRIN, PHD
Oslo, Norway; Durham, North Carolina; and Charlottesville, Virginia
From the Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, Oslo, Norway (A.F., A.D.O.); the Department of Medicine, The Center for Health Services Research in Primary Care, Department of Veterans Affairs Medical Center and Duke University Medical Center, Durham, NC (J.W.W.); and Flying Buttress Associates, Charlottesville, VA (J.H.). Preliminary results were presented at a seminar on clinical practice guidelines hosted by the Norwegian Centre for Health Technology Assessment; Oslo, Norway; June 26, 2000. Some of the results also were presented at the Nordic Workshop on Evidence Based Health Care, hosted by the National Institute of Public Health; Oslo, Norway; May 2001. Address reprint requests to Atle Fretheim, MD, Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, PO Box 8054 Dep, N-0031 Oslo, Norway. E-mail: [email protected].

Author and Disclosure Information

ATLE FRETHEIM, MD
JOHN W. WILLIAMSJR, MD, MHS
ANDREW D. OXMAN, MD
JEPH HERRIN, PHD
Oslo, Norway; Durham, North Carolina; and Charlottesville, Virginia
From the Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, Oslo, Norway (A.F., A.D.O.); the Department of Medicine, The Center for Health Services Research in Primary Care, Department of Veterans Affairs Medical Center and Duke University Medical Center, Durham, NC (J.W.W.); and Flying Buttress Associates, Charlottesville, VA (J.H.). Preliminary results were presented at a seminar on clinical practice guidelines hosted by the Norwegian Centre for Health Technology Assessment; Oslo, Norway; June 26, 2000. Some of the results also were presented at the Nordic Workshop on Evidence Based Health Care, hosted by the National Institute of Public Health; Oslo, Norway; May 2001. Address reprint requests to Atle Fretheim, MD, Department of Health Services Research, Norwegian Directorate for Health and Social Welfare, PO Box 8054 Dep, N-0031 Oslo, Norway. E-mail: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • Many guidelines address the same problems, often with conflicting recommendations.
  • There is considerable variation regarding the methods the guideline developers use to make recommendations.
  • Guideline developers who did not use rigorous methods appeared to make more aggressive recommendations for screening and treatment (ie, were more likely to promote interventions).
Clinicians are inundated with clinical practice guidelines. Many guidelines address the same problems, often with conflicting recommendations. The disagreement concerning recommendations often hinges on how aggressive clinicians are in promoting interventions, eg, screening for prostate cancer or prescribe antibiotics for the treatment of sore throat. There is also variation with regard to the methods that guideline developers use to make recommendations. The traditional “GOBSAT”-technique (“Good Old Boys Sat At Table”) has been criticized.1 Several groups have proposed more systematic approaches for guideline development,2-6 and some databases, such as the Guideline Advisory Committee’s Recommended Clinical Practice Guidelines, include only those guidelines that have been assessed for the rigor of the development process.7

There are strong logical arguments for developing guidelines systematically, eg, to ensure that they are based on current best evidence, to protect against bias, and to make the process transparent and open to criticism. However, guideline developers frequently do not adhere to such methods.8-11 We explored the possible association between the methods used in guideline development and the recommendations given in those guidelines. We used guidelines for hypertension and hyperlipidemia in our study. Our hypothesis was that less rigorous methods would, on average, be associated with more aggressive recommendations in these guidelines.

Methods

Inclusion criteria

We defined clinical guidelines as recommendations intended to assist health professionals and patients in making decisions for specific clinical circumstances. To be included a guideline had to address at least 1 of the following issues: threshold for drug treatment of essential hypertension in primary prevention, threshold for drug treatment of hyperlipidemia in primary prevention, or identification of the target population for cholesterol screening. Guidelines were excluded if they did not clearly identify the panel responsible for developing the guideline, identify a sponsoring organization, or include a reference list. We excluded textbooks, editorials, and commentaries. Review articles that were prepared and published as background documents were included with the relevant clinical practice guidelines. We included guidelines published after 1992. When multiple versions were available from the same organization, we used the most recent version.

Search strategy

We searched MEDLINE from 1992 through February 2000 by using hypertension, blood pressure, hyperlipidemia, or cholesterol as the key term, limited to practice guidelines as publication type. In addition, we searched databases of guidelines maintained by several groups around the world. One author (A.F.) reviewed all the citations and reference lists of relevant guidelines, retrieved potentially relevant guidelines, and selected guidelines for inclusion.

Guideline development methods

We used 8 criteria to rate the methodologic quality of the guidelines (Table 1). The criteria were adapted from a guideline appraisal instrument that is being developed and tested by a group of European researchers (the Agree/Biomed collaboration).12 It is based on a British Appraisal Instrument for Clinical Guidelines 13 that has been tested for its validity and reliability and has been characterized as “the most well developed to date” in a recent study.14

Two authors (A.F. and J.W.W.) evaluated each guideline independently. For analytical purposes, all criteria were dichotomized (Table 1). Because fewer than 50% of the guidelines reported sufficient information to determine stakeholder involvement, we supplemented information on authors through an Internet search.

TABLE 1

Criteria used to appraise guideline-development methods

CriterionStandard for fulfillment
Main outcomes identified
Is there an explicit statement of the main outcomes considered when developing the guideline?Explicit statement of the main outcomes considered in developing the guidelines
Key stakeholders involved
Are the essential stakeholders involved in the development group?Inclusion of all “essential” stakeholders (generalist physicians, specialists, and methodologists)*
Systematic search and selection
Has a systematic search for evidence been carried out and are criteria for inclusion and exclusion specified?Search specifying all relevant databases or described “electronic databases” or
 Inclusion–exclusion criteria defined, at least briefly
Recommendations linked to evidence
Is there an explicit link between the evidence and the recommendations given?Grading of strength of recommendations or level of evidence
Benefits and risks considered
Have the health benefits, side effects, and risks been considered?Any quantitative or qualitative weighing of benefits and harms that is incorporated into formulating recommendations
Resources/costs
Has the impact on resources been considered?Economic analysis or qualitative consideration of cost issues that is linked explicitly to the formulation of the recommendations
No industry influence
Is the guideline developed without funding or influence from the pharmaceutical industry?No financial support from a pharmaceutical company and no involvement on the panel
Conflicts of interest stated
Is there an explicit statement of conflict of interest?Statement of potential conflicts of interest for panel members
*Essential stakeholders are generalist physicians, specialists, and methodologists. Optional stakeholders are patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians.
 

 

Aggressiveness of recommendations

To grade the aggressiveness of the treatment recommendations, we evaluated the threshold for initiating pharmacologic treatment of hypertension (low = aggressive), threshold for initiating pharmacologic treatment of hyperlipidemia (low = aggressive), first-line antihypertensive drug (all drugs = aggressive), and the number of persons eligible for cholesterol screening (high = aggressive).

The thresholds for treatment of hypertension and hyperlipidemia were categorized with 4 clinical scenarios. By applying the scenarios we could determine the recommended thresholds for drug treatment among the various guidelines. We chose these specific scenarios to illustrate the existing variation in recommendations among the guidelines:

  1. A 50-year-old man without a high-risk profile of cardiovascular disease.
  2. A 50-year-old man with a high-risk profile of cardiovascular disease.
  3. A 70-year-old man without a high-risk profile of cardiovascular disease.
  4. A 70-year-old man with a high-risk profile of cardiovascular disease.
High-risk profile was defined separately for treatment of hypertension and treatment of hyperlipidemia. For hypertension, the high-risk scenario was a patient who smoked and was hyperlipidemic (total cholesterol > 310 mg/dL). For hyperlipidemia, the high-risk scenario was a patient who smoked and was hypertensive (blood pressure > 160/100 mm Hg). We assumed that lifestyle interventions had been attempted. For simplicity we used only the systolic value for blood pressure. We dichotomized the aggressiveness of thresholds for lipid-lowering treatment rather than calculating averages because many guidelines gave recommendations other than actual cholesterol values; eg, no treatment or familial hypercholesterolemia. Guidelines were considered nonaggressive if, for 2 or more of the scenarios, the threshold was set at 310 mg/dL or higher or was specified as familial hypercholesterolemia.

Guidelines recommending all common antihypertensive drugs were considered aggressive, and the ones suggesting more restrictive recommendations were considered nonaggressive. In a few guidelines this recommendation depended on the patient’s age. For simplicity we examined the recommendations for 50-year-olds. We graded aggressiveness of recommendations on cholesterol screening by estimating the proportion of the general adult population (in Norway) who would be candidates for screening or by case finding per year, if the guidelines were fully implemented.

Analysis

We qualitatively and quantitatively examined the relation between fulfillment of a methodologic criterion and the aggressiveness of recommendations. The power of our statistical analyses was limited by the available sample size. For hypertension, we averaged the treatment threshold for the 4 clinical scenarios within each guideline and compared the overall mean between guidelines meeting and not meeting the criterion. For the threshold to treat hyperlipidemia and for first-line therapy for hypertension, the degree of aggressiveness was dichotomized. We used the Fisher exact test to calculate P values for the association between the proportion of guidelines fulfilling a methodologic criterion and whether the recommendation was classified as aggressive. For cholesterol screening we found the mean yearly proportion of the adult population eligible for screening among guidelines fulfilling a methodologic criterion, and compared this with the mean for guidelines that did not fulfill the criterion. In addition to examining the association between methods and recommendations, we examined whether the level of stakeholder involvement or sponsorship by specialty societies was associated with fulfillment of the methodologic criteria. We included generalist physicians, specialists, and methodologists as “essential stakeholders” and patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians as “optional stakeholders.”

Results

We found 12 clinical guidelines for managing hypertension, 12 for hyperlipidemia, 5 for cholesterol screening, and 4 general guidelines for the prevention of coronary heart disease that met our inclusion criteria (references are available from the authors). Because each guideline was appraised according to the 8 methodologic criteria, we ended up with 264 appraisals (8 criteria applied to each of the 33 guidelines). There were 28 disagreements (11%), all of which were easily resolved by discussion. As expected, there was variation among the guidelines regarding fulfillment of methodologic criteria and the aggressiveness of recommendations.

Most guidelines did not meet the majority of the methodologic criteria (Table 2). Only 6 of the 33 guidelines met 5 or more of the 8 criteria. The threshold to start antihypertensive treatment varied in systolic blood pressure from 140 to 180 mm Hg for each of the 4 clinical scenarios we applied to the guidelines (Table 3). For 3 of the scenarios, the threshold to treat hyperlipidemia ranged from a total cholesterol value of 190 mg/dL to more than 310 mg/dL (Table 3). Fifteen guidelines gave recommendations for first-line therapy for hypertension. Three recommended thiazides only; 6 recommended thiazides and β-blockers; 1 recommended thiazides, β-blockers, and angiotensinconverting enzyme inhibitors; and 5 recommended all commonly used drugs. Recommendations for cholesterol screening ranged from no screening to testing the entire adult population every 2 to 5 years.

 

 

TABLE 2

Variability in fulfillment of methodologic criteria

CriterionFulfilled, n (%)Not fulfilled, n (%)No information, n (%)
Main outcomes identified10 (30)23 (70)0 (0)
Key stakeholders involved21 (64)9 (27)3 (9)
Systematic search and selection7 (21)26 (79)0 (0)
Recommendations linked to evidence10 (30)23 (70)0 (0)
Benefits and risks considered21 (64)12 (36)0 (0)
Resources/costs14 (42)19 (58)0 (0)
No industry influence23 (70)2 (6)8 (24)
Conflicts of interest stated4 (12)29 (88)0 (0)
TABLE 3

Variability in guideline recommendations*

 BP threshold (mm Hg) to treat hypertension
Age, clinical scenario140150160170180
50 y, low risk4 (25)1 (6)5 (31)4 (25)2 (13)
50 y, high risk9 (56)2 (13)3 (19)1 (6)1 (6)
70 y, high risk6 (38)1 (6)5 (31)2 (13)2 (13)
70 y, low risk8 (50)1 (6)5 (31)1 (6)1 (6)
 Cholesterol threshold (mg/dL) to treat hyperlipidemia 
 ≤230 ≤270 ≥310 
50 y, low risk3 (19)3 (19)1 (6)9 (56) 
50 y, high risk6 (38)8 (50)02 (13) 
70 y, high risk5 (31)3 (19)2 (13)3 (19) 
70 y, low risk7 (44)5 (31)1 (6)3 (19) 
*Data are presented as number (%) of patients.
Includes the recommendations “no treatment” and “familial hypercholesterolemia.”
BP, blood pressure.

Associations between recommendations and methodologic criteria

The threshold to treat hypertension did not seem to be associated with fulfillment of methodologic criteria. Differences in recommendations for first-line drugs for hypertension were not strongly associated with any of the criteria. Although not statistically significant, there was a trend for guidelines to recommend all commonly available drugs when methodologic criteria were not met (Table W1, available on the JFP web site: http://www.jfponline.com).

For all but 1 quality criteria (main outcomes identified), fulfilling the criteria tended to be associated with a higher threshold to treat hyperlipidemia. Similarly, guidelines meeting quality criteria tended to give less aggressive recommendations for cholesterol screening than did guidelines not fulfilling the criteria. The criterion on stakeholder involvement was the exception, but this criterion was fulfilled by all but 1 of the guidelines (Table 4).

TABLE 4

Guidelines for hyperlipidemia: relation between adherence to methodologic criteria and recommendations given*

 Guidelines recommending a low treatment-threshold
CriterionCriterion fulfilledCriterion not fulfilledP
Main outcomes identified3/4 (75)7/12 (58)1.00
Key stakeholders involved5/10 (50)3/4 (75).58
Systematic search and selection0/310/13 (77).036
Recommendations linked to evidence1/3 (33)9/13 (69).52
Benefits and risks considered4/10 (40)6/6 (100).034
Resources/costs4/8 (50)6/8 (75).61
Conflicts of interest stated0/210/14 (71).13
 Population to screen annually §
Main outcomes identified8 (0–21)11 (5–17)3 (−8.1 to 14)
Key stakeholders involved11 (4.8–16)1−10||,¶
Systematic search and selection4 (0–9.8)12 (6.2–18)8 (−2.3 to 18)
Recommendations linked to evidence6 (0–13)11 (5.3–18)5 (−5.5 to 16)
Benefits and risks considered8 (1.2–15)14 (6.0–21)6 (−4.1 to 15)
Resources/costs6 (0.9–12)14 (5.8–22)8 (−1.2 to 16)
Conflicts of interest stated012 (6.6–16)12 (−1.8 to 25)
*The criterion on industry influence is not included because all the guidelines either fulfilled the criterion or provided insufficient information to assess if the criterion was met
P values assessed with the Fisher exact test.
Guidelines in which the threshold to treat is less than 310 mg/dL for 3 or more of the clinical scenarios described in the text. Data are presented as proportion (%).
§ Data are presented as percentage (95% confidence interval).
|| One guideline did not fulfill this criterion, so confidence intervals could not be calculated.
Difference in percentage (95% confidence interval).

Stakeholder involvement and sponsorship by specialty societies

Guidelines that involved major stakeholders in the development process tended to fulfill the methodologic criteria to a greater extent than did guidelines that did not (Table W2, available on the JFP web site: http://www.jfponline.com). Nine of the 33 guidelines were sponsored by specialty societies. These fulfilled the methodologic criteria less often than did other guidelines (Table 5).

TABLE 5

Relation between specialty society sponsorship and fulfillment of methodologic criteria*

 Guidelines fulfilling criterion
Criterionspecialty by specialty societySponsored by Not sponsored societyP
Main outcomes identified6/9 (67)4/24 (17).010
Key stakeholders involved2/9 (22)19/21 (90).001
Systematic search and selection1/9 (11)6/24 (25).64
Recommendations linked to evidence2/9 (22)8/24 (33).69
Benefits and risks considered3/9 (33)18/24 (75).044
Resources/costs3/9 (33)11/24 (46).70
No industry influence2/4 (50)21/21 (100).02
Conflicts of interest stated2/9 (22)2/24 (8).30
*Data are presented as proportion (%).
Assessed with the Fisher exact test.
We did not take into account the guidelines for which we had insufficient information to assess whether the criterion was met.

Discussion

We found that nonadherence to rigorous methods when developing guidelines for hypertension and hyperlipidemia tends to be associated with more aggressive recommendations. We are not aware of other studies that have investigated the relation between methods and recommendations in clinical practice guidelines. The relatively small number of guidelines that met our inclusion criteria limited the power of our analyses, which rarely reached the conventional level of statistical significance (P

Many articles have assessed the methodologic quality of clinical practice guidelines with the use of similar criteria, all these studies found poor adherence to recommendations for guideline development.8-11 Grilli and colleagues found that “the quality of reporting of practice guidelines produced by specialty societies fell short of acceptable methodology” for the 431 guidelines they assessed.10(p104) Shaneyfelt and colleagues found no difference in methodologic rigor between guidelines published by specialty societies and those published by others but decided that methodologic criteria frequently were not met.11 We also found that methodologic criteria frequently were not met, and that they were met less often for guidelines sponsored by specialty societies than for those sponsored by other groups.

 

 

Stakeholder involvement, as we have defined it, was closely related to panel composition, which has been examined by others. For example, a link was found between panel composition and ratings of the appropriateness of procedures. Those who used a given procedure were more likely to rate it as appropriate than were those who did not use it.15,16 Murphy and coworkers found that “members of a specialty are more likely to advocate techniques that involve their special-ty.”17(p37) Savoie and colleagues, in their critical appraisal of guidelines for cholesterol testing, found that “the greater the involvement of clinical experts in the development process of the clinical practice guidelines, the less the recommendations reflected the research evidence.”9(p76) This is consistent with our finding that broader stakeholder involvement was associated with methodologic criteria being met more often.

In our study, guideline developers that did not use rigorous methods appeared more likely to promote aggressive intervention. This may be true for guidelines for conditions other than hypertension and hyperlipidemia. However, guideline developers also may introduce biases toward less aggressive recommendations, eg, purchasers of health services. The degree to which bias is likely and even the direction sometimes may be difficult to predict.

The quality among the guidelines we assessed was not associated with year of publication or the country where the guidelines were developed. The 6 guidelines fulfilling 5 or more of the quality criteria were not published more recently. The countries of origin for these 6 guidelines were Australia, Canada, France, the United Kingdom, and the United States.

There are strong logical reasons for users of guidelines to consider the methods used by guideline developers. Given the extent of disagreement among guidelines, it is necessary for users to understand the basis of those recommendations. This is only possible if guideline developers employ systematic methods and explicitly report the methods that were used. Our study provides empirical support of skepticism toward guidelines that have been developed without employing systematic methods.

· Acknowledgments ·

Wethank Signe Flottorp and Lena Nordheim for helping with the appraisal of non-English guidelines and Jonathan Lomas and Brian Hutchison who helped develop the idea for this study.

KEY POINTS FOR CLINICIANS

  • Many guidelines address the same problems, often with conflicting recommendations.
  • There is considerable variation regarding the methods the guideline developers use to make recommendations.
  • Guideline developers who did not use rigorous methods appeared to make more aggressive recommendations for screening and treatment (ie, were more likely to promote interventions).
Clinicians are inundated with clinical practice guidelines. Many guidelines address the same problems, often with conflicting recommendations. The disagreement concerning recommendations often hinges on how aggressive clinicians are in promoting interventions, eg, screening for prostate cancer or prescribe antibiotics for the treatment of sore throat. There is also variation with regard to the methods that guideline developers use to make recommendations. The traditional “GOBSAT”-technique (“Good Old Boys Sat At Table”) has been criticized.1 Several groups have proposed more systematic approaches for guideline development,2-6 and some databases, such as the Guideline Advisory Committee’s Recommended Clinical Practice Guidelines, include only those guidelines that have been assessed for the rigor of the development process.7

There are strong logical arguments for developing guidelines systematically, eg, to ensure that they are based on current best evidence, to protect against bias, and to make the process transparent and open to criticism. However, guideline developers frequently do not adhere to such methods.8-11 We explored the possible association between the methods used in guideline development and the recommendations given in those guidelines. We used guidelines for hypertension and hyperlipidemia in our study. Our hypothesis was that less rigorous methods would, on average, be associated with more aggressive recommendations in these guidelines.

Methods

Inclusion criteria

We defined clinical guidelines as recommendations intended to assist health professionals and patients in making decisions for specific clinical circumstances. To be included a guideline had to address at least 1 of the following issues: threshold for drug treatment of essential hypertension in primary prevention, threshold for drug treatment of hyperlipidemia in primary prevention, or identification of the target population for cholesterol screening. Guidelines were excluded if they did not clearly identify the panel responsible for developing the guideline, identify a sponsoring organization, or include a reference list. We excluded textbooks, editorials, and commentaries. Review articles that were prepared and published as background documents were included with the relevant clinical practice guidelines. We included guidelines published after 1992. When multiple versions were available from the same organization, we used the most recent version.

Search strategy

We searched MEDLINE from 1992 through February 2000 by using hypertension, blood pressure, hyperlipidemia, or cholesterol as the key term, limited to practice guidelines as publication type. In addition, we searched databases of guidelines maintained by several groups around the world. One author (A.F.) reviewed all the citations and reference lists of relevant guidelines, retrieved potentially relevant guidelines, and selected guidelines for inclusion.

Guideline development methods

We used 8 criteria to rate the methodologic quality of the guidelines (Table 1). The criteria were adapted from a guideline appraisal instrument that is being developed and tested by a group of European researchers (the Agree/Biomed collaboration).12 It is based on a British Appraisal Instrument for Clinical Guidelines 13 that has been tested for its validity and reliability and has been characterized as “the most well developed to date” in a recent study.14

Two authors (A.F. and J.W.W.) evaluated each guideline independently. For analytical purposes, all criteria were dichotomized (Table 1). Because fewer than 50% of the guidelines reported sufficient information to determine stakeholder involvement, we supplemented information on authors through an Internet search.

TABLE 1

Criteria used to appraise guideline-development methods

CriterionStandard for fulfillment
Main outcomes identified
Is there an explicit statement of the main outcomes considered when developing the guideline?Explicit statement of the main outcomes considered in developing the guidelines
Key stakeholders involved
Are the essential stakeholders involved in the development group?Inclusion of all “essential” stakeholders (generalist physicians, specialists, and methodologists)*
Systematic search and selection
Has a systematic search for evidence been carried out and are criteria for inclusion and exclusion specified?Search specifying all relevant databases or described “electronic databases” or
 Inclusion–exclusion criteria defined, at least briefly
Recommendations linked to evidence
Is there an explicit link between the evidence and the recommendations given?Grading of strength of recommendations or level of evidence
Benefits and risks considered
Have the health benefits, side effects, and risks been considered?Any quantitative or qualitative weighing of benefits and harms that is incorporated into formulating recommendations
Resources/costs
Has the impact on resources been considered?Economic analysis or qualitative consideration of cost issues that is linked explicitly to the formulation of the recommendations
No industry influence
Is the guideline developed without funding or influence from the pharmaceutical industry?No financial support from a pharmaceutical company and no involvement on the panel
Conflicts of interest stated
Is there an explicit statement of conflict of interest?Statement of potential conflicts of interest for panel members
*Essential stakeholders are generalist physicians, specialists, and methodologists. Optional stakeholders are patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians.
 

 

Aggressiveness of recommendations

To grade the aggressiveness of the treatment recommendations, we evaluated the threshold for initiating pharmacologic treatment of hypertension (low = aggressive), threshold for initiating pharmacologic treatment of hyperlipidemia (low = aggressive), first-line antihypertensive drug (all drugs = aggressive), and the number of persons eligible for cholesterol screening (high = aggressive).

The thresholds for treatment of hypertension and hyperlipidemia were categorized with 4 clinical scenarios. By applying the scenarios we could determine the recommended thresholds for drug treatment among the various guidelines. We chose these specific scenarios to illustrate the existing variation in recommendations among the guidelines:

  1. A 50-year-old man without a high-risk profile of cardiovascular disease.
  2. A 50-year-old man with a high-risk profile of cardiovascular disease.
  3. A 70-year-old man without a high-risk profile of cardiovascular disease.
  4. A 70-year-old man with a high-risk profile of cardiovascular disease.
High-risk profile was defined separately for treatment of hypertension and treatment of hyperlipidemia. For hypertension, the high-risk scenario was a patient who smoked and was hyperlipidemic (total cholesterol > 310 mg/dL). For hyperlipidemia, the high-risk scenario was a patient who smoked and was hypertensive (blood pressure > 160/100 mm Hg). We assumed that lifestyle interventions had been attempted. For simplicity we used only the systolic value for blood pressure. We dichotomized the aggressiveness of thresholds for lipid-lowering treatment rather than calculating averages because many guidelines gave recommendations other than actual cholesterol values; eg, no treatment or familial hypercholesterolemia. Guidelines were considered nonaggressive if, for 2 or more of the scenarios, the threshold was set at 310 mg/dL or higher or was specified as familial hypercholesterolemia.

Guidelines recommending all common antihypertensive drugs were considered aggressive, and the ones suggesting more restrictive recommendations were considered nonaggressive. In a few guidelines this recommendation depended on the patient’s age. For simplicity we examined the recommendations for 50-year-olds. We graded aggressiveness of recommendations on cholesterol screening by estimating the proportion of the general adult population (in Norway) who would be candidates for screening or by case finding per year, if the guidelines were fully implemented.

Analysis

We qualitatively and quantitatively examined the relation between fulfillment of a methodologic criterion and the aggressiveness of recommendations. The power of our statistical analyses was limited by the available sample size. For hypertension, we averaged the treatment threshold for the 4 clinical scenarios within each guideline and compared the overall mean between guidelines meeting and not meeting the criterion. For the threshold to treat hyperlipidemia and for first-line therapy for hypertension, the degree of aggressiveness was dichotomized. We used the Fisher exact test to calculate P values for the association between the proportion of guidelines fulfilling a methodologic criterion and whether the recommendation was classified as aggressive. For cholesterol screening we found the mean yearly proportion of the adult population eligible for screening among guidelines fulfilling a methodologic criterion, and compared this with the mean for guidelines that did not fulfill the criterion. In addition to examining the association between methods and recommendations, we examined whether the level of stakeholder involvement or sponsorship by specialty societies was associated with fulfillment of the methodologic criteria. We included generalist physicians, specialists, and methodologists as “essential stakeholders” and patients, policy makers/health administrators, nurses, pharmacists, economists, and other physicians as “optional stakeholders.”

Results

We found 12 clinical guidelines for managing hypertension, 12 for hyperlipidemia, 5 for cholesterol screening, and 4 general guidelines for the prevention of coronary heart disease that met our inclusion criteria (references are available from the authors). Because each guideline was appraised according to the 8 methodologic criteria, we ended up with 264 appraisals (8 criteria applied to each of the 33 guidelines). There were 28 disagreements (11%), all of which were easily resolved by discussion. As expected, there was variation among the guidelines regarding fulfillment of methodologic criteria and the aggressiveness of recommendations.

Most guidelines did not meet the majority of the methodologic criteria (Table 2). Only 6 of the 33 guidelines met 5 or more of the 8 criteria. The threshold to start antihypertensive treatment varied in systolic blood pressure from 140 to 180 mm Hg for each of the 4 clinical scenarios we applied to the guidelines (Table 3). For 3 of the scenarios, the threshold to treat hyperlipidemia ranged from a total cholesterol value of 190 mg/dL to more than 310 mg/dL (Table 3). Fifteen guidelines gave recommendations for first-line therapy for hypertension. Three recommended thiazides only; 6 recommended thiazides and β-blockers; 1 recommended thiazides, β-blockers, and angiotensinconverting enzyme inhibitors; and 5 recommended all commonly used drugs. Recommendations for cholesterol screening ranged from no screening to testing the entire adult population every 2 to 5 years.

 

 

TABLE 2

Variability in fulfillment of methodologic criteria

CriterionFulfilled, n (%)Not fulfilled, n (%)No information, n (%)
Main outcomes identified10 (30)23 (70)0 (0)
Key stakeholders involved21 (64)9 (27)3 (9)
Systematic search and selection7 (21)26 (79)0 (0)
Recommendations linked to evidence10 (30)23 (70)0 (0)
Benefits and risks considered21 (64)12 (36)0 (0)
Resources/costs14 (42)19 (58)0 (0)
No industry influence23 (70)2 (6)8 (24)
Conflicts of interest stated4 (12)29 (88)0 (0)
TABLE 3

Variability in guideline recommendations*

 BP threshold (mm Hg) to treat hypertension
Age, clinical scenario140150160170180
50 y, low risk4 (25)1 (6)5 (31)4 (25)2 (13)
50 y, high risk9 (56)2 (13)3 (19)1 (6)1 (6)
70 y, high risk6 (38)1 (6)5 (31)2 (13)2 (13)
70 y, low risk8 (50)1 (6)5 (31)1 (6)1 (6)
 Cholesterol threshold (mg/dL) to treat hyperlipidemia 
 ≤230 ≤270 ≥310 
50 y, low risk3 (19)3 (19)1 (6)9 (56) 
50 y, high risk6 (38)8 (50)02 (13) 
70 y, high risk5 (31)3 (19)2 (13)3 (19) 
70 y, low risk7 (44)5 (31)1 (6)3 (19) 
*Data are presented as number (%) of patients.
Includes the recommendations “no treatment” and “familial hypercholesterolemia.”
BP, blood pressure.

Associations between recommendations and methodologic criteria

The threshold to treat hypertension did not seem to be associated with fulfillment of methodologic criteria. Differences in recommendations for first-line drugs for hypertension were not strongly associated with any of the criteria. Although not statistically significant, there was a trend for guidelines to recommend all commonly available drugs when methodologic criteria were not met (Table W1, available on the JFP web site: http://www.jfponline.com).

For all but 1 quality criteria (main outcomes identified), fulfilling the criteria tended to be associated with a higher threshold to treat hyperlipidemia. Similarly, guidelines meeting quality criteria tended to give less aggressive recommendations for cholesterol screening than did guidelines not fulfilling the criteria. The criterion on stakeholder involvement was the exception, but this criterion was fulfilled by all but 1 of the guidelines (Table 4).

TABLE 4

Guidelines for hyperlipidemia: relation between adherence to methodologic criteria and recommendations given*

 Guidelines recommending a low treatment-threshold
CriterionCriterion fulfilledCriterion not fulfilledP
Main outcomes identified3/4 (75)7/12 (58)1.00
Key stakeholders involved5/10 (50)3/4 (75).58
Systematic search and selection0/310/13 (77).036
Recommendations linked to evidence1/3 (33)9/13 (69).52
Benefits and risks considered4/10 (40)6/6 (100).034
Resources/costs4/8 (50)6/8 (75).61
Conflicts of interest stated0/210/14 (71).13
 Population to screen annually §
Main outcomes identified8 (0–21)11 (5–17)3 (−8.1 to 14)
Key stakeholders involved11 (4.8–16)1−10||,¶
Systematic search and selection4 (0–9.8)12 (6.2–18)8 (−2.3 to 18)
Recommendations linked to evidence6 (0–13)11 (5.3–18)5 (−5.5 to 16)
Benefits and risks considered8 (1.2–15)14 (6.0–21)6 (−4.1 to 15)
Resources/costs6 (0.9–12)14 (5.8–22)8 (−1.2 to 16)
Conflicts of interest stated012 (6.6–16)12 (−1.8 to 25)
*The criterion on industry influence is not included because all the guidelines either fulfilled the criterion or provided insufficient information to assess if the criterion was met
P values assessed with the Fisher exact test.
Guidelines in which the threshold to treat is less than 310 mg/dL for 3 or more of the clinical scenarios described in the text. Data are presented as proportion (%).
§ Data are presented as percentage (95% confidence interval).
|| One guideline did not fulfill this criterion, so confidence intervals could not be calculated.
Difference in percentage (95% confidence interval).

Stakeholder involvement and sponsorship by specialty societies

Guidelines that involved major stakeholders in the development process tended to fulfill the methodologic criteria to a greater extent than did guidelines that did not (Table W2, available on the JFP web site: http://www.jfponline.com). Nine of the 33 guidelines were sponsored by specialty societies. These fulfilled the methodologic criteria less often than did other guidelines (Table 5).

TABLE 5

Relation between specialty society sponsorship and fulfillment of methodologic criteria*

 Guidelines fulfilling criterion
Criterionspecialty by specialty societySponsored by Not sponsored societyP
Main outcomes identified6/9 (67)4/24 (17).010
Key stakeholders involved2/9 (22)19/21 (90).001
Systematic search and selection1/9 (11)6/24 (25).64
Recommendations linked to evidence2/9 (22)8/24 (33).69
Benefits and risks considered3/9 (33)18/24 (75).044
Resources/costs3/9 (33)11/24 (46).70
No industry influence2/4 (50)21/21 (100).02
Conflicts of interest stated2/9 (22)2/24 (8).30
*Data are presented as proportion (%).
Assessed with the Fisher exact test.
We did not take into account the guidelines for which we had insufficient information to assess whether the criterion was met.

Discussion

We found that nonadherence to rigorous methods when developing guidelines for hypertension and hyperlipidemia tends to be associated with more aggressive recommendations. We are not aware of other studies that have investigated the relation between methods and recommendations in clinical practice guidelines. The relatively small number of guidelines that met our inclusion criteria limited the power of our analyses, which rarely reached the conventional level of statistical significance (P

Many articles have assessed the methodologic quality of clinical practice guidelines with the use of similar criteria, all these studies found poor adherence to recommendations for guideline development.8-11 Grilli and colleagues found that “the quality of reporting of practice guidelines produced by specialty societies fell short of acceptable methodology” for the 431 guidelines they assessed.10(p104) Shaneyfelt and colleagues found no difference in methodologic rigor between guidelines published by specialty societies and those published by others but decided that methodologic criteria frequently were not met.11 We also found that methodologic criteria frequently were not met, and that they were met less often for guidelines sponsored by specialty societies than for those sponsored by other groups.

 

 

Stakeholder involvement, as we have defined it, was closely related to panel composition, which has been examined by others. For example, a link was found between panel composition and ratings of the appropriateness of procedures. Those who used a given procedure were more likely to rate it as appropriate than were those who did not use it.15,16 Murphy and coworkers found that “members of a specialty are more likely to advocate techniques that involve their special-ty.”17(p37) Savoie and colleagues, in their critical appraisal of guidelines for cholesterol testing, found that “the greater the involvement of clinical experts in the development process of the clinical practice guidelines, the less the recommendations reflected the research evidence.”9(p76) This is consistent with our finding that broader stakeholder involvement was associated with methodologic criteria being met more often.

In our study, guideline developers that did not use rigorous methods appeared more likely to promote aggressive intervention. This may be true for guidelines for conditions other than hypertension and hyperlipidemia. However, guideline developers also may introduce biases toward less aggressive recommendations, eg, purchasers of health services. The degree to which bias is likely and even the direction sometimes may be difficult to predict.

The quality among the guidelines we assessed was not associated with year of publication or the country where the guidelines were developed. The 6 guidelines fulfilling 5 or more of the quality criteria were not published more recently. The countries of origin for these 6 guidelines were Australia, Canada, France, the United Kingdom, and the United States.

There are strong logical reasons for users of guidelines to consider the methods used by guideline developers. Given the extent of disagreement among guidelines, it is necessary for users to understand the basis of those recommendations. This is only possible if guideline developers employ systematic methods and explicitly report the methods that were used. Our study provides empirical support of skepticism toward guidelines that have been developed without employing systematic methods.

· Acknowledgments ·

Wethank Signe Flottorp and Lena Nordheim for helping with the appraisal of non-English guidelines and Jonathan Lomas and Brian Hutchison who helped develop the idea for this study.

References

1. Miller J, Petrie J. Development of practice guidelines. Lancet 2000;355:82-3.

2. Eddy D. A Manual for Assessing Health Practices and Designing Practice Policies: The Explicit Approach. Philadelphia: American College of Physicians; 1992.

3. Scottish Intercollegiate Guidelines Network. An Introduction to SIGN Methodology for the Development of Evidence-Based Clinical Guidelines. Vol 39. Edinburgh: Scottish Intercollegiate Guidelines Network; 1999.

4. Shekelle PG, Woolf SH, Eccles M, Grimshaw J. Clinical guidelines: developing guidelines. BMJ 1999;318:593-6.

5. Guidelines for Clinical Practice. From Development to Use. Washington, DC: National Academy Press; 1992.

6. Canadian Medical Association. Quality of Care Program: The Guidelines for Canadian Clinical Practice. Ottawa: Canadian Medical Association; 1993.

7. Guideline Advisory Committee. Recommended clinical practice guidelines. Available at: http://www.gacguidelines.ca/ aboutGAC. html. Accessed April 4, 2002.

8. Gibson P. Asthma guidelines and evidence-based medicine. Lancet 1993;342:1305.-

9. Savoie I, Kazanjian A, Bassett K. Do clinical practice guidelines reflect research evidence? J Health Serv Res Policy 2000;5:76-82.

10. Grilli R, Magrini N, Penna A, Mura G, Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet 2000;355:103-6.

11. Shaneyfelt TM, Mayo-Smith MF, Rothwangl J. Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA 1999;281:1900-5.

12. St George’s Hospital Medical School Health Care Evaluation Unit. Appraisal of guidelines research & evaluation. Available at: http://www.sghms.ac.uk/depts/phs/hceu/biomed.htm. Accessed December 12, 2000.

13. Cluzeau FA, Littlejohns P, Grimshaw J, Feder G. Appraisal Instrument for Clinical Guidelines. London: St George’s Hospital Medical School; 1997.

14. Graham ID, Calder LA, Hébert PC, Carter AO, Tetroe JM. A comparison of clinical practice guideline appraisal instruments. Int J Technol Assess Health Care 2000;16:1024-38.

15. Herrin J, Etchason JA, Kahan JP, Brook RH, Ballard DJ. Effect of panel composition on physician ratings of appropriateness of abdominal aortic aneurysm surgery: elucidating differences between multispecialty panel results and specialty society recommendations. Health Policy 1997;42:67-81.

16. Coulter I, Adams A, Shekelle P. Impact of varying panel membership on ratings of appropriateness in consensus panels: a comparison of a multi- and single disciplinary panel. Health Serv Res 1995;30:577-91.

17. Murphy MK, Black NA, Lamping DL, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess 1998;2(3):i-iv,1-88.

References

1. Miller J, Petrie J. Development of practice guidelines. Lancet 2000;355:82-3.

2. Eddy D. A Manual for Assessing Health Practices and Designing Practice Policies: The Explicit Approach. Philadelphia: American College of Physicians; 1992.

3. Scottish Intercollegiate Guidelines Network. An Introduction to SIGN Methodology for the Development of Evidence-Based Clinical Guidelines. Vol 39. Edinburgh: Scottish Intercollegiate Guidelines Network; 1999.

4. Shekelle PG, Woolf SH, Eccles M, Grimshaw J. Clinical guidelines: developing guidelines. BMJ 1999;318:593-6.

5. Guidelines for Clinical Practice. From Development to Use. Washington, DC: National Academy Press; 1992.

6. Canadian Medical Association. Quality of Care Program: The Guidelines for Canadian Clinical Practice. Ottawa: Canadian Medical Association; 1993.

7. Guideline Advisory Committee. Recommended clinical practice guidelines. Available at: http://www.gacguidelines.ca/ aboutGAC. html. Accessed April 4, 2002.

8. Gibson P. Asthma guidelines and evidence-based medicine. Lancet 1993;342:1305.-

9. Savoie I, Kazanjian A, Bassett K. Do clinical practice guidelines reflect research evidence? J Health Serv Res Policy 2000;5:76-82.

10. Grilli R, Magrini N, Penna A, Mura G, Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet 2000;355:103-6.

11. Shaneyfelt TM, Mayo-Smith MF, Rothwangl J. Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA 1999;281:1900-5.

12. St George’s Hospital Medical School Health Care Evaluation Unit. Appraisal of guidelines research & evaluation. Available at: http://www.sghms.ac.uk/depts/phs/hceu/biomed.htm. Accessed December 12, 2000.

13. Cluzeau FA, Littlejohns P, Grimshaw J, Feder G. Appraisal Instrument for Clinical Guidelines. London: St George’s Hospital Medical School; 1997.

14. Graham ID, Calder LA, Hébert PC, Carter AO, Tetroe JM. A comparison of clinical practice guideline appraisal instruments. Int J Technol Assess Health Care 2000;16:1024-38.

15. Herrin J, Etchason JA, Kahan JP, Brook RH, Ballard DJ. Effect of panel composition on physician ratings of appropriateness of abdominal aortic aneurysm surgery: elucidating differences between multispecialty panel results and specialty society recommendations. Health Policy 1997;42:67-81.

16. Coulter I, Adams A, Shekelle P. Impact of varying panel membership on ratings of appropriateness in consensus panels: a comparison of a multi- and single disciplinary panel. Health Serv Res 1995;30:577-91.

17. Murphy MK, Black NA, Lamping DL, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess 1998;2(3):i-iv,1-88.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
963-968
Page Number
963-968
Publications
Publications
Article Type
Display Headline
The relation between methods and recommendations in clinical practice guidelines for hypertension and hyperlipidemia
Display Headline
The relation between methods and recommendations in clinical practice guidelines for hypertension and hyperlipidemia
Legacy Keywords
,Practice guidelineshypertensionhyperlipidemiaevidence-based medicine. (J Fam Pract 2002; 51:963–968)
Legacy Keywords
,Practice guidelineshypertensionhyperlipidemiaevidence-based medicine. (J Fam Pract 2002; 51:963–968)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Delayed antibiotic prescriptions: What are the experiences and attitudes of physicians and patients?

Article Type
Changed
Mon, 01/14/2019 - 10:57
Display Headline
Delayed antibiotic prescriptions: What are the experiences and attitudes of physicians and patients?

KEY POINTS FOR CLINICIANS

  • Delayed antibiotic prescriptions are effective in decreasing antibiotic use for conditions not clinically warranting antibiotics.
  • Family practitioners valued empowering patients to be more involved in decision making about their health care management more highly than did patients.
  • Family practitioners generally viewed the strategy as giving patients reassurance and meeting their expectations for antibiotics.
  • Both patients and physicians agreed that delayed prescribing is not appropriate for all patients, but currently no consistent criteria have been established.
Family physicians often prescribe antibiotics for common colds despite being aware of their marginal effectiveness for such.1,2 Major contributing factors are overt patient expectation or demand for antibiotics3-5 and the physician’s perception that the patient expects antibiotics.6,7 Detrimental effects of antibiotic overuse include adverse effects on patients, development of antibioti-cresistant bacteria,8,9 and increased health care costs.10-12

Although it is possible to “just say no” to patients’ demands for antibiotics,13 family physicians may be under considerable pressure to prescribe. A strategy to decrease prescribing unnecessary antibiotics without damaging the physician–patient relationship involves giving a delayed (or deferred) prescription, which is a prescription to be filled at a later time if the patient’s condition fails to improve or deteriorates.14 Couchman et al14 reported that 50% of patients given “‘back-up”’ antibiotic prescriptions did not fill them. Cates15 found that delayed prescribing significantly decreased antibiotic use in children with acute otitis media. Results of a randomized controlled trial (RCT) found that 55% of patients with uncomplicated cough did not fill their delayed prescription, although patients demonstrated some dissatisfaction with the strategy.16 Little et al studied its effectiveness in managing sore throat17 and otitis media.18 In our recently published RCT19 we reported that delayed prescribing significantly decreased the filling of antibiotic prescriptions for the common cold.

The use and effectiveness of a new medical intervention is influenced by how the intervention is viewed by both physicians and patients. Delayed prescription use for the common cold has not been assessed in any qualitative study, although the topic of 1 qualitative study was antibiotic prescribing for sore throats.2 The researchers found that although making the diagnosis was not difficult, treatment was a problem because one third of patients expected to be prescribed antibiotics. Our aim was to explore issues and attitudes regarding delayed prescription use from the perspectives of family physicians and patients.

Methods

We used a qualitative approach (1) to explore the complexity of, and relations between, issues identified in delayed prescription use, and (2) to describe the experiences and attitudes of both physicians and patients regarding delayed prescription use. The physicians were recruited from a list of high-prescribers (20 or more delayed prescriptions per month) or low prescribers (1 or fewer delayed prescriptions per month). This list had been prepared for a previous study in which 100 random family physicians had reported their use of delayed prescribing.1 Patients were recruited from the intervention arm of an RCT on delayed prescribing that examined the hypothesis that delayed prescriptions would result in decreased use of antibiotics for the common cold.19 Inclusion criteria comprised both patients receiving delayed prescriptions and parents of children receiving delayed prescriptions. Patients in the RCT had given written consent to subsequent interview for the qualitative study. Approval for our study was granted by the Auckland Ethics Committee.

Thirteen physicians and 13 patients were interviewed by telephone (F.G.-S. served as the interviewer). Purposive sampling was used to deliberately include “outliers”’ with respect to characteristics such as sex, socioeconomic level, and geographic location.20,21 This built sample diversity with respect to different subjects and themes along the main topics of interest (eg, the advantages and disadvantages of delayed prescribing) to improve data robustness. The physicians comprised men and women ranging in age from their 30s to 60s, including both New Zealand–trained physicians and immigrants (from Asia and South Africa) with practice locations ranging from lower to upper middle-class suburbs. Both male and female patients were interviewed, ranging in age from adolescent to elderly (specific ages unavailable). Parents of children receiving delayed prescriptions were included in the patient population. Ethnicity and socioeconomic level included those of European, Moori, and Asian extraction from family backgrounds of differing socioeconomic districts.

The interview data were collected in an iterative process in which themes from the early interviews were specifically checked in later interviews. Interviewing ceased once data saturation had occurred, ie, when no new themes emerged.22-24 Family physicians were paid for their time. Semistructured, open-ended questions were progressively focused into more structured questions. Questions for physicians included their views on delayed prescribing; the duration, frequency, and circumstances of their use of delayed prescribing; and their perceived advantages and disadvantages of delayed prescribing. Questions for patients included their experiences of receiving delayed prescriptions; their preferences for decision making regarding antibiotic use; and their views about delayed prescribing. The interviews were audiotaped, and although the hand-written interview notes were not transcribed, they were checked against the audio recordings. Recording ceased once it was established that concurrent hand-written notes were similar (nearly verbatim) to the recorded versions. Interviews typically lasted between 10 and 20 minutes.

 

 

A general inductive approach, similar to grounded theory, was used. Individual written interview responses were initially analyzed to identify subthemes. Interviews were then collated and analyzed for emerging categories. These were combined into major themes through ongoing discussions with an experienced qualitative researcher (D.T.) and rereading of the transcripts by the first 3 authors until consensus was reached regarding the main themes being expressed. The data were double-coded by an independent researcher (N.K.) as a consistency check, and discrepancies were resolved by negotiation between 2 of the researchers (N.K., F.G.). Patient and physician data sets were coded separately.

Results

A picture emerged of both advantages and disadvantages of delayed prescription use. An associated scenario was the variability of criteria used to decide whether delayed prescribing was considered appropriate or inappropriate. Seven primary themes were identified (Table 1): value judgment of antibiotics, decreased antibiotic use, patient-centered factors, effects on the physician–patient relationship, patient convenience, adverse effects of delayed prescribing, and selectivity for use. Many themes were common to both groups of subjects. Examples of their responses illustrating the primary themes are shown in Table 2.

TABLE 1

Descriptions of themes

ThemeDescription
Value judgment of antibioticsThe perception that antibiotics are either “good and necessary” for people or “bad” for people
Decreased antibiotic useThe desire to decrease unnecessary antibiotic use to avoid patient side-effects, to decrease the drug bill for taxpayers, and to decrease the development of resistance to antibiotics
Patient-centered factorsThe ability of physicians to educate patients and empower them to be more involved in decision making about their health care management
Effects on the physician–patient relationshipThe perception that delayed prescribing might have ither positive effects (eg, reassuring patients and meeting their expectations for antibiotics) or negative effects (eg, negative patient perception of physician competence or increased patient concerns about entitlement from the health system)
Patient convenienceThe time and money patients save
Adverse effects of delayed prescribingThe possible adverse effects, with potential medicolegal ramifications: the missing or masking of serious illness; the physician losing control of the patient’s medical situation; the physician becoming less able to monitor outcomes; the possibility that some patients might still take antibiotic unnecessarily; and/or the possibility that antibiotics might be saved and later used inappropriately by another family member
Selectivity for useThe factors determining who might get a delayed prescription: patient age, education, ability to understand English, transiency to the practice, and other varying criteria for use regarding specific conditions
TABLE 2

Answers from physicians and patients interviewed about their use of delayed prescriptions

ThemeQuotes from physiciansQuotes from patients
Value judgment of antibiotics “I wanted to prove to myself I could get better without antibiotics.”
 “I expect to get antibiotics if I go to the physician with the flu.”
Decreased antibiotic use“Using a delayed prescription means you don’t give unnecessary antibiotics.”“I don’t like putting unnecessary drugs intomy body.”
Patient-centered factors“[Use of delayed prescribing provides] “an opportunity to educate and empower patients, allows them to make decisions for themselves, and offers them convenience of both access and cost. Otherwise they would have to return [to the physician’s office] if they [their condition] deteriorated.”“I like to decide for myself; I know when I need antibiotics.”
 “I prefer the physician to make the decision.”
Effects on the physician–patient relationship“The patient goes out the door with something. It does not damage the physician–patient relationship; the patient does not feel short-changed.”“Some parents panic; it helps to ease their “minds.”
“The patient might think you don’t know “ what you are doing, that you are sitting on the fence … the patient has decided to come to physician for advice and wants to be told what to do, not [be] given more options.”“If I go to the physician, it is because I know I need antibiotics.”
 “Younger physicians these days don’t have the experience; they are too busy to know when something is really wrong.”
Patient convenience“Patient convenience: preventing afterhours office visits saves the patient time and money.”“It saved [me] time and money.”
Adverse effects of delayed prescribing“Some patients will start right away anyway and use antibiotic when they really don’t need it. … [the physician has] no way of knowing whether they take it or not… [patients] may not seek medial attention if they get sicker because they have started the antibiotic and assume that’s all that can be done.”“Some people might take it [antibiotics] unnecessarily.”
“There could be medicolegal problems with a litigious patient.”“Maybe some people need to be told what to do and would get confused.”
Selectivity for use“I would never give [a delayed prescription] to very small children, infants, or even children younger than 3 or 4 [years].”“[Delayed prescribing is] good for me but not necessarily for everybody. Many people … have a very poor understanding of medicines.”
“I mostly use it [delayed prescribing] in children younger than 6 years.“I want the physician to make the decision when it’s my children; I’d rather take them back [to the office if necessary].”
“I don’t use [delayed prescribing with] patients, especially elderly ones, having a past history of chronic illness—such as bronchitis, excessive smoking, sinusitis— that has required antibiotics.” 
 

 

Value judgment of antibiotics

The theme of “value judgment of antibiotics” was evident only among patients. Several expressed the opinion that antibiotics were the necessary treatment to take every time they became ill. Conversely, other patients considered antibiotics bad for them and preferred to use alternatives such as naturopathic medications.

Decreased antibiotic use

The primary motivation for delayed prescribing by physicians was to decrease unnecessary antibiotic use. Benefits include avoiding patient side effects; decreasing the drug bill for taxpayers; and, especially, decreasing the occurrence of antibiotic resistance. Several patients made comments relevant to this theme. None identified decreasing resistance as an important goal, but 3 patients said the strategy could help avoid unwarranted antibiotic use.

Patient-centered factors

“Patient-centered factors” was a strong theme to emerge—especially from high-prescriber physicians. The physicians indicated that delayed prescribing helped them practice more patient-centered medicine—educating patients to take more responsibility for their own health care management and being more receptive to patient needs. Some physicians took into account pending weekends or patients’ travel or work commitments when offering delayed prescriptions. Although some patients mentioned their involvement in decision making, this aspect generally was not a key factor for many of them. Some liked to make the decision for themselves, which included using their “delayed” prescription immediately. Most patients did not wish to have an active role in decision making and preferred their physicians to decide for them. No patients commented on the role of the physician in providing them with education on their health matters.

Effects on the physician–patient relationship

The theme of “effects on the physician–patient relationship” delineated an associated factor for physicians: the strategy of delayed prescribing strengthened physician–patient relationships by helping physicians cope with the pressure they experienced from patients expecting antibiotics for common colds; by reassuring patients; by giving patients something to take home; and by preventing patients from going to a different physician to obtain antibiotics. An alternative view, expressed by one low-prescriber physician, was that delayed prescribing might damage the physician–patient relationship because the patient might consider the physician incompetent.

For a few patients, use of delayed prescriptions was reassuring. Several patients’ expectations that antibiotics were required persisted at the end of the consultation, and they chose to have their prescriptions filled immediately. Presumably, they would have gone elsewhere had they left the consultation empty-handed. Use of delayed prescribing had a potential negative effect on the physician–patient relationship for at least 2 patients. They perceived delayed prescribing as an indication of physician indecisiveness and incompetence or that the physician was trying to hold down costs to the patient at the risk of the patient’s being ill.

Patient convenience

The theme of “patient convenience” and cost savings was a strong theme among physicians and less so among patients. Several patients identified that delayed prescriptions could save them trouble and expense. For 3 patients this was not an issue, but they acknowledged it could be of value to busy working people or low-income patients.

Adverse effects of delayed prescribing

Regarding the theme of “adverse effects of delayed prescribing,” some physicians saw little or no disadvantage if delayed prescriptions were given to the right patients with correct instructions. However, low-prescribers identified a number of possible adverse effects of delayed prescriptions, such as leading to missing or masking serious illness, with possible medicolegal ramifications. Physicians were concerned about being perceived by patients as losing control of the situation and being less able to monitor outcomes. Even using delayed prescribing, some patients might still take antibiotics unnecessarily. The possibility also exists that the antibiotic might be saved and later used inappropriately by another family member.

Patients identified several potential problems, often for people other than themselves. Not only could delayed prescriptions have the potential to be confusing, especially for less-educated people, but 1 patient thought the practice might lead to patients taking antibiotics unnecessarily.

Selectivity for use

Physicians generally were selective about patients for whom they considered delayed prescribing appropriate. Patients who were poorly educated, who had a bad command of English, or who were transient to the practice were identified as poor candidates for receiving delayed prescriptions. Most physicians restricted delayed prescriptions to a particular age range. However, within this category there was considerable variability and inconsistency. Many used delayed prescriptions only for children, with children younger than 6 years being the most suitable group; others used delayed prescribing only for children older than 6 to 8 years. One physician would not use the strategy in very young children, ie, younger than 3 years. There was no consensus regarding circumstances or specific instructions for use. Some used delayed prescribing only with clearly viral illnesses; others employed the strategy in patients with chronic illnesses during which secondary infection was more likely. Instructions varied regarding symptoms to watch out for and how long to wait before filling the prescription.

 

 

Selection of patients was also a dominant theme for patients. Although they thought delayed prescribing might be acceptable for themselves, a number of patients believed that others might not understand or get confused. One patient was happy to make decisions about her own management, but believed the physician should make decisions about her children. Patients did not venture any opinions regarding conditions for which they thought use of delayed prescribing was warranted.

Discussion

Delayed prescribing is a strategy developed primarily to decrease unnecessary antibiotic use in the management of upper respiratory tract infections. Although physicians emphasized the importance of decreasing antimicrobial resistance, patients did not consider this factor. Continued public health education on this issue, including family physicians providing pertinent information to individual patients, could be helpful. Many patients have relatively fixed ideas that antibiotics are either “good” or “bad” for their health without knowing the personal and public health nuances of antibiotic prescribing.

Patients may pressure their physicians for unnecessary antibiotics either by direct request or indirectly by the way they present their complaint.25 Physicians may also incorrectly perceive that patients want antibiotics.6 This study showed that physicians are likely to use delayed prescription as a technique to decrease antibiotic use in patients they perceive as wanting antibiotics regardless of the medications’ appropriateness.

Empowering patients to have more control over their health care management was more important to physicians than patients. Patients held differing views, and whereas some appreciated the option of controlling the decision whether and when to take antibiotics, others expected “the physician to decide.” Perhaps improved physician–patient communication, as well as delayed prescribing, could help patients better understand about antibiotic use.

Many patients in this study had previously received antibiotics for common colds. Most physicians believed that using delayed prescriptions was a compromise strategy that prevented patients from feeling brushed off and offered reassurance, thus protecting the physician–patient relationship. Some patients reciprocated this view. However, a concern expressed by 1 physician that patients might view delayed prescribing as physician incompetence was substantiated by comments from other patients.

The potential adverse effects identified by some of the physicians, such as a serious disorder being masked or missed and physicians having less medical control, could be largely remedied by establishing criteria for suitable patient selection and improved educational resources as suggested above. Physicians and patients both expressed that some patients might automatically have their prescriptions filled and thus take antibiotics unnecessarily. Given that 2 patients had their “delayed” prescriptions filled immediately, this concern appears justified. No safeguard could entirely prevent inappropriate use by other family members.

Both physicians and patients commented that delayed prescribing is not appropriate for all patients. Patients need to understand the explanation of why antibiotics are not currently indicated and the instructions as to when they might be needed. In our opinion, patient comprehension might be greatly assisted by the use of clear handouts explaining, in patient-friendly terms, the management of upper respiratory tract infections.

The use of delayed prescribing in family practice is becoming more common.14,16-19 Considerable inconsistency and contradictory practices were found regarding its use in children and adults. Such diversity in physicians’ views regarding suitable ages raises questions about the optimal use of delayed prescriptions. Similarly, no consensus was found regarding circumstances and instructions under which physicians would use delayed prescribing. The development of more formalized recommendations regarding patient suitability and criteria for delayed prescribing is needed.

Given the concern that some patients might be confused about when to use a delayed prescription, placing the prescription in an envelope with clearly written instructions (ie, when to use and under what conditions) on the outside might ameliorate this difficulty. This practice might serve to further decrease unnecessary antibiotic use. Alternatively, special patient instructions in written form may be warranted, as was done in a controlled before–after study of delayed prescriptions for otitis media.15

In conclusion, previous research has shown delayed prescribing to be an effective means of decreasing antibiotic consumption for conditions not clinically warranting their use.14,15,19 However, not all physicians or patients demonstrated complete satisfaction with the strategy, and both groups agreed that selectivity is required for issuing a delayed prescription. Unlike interventions such as administering new drugs, physicians have spontaneously and independently generated the practice of delayed prescribing. Consequently, the practice varies considerably with respect to which patients, conditions, and instructions are considered appropriate. Formalizing recommendations for patient suitability and instructions for use may be required to ensure safety and consistency. Long-term safety issues will need to be monitored using longitudinal, large-cohort studies.

References

1. Arroll B, Goodyear-Smith F. General practitioner management of upper respiratory tract infections: when are antibiotics prescribed? N Z Med J 2000;13:493-6.

2. Butler CC, Rollnick S, Pill R, Maggs-Rapport F, Stott N. Understanding the culture of prescribing: qualitative study of general practitioners’ and patients’ perceptions of antibiotics for sore throats. Br Med J 1998;317:637-42.

3. Arroll B, Everts N. The common cold: what does the public think and want? N Z Fam Physician 1999;26:51-6.

4. Macfarlane J, Holmes W. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. Br Med J 1997;315:1211-4.

5. Palmer DA, Bauchner H. Parents’ and physicians’ views on antibiotics. Pediatrics 1997;99:E6.-

6. Cockburn J, Pit S. Prescribing behaviour in clinical practice: patients’ expectations and physicians’ perceptions of patients’ expectations—a questionnaire study. Br Med J 1997;315:520-3.

7. Britten N. Patients demands for prescriptions in primary care. Br Med J 1995;310:1084-5.

8. Arason VA, Kristinsson KG, Sigurdsson JA, Stefansdottir G, Molstad S, Gudmundsson S. Do antimicrobials increase the carriage rate of penicillin resistant pneumococci in children? Cross sectional prevalence study. Br Med J 1996;313:387-91.

9. Verkatesum P, Innes JA. Antibiotic resistance in common acute respiratory pathogens. Thorax 1995;50:481-3.

10. McCaig LF, Hughes JM. Trends in antimicrobial drug prescribing among office-based physicians in the United States. JAMA 1995;273:214-9.

11. Hueston WJ, Mainous AG, 3rd, Ornstein S, Pan Q, Jenkins R. Antibiotics for upper respiratory tract infections. Follow-up utilization and antibiotic use. Arch Fam Med 1999;8:426-30.

12. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-9.

13. Thomas MG, Arroll B. “Just say no”—reducing the use of antibiotics for colds, bronchitis and sinusitis. N Z Med J 2000;113:287-9.

14. Couchman GR, Rascoe TG, Forjuoh SN. Back-up antibiotic prescriptions for common respiratory symptoms. Patient satisfaction and fill rates. J Fam Pract 2000;49:907-13.

15. Cates C. An evidence based approach to reducing antibiotic use in children with acute otitis media: controlled before and after study. Br Med J 1999;318:715-6.

16. Dowell J, Pitkethly M, Bain J, Martin S. A randomised controlled trial of delayed antibiotic prescribing as a strategy for managing uncomplicated respiratory tract infection in primary care. Br J Gen Pract 2001;51:200-5.

17. Little P, Williamson I, Warner G, Gould C, Gantley M, Kinmonth AL. Open randomised trial of prescribing strategies in managing sore throat. Br Med J 1997;314:722-7.

18. Little P, Gould C, Williamson I, Moore M, Warner G, Dunleavey J. Pragmatic randomised controlled trial of two prescribing strategies for childhood acute otitis media. Br Med J 2001;322:336-42.

19. Arroll B, Kenealy T, Kerse N. Does a delayed prescription reduce unnecessary antibiotic use? A randomized controlled trial. J Fam Pract 2002;51:324-8.

20. Curtis S, Gesler W, Smith G, Washburn S. Approaches to sampling and case selection in qualitative research: examples in the geography of health. Soc Sci Med 2000;50:1001-4.

21. Barbour RS. Checklists for improving rigour in qualitative research: a case of the tail wagging the dog? Br Med J 2001;322:1115-7.

22. Guba EG, Lincoln YS. Competing paradigms in qualitative research. In: Denzin NK, Lincoln YS, eds. Handbook of Qualitative Research. Thousand Oaks, Calif: Sage Publications Inc; 1994;105-117.

23. Kuzel AJ, Engel JD, Addison RB, Bogdewic SP. Desirable features of qualitative research. Fam Pract Res J 1994;14:369-78.

24. Strauss A, Corbin J. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1998.

25. Scott JG, Cohen D, DiCicco-Bloom B, Orzano AJ, Jaen CR, Crabtree BF. Antibiotic use in acute respiratory infections and the ways patients pressure physicians for a prescription. J Fam Pract 2001;50:853-8.

Article PDF
Author and Disclosure Information

BRUCE ARROLL, MB, CHB, PHD
FELICITY GOODYEAR-SMITH, MB, CHB, MGP
DAVIDR. THOMAS, BA, MA, PHD
NGAIRE KERSE, MB, CHB, PHD
Auckland, New Zealand
From the Division of General Practice and Primary Health Care (B.A., F.G.-S., N.K.) and the Department of Community Health, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand. The authors report no competing interests. Address reprint requests to Bruce Arroll, MB, ChB, PhD, Division of General Practice and Primary Health Care, Faculty of Medical and Health Sciences, University of Auckland Private Bag 92019, Auckland, New Zealand. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
954-959
Legacy Keywords
,Antibioticsfamily practicequalitative evaluationupper respiratory tract infection. (J Fam Pract 2002; 51:954-959)
Sections
Author and Disclosure Information

BRUCE ARROLL, MB, CHB, PHD
FELICITY GOODYEAR-SMITH, MB, CHB, MGP
DAVIDR. THOMAS, BA, MA, PHD
NGAIRE KERSE, MB, CHB, PHD
Auckland, New Zealand
From the Division of General Practice and Primary Health Care (B.A., F.G.-S., N.K.) and the Department of Community Health, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand. The authors report no competing interests. Address reprint requests to Bruce Arroll, MB, ChB, PhD, Division of General Practice and Primary Health Care, Faculty of Medical and Health Sciences, University of Auckland Private Bag 92019, Auckland, New Zealand. E-mail: [email protected].

Author and Disclosure Information

BRUCE ARROLL, MB, CHB, PHD
FELICITY GOODYEAR-SMITH, MB, CHB, MGP
DAVIDR. THOMAS, BA, MA, PHD
NGAIRE KERSE, MB, CHB, PHD
Auckland, New Zealand
From the Division of General Practice and Primary Health Care (B.A., F.G.-S., N.K.) and the Department of Community Health, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand. The authors report no competing interests. Address reprint requests to Bruce Arroll, MB, ChB, PhD, Division of General Practice and Primary Health Care, Faculty of Medical and Health Sciences, University of Auckland Private Bag 92019, Auckland, New Zealand. E-mail: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • Delayed antibiotic prescriptions are effective in decreasing antibiotic use for conditions not clinically warranting antibiotics.
  • Family practitioners valued empowering patients to be more involved in decision making about their health care management more highly than did patients.
  • Family practitioners generally viewed the strategy as giving patients reassurance and meeting their expectations for antibiotics.
  • Both patients and physicians agreed that delayed prescribing is not appropriate for all patients, but currently no consistent criteria have been established.
Family physicians often prescribe antibiotics for common colds despite being aware of their marginal effectiveness for such.1,2 Major contributing factors are overt patient expectation or demand for antibiotics3-5 and the physician’s perception that the patient expects antibiotics.6,7 Detrimental effects of antibiotic overuse include adverse effects on patients, development of antibioti-cresistant bacteria,8,9 and increased health care costs.10-12

Although it is possible to “just say no” to patients’ demands for antibiotics,13 family physicians may be under considerable pressure to prescribe. A strategy to decrease prescribing unnecessary antibiotics without damaging the physician–patient relationship involves giving a delayed (or deferred) prescription, which is a prescription to be filled at a later time if the patient’s condition fails to improve or deteriorates.14 Couchman et al14 reported that 50% of patients given “‘back-up”’ antibiotic prescriptions did not fill them. Cates15 found that delayed prescribing significantly decreased antibiotic use in children with acute otitis media. Results of a randomized controlled trial (RCT) found that 55% of patients with uncomplicated cough did not fill their delayed prescription, although patients demonstrated some dissatisfaction with the strategy.16 Little et al studied its effectiveness in managing sore throat17 and otitis media.18 In our recently published RCT19 we reported that delayed prescribing significantly decreased the filling of antibiotic prescriptions for the common cold.

The use and effectiveness of a new medical intervention is influenced by how the intervention is viewed by both physicians and patients. Delayed prescription use for the common cold has not been assessed in any qualitative study, although the topic of 1 qualitative study was antibiotic prescribing for sore throats.2 The researchers found that although making the diagnosis was not difficult, treatment was a problem because one third of patients expected to be prescribed antibiotics. Our aim was to explore issues and attitudes regarding delayed prescription use from the perspectives of family physicians and patients.

Methods

We used a qualitative approach (1) to explore the complexity of, and relations between, issues identified in delayed prescription use, and (2) to describe the experiences and attitudes of both physicians and patients regarding delayed prescription use. The physicians were recruited from a list of high-prescribers (20 or more delayed prescriptions per month) or low prescribers (1 or fewer delayed prescriptions per month). This list had been prepared for a previous study in which 100 random family physicians had reported their use of delayed prescribing.1 Patients were recruited from the intervention arm of an RCT on delayed prescribing that examined the hypothesis that delayed prescriptions would result in decreased use of antibiotics for the common cold.19 Inclusion criteria comprised both patients receiving delayed prescriptions and parents of children receiving delayed prescriptions. Patients in the RCT had given written consent to subsequent interview for the qualitative study. Approval for our study was granted by the Auckland Ethics Committee.

Thirteen physicians and 13 patients were interviewed by telephone (F.G.-S. served as the interviewer). Purposive sampling was used to deliberately include “outliers”’ with respect to characteristics such as sex, socioeconomic level, and geographic location.20,21 This built sample diversity with respect to different subjects and themes along the main topics of interest (eg, the advantages and disadvantages of delayed prescribing) to improve data robustness. The physicians comprised men and women ranging in age from their 30s to 60s, including both New Zealand–trained physicians and immigrants (from Asia and South Africa) with practice locations ranging from lower to upper middle-class suburbs. Both male and female patients were interviewed, ranging in age from adolescent to elderly (specific ages unavailable). Parents of children receiving delayed prescriptions were included in the patient population. Ethnicity and socioeconomic level included those of European, Moori, and Asian extraction from family backgrounds of differing socioeconomic districts.

The interview data were collected in an iterative process in which themes from the early interviews were specifically checked in later interviews. Interviewing ceased once data saturation had occurred, ie, when no new themes emerged.22-24 Family physicians were paid for their time. Semistructured, open-ended questions were progressively focused into more structured questions. Questions for physicians included their views on delayed prescribing; the duration, frequency, and circumstances of their use of delayed prescribing; and their perceived advantages and disadvantages of delayed prescribing. Questions for patients included their experiences of receiving delayed prescriptions; their preferences for decision making regarding antibiotic use; and their views about delayed prescribing. The interviews were audiotaped, and although the hand-written interview notes were not transcribed, they were checked against the audio recordings. Recording ceased once it was established that concurrent hand-written notes were similar (nearly verbatim) to the recorded versions. Interviews typically lasted between 10 and 20 minutes.

 

 

A general inductive approach, similar to grounded theory, was used. Individual written interview responses were initially analyzed to identify subthemes. Interviews were then collated and analyzed for emerging categories. These were combined into major themes through ongoing discussions with an experienced qualitative researcher (D.T.) and rereading of the transcripts by the first 3 authors until consensus was reached regarding the main themes being expressed. The data were double-coded by an independent researcher (N.K.) as a consistency check, and discrepancies were resolved by negotiation between 2 of the researchers (N.K., F.G.). Patient and physician data sets were coded separately.

Results

A picture emerged of both advantages and disadvantages of delayed prescription use. An associated scenario was the variability of criteria used to decide whether delayed prescribing was considered appropriate or inappropriate. Seven primary themes were identified (Table 1): value judgment of antibiotics, decreased antibiotic use, patient-centered factors, effects on the physician–patient relationship, patient convenience, adverse effects of delayed prescribing, and selectivity for use. Many themes were common to both groups of subjects. Examples of their responses illustrating the primary themes are shown in Table 2.

TABLE 1

Descriptions of themes

ThemeDescription
Value judgment of antibioticsThe perception that antibiotics are either “good and necessary” for people or “bad” for people
Decreased antibiotic useThe desire to decrease unnecessary antibiotic use to avoid patient side-effects, to decrease the drug bill for taxpayers, and to decrease the development of resistance to antibiotics
Patient-centered factorsThe ability of physicians to educate patients and empower them to be more involved in decision making about their health care management
Effects on the physician–patient relationshipThe perception that delayed prescribing might have ither positive effects (eg, reassuring patients and meeting their expectations for antibiotics) or negative effects (eg, negative patient perception of physician competence or increased patient concerns about entitlement from the health system)
Patient convenienceThe time and money patients save
Adverse effects of delayed prescribingThe possible adverse effects, with potential medicolegal ramifications: the missing or masking of serious illness; the physician losing control of the patient’s medical situation; the physician becoming less able to monitor outcomes; the possibility that some patients might still take antibiotic unnecessarily; and/or the possibility that antibiotics might be saved and later used inappropriately by another family member
Selectivity for useThe factors determining who might get a delayed prescription: patient age, education, ability to understand English, transiency to the practice, and other varying criteria for use regarding specific conditions
TABLE 2

Answers from physicians and patients interviewed about their use of delayed prescriptions

ThemeQuotes from physiciansQuotes from patients
Value judgment of antibiotics “I wanted to prove to myself I could get better without antibiotics.”
 “I expect to get antibiotics if I go to the physician with the flu.”
Decreased antibiotic use“Using a delayed prescription means you don’t give unnecessary antibiotics.”“I don’t like putting unnecessary drugs intomy body.”
Patient-centered factors“[Use of delayed prescribing provides] “an opportunity to educate and empower patients, allows them to make decisions for themselves, and offers them convenience of both access and cost. Otherwise they would have to return [to the physician’s office] if they [their condition] deteriorated.”“I like to decide for myself; I know when I need antibiotics.”
 “I prefer the physician to make the decision.”
Effects on the physician–patient relationship“The patient goes out the door with something. It does not damage the physician–patient relationship; the patient does not feel short-changed.”“Some parents panic; it helps to ease their “minds.”
“The patient might think you don’t know “ what you are doing, that you are sitting on the fence … the patient has decided to come to physician for advice and wants to be told what to do, not [be] given more options.”“If I go to the physician, it is because I know I need antibiotics.”
 “Younger physicians these days don’t have the experience; they are too busy to know when something is really wrong.”
Patient convenience“Patient convenience: preventing afterhours office visits saves the patient time and money.”“It saved [me] time and money.”
Adverse effects of delayed prescribing“Some patients will start right away anyway and use antibiotic when they really don’t need it. … [the physician has] no way of knowing whether they take it or not… [patients] may not seek medial attention if they get sicker because they have started the antibiotic and assume that’s all that can be done.”“Some people might take it [antibiotics] unnecessarily.”
“There could be medicolegal problems with a litigious patient.”“Maybe some people need to be told what to do and would get confused.”
Selectivity for use“I would never give [a delayed prescription] to very small children, infants, or even children younger than 3 or 4 [years].”“[Delayed prescribing is] good for me but not necessarily for everybody. Many people … have a very poor understanding of medicines.”
“I mostly use it [delayed prescribing] in children younger than 6 years.“I want the physician to make the decision when it’s my children; I’d rather take them back [to the office if necessary].”
“I don’t use [delayed prescribing with] patients, especially elderly ones, having a past history of chronic illness—such as bronchitis, excessive smoking, sinusitis— that has required antibiotics.” 
 

 

Value judgment of antibiotics

The theme of “value judgment of antibiotics” was evident only among patients. Several expressed the opinion that antibiotics were the necessary treatment to take every time they became ill. Conversely, other patients considered antibiotics bad for them and preferred to use alternatives such as naturopathic medications.

Decreased antibiotic use

The primary motivation for delayed prescribing by physicians was to decrease unnecessary antibiotic use. Benefits include avoiding patient side effects; decreasing the drug bill for taxpayers; and, especially, decreasing the occurrence of antibiotic resistance. Several patients made comments relevant to this theme. None identified decreasing resistance as an important goal, but 3 patients said the strategy could help avoid unwarranted antibiotic use.

Patient-centered factors

“Patient-centered factors” was a strong theme to emerge—especially from high-prescriber physicians. The physicians indicated that delayed prescribing helped them practice more patient-centered medicine—educating patients to take more responsibility for their own health care management and being more receptive to patient needs. Some physicians took into account pending weekends or patients’ travel or work commitments when offering delayed prescriptions. Although some patients mentioned their involvement in decision making, this aspect generally was not a key factor for many of them. Some liked to make the decision for themselves, which included using their “delayed” prescription immediately. Most patients did not wish to have an active role in decision making and preferred their physicians to decide for them. No patients commented on the role of the physician in providing them with education on their health matters.

Effects on the physician–patient relationship

The theme of “effects on the physician–patient relationship” delineated an associated factor for physicians: the strategy of delayed prescribing strengthened physician–patient relationships by helping physicians cope with the pressure they experienced from patients expecting antibiotics for common colds; by reassuring patients; by giving patients something to take home; and by preventing patients from going to a different physician to obtain antibiotics. An alternative view, expressed by one low-prescriber physician, was that delayed prescribing might damage the physician–patient relationship because the patient might consider the physician incompetent.

For a few patients, use of delayed prescriptions was reassuring. Several patients’ expectations that antibiotics were required persisted at the end of the consultation, and they chose to have their prescriptions filled immediately. Presumably, they would have gone elsewhere had they left the consultation empty-handed. Use of delayed prescribing had a potential negative effect on the physician–patient relationship for at least 2 patients. They perceived delayed prescribing as an indication of physician indecisiveness and incompetence or that the physician was trying to hold down costs to the patient at the risk of the patient’s being ill.

Patient convenience

The theme of “patient convenience” and cost savings was a strong theme among physicians and less so among patients. Several patients identified that delayed prescriptions could save them trouble and expense. For 3 patients this was not an issue, but they acknowledged it could be of value to busy working people or low-income patients.

Adverse effects of delayed prescribing

Regarding the theme of “adverse effects of delayed prescribing,” some physicians saw little or no disadvantage if delayed prescriptions were given to the right patients with correct instructions. However, low-prescribers identified a number of possible adverse effects of delayed prescriptions, such as leading to missing or masking serious illness, with possible medicolegal ramifications. Physicians were concerned about being perceived by patients as losing control of the situation and being less able to monitor outcomes. Even using delayed prescribing, some patients might still take antibiotics unnecessarily. The possibility also exists that the antibiotic might be saved and later used inappropriately by another family member.

Patients identified several potential problems, often for people other than themselves. Not only could delayed prescriptions have the potential to be confusing, especially for less-educated people, but 1 patient thought the practice might lead to patients taking antibiotics unnecessarily.

Selectivity for use

Physicians generally were selective about patients for whom they considered delayed prescribing appropriate. Patients who were poorly educated, who had a bad command of English, or who were transient to the practice were identified as poor candidates for receiving delayed prescriptions. Most physicians restricted delayed prescriptions to a particular age range. However, within this category there was considerable variability and inconsistency. Many used delayed prescriptions only for children, with children younger than 6 years being the most suitable group; others used delayed prescribing only for children older than 6 to 8 years. One physician would not use the strategy in very young children, ie, younger than 3 years. There was no consensus regarding circumstances or specific instructions for use. Some used delayed prescribing only with clearly viral illnesses; others employed the strategy in patients with chronic illnesses during which secondary infection was more likely. Instructions varied regarding symptoms to watch out for and how long to wait before filling the prescription.

 

 

Selection of patients was also a dominant theme for patients. Although they thought delayed prescribing might be acceptable for themselves, a number of patients believed that others might not understand or get confused. One patient was happy to make decisions about her own management, but believed the physician should make decisions about her children. Patients did not venture any opinions regarding conditions for which they thought use of delayed prescribing was warranted.

Discussion

Delayed prescribing is a strategy developed primarily to decrease unnecessary antibiotic use in the management of upper respiratory tract infections. Although physicians emphasized the importance of decreasing antimicrobial resistance, patients did not consider this factor. Continued public health education on this issue, including family physicians providing pertinent information to individual patients, could be helpful. Many patients have relatively fixed ideas that antibiotics are either “good” or “bad” for their health without knowing the personal and public health nuances of antibiotic prescribing.

Patients may pressure their physicians for unnecessary antibiotics either by direct request or indirectly by the way they present their complaint.25 Physicians may also incorrectly perceive that patients want antibiotics.6 This study showed that physicians are likely to use delayed prescription as a technique to decrease antibiotic use in patients they perceive as wanting antibiotics regardless of the medications’ appropriateness.

Empowering patients to have more control over their health care management was more important to physicians than patients. Patients held differing views, and whereas some appreciated the option of controlling the decision whether and when to take antibiotics, others expected “the physician to decide.” Perhaps improved physician–patient communication, as well as delayed prescribing, could help patients better understand about antibiotic use.

Many patients in this study had previously received antibiotics for common colds. Most physicians believed that using delayed prescriptions was a compromise strategy that prevented patients from feeling brushed off and offered reassurance, thus protecting the physician–patient relationship. Some patients reciprocated this view. However, a concern expressed by 1 physician that patients might view delayed prescribing as physician incompetence was substantiated by comments from other patients.

The potential adverse effects identified by some of the physicians, such as a serious disorder being masked or missed and physicians having less medical control, could be largely remedied by establishing criteria for suitable patient selection and improved educational resources as suggested above. Physicians and patients both expressed that some patients might automatically have their prescriptions filled and thus take antibiotics unnecessarily. Given that 2 patients had their “delayed” prescriptions filled immediately, this concern appears justified. No safeguard could entirely prevent inappropriate use by other family members.

Both physicians and patients commented that delayed prescribing is not appropriate for all patients. Patients need to understand the explanation of why antibiotics are not currently indicated and the instructions as to when they might be needed. In our opinion, patient comprehension might be greatly assisted by the use of clear handouts explaining, in patient-friendly terms, the management of upper respiratory tract infections.

The use of delayed prescribing in family practice is becoming more common.14,16-19 Considerable inconsistency and contradictory practices were found regarding its use in children and adults. Such diversity in physicians’ views regarding suitable ages raises questions about the optimal use of delayed prescriptions. Similarly, no consensus was found regarding circumstances and instructions under which physicians would use delayed prescribing. The development of more formalized recommendations regarding patient suitability and criteria for delayed prescribing is needed.

Given the concern that some patients might be confused about when to use a delayed prescription, placing the prescription in an envelope with clearly written instructions (ie, when to use and under what conditions) on the outside might ameliorate this difficulty. This practice might serve to further decrease unnecessary antibiotic use. Alternatively, special patient instructions in written form may be warranted, as was done in a controlled before–after study of delayed prescriptions for otitis media.15

In conclusion, previous research has shown delayed prescribing to be an effective means of decreasing antibiotic consumption for conditions not clinically warranting their use.14,15,19 However, not all physicians or patients demonstrated complete satisfaction with the strategy, and both groups agreed that selectivity is required for issuing a delayed prescription. Unlike interventions such as administering new drugs, physicians have spontaneously and independently generated the practice of delayed prescribing. Consequently, the practice varies considerably with respect to which patients, conditions, and instructions are considered appropriate. Formalizing recommendations for patient suitability and instructions for use may be required to ensure safety and consistency. Long-term safety issues will need to be monitored using longitudinal, large-cohort studies.

KEY POINTS FOR CLINICIANS

  • Delayed antibiotic prescriptions are effective in decreasing antibiotic use for conditions not clinically warranting antibiotics.
  • Family practitioners valued empowering patients to be more involved in decision making about their health care management more highly than did patients.
  • Family practitioners generally viewed the strategy as giving patients reassurance and meeting their expectations for antibiotics.
  • Both patients and physicians agreed that delayed prescribing is not appropriate for all patients, but currently no consistent criteria have been established.
Family physicians often prescribe antibiotics for common colds despite being aware of their marginal effectiveness for such.1,2 Major contributing factors are overt patient expectation or demand for antibiotics3-5 and the physician’s perception that the patient expects antibiotics.6,7 Detrimental effects of antibiotic overuse include adverse effects on patients, development of antibioti-cresistant bacteria,8,9 and increased health care costs.10-12

Although it is possible to “just say no” to patients’ demands for antibiotics,13 family physicians may be under considerable pressure to prescribe. A strategy to decrease prescribing unnecessary antibiotics without damaging the physician–patient relationship involves giving a delayed (or deferred) prescription, which is a prescription to be filled at a later time if the patient’s condition fails to improve or deteriorates.14 Couchman et al14 reported that 50% of patients given “‘back-up”’ antibiotic prescriptions did not fill them. Cates15 found that delayed prescribing significantly decreased antibiotic use in children with acute otitis media. Results of a randomized controlled trial (RCT) found that 55% of patients with uncomplicated cough did not fill their delayed prescription, although patients demonstrated some dissatisfaction with the strategy.16 Little et al studied its effectiveness in managing sore throat17 and otitis media.18 In our recently published RCT19 we reported that delayed prescribing significantly decreased the filling of antibiotic prescriptions for the common cold.

The use and effectiveness of a new medical intervention is influenced by how the intervention is viewed by both physicians and patients. Delayed prescription use for the common cold has not been assessed in any qualitative study, although the topic of 1 qualitative study was antibiotic prescribing for sore throats.2 The researchers found that although making the diagnosis was not difficult, treatment was a problem because one third of patients expected to be prescribed antibiotics. Our aim was to explore issues and attitudes regarding delayed prescription use from the perspectives of family physicians and patients.

Methods

We used a qualitative approach (1) to explore the complexity of, and relations between, issues identified in delayed prescription use, and (2) to describe the experiences and attitudes of both physicians and patients regarding delayed prescription use. The physicians were recruited from a list of high-prescribers (20 or more delayed prescriptions per month) or low prescribers (1 or fewer delayed prescriptions per month). This list had been prepared for a previous study in which 100 random family physicians had reported their use of delayed prescribing.1 Patients were recruited from the intervention arm of an RCT on delayed prescribing that examined the hypothesis that delayed prescriptions would result in decreased use of antibiotics for the common cold.19 Inclusion criteria comprised both patients receiving delayed prescriptions and parents of children receiving delayed prescriptions. Patients in the RCT had given written consent to subsequent interview for the qualitative study. Approval for our study was granted by the Auckland Ethics Committee.

Thirteen physicians and 13 patients were interviewed by telephone (F.G.-S. served as the interviewer). Purposive sampling was used to deliberately include “outliers”’ with respect to characteristics such as sex, socioeconomic level, and geographic location.20,21 This built sample diversity with respect to different subjects and themes along the main topics of interest (eg, the advantages and disadvantages of delayed prescribing) to improve data robustness. The physicians comprised men and women ranging in age from their 30s to 60s, including both New Zealand–trained physicians and immigrants (from Asia and South Africa) with practice locations ranging from lower to upper middle-class suburbs. Both male and female patients were interviewed, ranging in age from adolescent to elderly (specific ages unavailable). Parents of children receiving delayed prescriptions were included in the patient population. Ethnicity and socioeconomic level included those of European, Moori, and Asian extraction from family backgrounds of differing socioeconomic districts.

The interview data were collected in an iterative process in which themes from the early interviews were specifically checked in later interviews. Interviewing ceased once data saturation had occurred, ie, when no new themes emerged.22-24 Family physicians were paid for their time. Semistructured, open-ended questions were progressively focused into more structured questions. Questions for physicians included their views on delayed prescribing; the duration, frequency, and circumstances of their use of delayed prescribing; and their perceived advantages and disadvantages of delayed prescribing. Questions for patients included their experiences of receiving delayed prescriptions; their preferences for decision making regarding antibiotic use; and their views about delayed prescribing. The interviews were audiotaped, and although the hand-written interview notes were not transcribed, they were checked against the audio recordings. Recording ceased once it was established that concurrent hand-written notes were similar (nearly verbatim) to the recorded versions. Interviews typically lasted between 10 and 20 minutes.

 

 

A general inductive approach, similar to grounded theory, was used. Individual written interview responses were initially analyzed to identify subthemes. Interviews were then collated and analyzed for emerging categories. These were combined into major themes through ongoing discussions with an experienced qualitative researcher (D.T.) and rereading of the transcripts by the first 3 authors until consensus was reached regarding the main themes being expressed. The data were double-coded by an independent researcher (N.K.) as a consistency check, and discrepancies were resolved by negotiation between 2 of the researchers (N.K., F.G.). Patient and physician data sets were coded separately.

Results

A picture emerged of both advantages and disadvantages of delayed prescription use. An associated scenario was the variability of criteria used to decide whether delayed prescribing was considered appropriate or inappropriate. Seven primary themes were identified (Table 1): value judgment of antibiotics, decreased antibiotic use, patient-centered factors, effects on the physician–patient relationship, patient convenience, adverse effects of delayed prescribing, and selectivity for use. Many themes were common to both groups of subjects. Examples of their responses illustrating the primary themes are shown in Table 2.

TABLE 1

Descriptions of themes

ThemeDescription
Value judgment of antibioticsThe perception that antibiotics are either “good and necessary” for people or “bad” for people
Decreased antibiotic useThe desire to decrease unnecessary antibiotic use to avoid patient side-effects, to decrease the drug bill for taxpayers, and to decrease the development of resistance to antibiotics
Patient-centered factorsThe ability of physicians to educate patients and empower them to be more involved in decision making about their health care management
Effects on the physician–patient relationshipThe perception that delayed prescribing might have ither positive effects (eg, reassuring patients and meeting their expectations for antibiotics) or negative effects (eg, negative patient perception of physician competence or increased patient concerns about entitlement from the health system)
Patient convenienceThe time and money patients save
Adverse effects of delayed prescribingThe possible adverse effects, with potential medicolegal ramifications: the missing or masking of serious illness; the physician losing control of the patient’s medical situation; the physician becoming less able to monitor outcomes; the possibility that some patients might still take antibiotic unnecessarily; and/or the possibility that antibiotics might be saved and later used inappropriately by another family member
Selectivity for useThe factors determining who might get a delayed prescription: patient age, education, ability to understand English, transiency to the practice, and other varying criteria for use regarding specific conditions
TABLE 2

Answers from physicians and patients interviewed about their use of delayed prescriptions

ThemeQuotes from physiciansQuotes from patients
Value judgment of antibiotics “I wanted to prove to myself I could get better without antibiotics.”
 “I expect to get antibiotics if I go to the physician with the flu.”
Decreased antibiotic use“Using a delayed prescription means you don’t give unnecessary antibiotics.”“I don’t like putting unnecessary drugs intomy body.”
Patient-centered factors“[Use of delayed prescribing provides] “an opportunity to educate and empower patients, allows them to make decisions for themselves, and offers them convenience of both access and cost. Otherwise they would have to return [to the physician’s office] if they [their condition] deteriorated.”“I like to decide for myself; I know when I need antibiotics.”
 “I prefer the physician to make the decision.”
Effects on the physician–patient relationship“The patient goes out the door with something. It does not damage the physician–patient relationship; the patient does not feel short-changed.”“Some parents panic; it helps to ease their “minds.”
“The patient might think you don’t know “ what you are doing, that you are sitting on the fence … the patient has decided to come to physician for advice and wants to be told what to do, not [be] given more options.”“If I go to the physician, it is because I know I need antibiotics.”
 “Younger physicians these days don’t have the experience; they are too busy to know when something is really wrong.”
Patient convenience“Patient convenience: preventing afterhours office visits saves the patient time and money.”“It saved [me] time and money.”
Adverse effects of delayed prescribing“Some patients will start right away anyway and use antibiotic when they really don’t need it. … [the physician has] no way of knowing whether they take it or not… [patients] may not seek medial attention if they get sicker because they have started the antibiotic and assume that’s all that can be done.”“Some people might take it [antibiotics] unnecessarily.”
“There could be medicolegal problems with a litigious patient.”“Maybe some people need to be told what to do and would get confused.”
Selectivity for use“I would never give [a delayed prescription] to very small children, infants, or even children younger than 3 or 4 [years].”“[Delayed prescribing is] good for me but not necessarily for everybody. Many people … have a very poor understanding of medicines.”
“I mostly use it [delayed prescribing] in children younger than 6 years.“I want the physician to make the decision when it’s my children; I’d rather take them back [to the office if necessary].”
“I don’t use [delayed prescribing with] patients, especially elderly ones, having a past history of chronic illness—such as bronchitis, excessive smoking, sinusitis— that has required antibiotics.” 
 

 

Value judgment of antibiotics

The theme of “value judgment of antibiotics” was evident only among patients. Several expressed the opinion that antibiotics were the necessary treatment to take every time they became ill. Conversely, other patients considered antibiotics bad for them and preferred to use alternatives such as naturopathic medications.

Decreased antibiotic use

The primary motivation for delayed prescribing by physicians was to decrease unnecessary antibiotic use. Benefits include avoiding patient side effects; decreasing the drug bill for taxpayers; and, especially, decreasing the occurrence of antibiotic resistance. Several patients made comments relevant to this theme. None identified decreasing resistance as an important goal, but 3 patients said the strategy could help avoid unwarranted antibiotic use.

Patient-centered factors

“Patient-centered factors” was a strong theme to emerge—especially from high-prescriber physicians. The physicians indicated that delayed prescribing helped them practice more patient-centered medicine—educating patients to take more responsibility for their own health care management and being more receptive to patient needs. Some physicians took into account pending weekends or patients’ travel or work commitments when offering delayed prescriptions. Although some patients mentioned their involvement in decision making, this aspect generally was not a key factor for many of them. Some liked to make the decision for themselves, which included using their “delayed” prescription immediately. Most patients did not wish to have an active role in decision making and preferred their physicians to decide for them. No patients commented on the role of the physician in providing them with education on their health matters.

Effects on the physician–patient relationship

The theme of “effects on the physician–patient relationship” delineated an associated factor for physicians: the strategy of delayed prescribing strengthened physician–patient relationships by helping physicians cope with the pressure they experienced from patients expecting antibiotics for common colds; by reassuring patients; by giving patients something to take home; and by preventing patients from going to a different physician to obtain antibiotics. An alternative view, expressed by one low-prescriber physician, was that delayed prescribing might damage the physician–patient relationship because the patient might consider the physician incompetent.

For a few patients, use of delayed prescriptions was reassuring. Several patients’ expectations that antibiotics were required persisted at the end of the consultation, and they chose to have their prescriptions filled immediately. Presumably, they would have gone elsewhere had they left the consultation empty-handed. Use of delayed prescribing had a potential negative effect on the physician–patient relationship for at least 2 patients. They perceived delayed prescribing as an indication of physician indecisiveness and incompetence or that the physician was trying to hold down costs to the patient at the risk of the patient’s being ill.

Patient convenience

The theme of “patient convenience” and cost savings was a strong theme among physicians and less so among patients. Several patients identified that delayed prescriptions could save them trouble and expense. For 3 patients this was not an issue, but they acknowledged it could be of value to busy working people or low-income patients.

Adverse effects of delayed prescribing

Regarding the theme of “adverse effects of delayed prescribing,” some physicians saw little or no disadvantage if delayed prescriptions were given to the right patients with correct instructions. However, low-prescribers identified a number of possible adverse effects of delayed prescriptions, such as leading to missing or masking serious illness, with possible medicolegal ramifications. Physicians were concerned about being perceived by patients as losing control of the situation and being less able to monitor outcomes. Even using delayed prescribing, some patients might still take antibiotics unnecessarily. The possibility also exists that the antibiotic might be saved and later used inappropriately by another family member.

Patients identified several potential problems, often for people other than themselves. Not only could delayed prescriptions have the potential to be confusing, especially for less-educated people, but 1 patient thought the practice might lead to patients taking antibiotics unnecessarily.

Selectivity for use

Physicians generally were selective about patients for whom they considered delayed prescribing appropriate. Patients who were poorly educated, who had a bad command of English, or who were transient to the practice were identified as poor candidates for receiving delayed prescriptions. Most physicians restricted delayed prescriptions to a particular age range. However, within this category there was considerable variability and inconsistency. Many used delayed prescriptions only for children, with children younger than 6 years being the most suitable group; others used delayed prescribing only for children older than 6 to 8 years. One physician would not use the strategy in very young children, ie, younger than 3 years. There was no consensus regarding circumstances or specific instructions for use. Some used delayed prescribing only with clearly viral illnesses; others employed the strategy in patients with chronic illnesses during which secondary infection was more likely. Instructions varied regarding symptoms to watch out for and how long to wait before filling the prescription.

 

 

Selection of patients was also a dominant theme for patients. Although they thought delayed prescribing might be acceptable for themselves, a number of patients believed that others might not understand or get confused. One patient was happy to make decisions about her own management, but believed the physician should make decisions about her children. Patients did not venture any opinions regarding conditions for which they thought use of delayed prescribing was warranted.

Discussion

Delayed prescribing is a strategy developed primarily to decrease unnecessary antibiotic use in the management of upper respiratory tract infections. Although physicians emphasized the importance of decreasing antimicrobial resistance, patients did not consider this factor. Continued public health education on this issue, including family physicians providing pertinent information to individual patients, could be helpful. Many patients have relatively fixed ideas that antibiotics are either “good” or “bad” for their health without knowing the personal and public health nuances of antibiotic prescribing.

Patients may pressure their physicians for unnecessary antibiotics either by direct request or indirectly by the way they present their complaint.25 Physicians may also incorrectly perceive that patients want antibiotics.6 This study showed that physicians are likely to use delayed prescription as a technique to decrease antibiotic use in patients they perceive as wanting antibiotics regardless of the medications’ appropriateness.

Empowering patients to have more control over their health care management was more important to physicians than patients. Patients held differing views, and whereas some appreciated the option of controlling the decision whether and when to take antibiotics, others expected “the physician to decide.” Perhaps improved physician–patient communication, as well as delayed prescribing, could help patients better understand about antibiotic use.

Many patients in this study had previously received antibiotics for common colds. Most physicians believed that using delayed prescriptions was a compromise strategy that prevented patients from feeling brushed off and offered reassurance, thus protecting the physician–patient relationship. Some patients reciprocated this view. However, a concern expressed by 1 physician that patients might view delayed prescribing as physician incompetence was substantiated by comments from other patients.

The potential adverse effects identified by some of the physicians, such as a serious disorder being masked or missed and physicians having less medical control, could be largely remedied by establishing criteria for suitable patient selection and improved educational resources as suggested above. Physicians and patients both expressed that some patients might automatically have their prescriptions filled and thus take antibiotics unnecessarily. Given that 2 patients had their “delayed” prescriptions filled immediately, this concern appears justified. No safeguard could entirely prevent inappropriate use by other family members.

Both physicians and patients commented that delayed prescribing is not appropriate for all patients. Patients need to understand the explanation of why antibiotics are not currently indicated and the instructions as to when they might be needed. In our opinion, patient comprehension might be greatly assisted by the use of clear handouts explaining, in patient-friendly terms, the management of upper respiratory tract infections.

The use of delayed prescribing in family practice is becoming more common.14,16-19 Considerable inconsistency and contradictory practices were found regarding its use in children and adults. Such diversity in physicians’ views regarding suitable ages raises questions about the optimal use of delayed prescriptions. Similarly, no consensus was found regarding circumstances and instructions under which physicians would use delayed prescribing. The development of more formalized recommendations regarding patient suitability and criteria for delayed prescribing is needed.

Given the concern that some patients might be confused about when to use a delayed prescription, placing the prescription in an envelope with clearly written instructions (ie, when to use and under what conditions) on the outside might ameliorate this difficulty. This practice might serve to further decrease unnecessary antibiotic use. Alternatively, special patient instructions in written form may be warranted, as was done in a controlled before–after study of delayed prescriptions for otitis media.15

In conclusion, previous research has shown delayed prescribing to be an effective means of decreasing antibiotic consumption for conditions not clinically warranting their use.14,15,19 However, not all physicians or patients demonstrated complete satisfaction with the strategy, and both groups agreed that selectivity is required for issuing a delayed prescription. Unlike interventions such as administering new drugs, physicians have spontaneously and independently generated the practice of delayed prescribing. Consequently, the practice varies considerably with respect to which patients, conditions, and instructions are considered appropriate. Formalizing recommendations for patient suitability and instructions for use may be required to ensure safety and consistency. Long-term safety issues will need to be monitored using longitudinal, large-cohort studies.

References

1. Arroll B, Goodyear-Smith F. General practitioner management of upper respiratory tract infections: when are antibiotics prescribed? N Z Med J 2000;13:493-6.

2. Butler CC, Rollnick S, Pill R, Maggs-Rapport F, Stott N. Understanding the culture of prescribing: qualitative study of general practitioners’ and patients’ perceptions of antibiotics for sore throats. Br Med J 1998;317:637-42.

3. Arroll B, Everts N. The common cold: what does the public think and want? N Z Fam Physician 1999;26:51-6.

4. Macfarlane J, Holmes W. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. Br Med J 1997;315:1211-4.

5. Palmer DA, Bauchner H. Parents’ and physicians’ views on antibiotics. Pediatrics 1997;99:E6.-

6. Cockburn J, Pit S. Prescribing behaviour in clinical practice: patients’ expectations and physicians’ perceptions of patients’ expectations—a questionnaire study. Br Med J 1997;315:520-3.

7. Britten N. Patients demands for prescriptions in primary care. Br Med J 1995;310:1084-5.

8. Arason VA, Kristinsson KG, Sigurdsson JA, Stefansdottir G, Molstad S, Gudmundsson S. Do antimicrobials increase the carriage rate of penicillin resistant pneumococci in children? Cross sectional prevalence study. Br Med J 1996;313:387-91.

9. Verkatesum P, Innes JA. Antibiotic resistance in common acute respiratory pathogens. Thorax 1995;50:481-3.

10. McCaig LF, Hughes JM. Trends in antimicrobial drug prescribing among office-based physicians in the United States. JAMA 1995;273:214-9.

11. Hueston WJ, Mainous AG, 3rd, Ornstein S, Pan Q, Jenkins R. Antibiotics for upper respiratory tract infections. Follow-up utilization and antibiotic use. Arch Fam Med 1999;8:426-30.

12. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-9.

13. Thomas MG, Arroll B. “Just say no”—reducing the use of antibiotics for colds, bronchitis and sinusitis. N Z Med J 2000;113:287-9.

14. Couchman GR, Rascoe TG, Forjuoh SN. Back-up antibiotic prescriptions for common respiratory symptoms. Patient satisfaction and fill rates. J Fam Pract 2000;49:907-13.

15. Cates C. An evidence based approach to reducing antibiotic use in children with acute otitis media: controlled before and after study. Br Med J 1999;318:715-6.

16. Dowell J, Pitkethly M, Bain J, Martin S. A randomised controlled trial of delayed antibiotic prescribing as a strategy for managing uncomplicated respiratory tract infection in primary care. Br J Gen Pract 2001;51:200-5.

17. Little P, Williamson I, Warner G, Gould C, Gantley M, Kinmonth AL. Open randomised trial of prescribing strategies in managing sore throat. Br Med J 1997;314:722-7.

18. Little P, Gould C, Williamson I, Moore M, Warner G, Dunleavey J. Pragmatic randomised controlled trial of two prescribing strategies for childhood acute otitis media. Br Med J 2001;322:336-42.

19. Arroll B, Kenealy T, Kerse N. Does a delayed prescription reduce unnecessary antibiotic use? A randomized controlled trial. J Fam Pract 2002;51:324-8.

20. Curtis S, Gesler W, Smith G, Washburn S. Approaches to sampling and case selection in qualitative research: examples in the geography of health. Soc Sci Med 2000;50:1001-4.

21. Barbour RS. Checklists for improving rigour in qualitative research: a case of the tail wagging the dog? Br Med J 2001;322:1115-7.

22. Guba EG, Lincoln YS. Competing paradigms in qualitative research. In: Denzin NK, Lincoln YS, eds. Handbook of Qualitative Research. Thousand Oaks, Calif: Sage Publications Inc; 1994;105-117.

23. Kuzel AJ, Engel JD, Addison RB, Bogdewic SP. Desirable features of qualitative research. Fam Pract Res J 1994;14:369-78.

24. Strauss A, Corbin J. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1998.

25. Scott JG, Cohen D, DiCicco-Bloom B, Orzano AJ, Jaen CR, Crabtree BF. Antibiotic use in acute respiratory infections and the ways patients pressure physicians for a prescription. J Fam Pract 2001;50:853-8.

References

1. Arroll B, Goodyear-Smith F. General practitioner management of upper respiratory tract infections: when are antibiotics prescribed? N Z Med J 2000;13:493-6.

2. Butler CC, Rollnick S, Pill R, Maggs-Rapport F, Stott N. Understanding the culture of prescribing: qualitative study of general practitioners’ and patients’ perceptions of antibiotics for sore throats. Br Med J 1998;317:637-42.

3. Arroll B, Everts N. The common cold: what does the public think and want? N Z Fam Physician 1999;26:51-6.

4. Macfarlane J, Holmes W. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. Br Med J 1997;315:1211-4.

5. Palmer DA, Bauchner H. Parents’ and physicians’ views on antibiotics. Pediatrics 1997;99:E6.-

6. Cockburn J, Pit S. Prescribing behaviour in clinical practice: patients’ expectations and physicians’ perceptions of patients’ expectations—a questionnaire study. Br Med J 1997;315:520-3.

7. Britten N. Patients demands for prescriptions in primary care. Br Med J 1995;310:1084-5.

8. Arason VA, Kristinsson KG, Sigurdsson JA, Stefansdottir G, Molstad S, Gudmundsson S. Do antimicrobials increase the carriage rate of penicillin resistant pneumococci in children? Cross sectional prevalence study. Br Med J 1996;313:387-91.

9. Verkatesum P, Innes JA. Antibiotic resistance in common acute respiratory pathogens. Thorax 1995;50:481-3.

10. McCaig LF, Hughes JM. Trends in antimicrobial drug prescribing among office-based physicians in the United States. JAMA 1995;273:214-9.

11. Hueston WJ, Mainous AG, 3rd, Ornstein S, Pan Q, Jenkins R. Antibiotics for upper respiratory tract infections. Follow-up utilization and antibiotic use. Arch Fam Med 1999;8:426-30.

12. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-9.

13. Thomas MG, Arroll B. “Just say no”—reducing the use of antibiotics for colds, bronchitis and sinusitis. N Z Med J 2000;113:287-9.

14. Couchman GR, Rascoe TG, Forjuoh SN. Back-up antibiotic prescriptions for common respiratory symptoms. Patient satisfaction and fill rates. J Fam Pract 2000;49:907-13.

15. Cates C. An evidence based approach to reducing antibiotic use in children with acute otitis media: controlled before and after study. Br Med J 1999;318:715-6.

16. Dowell J, Pitkethly M, Bain J, Martin S. A randomised controlled trial of delayed antibiotic prescribing as a strategy for managing uncomplicated respiratory tract infection in primary care. Br J Gen Pract 2001;51:200-5.

17. Little P, Williamson I, Warner G, Gould C, Gantley M, Kinmonth AL. Open randomised trial of prescribing strategies in managing sore throat. Br Med J 1997;314:722-7.

18. Little P, Gould C, Williamson I, Moore M, Warner G, Dunleavey J. Pragmatic randomised controlled trial of two prescribing strategies for childhood acute otitis media. Br Med J 2001;322:336-42.

19. Arroll B, Kenealy T, Kerse N. Does a delayed prescription reduce unnecessary antibiotic use? A randomized controlled trial. J Fam Pract 2002;51:324-8.

20. Curtis S, Gesler W, Smith G, Washburn S. Approaches to sampling and case selection in qualitative research: examples in the geography of health. Soc Sci Med 2000;50:1001-4.

21. Barbour RS. Checklists for improving rigour in qualitative research: a case of the tail wagging the dog? Br Med J 2001;322:1115-7.

22. Guba EG, Lincoln YS. Competing paradigms in qualitative research. In: Denzin NK, Lincoln YS, eds. Handbook of Qualitative Research. Thousand Oaks, Calif: Sage Publications Inc; 1994;105-117.

23. Kuzel AJ, Engel JD, Addison RB, Bogdewic SP. Desirable features of qualitative research. Fam Pract Res J 1994;14:369-78.

24. Strauss A, Corbin J. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1998.

25. Scott JG, Cohen D, DiCicco-Bloom B, Orzano AJ, Jaen CR, Crabtree BF. Antibiotic use in acute respiratory infections and the ways patients pressure physicians for a prescription. J Fam Pract 2001;50:853-8.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
954-959
Page Number
954-959
Publications
Publications
Article Type
Display Headline
Delayed antibiotic prescriptions: What are the experiences and attitudes of physicians and patients?
Display Headline
Delayed antibiotic prescriptions: What are the experiences and attitudes of physicians and patients?
Legacy Keywords
,Antibioticsfamily practicequalitative evaluationupper respiratory tract infection. (J Fam Pract 2002; 51:954-959)
Legacy Keywords
,Antibioticsfamily practicequalitative evaluationupper respiratory tract infection. (J Fam Pract 2002; 51:954-959)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Does oral creatine supplementation improve strength? A meta-analysis

Article Type
Changed
Mon, 01/14/2019 - 10:57
Display Headline
Does oral creatine supplementation improve strength? A meta-analysis

KEY POINTS FOR CLINICIANS

  • Oral creatine supplementation combined with resistance training increases maximal weight young men can lift.
  • It is unknown whether this increase in strength translates into improvement in sports performance.
  • Evidence in the existing literature is insufficient to draw conclusions about the effect of creatine in women or older individuals.
  • Because no long-term studies have been performed on the safety of creatine supplementation, its use should not be universally recommended.

Creatine has gained widespread popularity during the past decade as a possible performance-enhancing agent among professional and recreational athletes. It is the most widely used performance-enhancing supplement among youth aged 10 to 17 years,1 with 15% to 30% of high school athletes2,3 and 48% of male Division I college athletes4 reporting creatine use. Considered a nutritional supplement, it is not regulated by the United States Food and Drug Administration nor is it banned by the International Olympic Committee or National Collegiate Athletic Association. Because of the widespread use of creatine, primary care providers must be knowledgeable about its effectiveness and safety.

Oral creatine monohydrate increases skeletal muscle creatine concentration by 16% to 50%,5-7 but whether it is an effective ergogenic aid remains controversial. Multiple studies have investigated this question, but many have been small, often including fewer than 10 subjects, and results have been conflicting. Several reviews8-14 have addressed the effectiveness of creatine, but there has not been a systematic and comprehensive meta-analysis to resolve the uncertainties in the literature or to quantify the magnitude of the effect of creatine. To evaluate whether oral creatine supplementation improves strength and power in healthy adults, and further to quantify the effect, we performed a meta-analysis of randomized and matched controlled trials investigating creatine supplementation and strength.

Methods

Search strategy

To identify possible studies for inclusion, 1 author (M.F.M.) searched the MEDLINE electronic database (1966–2000) using the terms “creatine supplementation” or “creatine” combined with “strength” or “power.” Another MEDLINE search (1966–2000) was independently conducted by another author (R.L.D.) using the term “creatine not kinase” combined with a previously published search strategy to comprehensively identify randomized clinical trials.15 We searched the Cochrane Controlled Trials Register using the term “creatine not kinase.” We manually reviewed bibliographies of identified studies, abstracts from American College of Sports Medicine annual meetings (1999 and 2000), and a reference list distributed by an expert on the subject at the annual meeting of the American Medical Society for Sports Medicine (2000). Titles and available abstracts were screened and relevant articles retrieved. An expert in the field was contacted for sources of unpublished data.

Inclusion and exclusion criteria

Two reviewers independently assessed articles for inclusion. A third reviewer was consulted to resolve discrepancies. We used the following inclusion criteria: (1) the articles reported results of randomized or matched placebo-controlled trials investigating the effect of oral creatine supplementation on strength or power with or without concomitant resistance training; (2) the study subjects were healthy men or women older than 16 years with or without previous athletic training; and (3) the studies were published in any language. Given the general disagreement regarding the time required for muscle creatine concentration to return to presupplementation levels after discontinuing oral creatine,16-18 studies using a crossover design were excluded from the statistical analysis unless data from the first arm, before crossover, could be abstracted or obtained from the original investigator. Outcomes were measures of strength or power of any muscle group, including maximal weight lifted; peak power achieved in maximal (sprint) cycle ergometry; and peak knee flexion/extension torque in isokinetic dynamometer testing. Measurements of endurance, such as time to fatigue on cycle ergometer and number of repetitions achieved in submaximal weight lifting, were excluded. For studies reporting outcome per kilogram of body weight, we contacted investigators to obtain absolute outcome values and excluded studies if uncorrected data were not received. We also excluded articles that evaluated outcomes not investigated in at least 2 other studies. Finally, if we could not extract data in a usable form, we contacted investigators to obtain adequate data.

Quality assessment

Two independent reviewers appraised articles to determine methodological quality with respect to risk of bias under the following categories: method of randomization, allocation concealment, blinding, similarity of study groups, withdrawals and dropouts, and intention-to-treat analysis. Each study that met inclusion criteria was given a quality score, with a maximum possible score of 10, using a tool adapted from the Cochrane Handbook.19 The quality assessment data are presented but were not used to exclude or rank any study.

 

 

Data abstraction and statistical analysis

Two independent reviewers abstracted data, and a third reviewer resolved differences. For studies investigating multiple sprints, data from the first sprint only were included in statistical analysis because the first sprint is when peak power achievement is expected. A weighted mean difference (WMD) between creatine and placebo groups was calculated for each outcome using Review Manager 4.1 software (developed by The Cochrane Collaboration). A fixed effects model was used unless statistical heterogeneity was significant (P < .05), in which case a random effects model was used. Subanalyses were planned on several factors that were anticipated to be sources for variation, including (1) dose and duration of creatine administration, (2) concomitant resistance training, (3) different baseline level of physical training, (4) age, and (5) sex.

Results

Description of studies

After reviewing titles and available abstracts of more than 500 articles, we retrieved 66 potentially relevant studies, 16 of which met inclusion criteria for the analysis.17,20-34 Characteristics of these studies are summarized in the Table. Included studies represented 20 discrete samples and 414 subjects. Two studies20,21 evaluated creatine supplementation in men older than 60 years, whereas all the others studied younger subjects (range, 18–36 years). Only 1 study included women.17 Creatine dosages were similar across included studies (typically 20 g/d for the first 4–7 days of supplementation and 5 g/d thereafter). Studies that evaluated maximal weight lifting performance were more likely to include adjuvant resistance training programs in their protocols than those that evaluated cycle ergometry sprint or isokinetic dynamometer performance. None included cycle ergometry training.

TABLE

Characteristics of included studies

ReferenceNo. subjects (sex)Dose per day and durationTraining levelWeight training during study?Outcome measurementQuality score (out of 10)Comparability of creatine & placebo groups at baseline*
Barnett 199617 (M)280 mg/kg ×4 dActiveNoCP2.5+
Cooke 199512 (M)20 g ×5 dUntrainedNoCP2.5+
Cooke 199780 (M)20 g ×5 dTrained or activeNoCP2+++
Dawson 199518 (M), 22(M)20 g ×5 dActiveNoCP3+++
Jones 199916 (M)20 g ×5 d then 5 g ×10 wkTrainedYesCM3+++
Stone 199920 (M)0.22 g/kg ×35 dTrainedYesCM, BP, S4.5+++
Kelly 199818 (M)20 g ×5 d then 5 g ×26 dTrainedYes3BP2
Noonan 199839 (M)20 g ×5 d then 300 mg/kg ×8 wkTrainedYesBP5.5+++
Peeters 199935 (M)20 g ×3 d then 10 g ×6 wkTrainedYesBP3+++
Vandenberghe 199719 (F)20 g ×4 d then 5 g ×10 wkUntrainedYesBP, S5+++
Pearson 199916 (M)5 g ×10 wkTrainedYesBP, S, PT3+++
Volek 199919 (M)25 g ×7 d then 5 g ×12 wkTrainedYesBP, S4.5+++
Gilliam 200023 (M)20 g ×5 dActive but untrainedNoPT2.5+
Rawson 1999§20 (M)20 g ×10 d then 4 g ×20 dUntrainedNoAF, PT4.5+++
Rawson 2000§17 (M)20 g ×5 dUntrainedNoAF3.5+++
Becque 200023 (M)20 g ×5 d then 2 g ×6 wkTrainedYesAF5+
*Comparability between groups was assessed for age, anthropomorphic measurements, and strength outcomes. +++ = similar for all 3 characteristics; + = similar for strength outcome measurements; – = not comparable at baseline for strength outcome.
Four protocols with 20 subjects each evaluating the same strength outcome measurement reported in Cooke 1997.
Two separate experiments reported in Dawson 1995.
§Included subjects > 60 years old; in all others subjects were < 36 years old.
AF, 1 repetition maximum arm flexor strength; BP, 1 repetition maximum bench press strength; 3BP, 3 repetition maximum bench press strength; CM, cycle ergometer mean peak power; CP, cycle ergometer peak power; PT, isokinetic leg flexion/extension peak torque; S, 1 repetition maximum squat strength.

Methodological quality of included studies

The methodological quality of studies was generally low (Table). The mean quality score was 3.5 ± 1.2 (mean ± SD) out of a possible 10 (range, 2–5.5). None of the studies identified the method of randomization used or specifically reported an intention-to-treat analysis. None specifically reported masking of outcome assessment. In general, these significant flaws in study design would tend to result in overestimation of the benefit of creatine supplementation.

Absolute strength

When 1- to 3-repetition maximum bench press strength measurements were statistically combined (they were homogeneous), the creatine supplementation group showed an absolute strength increase of 6.85 kg (95% confidence interval [CI], 5.24–8.47; n = 143) lifted per repetition greater than that seen with placebo alone (Figure 1). There was no additional advantage in strength performance after 9 to 12 weeks of supplementation (WMD = 6.6 kg; 95% CI, 3.5–9.5) compared with 4 to 8 weeks of supplementation (WMD = 6.6 kg; 95% CI, 4.8–8.4). Subanalysis for an interaction with resistance training, previous training level, age, or sex was not possible because all studies measuring bench press strength except one17 investigated creatine supple mentation in previously trained young men who continued resistance training during supplementation. The 1 study in previously sedentary young women17 did find a trend toward increased bench press strength, although independently this change was not statistically significant.

 

 

There was no significant difference in 1-repetition maximum arm flexor strength with creatine supplementation (WMD = 1.53 kg; 95% CI, –1.07 to 4.13; n = 60; Figure 2). However, 2 trials20,21 of the 3 evaluating this outcome studied subjects older than 60 years and did not employ adjuvant weight training programs. The study that incorporated resistance training and evaluated younger subjects22 found a modest (29.9% vs 16.5%) improvement in 1-repetition maximum arm flexor strength with creatine compared with placebo.

For 1-repetition maximum squat, creatine supplementation resulted in a strength increase of 9.76 kg (95% CI, 3.37–16.15; n = 74) greater than that of placebo (Figure 3). There was no advantage to longer-term supplementation (10.9 kg more than placebo [95% CI, 3.4–18.4] for 5–6 weeks compared with 10.4 kg [95% CI, 3.5–17.2] for 10–12 weeks). Again, in all but 1 study17 measuring squat performance, subjects were previously trained young men engaging in adjuvant resistance training programs, so subanalysis for other variables was not possible. For previously sedentary women, Vandenberghe et al17 found no difference at 5 weeks, but they did find a significant improvement in 1-repetition maximum squat performance with creatine supplementation at 10 weeks. Tests for heterogeneity were nonsignificant for all absolute strength variables.

To evaluate for publication bias, we examined funnel plots of each of the 3 absolute strength outcomes (bench press, arm flexor, and squat exercises). No evidence of publication bias was demonstrated. Figure W1 (available on the JFP Web site: http://www.jfponline.com) depicts a composite funnel plot of all 3 outcomes using a standardized mean difference to allow comparison between these 3 different outcomes.

FIGURE 1 Studies assessing 1- to 3-repetition maximum bench press strength



FIGURE 2 Studies assessing 1-repetition maximum arm flexor strength



FIGURE 3 Studies assessing 1-repetition maximum squat strength


Cycle ergometer peak power

Creatine supplementation had no effect on peak power production during cycle ergometry sprint (Figure 4). Results among studies were widely variable (test for heterogeneity P= .035), so a random effects model was used to pool data. The summary weighted mean difference of 16.79 W (95% CI, –13.26 to 46.84; n = 149) was insignificant, both statistically (test for overall effect P= .3) and clinically, because this represents approximately a 1% change greater than baseline. Two studies23,24 looked at mean peak power across a series of 15-to 30-second sprints and found inconsistent results, with a summary weighted mean difference of 68.61 W (95% CI, –85.74 to 222.97; n = 36). Of note, for the 2 studies24,25 that demonstrated improved performance with creatine, the difference was accentuated by an unexplained but pronounced worsening of performance after supplementation in the placebo groups.

FIGURE 4 Studies assessing cycle ergometer sprint peak power


Dynamometer peak torque

Only 3 studies21,26,27 evaluated peak torque, and all used slightly different outcome assessments. One study26 reported average peak torque across 30 isokinetic leg flexion/extension contractions; 1 study21 reported the sum of peak torque across 5 sets of 30 isokinetic leg flexion/extension contractions; and 1 study27 gave peak torque data for isokinetic leg extension but did not describe precisely how peak torque was determined. There was no difference between creatine and placebo for isokinetic leg flexion/extension peak torque using a standardized mean difference to account for variations in measurement of this outcome. Tests for heterogeneity were nonsignificant for this outcome (P= .19).

Adverse effects

Four studies commented on short-term adverse effects of creatine supplementation. Three studies17,23,28 found no difference between creatine and placebo. One study21 reported gastrointestinal upset, rash, or headache in 3 subjects taking creatine and no adverse effects in subjects taking placebo. None of these studies was designed to evaluate long-term adverse effects of creatine supplementation, and there were no reports of longer-term follow up.

Discussion

This is the first study to report quantitatively the effect of creatine supplementation on strength performance from meta-analysis of the existing literature. We found that oral creatine supplementation improves maximal resistance exercise performance in previously trained young men. There is insufficient evidence that creatine improves other measures of strength, such as cycle ergometry sprint peak power or isokinetic dynamometer peak torque, or that creatine improves strength in women or older individuals. The effect of creatine on endurance, submaximal exercise, or actual “on-field” athletic performance was not addressed.

Creatine’s ergogenic properties may result from allowing increased work during training and decreasing recovery time. If so, creatine must be combined with adjuvant training to increase strength and power. Only studies investigating maximal weight-lifting performance incorporated resistance-training programs specific to the outcome being measured. Three studies included weight training but investigated non–weight-lifting outcomes,23,24,27 and only 1 study24 found a benefit from creatine supplementation. It is unclear whether the lack of effect for non–weight-lifting outcomes means that creatine is not beneficial unless combined with specific adjuvant training or that creatine simply is not ergogenic for outcomes other than maximal weight lifted.

 

 

This meta-analysis has some limitations. Our definition of strength included only “pure strength” or “power” measurements to allow statistical comparisons between similar outcomes. Because muscle strength is related to muscle endurance, researchers may define strength differently. It is not obvious at what point an exercise becomes a test of endurance and not just strength, but there is a physiologic basis for believing that creatine supplementation would more markedly improve performance in maximal or shorter duration exercises (ie, requiring strength and not endurance). The inclusion criteria for this project were determined before study review and selection and were applied consistently across all studies.

The quality and design of identified studies was another limitation. Most were small and did not fully delineate their randomization or blinding strategies. Multiple variations in study protocols made combining results of different studies somewhat problematic. Unfortunately, meta-regression or subanalysis for variables such as concurrent resistance training, previous training level, age, and sex were not possible because too few studies evaluated these variables independently of one another. Almost all of the studies finding a benefit of creatine supplementation were in young, previously trained men who engaged in resistance training concomitantly with supplementation, and the outcome measured was maximal weight lifted. Those studies not finding a difference were generally of less highly trained or older individuals, did not include resistance training, and more often investigated outcomes other than maximal weight lifted. This meta-analysis identifies that it is impossible to conclude from the existing literature which combination of variables is necessary to see a benefit of creatine supplementation.

More information is needed on the safety of creatine supplementation. Although a recent review35 reported no significant short-term adverse effects, no adequate long-term studies have been conducted. Two retrospective trials36,37 reported no adverse effects from longer-term (up to 5 years) creatine supplementation;however, neither study was randomized, blinded, or controlled, and neither had sufficient statistical power to detect uncommon adverse effects. Additionally, the designs of these studies precluded the possibility of detecting serious adverse effects such as death or disability. There have been case reports of renal dysfunction due to creatine38-40 and, as of 1998, the Food and Drug Administration had received 32 adverse event reports including seizures, myopathy, rhabdomyolosis, cardiac arrhythmia, and death.41

Given the popularity of nutritional supplements among all levels of athletes, clinicians cannot avoid questions about the effectiveness and safety of creatine supplementation. This meta-analysis demonstrated that oral creatine does improve performance during maximal resistance exercises in young men. However, we found no benefit for outcomes other than maximal weight lifted, suggesting that creatine may not improve actual performance in more complex movements requiring strength, speed, and coordination of multiple muscle groups. Studies investigating the effect of creatine in actual athletic performance are lacking.

Several important questions remain to be answered about creatine. What are the effects for women and older individuals? Is resistance training necessary to see strength performance improvement? Are these improvements in strength accompanied by improved athletic performance? How long do the effects of creatine remain after discontinuing supplementation? Most importantly, what is the long-term safety profile of creatine? Without further research to answer these questions, we cannot support the use of creatine supplementation for performance enhancement despite evidence for a positive impact on some components of strength.

Drug therapy for prevention and treatment of postmenopausal osteoporosis

Drug (trade name)Indication and dosagePossible side effects (% of patients)Cost per month*
Calcium and vitamin D (generic,Tums,Citracal, and others)Prevention and treatment: 1200 –1500 mg/day calcium and 800 IU/day vitamin DNausea,dyspepsia (uncommon), constipation (10%)$5 (both)
Estrogen †(Premarin,Ogen,Estrace, Estraderm,and others)Prevention: 0.625 mg/day conjugated equine estrogen or the equivalent;0.3 mg/day may be effectiveNausea,breast tenderness, vaginal bleeding, mood alterations, headache, bloating$14 –$28
Alendronate (Fosamax)Prevention:5 mg/day or 35 mg/week Treatment:10 mg/day or 70 mg/wkNausea, dyspepsia, esophageal irritation$67
Risedronate (Actonel)Prevention and treatment: 5 mg/day or 35 mg/weekAbdominal pain, esophageal irritation$67
Raloxifene (Evista)Treatment: 60 mg/dayHot flashes (6%), leg cramps (3%)$70
Calcitonin nasal spray (Miacalcin)Treatment:200 IU/day (1 spray in 1 nostril per day)Rhinitis (5%), epistaxis, sinusitis$66
*Average wholesale cost to the pharmacy for 30 days of therapy; (Drug Topics Red Book. Montvale, NJ; Medical Economics Co., Inc, 2002.)
†Women with a uterus need to take a progestin such as medroxyprogesterone acetate (Provera $30/month, generic $9/month) or a combination estrogen/progestin product (Prempro $33/monh, FemHRT $26/month).

CORRECTIONS

On page 868 of the October issue a name was misspelled; the correct name is Brian S. Alper.

In the table appearing on page 877 of the October issue, the entry for Fosamax inadvertently combined prevention and treatment dosages. The corrected entry is shown below.

· Acknowledgments ·

 

 

This study was supported in part by a Faculty Development in Family Medicine Grant (No. 5D45 PE 55052-09) and a National Research Service Award Grant (No. 1T32 PE 10030-03) from the United States Department of Health and Human Services. The authors thank Craig Young, MD, who assisted in the conception of this project; Chris McLaughlin, who provided editorial assistance; and Veronica Ruleford, who assisted with the preparation of the manuscript.

References

1. USA Today. Survey: More than 1 million kids use sports supplements. USA Today. August 28, 2001. Available at: www.usatoday.com/news/nation/2001/08/28/youth-supplements.htm. Accessed October 8, 2002.

2. McGuine TA, Sullivan JC, Bernhardt DT. Creatine supplementation in high school football players. Clin J Sport Med 2001;11(4):247-53.

3. Ray TR, Eck JC, Covington LA, Murphy RB, Williams R, Knudtson J. Use of oral creatine as an ergogenic aid for increased sports performance: perceptions of adolescent athletes. South Med J 2001;94(6):608-12.

4. LaBotz M, Smith BW. Creatine supplement use in an NCAA Division I athletic program. Clin J Sport Med 1999;9(3):167-9.

5. Harris RC, Soderlund K, Hultman E. Elevation of creatine in resting and exercised muscle of normal subjects by creatine supplementation. Clin Sci 1992;83:367-74.

6. Vandenberghe K, Van Hecke P, Van Leemputte M, Vanstapel F, Hespel P. Phosphocreatine resynthesis is not affected by creatine loading. Med Sci Sports Exerc 1999;31(2):236-42.

7. Hultman E, Soderlund K, Timmons JA, Cederblad G, Greenhaff PL. Muscle creatine loading in men. J Appl Physiol 1996;812:32-7.

8. Terjung RL, Clarkson P, Eichner ER, Greenhaff PL, Hespel PJ, Israel RG, et al. American College of Sports Medicine roundtable. The physiological and health effects of oral creatine supplementation. Med Sci Sports Exerc 2000;32(3):706-17.

9. Kreider RB. Dietary supplements and the promotion of muscle growth with resistance exercise. Sports Med 1999;27(2):97-110.

10. Mujika I, Padilla S. Creatine supplementation as an ergogenic aid for sports performance in highly trained athletes: a critical review. Int J Sports Med 1997;18(7):491-6.

11. Juhn MS, Tarnopolsky M. Oral creatine supplementation and athletic performance: a critical review. Clin J Sport Med 1998;8(4):286-97.

12. Volek JS, Kraemer WJ. Creatine supplementation: its effect on human muscular performance and body composition. J Strength Cond Res 1996;10(3):200-10.

13. Maughan RJ. Creatine supplementation and exercise performance. Int J Sport Nutr 1995;5:94-101.

14. Kraemer WJ, Volek JS. Creatine supplementation. Its role in human performance. Clin Sports Med 1999;18(3):651-66.

15. Dickersin K, Scherer R, Lefebvre C. Systematic reviews: identifying relevant studies for systematic reviews. Br Med J 1994;309(6964):1286-91.

16. Febbraio MA, Flanagan TR, Snow RJ, Zhao S, Carey MF. Effect of creatine supplementation on intramuscular TCr, metabolism and performance during intermittent, supramaximal exercise in humans. Acta Physiol Scand 1995;155(4):387-95.

17. Vandenberghe K, Goris M, Van Hecke P, Van Leemputte M, Vangerven L, Hespel P. Long-term creatine intake is beneficial to muscle performance during resistance training. J Appl Physiol 1997;83:2055-63.

18. Greenhaff PL. Creatine and its application as an ergogenic aid. Int J Sport Nutr 1995;5(suppl):S100-10.

19. The Cochrane Collaboration. The Cochrane Handbook (Online). Available at: http://www.cochrane.dk/cochrane/handbook/hbook CONTENTS__6_ASSESSMENT_OF_STUDY_.htm. Accessed June 2001.

20. Rawson ES, Clarkson PM. Acute creatine supplementation in older men. Int J Sports Med 2000;21(1):71-5.

21. Rawson ES, Wehnert ML, Clarkson PM. Effects of 30 days of creatine ingestion in older men. Eur J Appl Physiol 1999;80(2):139-44.

22. Becque MD, Lochmann JD, Melrose DR. Effects of oral creatine supplementation on muscular strength and body composition. Med Sci Sports Exerc 2000;32(3):654-8.

23. Stone MH, Sanborn K, Smith LL, O’Bryant HS, Hoke T, Utter AC, et al. Effects of in-season (5 weeks) creatine and pyruvate supplementation on anaerobic performance and body composition in American football players. Int J Sport Nutr 1999;9(2):146-65.

24. Jones AM, Atter T, Georg KP. Oral creatine supplementation improves multiple sprint performance in elite ice-hockey players. J Sports Med Phys Fitness 1999;39(3):189-96.

25. Dawson B, Cutler M, Moody A, Lawrence S, Goodman C, Randall N. Effects of oral creatine loading on single and repeated maximal short sprints. Aust J Sci Med Sport 1995;27(3):56-61.

26. Gilliam JD, Hohzorn C, Martin D, Trimble MH. Effect of oral creatine supplementation on isokinetic torque production. Med Sci Sports Exerc 2000;32(5):993-6.

27. Pearson DR, Hamby DG, Russel W, Harris T. Long-term effects of creatine monohydrate on strength and power. J Strength Cond Res 1999;13(3):187-92.

28. Volek JS, Duncan ND, Mazzetti SA, Staron RS, Putukian M, Gomez AL, et al. Performance and muscle fiber adaptations to creatine supplementation and heavy resistance training. Med Sci Sports Exerc 1999;31(8):1147-56.

29. Barnett C, Hinds M, Jenkins DG. Effects of oral creatine supplementation on multiple sprint cycle performance. Aust J Sci Med Sport 1996;28(1):35-9.

30. Cooke WH, Grandjean PW, Barnes WS. Effect of oral creatine supplementation on power output and fatigue during bicycle ergometry. J Appl Physiol 1995;78:670-3.

31. Cooke WH, Barnes WS. The influence of recovery duration on high-intensity exercise performance after oral creatine supplementation. Can J Appl Physiol 1997;22(5):454-67.

32. Kelly VG, Jenkins DG. Effect of oral creatine supplementation on near-maximal strength and repeated sets of high-intensity bench press exercise. J Strength Cond Res 1998;12(2):109-15.

33. Noonan D, Berg K, Latin RW, Wagner JC, Reimers K. Effects of varying dosages of oral creatine relative to fat free body mass on strength and body composition. J Strength Cond Res 1998;12(2):104-8.

34. Peeters BM, Lantz CD, Mayhew JL. Effect of oral creatine monohydrate and creatine phosphate supplementation on maximal strength indices, body composition, and blood pressure. J Strength Cond Res 1999;13(1):3-9.

35. Juhn MS, Tarnopolsky M. Potential side effects of oral creatine supplementation: a critical review. Clin J Sport Med 1998;8(4):298-304.

36. Poortmans JR, Francaux M. Long-term oral creatine supplementation does not impair renal function in healthy athletes. Med Sci Sports Exerc 1999;31(8):1108-10.

37. Schilling BK, Stone MH, Utter A, Kearney JT, Johnson M, Coglianese R, et al. Creatine supplementation and health variables: a retrospective study. Med Sci Sports Exerc 2001;33(2):183-8.

38. Pritchard NR, Kalra PA. Renal dysfunction accompanying oral creatine supplementation [letter]. Lancet 1998;351:125-23.

39. Koshy KM, Griswold E, Schneeberger EE. Interstitial nephritis in a patient taking creatine [letter]. New Engl J Med 1999;340:814-5.

40. Kuehl KS, Goldberg L, Elliot D. Renal insufficiency after creatine supplementation in a college football athlete [abstract]. Med Sci Sports Exerc 1998;30(suppl 5):S235.-

41. United States Food and Drug Administration The Special Nutritionals Adverse Event Monitoring System. Available at: http://vm.cfsan.fda.gov/cgibin/aems.cgi?QUERY= creatine&STYPE =EXACT. Accessed March 3, 2001.

Article PDF
Author and Disclosure Information

RANIA L. DEMPSEY, MD
MICHAEL F. MAZZONE, MD
LINDA N. MEURER, MD, MPH
Milwaukee, Wisconsin
From the Department of Family and Community Medicine, Medical College of Wisconsin, Milwaukee, WI. The authors report no competing interests. Address reprint requests to Rania L. Dempsey, MD, Department of Family and Community Medicine, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226. Email: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
945-952
Legacy Keywords
,Creatinedietary supplementsmeta-analysis. (J Fam Pract 2002; 51:945–952)
Sections
Author and Disclosure Information

RANIA L. DEMPSEY, MD
MICHAEL F. MAZZONE, MD
LINDA N. MEURER, MD, MPH
Milwaukee, Wisconsin
From the Department of Family and Community Medicine, Medical College of Wisconsin, Milwaukee, WI. The authors report no competing interests. Address reprint requests to Rania L. Dempsey, MD, Department of Family and Community Medicine, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226. Email: [email protected].

Author and Disclosure Information

RANIA L. DEMPSEY, MD
MICHAEL F. MAZZONE, MD
LINDA N. MEURER, MD, MPH
Milwaukee, Wisconsin
From the Department of Family and Community Medicine, Medical College of Wisconsin, Milwaukee, WI. The authors report no competing interests. Address reprint requests to Rania L. Dempsey, MD, Department of Family and Community Medicine, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226. Email: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • Oral creatine supplementation combined with resistance training increases maximal weight young men can lift.
  • It is unknown whether this increase in strength translates into improvement in sports performance.
  • Evidence in the existing literature is insufficient to draw conclusions about the effect of creatine in women or older individuals.
  • Because no long-term studies have been performed on the safety of creatine supplementation, its use should not be universally recommended.

Creatine has gained widespread popularity during the past decade as a possible performance-enhancing agent among professional and recreational athletes. It is the most widely used performance-enhancing supplement among youth aged 10 to 17 years,1 with 15% to 30% of high school athletes2,3 and 48% of male Division I college athletes4 reporting creatine use. Considered a nutritional supplement, it is not regulated by the United States Food and Drug Administration nor is it banned by the International Olympic Committee or National Collegiate Athletic Association. Because of the widespread use of creatine, primary care providers must be knowledgeable about its effectiveness and safety.

Oral creatine monohydrate increases skeletal muscle creatine concentration by 16% to 50%,5-7 but whether it is an effective ergogenic aid remains controversial. Multiple studies have investigated this question, but many have been small, often including fewer than 10 subjects, and results have been conflicting. Several reviews8-14 have addressed the effectiveness of creatine, but there has not been a systematic and comprehensive meta-analysis to resolve the uncertainties in the literature or to quantify the magnitude of the effect of creatine. To evaluate whether oral creatine supplementation improves strength and power in healthy adults, and further to quantify the effect, we performed a meta-analysis of randomized and matched controlled trials investigating creatine supplementation and strength.

Methods

Search strategy

To identify possible studies for inclusion, 1 author (M.F.M.) searched the MEDLINE electronic database (1966–2000) using the terms “creatine supplementation” or “creatine” combined with “strength” or “power.” Another MEDLINE search (1966–2000) was independently conducted by another author (R.L.D.) using the term “creatine not kinase” combined with a previously published search strategy to comprehensively identify randomized clinical trials.15 We searched the Cochrane Controlled Trials Register using the term “creatine not kinase.” We manually reviewed bibliographies of identified studies, abstracts from American College of Sports Medicine annual meetings (1999 and 2000), and a reference list distributed by an expert on the subject at the annual meeting of the American Medical Society for Sports Medicine (2000). Titles and available abstracts were screened and relevant articles retrieved. An expert in the field was contacted for sources of unpublished data.

Inclusion and exclusion criteria

Two reviewers independently assessed articles for inclusion. A third reviewer was consulted to resolve discrepancies. We used the following inclusion criteria: (1) the articles reported results of randomized or matched placebo-controlled trials investigating the effect of oral creatine supplementation on strength or power with or without concomitant resistance training; (2) the study subjects were healthy men or women older than 16 years with or without previous athletic training; and (3) the studies were published in any language. Given the general disagreement regarding the time required for muscle creatine concentration to return to presupplementation levels after discontinuing oral creatine,16-18 studies using a crossover design were excluded from the statistical analysis unless data from the first arm, before crossover, could be abstracted or obtained from the original investigator. Outcomes were measures of strength or power of any muscle group, including maximal weight lifted; peak power achieved in maximal (sprint) cycle ergometry; and peak knee flexion/extension torque in isokinetic dynamometer testing. Measurements of endurance, such as time to fatigue on cycle ergometer and number of repetitions achieved in submaximal weight lifting, were excluded. For studies reporting outcome per kilogram of body weight, we contacted investigators to obtain absolute outcome values and excluded studies if uncorrected data were not received. We also excluded articles that evaluated outcomes not investigated in at least 2 other studies. Finally, if we could not extract data in a usable form, we contacted investigators to obtain adequate data.

Quality assessment

Two independent reviewers appraised articles to determine methodological quality with respect to risk of bias under the following categories: method of randomization, allocation concealment, blinding, similarity of study groups, withdrawals and dropouts, and intention-to-treat analysis. Each study that met inclusion criteria was given a quality score, with a maximum possible score of 10, using a tool adapted from the Cochrane Handbook.19 The quality assessment data are presented but were not used to exclude or rank any study.

 

 

Data abstraction and statistical analysis

Two independent reviewers abstracted data, and a third reviewer resolved differences. For studies investigating multiple sprints, data from the first sprint only were included in statistical analysis because the first sprint is when peak power achievement is expected. A weighted mean difference (WMD) between creatine and placebo groups was calculated for each outcome using Review Manager 4.1 software (developed by The Cochrane Collaboration). A fixed effects model was used unless statistical heterogeneity was significant (P < .05), in which case a random effects model was used. Subanalyses were planned on several factors that were anticipated to be sources for variation, including (1) dose and duration of creatine administration, (2) concomitant resistance training, (3) different baseline level of physical training, (4) age, and (5) sex.

Results

Description of studies

After reviewing titles and available abstracts of more than 500 articles, we retrieved 66 potentially relevant studies, 16 of which met inclusion criteria for the analysis.17,20-34 Characteristics of these studies are summarized in the Table. Included studies represented 20 discrete samples and 414 subjects. Two studies20,21 evaluated creatine supplementation in men older than 60 years, whereas all the others studied younger subjects (range, 18–36 years). Only 1 study included women.17 Creatine dosages were similar across included studies (typically 20 g/d for the first 4–7 days of supplementation and 5 g/d thereafter). Studies that evaluated maximal weight lifting performance were more likely to include adjuvant resistance training programs in their protocols than those that evaluated cycle ergometry sprint or isokinetic dynamometer performance. None included cycle ergometry training.

TABLE

Characteristics of included studies

ReferenceNo. subjects (sex)Dose per day and durationTraining levelWeight training during study?Outcome measurementQuality score (out of 10)Comparability of creatine & placebo groups at baseline*
Barnett 199617 (M)280 mg/kg ×4 dActiveNoCP2.5+
Cooke 199512 (M)20 g ×5 dUntrainedNoCP2.5+
Cooke 199780 (M)20 g ×5 dTrained or activeNoCP2+++
Dawson 199518 (M), 22(M)20 g ×5 dActiveNoCP3+++
Jones 199916 (M)20 g ×5 d then 5 g ×10 wkTrainedYesCM3+++
Stone 199920 (M)0.22 g/kg ×35 dTrainedYesCM, BP, S4.5+++
Kelly 199818 (M)20 g ×5 d then 5 g ×26 dTrainedYes3BP2
Noonan 199839 (M)20 g ×5 d then 300 mg/kg ×8 wkTrainedYesBP5.5+++
Peeters 199935 (M)20 g ×3 d then 10 g ×6 wkTrainedYesBP3+++
Vandenberghe 199719 (F)20 g ×4 d then 5 g ×10 wkUntrainedYesBP, S5+++
Pearson 199916 (M)5 g ×10 wkTrainedYesBP, S, PT3+++
Volek 199919 (M)25 g ×7 d then 5 g ×12 wkTrainedYesBP, S4.5+++
Gilliam 200023 (M)20 g ×5 dActive but untrainedNoPT2.5+
Rawson 1999§20 (M)20 g ×10 d then 4 g ×20 dUntrainedNoAF, PT4.5+++
Rawson 2000§17 (M)20 g ×5 dUntrainedNoAF3.5+++
Becque 200023 (M)20 g ×5 d then 2 g ×6 wkTrainedYesAF5+
*Comparability between groups was assessed for age, anthropomorphic measurements, and strength outcomes. +++ = similar for all 3 characteristics; + = similar for strength outcome measurements; – = not comparable at baseline for strength outcome.
Four protocols with 20 subjects each evaluating the same strength outcome measurement reported in Cooke 1997.
Two separate experiments reported in Dawson 1995.
§Included subjects > 60 years old; in all others subjects were < 36 years old.
AF, 1 repetition maximum arm flexor strength; BP, 1 repetition maximum bench press strength; 3BP, 3 repetition maximum bench press strength; CM, cycle ergometer mean peak power; CP, cycle ergometer peak power; PT, isokinetic leg flexion/extension peak torque; S, 1 repetition maximum squat strength.

Methodological quality of included studies

The methodological quality of studies was generally low (Table). The mean quality score was 3.5 ± 1.2 (mean ± SD) out of a possible 10 (range, 2–5.5). None of the studies identified the method of randomization used or specifically reported an intention-to-treat analysis. None specifically reported masking of outcome assessment. In general, these significant flaws in study design would tend to result in overestimation of the benefit of creatine supplementation.

Absolute strength

When 1- to 3-repetition maximum bench press strength measurements were statistically combined (they were homogeneous), the creatine supplementation group showed an absolute strength increase of 6.85 kg (95% confidence interval [CI], 5.24–8.47; n = 143) lifted per repetition greater than that seen with placebo alone (Figure 1). There was no additional advantage in strength performance after 9 to 12 weeks of supplementation (WMD = 6.6 kg; 95% CI, 3.5–9.5) compared with 4 to 8 weeks of supplementation (WMD = 6.6 kg; 95% CI, 4.8–8.4). Subanalysis for an interaction with resistance training, previous training level, age, or sex was not possible because all studies measuring bench press strength except one17 investigated creatine supple mentation in previously trained young men who continued resistance training during supplementation. The 1 study in previously sedentary young women17 did find a trend toward increased bench press strength, although independently this change was not statistically significant.

 

 

There was no significant difference in 1-repetition maximum arm flexor strength with creatine supplementation (WMD = 1.53 kg; 95% CI, –1.07 to 4.13; n = 60; Figure 2). However, 2 trials20,21 of the 3 evaluating this outcome studied subjects older than 60 years and did not employ adjuvant weight training programs. The study that incorporated resistance training and evaluated younger subjects22 found a modest (29.9% vs 16.5%) improvement in 1-repetition maximum arm flexor strength with creatine compared with placebo.

For 1-repetition maximum squat, creatine supplementation resulted in a strength increase of 9.76 kg (95% CI, 3.37–16.15; n = 74) greater than that of placebo (Figure 3). There was no advantage to longer-term supplementation (10.9 kg more than placebo [95% CI, 3.4–18.4] for 5–6 weeks compared with 10.4 kg [95% CI, 3.5–17.2] for 10–12 weeks). Again, in all but 1 study17 measuring squat performance, subjects were previously trained young men engaging in adjuvant resistance training programs, so subanalysis for other variables was not possible. For previously sedentary women, Vandenberghe et al17 found no difference at 5 weeks, but they did find a significant improvement in 1-repetition maximum squat performance with creatine supplementation at 10 weeks. Tests for heterogeneity were nonsignificant for all absolute strength variables.

To evaluate for publication bias, we examined funnel plots of each of the 3 absolute strength outcomes (bench press, arm flexor, and squat exercises). No evidence of publication bias was demonstrated. Figure W1 (available on the JFP Web site: http://www.jfponline.com) depicts a composite funnel plot of all 3 outcomes using a standardized mean difference to allow comparison between these 3 different outcomes.

FIGURE 1 Studies assessing 1- to 3-repetition maximum bench press strength



FIGURE 2 Studies assessing 1-repetition maximum arm flexor strength



FIGURE 3 Studies assessing 1-repetition maximum squat strength


Cycle ergometer peak power

Creatine supplementation had no effect on peak power production during cycle ergometry sprint (Figure 4). Results among studies were widely variable (test for heterogeneity P= .035), so a random effects model was used to pool data. The summary weighted mean difference of 16.79 W (95% CI, –13.26 to 46.84; n = 149) was insignificant, both statistically (test for overall effect P= .3) and clinically, because this represents approximately a 1% change greater than baseline. Two studies23,24 looked at mean peak power across a series of 15-to 30-second sprints and found inconsistent results, with a summary weighted mean difference of 68.61 W (95% CI, –85.74 to 222.97; n = 36). Of note, for the 2 studies24,25 that demonstrated improved performance with creatine, the difference was accentuated by an unexplained but pronounced worsening of performance after supplementation in the placebo groups.

FIGURE 4 Studies assessing cycle ergometer sprint peak power


Dynamometer peak torque

Only 3 studies21,26,27 evaluated peak torque, and all used slightly different outcome assessments. One study26 reported average peak torque across 30 isokinetic leg flexion/extension contractions; 1 study21 reported the sum of peak torque across 5 sets of 30 isokinetic leg flexion/extension contractions; and 1 study27 gave peak torque data for isokinetic leg extension but did not describe precisely how peak torque was determined. There was no difference between creatine and placebo for isokinetic leg flexion/extension peak torque using a standardized mean difference to account for variations in measurement of this outcome. Tests for heterogeneity were nonsignificant for this outcome (P= .19).

Adverse effects

Four studies commented on short-term adverse effects of creatine supplementation. Three studies17,23,28 found no difference between creatine and placebo. One study21 reported gastrointestinal upset, rash, or headache in 3 subjects taking creatine and no adverse effects in subjects taking placebo. None of these studies was designed to evaluate long-term adverse effects of creatine supplementation, and there were no reports of longer-term follow up.

Discussion

This is the first study to report quantitatively the effect of creatine supplementation on strength performance from meta-analysis of the existing literature. We found that oral creatine supplementation improves maximal resistance exercise performance in previously trained young men. There is insufficient evidence that creatine improves other measures of strength, such as cycle ergometry sprint peak power or isokinetic dynamometer peak torque, or that creatine improves strength in women or older individuals. The effect of creatine on endurance, submaximal exercise, or actual “on-field” athletic performance was not addressed.

Creatine’s ergogenic properties may result from allowing increased work during training and decreasing recovery time. If so, creatine must be combined with adjuvant training to increase strength and power. Only studies investigating maximal weight-lifting performance incorporated resistance-training programs specific to the outcome being measured. Three studies included weight training but investigated non–weight-lifting outcomes,23,24,27 and only 1 study24 found a benefit from creatine supplementation. It is unclear whether the lack of effect for non–weight-lifting outcomes means that creatine is not beneficial unless combined with specific adjuvant training or that creatine simply is not ergogenic for outcomes other than maximal weight lifted.

 

 

This meta-analysis has some limitations. Our definition of strength included only “pure strength” or “power” measurements to allow statistical comparisons between similar outcomes. Because muscle strength is related to muscle endurance, researchers may define strength differently. It is not obvious at what point an exercise becomes a test of endurance and not just strength, but there is a physiologic basis for believing that creatine supplementation would more markedly improve performance in maximal or shorter duration exercises (ie, requiring strength and not endurance). The inclusion criteria for this project were determined before study review and selection and were applied consistently across all studies.

The quality and design of identified studies was another limitation. Most were small and did not fully delineate their randomization or blinding strategies. Multiple variations in study protocols made combining results of different studies somewhat problematic. Unfortunately, meta-regression or subanalysis for variables such as concurrent resistance training, previous training level, age, and sex were not possible because too few studies evaluated these variables independently of one another. Almost all of the studies finding a benefit of creatine supplementation were in young, previously trained men who engaged in resistance training concomitantly with supplementation, and the outcome measured was maximal weight lifted. Those studies not finding a difference were generally of less highly trained or older individuals, did not include resistance training, and more often investigated outcomes other than maximal weight lifted. This meta-analysis identifies that it is impossible to conclude from the existing literature which combination of variables is necessary to see a benefit of creatine supplementation.

More information is needed on the safety of creatine supplementation. Although a recent review35 reported no significant short-term adverse effects, no adequate long-term studies have been conducted. Two retrospective trials36,37 reported no adverse effects from longer-term (up to 5 years) creatine supplementation;however, neither study was randomized, blinded, or controlled, and neither had sufficient statistical power to detect uncommon adverse effects. Additionally, the designs of these studies precluded the possibility of detecting serious adverse effects such as death or disability. There have been case reports of renal dysfunction due to creatine38-40 and, as of 1998, the Food and Drug Administration had received 32 adverse event reports including seizures, myopathy, rhabdomyolosis, cardiac arrhythmia, and death.41

Given the popularity of nutritional supplements among all levels of athletes, clinicians cannot avoid questions about the effectiveness and safety of creatine supplementation. This meta-analysis demonstrated that oral creatine does improve performance during maximal resistance exercises in young men. However, we found no benefit for outcomes other than maximal weight lifted, suggesting that creatine may not improve actual performance in more complex movements requiring strength, speed, and coordination of multiple muscle groups. Studies investigating the effect of creatine in actual athletic performance are lacking.

Several important questions remain to be answered about creatine. What are the effects for women and older individuals? Is resistance training necessary to see strength performance improvement? Are these improvements in strength accompanied by improved athletic performance? How long do the effects of creatine remain after discontinuing supplementation? Most importantly, what is the long-term safety profile of creatine? Without further research to answer these questions, we cannot support the use of creatine supplementation for performance enhancement despite evidence for a positive impact on some components of strength.

Drug therapy for prevention and treatment of postmenopausal osteoporosis

Drug (trade name)Indication and dosagePossible side effects (% of patients)Cost per month*
Calcium and vitamin D (generic,Tums,Citracal, and others)Prevention and treatment: 1200 –1500 mg/day calcium and 800 IU/day vitamin DNausea,dyspepsia (uncommon), constipation (10%)$5 (both)
Estrogen †(Premarin,Ogen,Estrace, Estraderm,and others)Prevention: 0.625 mg/day conjugated equine estrogen or the equivalent;0.3 mg/day may be effectiveNausea,breast tenderness, vaginal bleeding, mood alterations, headache, bloating$14 –$28
Alendronate (Fosamax)Prevention:5 mg/day or 35 mg/week Treatment:10 mg/day or 70 mg/wkNausea, dyspepsia, esophageal irritation$67
Risedronate (Actonel)Prevention and treatment: 5 mg/day or 35 mg/weekAbdominal pain, esophageal irritation$67
Raloxifene (Evista)Treatment: 60 mg/dayHot flashes (6%), leg cramps (3%)$70
Calcitonin nasal spray (Miacalcin)Treatment:200 IU/day (1 spray in 1 nostril per day)Rhinitis (5%), epistaxis, sinusitis$66
*Average wholesale cost to the pharmacy for 30 days of therapy; (Drug Topics Red Book. Montvale, NJ; Medical Economics Co., Inc, 2002.)
†Women with a uterus need to take a progestin such as medroxyprogesterone acetate (Provera $30/month, generic $9/month) or a combination estrogen/progestin product (Prempro $33/monh, FemHRT $26/month).

CORRECTIONS

On page 868 of the October issue a name was misspelled; the correct name is Brian S. Alper.

In the table appearing on page 877 of the October issue, the entry for Fosamax inadvertently combined prevention and treatment dosages. The corrected entry is shown below.

· Acknowledgments ·

 

 

This study was supported in part by a Faculty Development in Family Medicine Grant (No. 5D45 PE 55052-09) and a National Research Service Award Grant (No. 1T32 PE 10030-03) from the United States Department of Health and Human Services. The authors thank Craig Young, MD, who assisted in the conception of this project; Chris McLaughlin, who provided editorial assistance; and Veronica Ruleford, who assisted with the preparation of the manuscript.

KEY POINTS FOR CLINICIANS

  • Oral creatine supplementation combined with resistance training increases maximal weight young men can lift.
  • It is unknown whether this increase in strength translates into improvement in sports performance.
  • Evidence in the existing literature is insufficient to draw conclusions about the effect of creatine in women or older individuals.
  • Because no long-term studies have been performed on the safety of creatine supplementation, its use should not be universally recommended.

Creatine has gained widespread popularity during the past decade as a possible performance-enhancing agent among professional and recreational athletes. It is the most widely used performance-enhancing supplement among youth aged 10 to 17 years,1 with 15% to 30% of high school athletes2,3 and 48% of male Division I college athletes4 reporting creatine use. Considered a nutritional supplement, it is not regulated by the United States Food and Drug Administration nor is it banned by the International Olympic Committee or National Collegiate Athletic Association. Because of the widespread use of creatine, primary care providers must be knowledgeable about its effectiveness and safety.

Oral creatine monohydrate increases skeletal muscle creatine concentration by 16% to 50%,5-7 but whether it is an effective ergogenic aid remains controversial. Multiple studies have investigated this question, but many have been small, often including fewer than 10 subjects, and results have been conflicting. Several reviews8-14 have addressed the effectiveness of creatine, but there has not been a systematic and comprehensive meta-analysis to resolve the uncertainties in the literature or to quantify the magnitude of the effect of creatine. To evaluate whether oral creatine supplementation improves strength and power in healthy adults, and further to quantify the effect, we performed a meta-analysis of randomized and matched controlled trials investigating creatine supplementation and strength.

Methods

Search strategy

To identify possible studies for inclusion, 1 author (M.F.M.) searched the MEDLINE electronic database (1966–2000) using the terms “creatine supplementation” or “creatine” combined with “strength” or “power.” Another MEDLINE search (1966–2000) was independently conducted by another author (R.L.D.) using the term “creatine not kinase” combined with a previously published search strategy to comprehensively identify randomized clinical trials.15 We searched the Cochrane Controlled Trials Register using the term “creatine not kinase.” We manually reviewed bibliographies of identified studies, abstracts from American College of Sports Medicine annual meetings (1999 and 2000), and a reference list distributed by an expert on the subject at the annual meeting of the American Medical Society for Sports Medicine (2000). Titles and available abstracts were screened and relevant articles retrieved. An expert in the field was contacted for sources of unpublished data.

Inclusion and exclusion criteria

Two reviewers independently assessed articles for inclusion. A third reviewer was consulted to resolve discrepancies. We used the following inclusion criteria: (1) the articles reported results of randomized or matched placebo-controlled trials investigating the effect of oral creatine supplementation on strength or power with or without concomitant resistance training; (2) the study subjects were healthy men or women older than 16 years with or without previous athletic training; and (3) the studies were published in any language. Given the general disagreement regarding the time required for muscle creatine concentration to return to presupplementation levels after discontinuing oral creatine,16-18 studies using a crossover design were excluded from the statistical analysis unless data from the first arm, before crossover, could be abstracted or obtained from the original investigator. Outcomes were measures of strength or power of any muscle group, including maximal weight lifted; peak power achieved in maximal (sprint) cycle ergometry; and peak knee flexion/extension torque in isokinetic dynamometer testing. Measurements of endurance, such as time to fatigue on cycle ergometer and number of repetitions achieved in submaximal weight lifting, were excluded. For studies reporting outcome per kilogram of body weight, we contacted investigators to obtain absolute outcome values and excluded studies if uncorrected data were not received. We also excluded articles that evaluated outcomes not investigated in at least 2 other studies. Finally, if we could not extract data in a usable form, we contacted investigators to obtain adequate data.

Quality assessment

Two independent reviewers appraised articles to determine methodological quality with respect to risk of bias under the following categories: method of randomization, allocation concealment, blinding, similarity of study groups, withdrawals and dropouts, and intention-to-treat analysis. Each study that met inclusion criteria was given a quality score, with a maximum possible score of 10, using a tool adapted from the Cochrane Handbook.19 The quality assessment data are presented but were not used to exclude or rank any study.

 

 

Data abstraction and statistical analysis

Two independent reviewers abstracted data, and a third reviewer resolved differences. For studies investigating multiple sprints, data from the first sprint only were included in statistical analysis because the first sprint is when peak power achievement is expected. A weighted mean difference (WMD) between creatine and placebo groups was calculated for each outcome using Review Manager 4.1 software (developed by The Cochrane Collaboration). A fixed effects model was used unless statistical heterogeneity was significant (P < .05), in which case a random effects model was used. Subanalyses were planned on several factors that were anticipated to be sources for variation, including (1) dose and duration of creatine administration, (2) concomitant resistance training, (3) different baseline level of physical training, (4) age, and (5) sex.

Results

Description of studies

After reviewing titles and available abstracts of more than 500 articles, we retrieved 66 potentially relevant studies, 16 of which met inclusion criteria for the analysis.17,20-34 Characteristics of these studies are summarized in the Table. Included studies represented 20 discrete samples and 414 subjects. Two studies20,21 evaluated creatine supplementation in men older than 60 years, whereas all the others studied younger subjects (range, 18–36 years). Only 1 study included women.17 Creatine dosages were similar across included studies (typically 20 g/d for the first 4–7 days of supplementation and 5 g/d thereafter). Studies that evaluated maximal weight lifting performance were more likely to include adjuvant resistance training programs in their protocols than those that evaluated cycle ergometry sprint or isokinetic dynamometer performance. None included cycle ergometry training.

TABLE

Characteristics of included studies

ReferenceNo. subjects (sex)Dose per day and durationTraining levelWeight training during study?Outcome measurementQuality score (out of 10)Comparability of creatine & placebo groups at baseline*
Barnett 199617 (M)280 mg/kg ×4 dActiveNoCP2.5+
Cooke 199512 (M)20 g ×5 dUntrainedNoCP2.5+
Cooke 199780 (M)20 g ×5 dTrained or activeNoCP2+++
Dawson 199518 (M), 22(M)20 g ×5 dActiveNoCP3+++
Jones 199916 (M)20 g ×5 d then 5 g ×10 wkTrainedYesCM3+++
Stone 199920 (M)0.22 g/kg ×35 dTrainedYesCM, BP, S4.5+++
Kelly 199818 (M)20 g ×5 d then 5 g ×26 dTrainedYes3BP2
Noonan 199839 (M)20 g ×5 d then 300 mg/kg ×8 wkTrainedYesBP5.5+++
Peeters 199935 (M)20 g ×3 d then 10 g ×6 wkTrainedYesBP3+++
Vandenberghe 199719 (F)20 g ×4 d then 5 g ×10 wkUntrainedYesBP, S5+++
Pearson 199916 (M)5 g ×10 wkTrainedYesBP, S, PT3+++
Volek 199919 (M)25 g ×7 d then 5 g ×12 wkTrainedYesBP, S4.5+++
Gilliam 200023 (M)20 g ×5 dActive but untrainedNoPT2.5+
Rawson 1999§20 (M)20 g ×10 d then 4 g ×20 dUntrainedNoAF, PT4.5+++
Rawson 2000§17 (M)20 g ×5 dUntrainedNoAF3.5+++
Becque 200023 (M)20 g ×5 d then 2 g ×6 wkTrainedYesAF5+
*Comparability between groups was assessed for age, anthropomorphic measurements, and strength outcomes. +++ = similar for all 3 characteristics; + = similar for strength outcome measurements; – = not comparable at baseline for strength outcome.
Four protocols with 20 subjects each evaluating the same strength outcome measurement reported in Cooke 1997.
Two separate experiments reported in Dawson 1995.
§Included subjects > 60 years old; in all others subjects were < 36 years old.
AF, 1 repetition maximum arm flexor strength; BP, 1 repetition maximum bench press strength; 3BP, 3 repetition maximum bench press strength; CM, cycle ergometer mean peak power; CP, cycle ergometer peak power; PT, isokinetic leg flexion/extension peak torque; S, 1 repetition maximum squat strength.

Methodological quality of included studies

The methodological quality of studies was generally low (Table). The mean quality score was 3.5 ± 1.2 (mean ± SD) out of a possible 10 (range, 2–5.5). None of the studies identified the method of randomization used or specifically reported an intention-to-treat analysis. None specifically reported masking of outcome assessment. In general, these significant flaws in study design would tend to result in overestimation of the benefit of creatine supplementation.

Absolute strength

When 1- to 3-repetition maximum bench press strength measurements were statistically combined (they were homogeneous), the creatine supplementation group showed an absolute strength increase of 6.85 kg (95% confidence interval [CI], 5.24–8.47; n = 143) lifted per repetition greater than that seen with placebo alone (Figure 1). There was no additional advantage in strength performance after 9 to 12 weeks of supplementation (WMD = 6.6 kg; 95% CI, 3.5–9.5) compared with 4 to 8 weeks of supplementation (WMD = 6.6 kg; 95% CI, 4.8–8.4). Subanalysis for an interaction with resistance training, previous training level, age, or sex was not possible because all studies measuring bench press strength except one17 investigated creatine supple mentation in previously trained young men who continued resistance training during supplementation. The 1 study in previously sedentary young women17 did find a trend toward increased bench press strength, although independently this change was not statistically significant.

 

 

There was no significant difference in 1-repetition maximum arm flexor strength with creatine supplementation (WMD = 1.53 kg; 95% CI, –1.07 to 4.13; n = 60; Figure 2). However, 2 trials20,21 of the 3 evaluating this outcome studied subjects older than 60 years and did not employ adjuvant weight training programs. The study that incorporated resistance training and evaluated younger subjects22 found a modest (29.9% vs 16.5%) improvement in 1-repetition maximum arm flexor strength with creatine compared with placebo.

For 1-repetition maximum squat, creatine supplementation resulted in a strength increase of 9.76 kg (95% CI, 3.37–16.15; n = 74) greater than that of placebo (Figure 3). There was no advantage to longer-term supplementation (10.9 kg more than placebo [95% CI, 3.4–18.4] for 5–6 weeks compared with 10.4 kg [95% CI, 3.5–17.2] for 10–12 weeks). Again, in all but 1 study17 measuring squat performance, subjects were previously trained young men engaging in adjuvant resistance training programs, so subanalysis for other variables was not possible. For previously sedentary women, Vandenberghe et al17 found no difference at 5 weeks, but they did find a significant improvement in 1-repetition maximum squat performance with creatine supplementation at 10 weeks. Tests for heterogeneity were nonsignificant for all absolute strength variables.

To evaluate for publication bias, we examined funnel plots of each of the 3 absolute strength outcomes (bench press, arm flexor, and squat exercises). No evidence of publication bias was demonstrated. Figure W1 (available on the JFP Web site: http://www.jfponline.com) depicts a composite funnel plot of all 3 outcomes using a standardized mean difference to allow comparison between these 3 different outcomes.

FIGURE 1 Studies assessing 1- to 3-repetition maximum bench press strength



FIGURE 2 Studies assessing 1-repetition maximum arm flexor strength



FIGURE 3 Studies assessing 1-repetition maximum squat strength


Cycle ergometer peak power

Creatine supplementation had no effect on peak power production during cycle ergometry sprint (Figure 4). Results among studies were widely variable (test for heterogeneity P= .035), so a random effects model was used to pool data. The summary weighted mean difference of 16.79 W (95% CI, –13.26 to 46.84; n = 149) was insignificant, both statistically (test for overall effect P= .3) and clinically, because this represents approximately a 1% change greater than baseline. Two studies23,24 looked at mean peak power across a series of 15-to 30-second sprints and found inconsistent results, with a summary weighted mean difference of 68.61 W (95% CI, –85.74 to 222.97; n = 36). Of note, for the 2 studies24,25 that demonstrated improved performance with creatine, the difference was accentuated by an unexplained but pronounced worsening of performance after supplementation in the placebo groups.

FIGURE 4 Studies assessing cycle ergometer sprint peak power


Dynamometer peak torque

Only 3 studies21,26,27 evaluated peak torque, and all used slightly different outcome assessments. One study26 reported average peak torque across 30 isokinetic leg flexion/extension contractions; 1 study21 reported the sum of peak torque across 5 sets of 30 isokinetic leg flexion/extension contractions; and 1 study27 gave peak torque data for isokinetic leg extension but did not describe precisely how peak torque was determined. There was no difference between creatine and placebo for isokinetic leg flexion/extension peak torque using a standardized mean difference to account for variations in measurement of this outcome. Tests for heterogeneity were nonsignificant for this outcome (P= .19).

Adverse effects

Four studies commented on short-term adverse effects of creatine supplementation. Three studies17,23,28 found no difference between creatine and placebo. One study21 reported gastrointestinal upset, rash, or headache in 3 subjects taking creatine and no adverse effects in subjects taking placebo. None of these studies was designed to evaluate long-term adverse effects of creatine supplementation, and there were no reports of longer-term follow up.

Discussion

This is the first study to report quantitatively the effect of creatine supplementation on strength performance from meta-analysis of the existing literature. We found that oral creatine supplementation improves maximal resistance exercise performance in previously trained young men. There is insufficient evidence that creatine improves other measures of strength, such as cycle ergometry sprint peak power or isokinetic dynamometer peak torque, or that creatine improves strength in women or older individuals. The effect of creatine on endurance, submaximal exercise, or actual “on-field” athletic performance was not addressed.

Creatine’s ergogenic properties may result from allowing increased work during training and decreasing recovery time. If so, creatine must be combined with adjuvant training to increase strength and power. Only studies investigating maximal weight-lifting performance incorporated resistance-training programs specific to the outcome being measured. Three studies included weight training but investigated non–weight-lifting outcomes,23,24,27 and only 1 study24 found a benefit from creatine supplementation. It is unclear whether the lack of effect for non–weight-lifting outcomes means that creatine is not beneficial unless combined with specific adjuvant training or that creatine simply is not ergogenic for outcomes other than maximal weight lifted.

 

 

This meta-analysis has some limitations. Our definition of strength included only “pure strength” or “power” measurements to allow statistical comparisons between similar outcomes. Because muscle strength is related to muscle endurance, researchers may define strength differently. It is not obvious at what point an exercise becomes a test of endurance and not just strength, but there is a physiologic basis for believing that creatine supplementation would more markedly improve performance in maximal or shorter duration exercises (ie, requiring strength and not endurance). The inclusion criteria for this project were determined before study review and selection and were applied consistently across all studies.

The quality and design of identified studies was another limitation. Most were small and did not fully delineate their randomization or blinding strategies. Multiple variations in study protocols made combining results of different studies somewhat problematic. Unfortunately, meta-regression or subanalysis for variables such as concurrent resistance training, previous training level, age, and sex were not possible because too few studies evaluated these variables independently of one another. Almost all of the studies finding a benefit of creatine supplementation were in young, previously trained men who engaged in resistance training concomitantly with supplementation, and the outcome measured was maximal weight lifted. Those studies not finding a difference were generally of less highly trained or older individuals, did not include resistance training, and more often investigated outcomes other than maximal weight lifted. This meta-analysis identifies that it is impossible to conclude from the existing literature which combination of variables is necessary to see a benefit of creatine supplementation.

More information is needed on the safety of creatine supplementation. Although a recent review35 reported no significant short-term adverse effects, no adequate long-term studies have been conducted. Two retrospective trials36,37 reported no adverse effects from longer-term (up to 5 years) creatine supplementation;however, neither study was randomized, blinded, or controlled, and neither had sufficient statistical power to detect uncommon adverse effects. Additionally, the designs of these studies precluded the possibility of detecting serious adverse effects such as death or disability. There have been case reports of renal dysfunction due to creatine38-40 and, as of 1998, the Food and Drug Administration had received 32 adverse event reports including seizures, myopathy, rhabdomyolosis, cardiac arrhythmia, and death.41

Given the popularity of nutritional supplements among all levels of athletes, clinicians cannot avoid questions about the effectiveness and safety of creatine supplementation. This meta-analysis demonstrated that oral creatine does improve performance during maximal resistance exercises in young men. However, we found no benefit for outcomes other than maximal weight lifted, suggesting that creatine may not improve actual performance in more complex movements requiring strength, speed, and coordination of multiple muscle groups. Studies investigating the effect of creatine in actual athletic performance are lacking.

Several important questions remain to be answered about creatine. What are the effects for women and older individuals? Is resistance training necessary to see strength performance improvement? Are these improvements in strength accompanied by improved athletic performance? How long do the effects of creatine remain after discontinuing supplementation? Most importantly, what is the long-term safety profile of creatine? Without further research to answer these questions, we cannot support the use of creatine supplementation for performance enhancement despite evidence for a positive impact on some components of strength.

Drug therapy for prevention and treatment of postmenopausal osteoporosis

Drug (trade name)Indication and dosagePossible side effects (% of patients)Cost per month*
Calcium and vitamin D (generic,Tums,Citracal, and others)Prevention and treatment: 1200 –1500 mg/day calcium and 800 IU/day vitamin DNausea,dyspepsia (uncommon), constipation (10%)$5 (both)
Estrogen †(Premarin,Ogen,Estrace, Estraderm,and others)Prevention: 0.625 mg/day conjugated equine estrogen or the equivalent;0.3 mg/day may be effectiveNausea,breast tenderness, vaginal bleeding, mood alterations, headache, bloating$14 –$28
Alendronate (Fosamax)Prevention:5 mg/day or 35 mg/week Treatment:10 mg/day or 70 mg/wkNausea, dyspepsia, esophageal irritation$67
Risedronate (Actonel)Prevention and treatment: 5 mg/day or 35 mg/weekAbdominal pain, esophageal irritation$67
Raloxifene (Evista)Treatment: 60 mg/dayHot flashes (6%), leg cramps (3%)$70
Calcitonin nasal spray (Miacalcin)Treatment:200 IU/day (1 spray in 1 nostril per day)Rhinitis (5%), epistaxis, sinusitis$66
*Average wholesale cost to the pharmacy for 30 days of therapy; (Drug Topics Red Book. Montvale, NJ; Medical Economics Co., Inc, 2002.)
†Women with a uterus need to take a progestin such as medroxyprogesterone acetate (Provera $30/month, generic $9/month) or a combination estrogen/progestin product (Prempro $33/monh, FemHRT $26/month).

CORRECTIONS

On page 868 of the October issue a name was misspelled; the correct name is Brian S. Alper.

In the table appearing on page 877 of the October issue, the entry for Fosamax inadvertently combined prevention and treatment dosages. The corrected entry is shown below.

· Acknowledgments ·

 

 

This study was supported in part by a Faculty Development in Family Medicine Grant (No. 5D45 PE 55052-09) and a National Research Service Award Grant (No. 1T32 PE 10030-03) from the United States Department of Health and Human Services. The authors thank Craig Young, MD, who assisted in the conception of this project; Chris McLaughlin, who provided editorial assistance; and Veronica Ruleford, who assisted with the preparation of the manuscript.

References

1. USA Today. Survey: More than 1 million kids use sports supplements. USA Today. August 28, 2001. Available at: www.usatoday.com/news/nation/2001/08/28/youth-supplements.htm. Accessed October 8, 2002.

2. McGuine TA, Sullivan JC, Bernhardt DT. Creatine supplementation in high school football players. Clin J Sport Med 2001;11(4):247-53.

3. Ray TR, Eck JC, Covington LA, Murphy RB, Williams R, Knudtson J. Use of oral creatine as an ergogenic aid for increased sports performance: perceptions of adolescent athletes. South Med J 2001;94(6):608-12.

4. LaBotz M, Smith BW. Creatine supplement use in an NCAA Division I athletic program. Clin J Sport Med 1999;9(3):167-9.

5. Harris RC, Soderlund K, Hultman E. Elevation of creatine in resting and exercised muscle of normal subjects by creatine supplementation. Clin Sci 1992;83:367-74.

6. Vandenberghe K, Van Hecke P, Van Leemputte M, Vanstapel F, Hespel P. Phosphocreatine resynthesis is not affected by creatine loading. Med Sci Sports Exerc 1999;31(2):236-42.

7. Hultman E, Soderlund K, Timmons JA, Cederblad G, Greenhaff PL. Muscle creatine loading in men. J Appl Physiol 1996;812:32-7.

8. Terjung RL, Clarkson P, Eichner ER, Greenhaff PL, Hespel PJ, Israel RG, et al. American College of Sports Medicine roundtable. The physiological and health effects of oral creatine supplementation. Med Sci Sports Exerc 2000;32(3):706-17.

9. Kreider RB. Dietary supplements and the promotion of muscle growth with resistance exercise. Sports Med 1999;27(2):97-110.

10. Mujika I, Padilla S. Creatine supplementation as an ergogenic aid for sports performance in highly trained athletes: a critical review. Int J Sports Med 1997;18(7):491-6.

11. Juhn MS, Tarnopolsky M. Oral creatine supplementation and athletic performance: a critical review. Clin J Sport Med 1998;8(4):286-97.

12. Volek JS, Kraemer WJ. Creatine supplementation: its effect on human muscular performance and body composition. J Strength Cond Res 1996;10(3):200-10.

13. Maughan RJ. Creatine supplementation and exercise performance. Int J Sport Nutr 1995;5:94-101.

14. Kraemer WJ, Volek JS. Creatine supplementation. Its role in human performance. Clin Sports Med 1999;18(3):651-66.

15. Dickersin K, Scherer R, Lefebvre C. Systematic reviews: identifying relevant studies for systematic reviews. Br Med J 1994;309(6964):1286-91.

16. Febbraio MA, Flanagan TR, Snow RJ, Zhao S, Carey MF. Effect of creatine supplementation on intramuscular TCr, metabolism and performance during intermittent, supramaximal exercise in humans. Acta Physiol Scand 1995;155(4):387-95.

17. Vandenberghe K, Goris M, Van Hecke P, Van Leemputte M, Vangerven L, Hespel P. Long-term creatine intake is beneficial to muscle performance during resistance training. J Appl Physiol 1997;83:2055-63.

18. Greenhaff PL. Creatine and its application as an ergogenic aid. Int J Sport Nutr 1995;5(suppl):S100-10.

19. The Cochrane Collaboration. The Cochrane Handbook (Online). Available at: http://www.cochrane.dk/cochrane/handbook/hbook CONTENTS__6_ASSESSMENT_OF_STUDY_.htm. Accessed June 2001.

20. Rawson ES, Clarkson PM. Acute creatine supplementation in older men. Int J Sports Med 2000;21(1):71-5.

21. Rawson ES, Wehnert ML, Clarkson PM. Effects of 30 days of creatine ingestion in older men. Eur J Appl Physiol 1999;80(2):139-44.

22. Becque MD, Lochmann JD, Melrose DR. Effects of oral creatine supplementation on muscular strength and body composition. Med Sci Sports Exerc 2000;32(3):654-8.

23. Stone MH, Sanborn K, Smith LL, O’Bryant HS, Hoke T, Utter AC, et al. Effects of in-season (5 weeks) creatine and pyruvate supplementation on anaerobic performance and body composition in American football players. Int J Sport Nutr 1999;9(2):146-65.

24. Jones AM, Atter T, Georg KP. Oral creatine supplementation improves multiple sprint performance in elite ice-hockey players. J Sports Med Phys Fitness 1999;39(3):189-96.

25. Dawson B, Cutler M, Moody A, Lawrence S, Goodman C, Randall N. Effects of oral creatine loading on single and repeated maximal short sprints. Aust J Sci Med Sport 1995;27(3):56-61.

26. Gilliam JD, Hohzorn C, Martin D, Trimble MH. Effect of oral creatine supplementation on isokinetic torque production. Med Sci Sports Exerc 2000;32(5):993-6.

27. Pearson DR, Hamby DG, Russel W, Harris T. Long-term effects of creatine monohydrate on strength and power. J Strength Cond Res 1999;13(3):187-92.

28. Volek JS, Duncan ND, Mazzetti SA, Staron RS, Putukian M, Gomez AL, et al. Performance and muscle fiber adaptations to creatine supplementation and heavy resistance training. Med Sci Sports Exerc 1999;31(8):1147-56.

29. Barnett C, Hinds M, Jenkins DG. Effects of oral creatine supplementation on multiple sprint cycle performance. Aust J Sci Med Sport 1996;28(1):35-9.

30. Cooke WH, Grandjean PW, Barnes WS. Effect of oral creatine supplementation on power output and fatigue during bicycle ergometry. J Appl Physiol 1995;78:670-3.

31. Cooke WH, Barnes WS. The influence of recovery duration on high-intensity exercise performance after oral creatine supplementation. Can J Appl Physiol 1997;22(5):454-67.

32. Kelly VG, Jenkins DG. Effect of oral creatine supplementation on near-maximal strength and repeated sets of high-intensity bench press exercise. J Strength Cond Res 1998;12(2):109-15.

33. Noonan D, Berg K, Latin RW, Wagner JC, Reimers K. Effects of varying dosages of oral creatine relative to fat free body mass on strength and body composition. J Strength Cond Res 1998;12(2):104-8.

34. Peeters BM, Lantz CD, Mayhew JL. Effect of oral creatine monohydrate and creatine phosphate supplementation on maximal strength indices, body composition, and blood pressure. J Strength Cond Res 1999;13(1):3-9.

35. Juhn MS, Tarnopolsky M. Potential side effects of oral creatine supplementation: a critical review. Clin J Sport Med 1998;8(4):298-304.

36. Poortmans JR, Francaux M. Long-term oral creatine supplementation does not impair renal function in healthy athletes. Med Sci Sports Exerc 1999;31(8):1108-10.

37. Schilling BK, Stone MH, Utter A, Kearney JT, Johnson M, Coglianese R, et al. Creatine supplementation and health variables: a retrospective study. Med Sci Sports Exerc 2001;33(2):183-8.

38. Pritchard NR, Kalra PA. Renal dysfunction accompanying oral creatine supplementation [letter]. Lancet 1998;351:125-23.

39. Koshy KM, Griswold E, Schneeberger EE. Interstitial nephritis in a patient taking creatine [letter]. New Engl J Med 1999;340:814-5.

40. Kuehl KS, Goldberg L, Elliot D. Renal insufficiency after creatine supplementation in a college football athlete [abstract]. Med Sci Sports Exerc 1998;30(suppl 5):S235.-

41. United States Food and Drug Administration The Special Nutritionals Adverse Event Monitoring System. Available at: http://vm.cfsan.fda.gov/cgibin/aems.cgi?QUERY= creatine&STYPE =EXACT. Accessed March 3, 2001.

References

1. USA Today. Survey: More than 1 million kids use sports supplements. USA Today. August 28, 2001. Available at: www.usatoday.com/news/nation/2001/08/28/youth-supplements.htm. Accessed October 8, 2002.

2. McGuine TA, Sullivan JC, Bernhardt DT. Creatine supplementation in high school football players. Clin J Sport Med 2001;11(4):247-53.

3. Ray TR, Eck JC, Covington LA, Murphy RB, Williams R, Knudtson J. Use of oral creatine as an ergogenic aid for increased sports performance: perceptions of adolescent athletes. South Med J 2001;94(6):608-12.

4. LaBotz M, Smith BW. Creatine supplement use in an NCAA Division I athletic program. Clin J Sport Med 1999;9(3):167-9.

5. Harris RC, Soderlund K, Hultman E. Elevation of creatine in resting and exercised muscle of normal subjects by creatine supplementation. Clin Sci 1992;83:367-74.

6. Vandenberghe K, Van Hecke P, Van Leemputte M, Vanstapel F, Hespel P. Phosphocreatine resynthesis is not affected by creatine loading. Med Sci Sports Exerc 1999;31(2):236-42.

7. Hultman E, Soderlund K, Timmons JA, Cederblad G, Greenhaff PL. Muscle creatine loading in men. J Appl Physiol 1996;812:32-7.

8. Terjung RL, Clarkson P, Eichner ER, Greenhaff PL, Hespel PJ, Israel RG, et al. American College of Sports Medicine roundtable. The physiological and health effects of oral creatine supplementation. Med Sci Sports Exerc 2000;32(3):706-17.

9. Kreider RB. Dietary supplements and the promotion of muscle growth with resistance exercise. Sports Med 1999;27(2):97-110.

10. Mujika I, Padilla S. Creatine supplementation as an ergogenic aid for sports performance in highly trained athletes: a critical review. Int J Sports Med 1997;18(7):491-6.

11. Juhn MS, Tarnopolsky M. Oral creatine supplementation and athletic performance: a critical review. Clin J Sport Med 1998;8(4):286-97.

12. Volek JS, Kraemer WJ. Creatine supplementation: its effect on human muscular performance and body composition. J Strength Cond Res 1996;10(3):200-10.

13. Maughan RJ. Creatine supplementation and exercise performance. Int J Sport Nutr 1995;5:94-101.

14. Kraemer WJ, Volek JS. Creatine supplementation. Its role in human performance. Clin Sports Med 1999;18(3):651-66.

15. Dickersin K, Scherer R, Lefebvre C. Systematic reviews: identifying relevant studies for systematic reviews. Br Med J 1994;309(6964):1286-91.

16. Febbraio MA, Flanagan TR, Snow RJ, Zhao S, Carey MF. Effect of creatine supplementation on intramuscular TCr, metabolism and performance during intermittent, supramaximal exercise in humans. Acta Physiol Scand 1995;155(4):387-95.

17. Vandenberghe K, Goris M, Van Hecke P, Van Leemputte M, Vangerven L, Hespel P. Long-term creatine intake is beneficial to muscle performance during resistance training. J Appl Physiol 1997;83:2055-63.

18. Greenhaff PL. Creatine and its application as an ergogenic aid. Int J Sport Nutr 1995;5(suppl):S100-10.

19. The Cochrane Collaboration. The Cochrane Handbook (Online). Available at: http://www.cochrane.dk/cochrane/handbook/hbook CONTENTS__6_ASSESSMENT_OF_STUDY_.htm. Accessed June 2001.

20. Rawson ES, Clarkson PM. Acute creatine supplementation in older men. Int J Sports Med 2000;21(1):71-5.

21. Rawson ES, Wehnert ML, Clarkson PM. Effects of 30 days of creatine ingestion in older men. Eur J Appl Physiol 1999;80(2):139-44.

22. Becque MD, Lochmann JD, Melrose DR. Effects of oral creatine supplementation on muscular strength and body composition. Med Sci Sports Exerc 2000;32(3):654-8.

23. Stone MH, Sanborn K, Smith LL, O’Bryant HS, Hoke T, Utter AC, et al. Effects of in-season (5 weeks) creatine and pyruvate supplementation on anaerobic performance and body composition in American football players. Int J Sport Nutr 1999;9(2):146-65.

24. Jones AM, Atter T, Georg KP. Oral creatine supplementation improves multiple sprint performance in elite ice-hockey players. J Sports Med Phys Fitness 1999;39(3):189-96.

25. Dawson B, Cutler M, Moody A, Lawrence S, Goodman C, Randall N. Effects of oral creatine loading on single and repeated maximal short sprints. Aust J Sci Med Sport 1995;27(3):56-61.

26. Gilliam JD, Hohzorn C, Martin D, Trimble MH. Effect of oral creatine supplementation on isokinetic torque production. Med Sci Sports Exerc 2000;32(5):993-6.

27. Pearson DR, Hamby DG, Russel W, Harris T. Long-term effects of creatine monohydrate on strength and power. J Strength Cond Res 1999;13(3):187-92.

28. Volek JS, Duncan ND, Mazzetti SA, Staron RS, Putukian M, Gomez AL, et al. Performance and muscle fiber adaptations to creatine supplementation and heavy resistance training. Med Sci Sports Exerc 1999;31(8):1147-56.

29. Barnett C, Hinds M, Jenkins DG. Effects of oral creatine supplementation on multiple sprint cycle performance. Aust J Sci Med Sport 1996;28(1):35-9.

30. Cooke WH, Grandjean PW, Barnes WS. Effect of oral creatine supplementation on power output and fatigue during bicycle ergometry. J Appl Physiol 1995;78:670-3.

31. Cooke WH, Barnes WS. The influence of recovery duration on high-intensity exercise performance after oral creatine supplementation. Can J Appl Physiol 1997;22(5):454-67.

32. Kelly VG, Jenkins DG. Effect of oral creatine supplementation on near-maximal strength and repeated sets of high-intensity bench press exercise. J Strength Cond Res 1998;12(2):109-15.

33. Noonan D, Berg K, Latin RW, Wagner JC, Reimers K. Effects of varying dosages of oral creatine relative to fat free body mass on strength and body composition. J Strength Cond Res 1998;12(2):104-8.

34. Peeters BM, Lantz CD, Mayhew JL. Effect of oral creatine monohydrate and creatine phosphate supplementation on maximal strength indices, body composition, and blood pressure. J Strength Cond Res 1999;13(1):3-9.

35. Juhn MS, Tarnopolsky M. Potential side effects of oral creatine supplementation: a critical review. Clin J Sport Med 1998;8(4):298-304.

36. Poortmans JR, Francaux M. Long-term oral creatine supplementation does not impair renal function in healthy athletes. Med Sci Sports Exerc 1999;31(8):1108-10.

37. Schilling BK, Stone MH, Utter A, Kearney JT, Johnson M, Coglianese R, et al. Creatine supplementation and health variables: a retrospective study. Med Sci Sports Exerc 2001;33(2):183-8.

38. Pritchard NR, Kalra PA. Renal dysfunction accompanying oral creatine supplementation [letter]. Lancet 1998;351:125-23.

39. Koshy KM, Griswold E, Schneeberger EE. Interstitial nephritis in a patient taking creatine [letter]. New Engl J Med 1999;340:814-5.

40. Kuehl KS, Goldberg L, Elliot D. Renal insufficiency after creatine supplementation in a college football athlete [abstract]. Med Sci Sports Exerc 1998;30(suppl 5):S235.-

41. United States Food and Drug Administration The Special Nutritionals Adverse Event Monitoring System. Available at: http://vm.cfsan.fda.gov/cgibin/aems.cgi?QUERY= creatine&STYPE =EXACT. Accessed March 3, 2001.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
945-952
Page Number
945-952
Publications
Publications
Article Type
Display Headline
Does oral creatine supplementation improve strength? A meta-analysis
Display Headline
Does oral creatine supplementation improve strength? A meta-analysis
Legacy Keywords
,Creatinedietary supplementsmeta-analysis. (J Fam Pract 2002; 51:945–952)
Legacy Keywords
,Creatinedietary supplementsmeta-analysis. (J Fam Pract 2002; 51:945–952)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Validating an instrument for selecting interventions to change physician practice patterns

Article Type
Changed
Mon, 01/14/2019 - 10:57
Display Headline
Validating an instrument for selecting interventions to change physician practice patterns

KEY POINTS FOR CLINICIANS

  • One size probably does not fit all when bringing physicians new information that might change their practice.
  • Physicians differ measurably in what they consider credible sources of information, the weight they assign to practical concerns, and their willingness to diverge from group norms in practice.
  • Interventions that bring new knowledge into practice can be tailored to physicians’ perspectives. Further research may show this approach to be more useful to physicians and more likely to succeed than current approaches.

We previously proposed a theoretical framework for selecting the most effective strategies for changing physicians’ practice patterns.1 This framework called for classifying physicians into 4 categories based on how they respond to new information about the effectiveness of clinical practices, then selecting the strategy best suited to each physician’s response style. In this paper we describe the development and validation of a psychometric instrument to classify physicians into the 4 categories. This is one more element in our ongoing effort to answer, rigorously and specifically, basic questions about the adoption of evidence-based practices; for example, how can we increase physicians’ use of proven interventions, such as β-blockers after myocardial infarction or tight blood pressure control for patients with type 2 diabetes? How can we reduce physicians’ use of disproved therapies, such as oral β-agonist tocolytics for preterm labor or antibiotics for viral illnesses?

The literature is rife with examples of singlemode and multimode studies using educational interventions, positive and negative incentives, group and individualized feedback, sanctions, regulations, academic detailing, and patient-demand interventions to bring about changes in physician practice. 2-5 Advocates of these approaches cite published examples of their success in changing clinical practices; in all cases, however, published and unpublished instances of failure exist as well. The lack of a consistent pattern of success or failure has led to a growing recognition that no single strategy will ever be a “magic bullet”5 ; therefore, the selection of practice change strategies must be based on specific situations and settings.6-8 However, it is still not known what characteristics of the setting matter most and which approach will work in a specific setting and situation.

We believe that one key factor in selecting effective strategies is the audience. Businesses learned long ago that market segmentation, in which products are advertised differently to people who have different needs, values, and views, is crucial to success in sales. Similarly, our theoretical framework posits that selecting the most appropriate change strategy requires first classifying clinicians according to how they respond to new information about the effectiveness of clinical strategies. We distinguish 4 classification categories: seekers, receptives, traditionalists, and pragmatists.1

Physician categories and underlying factors

Seekers consider systematically gathered, published data (rather than personal experience or authority) the most reliable source of knowledge. They critically appraise the data themselves and value what they view as correct practice over pragmatic concerns, such as seeing patients quickly and efficiently. Most notably, seekers make evidence-driven practice changes even when the changes are out of step with local medical culture.

Like seekers, receptives are evidence-oriented, but they generally rely on the judgment of respected others for critical appraisal of new information. Receptives are likely to act on information from a scientifically and clinically sound source. Although they do not always hew to local medical culture, receptives generally depart from local practice only when the evidence is sufficiently compelling.

Traditionalists view clinical experience and authority as the most reliable basis for practice, and therefore rely on personal experience and the judgment and teachings of clinical leaders for guidance. The term “traditionalist” is not meant to suggest that the practitioner follows older, more traditional medical practices; rather, it relates to the physician’s traditional view of clinical experience as the ultimate basis of knowledge. The traditionalist may be an early adopter of new technologies if a respected clinical leader suggests them. Traditionalists are not greatly concerned with how their practices fit local medical culture, and are more concerned with practicing correctly than efficiently.

Pragmatists focus on the day-to-day demands of a busy practice. Acutely aware of the many competing claims on their scarce time from patients, colleagues, employees, insurers, and hospitals, pragmatists evaluate calls to change their practice in terms of anticipated impact on time, workload, patient flow, and patient satisfaction rather than scientific validity or congruence with local medical culture. Pragmatists may view either evidence or experience as the most reliable foundation for practice, and may be willing to diverge from local norms when doing so is not disruptive; their primary concern, however, is efficiency.

 

 

As we emphasized in our original formulation, our categorizations refer to trait, not state; that is, the categories describe general response tendencies, not moment-to-moment clinical decision making. It is incorrect to say that a physician responds as a seeker in one instance and a pragmatist in another, or that the same person shows traditionalist responses to one topic and receptive responses to another. (Most actual clinical behavior is, of necessity, pragmatic most of the time.)

We hypothesize that these physician response styles represent various combinations of 3 underlying factors:

  1. Extent to which scientific evidence, rather than clinical experience and authority, is perceived as the best source of knowledge about good practice (evidence vs experience).
  2. Degree of comfort with clinical practices that are out of step with the local community’s practices or the recommendations of leaders (nonconformity).
  3. Importance attached to managing workload and patient flow while maintaining general patient satisfaction (practicality).

Not all possible combinations of the 3 factors exist, and some combinations are behaviorally indistinguishable—that is, they produce the same response style. The manner in which these 3 factors define the 4 types of physicians is shown in Table 1. In this paper we report the results of 3 iterations in the development of a psychometric instrument to measure these factors.

TABLE 1

Hypothesized factor loading by physician type

Physician typeEvidence vs experienceNonconformityPracticality
SeekersExtreme evidence endHighNot high
ReceptivesToward evidence endModerateNot high
TraditionalistsToward experience endVariableNot high
PragmatistsVariableVariableHigh

Methods

To test the hypothesized relationship between physician category and response to practice change interventions, we needed to develop an instrument for assessing physicians on the underlying 3 attributes so that, based on those attributes, we could subsequently place them in the 4 information response categories. We created several questions addressing each of our hypothesized factors and refined them for clarity. The question pool was further refined in consultation with active practitioners serving on commissions and committees of the American Academy of Family Practice, who represented a variety of nonacademic perspectives on clinical practice and learning. An 18-item psychometric instrument was prepared and pilot tested on a convenience sample of 112 family physicians in Iowa and Michigan who were participating in other research projects.

The results of that pilot test were used to prepare a second version, which was tested with 328 physicians at a regional CME conference and 889 physicians with the national Veterans Health Administration system for a total of 1217. The sample comprised 234 family physicians; 848 internists; 29 obstetrician/gynecologists; 27 general practitioners; 24 emergency physicians; and a small number of general surgeons, pediatricians, psychiatrists, and other specialists. The results from the second version guided the preparation of the third (Figure), which was tested on a sample of 64 family physicians at 2 CME events.

Because of the free-choice manner in which the instruments were distributed, it was not possible to calculate an exact response rate; however, the total number of participants equaled slightly more than 75% of the total number of instruments distributed.

To refine the instrument at each iteration, we began with a factor analysis using the principalcomponents method and orthogonal varimax rotation. The eigenvalues from the factor analysis were used to determine the number of factors in the optimum solution. The instrument’s questions were assigned to these factors based on the factor on which they loaded most heavily in the rotated solution. Cronbach α was calculated for each factor scale. At each iteration, questions loading less than 0.35 on all factors in the rotated varimax solution were dropped. Questions loading on 2 factors were revised for clearer wording in the subsequent draft. New questions were added to factor scales on which too few questions were loading. All analyses were performed using Intercooled Stata 7.0 statistical software (Stata Corp, College Station, TX) on a Linux workstation.

The results of the factor analysis were compared with the theory after the second and third iterations. Physicians were scaled on the 3 factors by summing the responses to the items of each scale, with strongly agree (SA) = 5 and strongly disagree (SD) = 1 (reversing the numbers for items phrased in the opposite manner). Normalization (adjusting scores to account for scales that included more items, resulting in larger maximum scores) was considered but rejected, because normalized scores proved more confusing than unequal scales when the results were presented to audiences.

We used the scale scores to classify the physicians into the 4 types (seeker, receptive, traditionalist, and pragmatist). We performed the factor analyses and interpretations as described in Tables2, 3, and 4, then translated the hypothesized relationships in Table 1 into specific calculations as shown in Tables 5 and 6 (for the second and third iterations, respectively). The chosen cutoff points were necessarily somewhat arbitrary; to prove them optimal, we must complete an external validation study of the physicians’ behavior vs their scale scores, which is now underway. The current data address the instrument’s development and internal consistency.

 

 

TABLE 2

Factor analysis solutions

 Eigenvalues by number of factors in solution
Iteration1234
12.881.671.441.23
21.951.200.8090.387
33.352.311.600.821

TABLE 3

Scale interpretations

ScaleInterpretationQuestions (on iteration 3)*
1Evidence–experience1, 3, 9, 12, 16, 17
2Nonconformity2, 5, 7, 11, 13, 15
3Practicality4, 6, 8, 10, 14
*See Figure.

TABLE 4

Scale internal consistencies

 Cronbach α at each iteration
IterationEvidence-experienceNonconformityPracticality
10.630.610.54
20.700.590.48
30.790.740.68

TABLE 5

Scale scores by physician type, second iteration

Physician typeEvidence vs experience (range, 5–25)Nonconformity (range, 4–20)Practicality (range, 4–20)
SeekersExtreme evidence end: ≥20High: >12Not high: ≤14
ReceptivesToward evidence end: ≥15Moderate: ≤12Not high: ≤14
TraditionalistsToward experience end: <15VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

TABLE 6

Scale scores by physician type, third iteration (depicted in the Figure)

Physician typeEvidence vs experience (range, 6–30)Nonconformity (range, 6–30)Practicality (range, 5–25)
SeekersExtreme evidence end: ≥22High: >18Not high: ≤14
ReceptivesToward evidence end: ≥18Moderate: ≤18Not high: ≤14
TraditionalistsToward experience end: <18VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

FIGURE Psychometric Instrument


Results

For the first, second, and third iterations, we received 106, 1120, and 61 instruments respectively that were completed in usable form. At every stage of the instrument’s development, factor analysis showed that a 3-factor model fit best. The eigenvalues declined rapidly when there were more than 3 factors (Table 2), showing that additional factors would not improve the solution.

Orthogonal rotation and interpretation of the questions making up each factor produced 3 psychologically meaningful scales (Table 3) corresponding closely to our theoretical model; the same 3 scales emerged at each iteration. The scales are named similarly to the theory above: evidence–experience, practicality, and nonconformity. The Cronbach α for each scale at each iteration is presented in Table 4.

Using the above-described classification scheme (with specific cutoffs detailed in Tables 5 and 6), the 1181 physicians who completed the instrument in the second and third iterations were classified as follows: 2.5% seekers; 57.0% receptives; 12.6% traditionalists; and 27.9% pragmatists. Different cutoff values would yield somewhat different percentages, but seekers are very few using any reasonable value.

Discussion

These results are consistent with the theoretical construct of 3 factors underlying our physician classification scheme and demonstrate that those factors can be measured on scales with reasonable internal consistency. The data are consistent with the theory on which the instrument was developed. Not all possible combinations of the 3 factors exist, which is consistent with the 4-types theory depicted in Table 1. For example, there should be no physicians who are strongly evidence-based and strongly conformist, and that combination does not occur. However, there are physicians who are strongly evidence-based and strongly nonconformist (the seekers). Few physicians selected either extreme for any factor, but with the exception of nonconformity, a broad range existed across all of the factors.

These findings show that physicians differ in their attitudes toward new information about the effectiveness or appropriateness of clinical strategies, and that those differences are measurable and quantifiable. Quantifying those differences was a major step forward in testing our theoretical framework for selecting effective practice change strategies.

The next step is to demonstrate external validity by showing that differences in physician behavior are consistent with demonstrable differences in attitudes. Such a study is underway at this writing. A trial of practice change interventions guided by the categorization scheme should be carried out subsequently.

The categories we propose do not reflect bimodal distributions of attributes; physicians are distributed relatively uniformly all along the 3 scales. The categories are useful descriptors, not absolute pigeonholes.

The results suggest to us that there is fertile ground for applied psychometrics and cognitive science research related to changing clinical practices. Such work may help illuminate the murky results of practice change intervention and guideline implementation studies to date. Further cognitive research about our own theoretical framework is likely to identify factors and complexities that we have not yet addressed.

· Acknowledgments ·

The authors thank Mark Ebell, MD, MS for his assistance in revising the instrument; Judith Zemencuk and Bonnie Boots-Miller of the Ann Arbor Veterans Administration Health Services Research and Development offices for their assistance in distributing the instrument to and collecting data from Veterans Administration physicians; Janice Klos for her help in gathering data from Michigan Academy of Family Practice member physicians; Van Harrison, PhD and his staff for their help in enlisting the participation of physicians at Michigan CME events; and of course, the Veterans Administration, Michigan Academy of Family Practice, and Michigan physicians who graciously completed instruments for this project.

References

1. Wyszewianski L, Green LA. Strategies for changing clinicians’ practice patterns: a new perspective. J Family Pract 2000;49:461-4.

2. Eisenberg JM. Doctors’ Decisions and the Cost of Medical Care. Ann Arbor: Health Administration Press, 1986.

3. Davis D, O’Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA 1999;282:867-74.

4. Wensing M, van der Weijden T, Grol R. Implementing guidelines and innovations in general practice: which interventions are effective? Br J Gen Pract 1998;48:991-7.

5. Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ 1995;153:1423-31.

6. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.

7. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.

8. Woolf SH. Changing physician practice behavior: the merits of a diagnostic approach. J Fam Pract 2000;49:126-9.

Article PDF
Author and Disclosure Information

LEE A. GREEN, MD, MPH
DANIEL W. GORENFLO, PHD
LEON WYSZEWIANSKI, PHD
Ann Arbor, Michigan
From the University of Michigan Medical School (L.A.G., D.W.G.) and the University of Michigan School of Public Health (L.W.), Ann Arbor, MI. Presented in part at the North American Primary Care Research Group meeting of 2000. The authors report no competing interests. Address reprint requests to Lee Green, MD, MPH, Department of Family Medicine, 1018 Fuller, Campus 0708, Ann Arbor, MI 48109. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(11)
Publications
Page Number
938-942
Legacy Keywords
,Patternsphysician’s practiceeducationmedicalcontinuingpractice guidelinesdecision makingpsychometric instruments.(J Fam Pract 2002; 51:938–942)
Sections
Author and Disclosure Information

LEE A. GREEN, MD, MPH
DANIEL W. GORENFLO, PHD
LEON WYSZEWIANSKI, PHD
Ann Arbor, Michigan
From the University of Michigan Medical School (L.A.G., D.W.G.) and the University of Michigan School of Public Health (L.W.), Ann Arbor, MI. Presented in part at the North American Primary Care Research Group meeting of 2000. The authors report no competing interests. Address reprint requests to Lee Green, MD, MPH, Department of Family Medicine, 1018 Fuller, Campus 0708, Ann Arbor, MI 48109. E-mail: [email protected].

Author and Disclosure Information

LEE A. GREEN, MD, MPH
DANIEL W. GORENFLO, PHD
LEON WYSZEWIANSKI, PHD
Ann Arbor, Michigan
From the University of Michigan Medical School (L.A.G., D.W.G.) and the University of Michigan School of Public Health (L.W.), Ann Arbor, MI. Presented in part at the North American Primary Care Research Group meeting of 2000. The authors report no competing interests. Address reprint requests to Lee Green, MD, MPH, Department of Family Medicine, 1018 Fuller, Campus 0708, Ann Arbor, MI 48109. E-mail: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • One size probably does not fit all when bringing physicians new information that might change their practice.
  • Physicians differ measurably in what they consider credible sources of information, the weight they assign to practical concerns, and their willingness to diverge from group norms in practice.
  • Interventions that bring new knowledge into practice can be tailored to physicians’ perspectives. Further research may show this approach to be more useful to physicians and more likely to succeed than current approaches.

We previously proposed a theoretical framework for selecting the most effective strategies for changing physicians’ practice patterns.1 This framework called for classifying physicians into 4 categories based on how they respond to new information about the effectiveness of clinical practices, then selecting the strategy best suited to each physician’s response style. In this paper we describe the development and validation of a psychometric instrument to classify physicians into the 4 categories. This is one more element in our ongoing effort to answer, rigorously and specifically, basic questions about the adoption of evidence-based practices; for example, how can we increase physicians’ use of proven interventions, such as β-blockers after myocardial infarction or tight blood pressure control for patients with type 2 diabetes? How can we reduce physicians’ use of disproved therapies, such as oral β-agonist tocolytics for preterm labor or antibiotics for viral illnesses?

The literature is rife with examples of singlemode and multimode studies using educational interventions, positive and negative incentives, group and individualized feedback, sanctions, regulations, academic detailing, and patient-demand interventions to bring about changes in physician practice. 2-5 Advocates of these approaches cite published examples of their success in changing clinical practices; in all cases, however, published and unpublished instances of failure exist as well. The lack of a consistent pattern of success or failure has led to a growing recognition that no single strategy will ever be a “magic bullet”5 ; therefore, the selection of practice change strategies must be based on specific situations and settings.6-8 However, it is still not known what characteristics of the setting matter most and which approach will work in a specific setting and situation.

We believe that one key factor in selecting effective strategies is the audience. Businesses learned long ago that market segmentation, in which products are advertised differently to people who have different needs, values, and views, is crucial to success in sales. Similarly, our theoretical framework posits that selecting the most appropriate change strategy requires first classifying clinicians according to how they respond to new information about the effectiveness of clinical strategies. We distinguish 4 classification categories: seekers, receptives, traditionalists, and pragmatists.1

Physician categories and underlying factors

Seekers consider systematically gathered, published data (rather than personal experience or authority) the most reliable source of knowledge. They critically appraise the data themselves and value what they view as correct practice over pragmatic concerns, such as seeing patients quickly and efficiently. Most notably, seekers make evidence-driven practice changes even when the changes are out of step with local medical culture.

Like seekers, receptives are evidence-oriented, but they generally rely on the judgment of respected others for critical appraisal of new information. Receptives are likely to act on information from a scientifically and clinically sound source. Although they do not always hew to local medical culture, receptives generally depart from local practice only when the evidence is sufficiently compelling.

Traditionalists view clinical experience and authority as the most reliable basis for practice, and therefore rely on personal experience and the judgment and teachings of clinical leaders for guidance. The term “traditionalist” is not meant to suggest that the practitioner follows older, more traditional medical practices; rather, it relates to the physician’s traditional view of clinical experience as the ultimate basis of knowledge. The traditionalist may be an early adopter of new technologies if a respected clinical leader suggests them. Traditionalists are not greatly concerned with how their practices fit local medical culture, and are more concerned with practicing correctly than efficiently.

Pragmatists focus on the day-to-day demands of a busy practice. Acutely aware of the many competing claims on their scarce time from patients, colleagues, employees, insurers, and hospitals, pragmatists evaluate calls to change their practice in terms of anticipated impact on time, workload, patient flow, and patient satisfaction rather than scientific validity or congruence with local medical culture. Pragmatists may view either evidence or experience as the most reliable foundation for practice, and may be willing to diverge from local norms when doing so is not disruptive; their primary concern, however, is efficiency.

 

 

As we emphasized in our original formulation, our categorizations refer to trait, not state; that is, the categories describe general response tendencies, not moment-to-moment clinical decision making. It is incorrect to say that a physician responds as a seeker in one instance and a pragmatist in another, or that the same person shows traditionalist responses to one topic and receptive responses to another. (Most actual clinical behavior is, of necessity, pragmatic most of the time.)

We hypothesize that these physician response styles represent various combinations of 3 underlying factors:

  1. Extent to which scientific evidence, rather than clinical experience and authority, is perceived as the best source of knowledge about good practice (evidence vs experience).
  2. Degree of comfort with clinical practices that are out of step with the local community’s practices or the recommendations of leaders (nonconformity).
  3. Importance attached to managing workload and patient flow while maintaining general patient satisfaction (practicality).

Not all possible combinations of the 3 factors exist, and some combinations are behaviorally indistinguishable—that is, they produce the same response style. The manner in which these 3 factors define the 4 types of physicians is shown in Table 1. In this paper we report the results of 3 iterations in the development of a psychometric instrument to measure these factors.

TABLE 1

Hypothesized factor loading by physician type

Physician typeEvidence vs experienceNonconformityPracticality
SeekersExtreme evidence endHighNot high
ReceptivesToward evidence endModerateNot high
TraditionalistsToward experience endVariableNot high
PragmatistsVariableVariableHigh

Methods

To test the hypothesized relationship between physician category and response to practice change interventions, we needed to develop an instrument for assessing physicians on the underlying 3 attributes so that, based on those attributes, we could subsequently place them in the 4 information response categories. We created several questions addressing each of our hypothesized factors and refined them for clarity. The question pool was further refined in consultation with active practitioners serving on commissions and committees of the American Academy of Family Practice, who represented a variety of nonacademic perspectives on clinical practice and learning. An 18-item psychometric instrument was prepared and pilot tested on a convenience sample of 112 family physicians in Iowa and Michigan who were participating in other research projects.

The results of that pilot test were used to prepare a second version, which was tested with 328 physicians at a regional CME conference and 889 physicians with the national Veterans Health Administration system for a total of 1217. The sample comprised 234 family physicians; 848 internists; 29 obstetrician/gynecologists; 27 general practitioners; 24 emergency physicians; and a small number of general surgeons, pediatricians, psychiatrists, and other specialists. The results from the second version guided the preparation of the third (Figure), which was tested on a sample of 64 family physicians at 2 CME events.

Because of the free-choice manner in which the instruments were distributed, it was not possible to calculate an exact response rate; however, the total number of participants equaled slightly more than 75% of the total number of instruments distributed.

To refine the instrument at each iteration, we began with a factor analysis using the principalcomponents method and orthogonal varimax rotation. The eigenvalues from the factor analysis were used to determine the number of factors in the optimum solution. The instrument’s questions were assigned to these factors based on the factor on which they loaded most heavily in the rotated solution. Cronbach α was calculated for each factor scale. At each iteration, questions loading less than 0.35 on all factors in the rotated varimax solution were dropped. Questions loading on 2 factors were revised for clearer wording in the subsequent draft. New questions were added to factor scales on which too few questions were loading. All analyses were performed using Intercooled Stata 7.0 statistical software (Stata Corp, College Station, TX) on a Linux workstation.

The results of the factor analysis were compared with the theory after the second and third iterations. Physicians were scaled on the 3 factors by summing the responses to the items of each scale, with strongly agree (SA) = 5 and strongly disagree (SD) = 1 (reversing the numbers for items phrased in the opposite manner). Normalization (adjusting scores to account for scales that included more items, resulting in larger maximum scores) was considered but rejected, because normalized scores proved more confusing than unequal scales when the results were presented to audiences.

We used the scale scores to classify the physicians into the 4 types (seeker, receptive, traditionalist, and pragmatist). We performed the factor analyses and interpretations as described in Tables2, 3, and 4, then translated the hypothesized relationships in Table 1 into specific calculations as shown in Tables 5 and 6 (for the second and third iterations, respectively). The chosen cutoff points were necessarily somewhat arbitrary; to prove them optimal, we must complete an external validation study of the physicians’ behavior vs their scale scores, which is now underway. The current data address the instrument’s development and internal consistency.

 

 

TABLE 2

Factor analysis solutions

 Eigenvalues by number of factors in solution
Iteration1234
12.881.671.441.23
21.951.200.8090.387
33.352.311.600.821

TABLE 3

Scale interpretations

ScaleInterpretationQuestions (on iteration 3)*
1Evidence–experience1, 3, 9, 12, 16, 17
2Nonconformity2, 5, 7, 11, 13, 15
3Practicality4, 6, 8, 10, 14
*See Figure.

TABLE 4

Scale internal consistencies

 Cronbach α at each iteration
IterationEvidence-experienceNonconformityPracticality
10.630.610.54
20.700.590.48
30.790.740.68

TABLE 5

Scale scores by physician type, second iteration

Physician typeEvidence vs experience (range, 5–25)Nonconformity (range, 4–20)Practicality (range, 4–20)
SeekersExtreme evidence end: ≥20High: >12Not high: ≤14
ReceptivesToward evidence end: ≥15Moderate: ≤12Not high: ≤14
TraditionalistsToward experience end: <15VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

TABLE 6

Scale scores by physician type, third iteration (depicted in the Figure)

Physician typeEvidence vs experience (range, 6–30)Nonconformity (range, 6–30)Practicality (range, 5–25)
SeekersExtreme evidence end: ≥22High: >18Not high: ≤14
ReceptivesToward evidence end: ≥18Moderate: ≤18Not high: ≤14
TraditionalistsToward experience end: <18VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

FIGURE Psychometric Instrument


Results

For the first, second, and third iterations, we received 106, 1120, and 61 instruments respectively that were completed in usable form. At every stage of the instrument’s development, factor analysis showed that a 3-factor model fit best. The eigenvalues declined rapidly when there were more than 3 factors (Table 2), showing that additional factors would not improve the solution.

Orthogonal rotation and interpretation of the questions making up each factor produced 3 psychologically meaningful scales (Table 3) corresponding closely to our theoretical model; the same 3 scales emerged at each iteration. The scales are named similarly to the theory above: evidence–experience, practicality, and nonconformity. The Cronbach α for each scale at each iteration is presented in Table 4.

Using the above-described classification scheme (with specific cutoffs detailed in Tables 5 and 6), the 1181 physicians who completed the instrument in the second and third iterations were classified as follows: 2.5% seekers; 57.0% receptives; 12.6% traditionalists; and 27.9% pragmatists. Different cutoff values would yield somewhat different percentages, but seekers are very few using any reasonable value.

Discussion

These results are consistent with the theoretical construct of 3 factors underlying our physician classification scheme and demonstrate that those factors can be measured on scales with reasonable internal consistency. The data are consistent with the theory on which the instrument was developed. Not all possible combinations of the 3 factors exist, which is consistent with the 4-types theory depicted in Table 1. For example, there should be no physicians who are strongly evidence-based and strongly conformist, and that combination does not occur. However, there are physicians who are strongly evidence-based and strongly nonconformist (the seekers). Few physicians selected either extreme for any factor, but with the exception of nonconformity, a broad range existed across all of the factors.

These findings show that physicians differ in their attitudes toward new information about the effectiveness or appropriateness of clinical strategies, and that those differences are measurable and quantifiable. Quantifying those differences was a major step forward in testing our theoretical framework for selecting effective practice change strategies.

The next step is to demonstrate external validity by showing that differences in physician behavior are consistent with demonstrable differences in attitudes. Such a study is underway at this writing. A trial of practice change interventions guided by the categorization scheme should be carried out subsequently.

The categories we propose do not reflect bimodal distributions of attributes; physicians are distributed relatively uniformly all along the 3 scales. The categories are useful descriptors, not absolute pigeonholes.

The results suggest to us that there is fertile ground for applied psychometrics and cognitive science research related to changing clinical practices. Such work may help illuminate the murky results of practice change intervention and guideline implementation studies to date. Further cognitive research about our own theoretical framework is likely to identify factors and complexities that we have not yet addressed.

· Acknowledgments ·

The authors thank Mark Ebell, MD, MS for his assistance in revising the instrument; Judith Zemencuk and Bonnie Boots-Miller of the Ann Arbor Veterans Administration Health Services Research and Development offices for their assistance in distributing the instrument to and collecting data from Veterans Administration physicians; Janice Klos for her help in gathering data from Michigan Academy of Family Practice member physicians; Van Harrison, PhD and his staff for their help in enlisting the participation of physicians at Michigan CME events; and of course, the Veterans Administration, Michigan Academy of Family Practice, and Michigan physicians who graciously completed instruments for this project.

KEY POINTS FOR CLINICIANS

  • One size probably does not fit all when bringing physicians new information that might change their practice.
  • Physicians differ measurably in what they consider credible sources of information, the weight they assign to practical concerns, and their willingness to diverge from group norms in practice.
  • Interventions that bring new knowledge into practice can be tailored to physicians’ perspectives. Further research may show this approach to be more useful to physicians and more likely to succeed than current approaches.

We previously proposed a theoretical framework for selecting the most effective strategies for changing physicians’ practice patterns.1 This framework called for classifying physicians into 4 categories based on how they respond to new information about the effectiveness of clinical practices, then selecting the strategy best suited to each physician’s response style. In this paper we describe the development and validation of a psychometric instrument to classify physicians into the 4 categories. This is one more element in our ongoing effort to answer, rigorously and specifically, basic questions about the adoption of evidence-based practices; for example, how can we increase physicians’ use of proven interventions, such as β-blockers after myocardial infarction or tight blood pressure control for patients with type 2 diabetes? How can we reduce physicians’ use of disproved therapies, such as oral β-agonist tocolytics for preterm labor or antibiotics for viral illnesses?

The literature is rife with examples of singlemode and multimode studies using educational interventions, positive and negative incentives, group and individualized feedback, sanctions, regulations, academic detailing, and patient-demand interventions to bring about changes in physician practice. 2-5 Advocates of these approaches cite published examples of their success in changing clinical practices; in all cases, however, published and unpublished instances of failure exist as well. The lack of a consistent pattern of success or failure has led to a growing recognition that no single strategy will ever be a “magic bullet”5 ; therefore, the selection of practice change strategies must be based on specific situations and settings.6-8 However, it is still not known what characteristics of the setting matter most and which approach will work in a specific setting and situation.

We believe that one key factor in selecting effective strategies is the audience. Businesses learned long ago that market segmentation, in which products are advertised differently to people who have different needs, values, and views, is crucial to success in sales. Similarly, our theoretical framework posits that selecting the most appropriate change strategy requires first classifying clinicians according to how they respond to new information about the effectiveness of clinical strategies. We distinguish 4 classification categories: seekers, receptives, traditionalists, and pragmatists.1

Physician categories and underlying factors

Seekers consider systematically gathered, published data (rather than personal experience or authority) the most reliable source of knowledge. They critically appraise the data themselves and value what they view as correct practice over pragmatic concerns, such as seeing patients quickly and efficiently. Most notably, seekers make evidence-driven practice changes even when the changes are out of step with local medical culture.

Like seekers, receptives are evidence-oriented, but they generally rely on the judgment of respected others for critical appraisal of new information. Receptives are likely to act on information from a scientifically and clinically sound source. Although they do not always hew to local medical culture, receptives generally depart from local practice only when the evidence is sufficiently compelling.

Traditionalists view clinical experience and authority as the most reliable basis for practice, and therefore rely on personal experience and the judgment and teachings of clinical leaders for guidance. The term “traditionalist” is not meant to suggest that the practitioner follows older, more traditional medical practices; rather, it relates to the physician’s traditional view of clinical experience as the ultimate basis of knowledge. The traditionalist may be an early adopter of new technologies if a respected clinical leader suggests them. Traditionalists are not greatly concerned with how their practices fit local medical culture, and are more concerned with practicing correctly than efficiently.

Pragmatists focus on the day-to-day demands of a busy practice. Acutely aware of the many competing claims on their scarce time from patients, colleagues, employees, insurers, and hospitals, pragmatists evaluate calls to change their practice in terms of anticipated impact on time, workload, patient flow, and patient satisfaction rather than scientific validity or congruence with local medical culture. Pragmatists may view either evidence or experience as the most reliable foundation for practice, and may be willing to diverge from local norms when doing so is not disruptive; their primary concern, however, is efficiency.

 

 

As we emphasized in our original formulation, our categorizations refer to trait, not state; that is, the categories describe general response tendencies, not moment-to-moment clinical decision making. It is incorrect to say that a physician responds as a seeker in one instance and a pragmatist in another, or that the same person shows traditionalist responses to one topic and receptive responses to another. (Most actual clinical behavior is, of necessity, pragmatic most of the time.)

We hypothesize that these physician response styles represent various combinations of 3 underlying factors:

  1. Extent to which scientific evidence, rather than clinical experience and authority, is perceived as the best source of knowledge about good practice (evidence vs experience).
  2. Degree of comfort with clinical practices that are out of step with the local community’s practices or the recommendations of leaders (nonconformity).
  3. Importance attached to managing workload and patient flow while maintaining general patient satisfaction (practicality).

Not all possible combinations of the 3 factors exist, and some combinations are behaviorally indistinguishable—that is, they produce the same response style. The manner in which these 3 factors define the 4 types of physicians is shown in Table 1. In this paper we report the results of 3 iterations in the development of a psychometric instrument to measure these factors.

TABLE 1

Hypothesized factor loading by physician type

Physician typeEvidence vs experienceNonconformityPracticality
SeekersExtreme evidence endHighNot high
ReceptivesToward evidence endModerateNot high
TraditionalistsToward experience endVariableNot high
PragmatistsVariableVariableHigh

Methods

To test the hypothesized relationship between physician category and response to practice change interventions, we needed to develop an instrument for assessing physicians on the underlying 3 attributes so that, based on those attributes, we could subsequently place them in the 4 information response categories. We created several questions addressing each of our hypothesized factors and refined them for clarity. The question pool was further refined in consultation with active practitioners serving on commissions and committees of the American Academy of Family Practice, who represented a variety of nonacademic perspectives on clinical practice and learning. An 18-item psychometric instrument was prepared and pilot tested on a convenience sample of 112 family physicians in Iowa and Michigan who were participating in other research projects.

The results of that pilot test were used to prepare a second version, which was tested with 328 physicians at a regional CME conference and 889 physicians with the national Veterans Health Administration system for a total of 1217. The sample comprised 234 family physicians; 848 internists; 29 obstetrician/gynecologists; 27 general practitioners; 24 emergency physicians; and a small number of general surgeons, pediatricians, psychiatrists, and other specialists. The results from the second version guided the preparation of the third (Figure), which was tested on a sample of 64 family physicians at 2 CME events.

Because of the free-choice manner in which the instruments were distributed, it was not possible to calculate an exact response rate; however, the total number of participants equaled slightly more than 75% of the total number of instruments distributed.

To refine the instrument at each iteration, we began with a factor analysis using the principalcomponents method and orthogonal varimax rotation. The eigenvalues from the factor analysis were used to determine the number of factors in the optimum solution. The instrument’s questions were assigned to these factors based on the factor on which they loaded most heavily in the rotated solution. Cronbach α was calculated for each factor scale. At each iteration, questions loading less than 0.35 on all factors in the rotated varimax solution were dropped. Questions loading on 2 factors were revised for clearer wording in the subsequent draft. New questions were added to factor scales on which too few questions were loading. All analyses were performed using Intercooled Stata 7.0 statistical software (Stata Corp, College Station, TX) on a Linux workstation.

The results of the factor analysis were compared with the theory after the second and third iterations. Physicians were scaled on the 3 factors by summing the responses to the items of each scale, with strongly agree (SA) = 5 and strongly disagree (SD) = 1 (reversing the numbers for items phrased in the opposite manner). Normalization (adjusting scores to account for scales that included more items, resulting in larger maximum scores) was considered but rejected, because normalized scores proved more confusing than unequal scales when the results were presented to audiences.

We used the scale scores to classify the physicians into the 4 types (seeker, receptive, traditionalist, and pragmatist). We performed the factor analyses and interpretations as described in Tables2, 3, and 4, then translated the hypothesized relationships in Table 1 into specific calculations as shown in Tables 5 and 6 (for the second and third iterations, respectively). The chosen cutoff points were necessarily somewhat arbitrary; to prove them optimal, we must complete an external validation study of the physicians’ behavior vs their scale scores, which is now underway. The current data address the instrument’s development and internal consistency.

 

 

TABLE 2

Factor analysis solutions

 Eigenvalues by number of factors in solution
Iteration1234
12.881.671.441.23
21.951.200.8090.387
33.352.311.600.821

TABLE 3

Scale interpretations

ScaleInterpretationQuestions (on iteration 3)*
1Evidence–experience1, 3, 9, 12, 16, 17
2Nonconformity2, 5, 7, 11, 13, 15
3Practicality4, 6, 8, 10, 14
*See Figure.

TABLE 4

Scale internal consistencies

 Cronbach α at each iteration
IterationEvidence-experienceNonconformityPracticality
10.630.610.54
20.700.590.48
30.790.740.68

TABLE 5

Scale scores by physician type, second iteration

Physician typeEvidence vs experience (range, 5–25)Nonconformity (range, 4–20)Practicality (range, 4–20)
SeekersExtreme evidence end: ≥20High: >12Not high: ≤14
ReceptivesToward evidence end: ≥15Moderate: ≤12Not high: ≤14
TraditionalistsToward experience end: <15VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

TABLE 6

Scale scores by physician type, third iteration (depicted in the Figure)

Physician typeEvidence vs experience (range, 6–30)Nonconformity (range, 6–30)Practicality (range, 5–25)
SeekersExtreme evidence end: ≥22High: >18Not high: ≤14
ReceptivesToward evidence end: ≥18Moderate: ≤18Not high: ≤14
TraditionalistsToward experience end: <18VariableNot high: ≤14
PragmatistsVariableVariableHigh: >14

FIGURE Psychometric Instrument


Results

For the first, second, and third iterations, we received 106, 1120, and 61 instruments respectively that were completed in usable form. At every stage of the instrument’s development, factor analysis showed that a 3-factor model fit best. The eigenvalues declined rapidly when there were more than 3 factors (Table 2), showing that additional factors would not improve the solution.

Orthogonal rotation and interpretation of the questions making up each factor produced 3 psychologically meaningful scales (Table 3) corresponding closely to our theoretical model; the same 3 scales emerged at each iteration. The scales are named similarly to the theory above: evidence–experience, practicality, and nonconformity. The Cronbach α for each scale at each iteration is presented in Table 4.

Using the above-described classification scheme (with specific cutoffs detailed in Tables 5 and 6), the 1181 physicians who completed the instrument in the second and third iterations were classified as follows: 2.5% seekers; 57.0% receptives; 12.6% traditionalists; and 27.9% pragmatists. Different cutoff values would yield somewhat different percentages, but seekers are very few using any reasonable value.

Discussion

These results are consistent with the theoretical construct of 3 factors underlying our physician classification scheme and demonstrate that those factors can be measured on scales with reasonable internal consistency. The data are consistent with the theory on which the instrument was developed. Not all possible combinations of the 3 factors exist, which is consistent with the 4-types theory depicted in Table 1. For example, there should be no physicians who are strongly evidence-based and strongly conformist, and that combination does not occur. However, there are physicians who are strongly evidence-based and strongly nonconformist (the seekers). Few physicians selected either extreme for any factor, but with the exception of nonconformity, a broad range existed across all of the factors.

These findings show that physicians differ in their attitudes toward new information about the effectiveness or appropriateness of clinical strategies, and that those differences are measurable and quantifiable. Quantifying those differences was a major step forward in testing our theoretical framework for selecting effective practice change strategies.

The next step is to demonstrate external validity by showing that differences in physician behavior are consistent with demonstrable differences in attitudes. Such a study is underway at this writing. A trial of practice change interventions guided by the categorization scheme should be carried out subsequently.

The categories we propose do not reflect bimodal distributions of attributes; physicians are distributed relatively uniformly all along the 3 scales. The categories are useful descriptors, not absolute pigeonholes.

The results suggest to us that there is fertile ground for applied psychometrics and cognitive science research related to changing clinical practices. Such work may help illuminate the murky results of practice change intervention and guideline implementation studies to date. Further cognitive research about our own theoretical framework is likely to identify factors and complexities that we have not yet addressed.

· Acknowledgments ·

The authors thank Mark Ebell, MD, MS for his assistance in revising the instrument; Judith Zemencuk and Bonnie Boots-Miller of the Ann Arbor Veterans Administration Health Services Research and Development offices for their assistance in distributing the instrument to and collecting data from Veterans Administration physicians; Janice Klos for her help in gathering data from Michigan Academy of Family Practice member physicians; Van Harrison, PhD and his staff for their help in enlisting the participation of physicians at Michigan CME events; and of course, the Veterans Administration, Michigan Academy of Family Practice, and Michigan physicians who graciously completed instruments for this project.

References

1. Wyszewianski L, Green LA. Strategies for changing clinicians’ practice patterns: a new perspective. J Family Pract 2000;49:461-4.

2. Eisenberg JM. Doctors’ Decisions and the Cost of Medical Care. Ann Arbor: Health Administration Press, 1986.

3. Davis D, O’Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA 1999;282:867-74.

4. Wensing M, van der Weijden T, Grol R. Implementing guidelines and innovations in general practice: which interventions are effective? Br J Gen Pract 1998;48:991-7.

5. Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ 1995;153:1423-31.

6. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.

7. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.

8. Woolf SH. Changing physician practice behavior: the merits of a diagnostic approach. J Fam Pract 2000;49:126-9.

References

1. Wyszewianski L, Green LA. Strategies for changing clinicians’ practice patterns: a new perspective. J Family Pract 2000;49:461-4.

2. Eisenberg JM. Doctors’ Decisions and the Cost of Medical Care. Ann Arbor: Health Administration Press, 1986.

3. Davis D, O’Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA 1999;282:867-74.

4. Wensing M, van der Weijden T, Grol R. Implementing guidelines and innovations in general practice: which interventions are effective? Br J Gen Pract 1998;48:991-7.

5. Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. CMAJ 1995;153:1423-31.

6. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.

7. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.

8. Woolf SH. Changing physician practice behavior: the merits of a diagnostic approach. J Fam Pract 2000;49:126-9.

Issue
The Journal of Family Practice - 51(11)
Issue
The Journal of Family Practice - 51(11)
Page Number
938-942
Page Number
938-942
Publications
Publications
Article Type
Display Headline
Validating an instrument for selecting interventions to change physician practice patterns
Display Headline
Validating an instrument for selecting interventions to change physician practice patterns
Legacy Keywords
,Patternsphysician’s practiceeducationmedicalcontinuingpractice guidelinesdecision makingpsychometric instruments.(J Fam Pract 2002; 51:938–942)
Legacy Keywords
,Patternsphysician’s practiceeducationmedicalcontinuingpractice guidelinesdecision makingpsychometric instruments.(J Fam Pract 2002; 51:938–942)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media