User login
Primary Care Physician Supply and Colorectal Cancer
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
The Effect of Patient and Visit Characteristics on Diagnosis of Depression in Primary Care
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
Why Some Cancer Patients Choose Complementary and Alternative Medicine Instead of Conventional Treatment
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
Tazarotene 0.1% Gel in the Treatment of Fingernail Psoriasis: A Double-Blind, Randomized, Vehicle-Controlled Study
Assessing Guidelines for Use in Family Practice
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
Answering Family Physicians’ Clinical Questions Using Electronic Medical Databases
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
Are b2-agonists Effective Treatment for Acute Bronchitis or Acute Cough in Patients Without Underlying Pulmonary Disease? A Systematic Review
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
The Accuracy of Physical Diagnostic Tests for Assessing Meniscal Lesions of the Knee: A Meta-Analysis
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
Clinical Findings Associated with Radiographic Pneumonia in Nursing Home Residents
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
Rate of Case Reporting, Physician Compliance, and Practice Volume in a Practice-Based Research Network Study
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.