User login
Answering Family Physicians’ Clinical Questions Using Electronic Medical Databases
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
STUDY DESIGN: Two family physicians attempted to answer 20 questions with each of the databases evaluated. The adequacy of the answers was determined by the 2 physician searchers, and an arbitration panel of 3 family physicians was used if there was disagreement.
DATA SOURCE: We identified 38 databases through nominations from national groups of family physicians, medical informaticians, and medical librarians; 14 of these databases met predetermined eligibility criteria.
OUTCOME MEASURED: The primary outcome was the proportion of questions adequately answered by each database and by combinations of databases. We also measured mean and median times to obtain adequate answers for individual databases.
RESULTS: The agreement between family physician searchers regarding the adequacy of answers was excellent (k=0.94). Five individual databases (STAT! Ref, MDConsult, DynaMed, MAXX, and MDChoice.com) answered at least half of the clinical questions. Some combinations of databases answered 75% or more. The average time to obtain an adequate answer ranged from 2.4 to 6.5 minutes.
CONCLUSIONS: Several current electronic medical databases could answer most of a group of 20 clinical questions derived from family physicians during office practice. However, point-of-care searching is not yet fast enough to address most clinical questions identified during routine clinical practice.
Family physicians and general internists report an average of 6 questions for each half-day of office practice,1-3 and 70% of these questions remain unanswered. The 2 factors that significantly predict whether a physician will attempt to answer a clinical question are the physician’s belief that a definitive answer exists and the urgency of the patient’s problem.4
Gorman and colleagues3 reported that medical librarians found clear answers for 46% of 60 randomly selected questions from family physicians; 51% would affect practice. The medical librarians searched for an average of 43 minutes per question. In a second study,5 medical librarians used MEDLINE and textbooks to answer 86 questions from family physicians. The MEDLINE searches took a mean of 27 minutes, and textbook searches took a mean of 6 minutes. Search results answered 54% of the clinical questions completely or nearly completely. Physicians estimated that the answers would have a “major” or “fairly major” impact on practice for 35% of their questions. MEDLINE searches provided answers to 43% of the questions, while textbook searches provided answers for an additional 11%.
Many physicians do not have the searching skills or access to the range of knowledge resources that librarians use. Even if they did, they do not take the time to conduct such searches during patient care. One study1 found that physicians spent less than 2 minutes on average seeking an answer to a question. Thus, most clinical questions remain unanswered.
Electronic medical databases that provide answers directly (not just reference citations) may make it easier for clinicians to obtain answers at the point of care. We found no systematic evaluation of the capacity of such databases to answer clinical questions. We conducted this study to determine the extent to which current electronic medical databases can answer family physicians’ point-of-care clinical questions.
Methods
Database Selection
We solicited nominations for potentially suitable databases from multiple E-mail lists (including communities of family physicians [Family-L], medical informaticians [FAM-MED] and medical librarians [MEDLIB-L, MCMLA-L]) and through Web searches. A selection team consisting of 3 family physicians (J.S., D.W., B.E.) and a medical librarian (none of whom had financial relationships with any databases) determined whether the nominated databases met our inclusion criteria Table 1.
Clinical Questions
More than 1200 clinical questions had been previously collected from observations of family physicians during office practice.1,5 These questions had been classified by typology (eg, Is test X indicated in situation Y?) and by topic (eg, dermatology).1 We selected questions from these sources that were categorized among the most common typologies (8 of 68 typologies covering 50% of the questions) and the most common topics (7 of 62 topics covering 43% of the questions). These combinations of typologies and topics accounted for 272 (23%) of the 1204 questions.
If necessary, each question was translated by 2 physicians (B.A. and D.W. working together) to meet the following criteria: (1) clear enough to imagine an applicable clinical scenario, (2) answerable (ie, the question could theoretically be answered using clinical references without further patient data regardless of whether an answer was known to exist), (3) clinically relevant, and (4) true to the original question (ie, containing the information need and the modifying factors of the original question).
Each question was then independently proofread by at least 2 other physicians and translated again if necessary. Thirteen questions (5%) that did not meet these criteria after a second translation were dropped. Forty-seven questions (17%) that referred to information needs that could be adequately answered using the Physicians’ Desk Reference6 were dropped (eg, Are Paxil tablets scored?). The remaining 212 questions represented 8 typologies.1 Two or 3 questions were randomly selected from each typology for a total of 20 questions Table 2.
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.
We also evaluated combinations of databases that were available at no cost. The combination of the 2 no-cost databases that answered the largest proportion of questions (75%) was DynaMed and American Family Physician. The greatest proportion of clinical questions that could be answered using the freely available sources was 80%, and this required the use of 3 databases (DynaMed, MDChoice.com, and American Family Physician).
Discussion
Our study suggests that individual databases can answer a considerable proportion of family physicians’ clinical questions. Combinations of currently available databases can answer 75% or more. The searches in this study were based on the combination of efforts of 2 experienced physician searchers. These results may not be replicable in the practice setting but do provide an objective best-case scenario assessment of the content of these databases.
The time required to obtain answers, while much less than searching for original articles, is still longer than the 2-minute average time spent by family physicians in the study by Ely and colleagues.1 Our time estimates are not precise, as time was not the primary focus of our study. Time was only recorded in 1-minute intervals, so searches that took 10 seconds were recorded as 1 minute. Even so, the existence of median times to obtain adequate answers greater than 2 minutes suggests that these databases may require more time than most physicians will take to pursue answers during patient care.
This is the first study to systematically evaluate how many questions can be answered by electronic medical databases. The strengths of this study include the use of a standard set of common questions asked by family physicians, testing by 2 experienced family physician searchers, and a systematic replicable approach to the evaluation. The only similar study we identified was one in which Graber and coworkers9 used 10 clinical questions and tested a commercial site, 2 medical meta-lists, 4 general search engines, and 9 medicine-specific search engines to determine the efficiency of answering clinical questions on the Web. Different approaches answered from 0 to 6 of the 10 questions, but that study looked primarily at sites that were not generally designed for use in clinical practice.
Limitations
Our study was limited by the relatively small number of questions, causing wide confidence intervals. Some answers were present in the databases but not found despite the use of 2 searchers. For example, a database manager identified 2 answers that were not found but would have been considered adequate.
We accepted answers as adequate if, in our judgment, they offered a practical course of action. We did not attempt to determine whether the individual asking the question believed that the answer was adequate nor did we attempt to validate the accuracy or currency of answers using independent standards. Many of the answers were based on sources that were several years old, and few were based on explicit evidence-based criteria. Although we determined the adequacy of answers for clinical practice through formal mechanisms, an in vivo study in which the clinicians asking the questions determined the adequacy of their findings during patient care activities would provide a more accurate assessment.
Our study presents a static evaluation of a dynamic field. Over time, answers may be lost because of lack of maintenance of resource links or may be gained by addition of new materials. Our use of questions gathered several years ago may not accurately reflect the ability of databases to answer current questions, which may be more likely to reflect new tests and treatments.
Many of the databases were designed for purposes other than meeting clinical information needs at the point of care. Performance in this study does not reflect the capacity of these databases to address their stated purposes. For example, the Translating Research Into Practice (TRIP) database is an excellent resource for searches of a large collection of evidence-based resources. These resources are generally limited to summaries of studies with the highest methodologic quality. The TRIP database did not perform well in our study partly because most of our test questions (consistent with questions in clinical practice) cannot currently be answered using studies of the highest methodologic quality. Another example is Medical Matrix, which provides a search engine and annotated summaries for exploring the entire medical Internet and not just clinical reference information.
We did not study the costs involved in using the databases we evaluated, and these costs may have changed since our study was conducted. Most of the databases we included were free to use at the time of the study and at the time of this report. The 3 collections of textbooks required access fees. STAT!Ref, which scored the highest in our study, did so because we used the complete collection available to us through our institutional library. This collection would cost an individual $2189 annually at the time of our study. A starter library was available for $199 annually and would only answer 40% of the questions.
Context
Family physicians and other primary care providers treat patients who have a wide variety of syndromes and symptoms. Because of the scope and breadth of primary care, it is nearly impossible for a clinician to keep up with rapidly changing medical information.10
Connelly and colleagues11 surveyed 126 family physicians and found they used the Physicians’ Desk Reference and colleagues much more often than Index Medicus or computer-based bibliographic retrieval systems. Research literature was used infrequently and rated among the lowest in terms of credibility, availability, searchability, understandability, and applicability. Physicians preferred sources that had low cost and were relevant to specific patient problems over sources that had higher quality.
Conclusions
Current databases can answer a considerable proportion of clinical questions but have not reached their potential for efficiency. It is our hope that as electronic medical databases mature, they will be able to bridge this gap and bring the research literature to the point of care in useful and practical ways. This study provides a snapshot of how far we have come and how far we need to go to meet these needs.
Acknowledgments
Funding for our study was provided by a grant from the American Academy of Family Physicians to support the Center for Family Medicine Science and from 2 Bureau of Health Professions Awards (DHHS 1-D14-HP-00029-01, DHHS 5 T32 HP10038) from the Health Resources and Services Administration to the Department of Family and Community Medicine at University of Missouri-Columbia. The authors would like to acknowledge Erik Lindbloom, MD, MSPH, for assisting with the database testing as a substitute searcher for B.A.; E. Diane Johnson, MLS, for assisting with the selection of databases for study inclusion; Robert Phillips, Jr., MD, MSPH, for arbitration of questions and answers for which the searchers did not reach agreement along with B.E. and J.S.; David Cravens, Erik Lindbloom, Kevin Kane, Jim Brillhart, and Mark Ebell for proofreading the questions for clarity, answerability, and clinical relevance; John Ely and Lee Chambliss for providing clinical questions from their observations; Mark Ebell, John Ely, Erik Lindbloom, Jerry Osheroff, Lee Chambliss, David Mehr, Robin Kruse, John Smucny, and many others for constructive criticism in the design of this study; and Steve Zweig for editorial review.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
1. Ely JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-61.
2. Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103:596-99.
3. Gorman PN, Ash J, Wykoff L. Can primary care physicians’ questions be answered using the medical journal literature? Bull Med Lib Assoc 1994;82:140-46.
4. Gorman PN, Helfand M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Mak 1995;15:113-19.
5. Chambliss ML, Conley J. Answering clinical questions. J Fam Pract 1996;43:140-44.
6. Medical Economics Physicians’ desk reference. 54th ed. Oradell, NJ: Medical Economics Company; 2000.
7. Pagano M, Gauvreau K. Inference on proportions. Principles of biostatistics. Belmont, Calif: Duxbury Press; 1993;297-298.
8. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The clinical examination. Clinical epidemiology: a basic science for clinical medicine. Boston, Mass: Little, Brown and Company; 1991;29-30.
9. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520-24.
10. Dickinson WP, Stange KC, Ebell MH, Ewigman BG, Green LA. Involving all family physicians and family medicine faculty members in the use and generation of new knowledge. Fam Med 2000;32:480-90.
11. Connelly DP, Rich EC, Curley SP, Kelly JT. Knowledge resource p of family physicians. J Fam Pract 1990;30:353-59.
Are b2-agonists Effective Treatment for Acute Bronchitis or Acute Cough in Patients Without Underlying Pulmonary Disease? A Systematic Review
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
STUDY DESIGN: We performed a systematic review including meta-analysis.
DATA SOURCES: We included randomized controlled trials comparing b2-agonists with placebo or alternative therapies identified from the Cochrane Library, MEDLINE, EMBASE, conference proceedings, Science Citation Index, the System for Information on Grey Literature in Europe, and letters to manufacturers of b2-agonists.
OUTCOME MEASURED: We measured duration, persistence, severity or frequency of cough, productive cough, and night cough; duration of activity limitations; and adverse effects.
RESULTS: Two trials in children with cough and no obvious airway obstruction did not find any benefits from b2-agonists. Five trials in adults with cough and with or without airway obstruction had mixed results, but summary statistics did not reveal any significant benefits from b2-agonists. Studies that enrolled more wheezing patients were more likely to show benefits from b2-agonists, and in one study only patients with evidence of airflow limitation were more likely to benefit. Patients given b2-agonists were more likely to report tremor, shakiness, or nervousness than those in the control groups.
CONCLUSIONS: There is no evidence to support using b2-agonists in children with acute cough and no evidence of airflow obstruction. There is little evidence that the routine use of b2-agonists for adults with acute cough is helpful. These agents may reduce symptoms, including cough, in patients with evidence of airflow obstruction, but this potential benefit is not well-supported by the available data and must be weighed against the adverse effects associated with b2-agonists.
Acute bronchitis is characterized by cough associated with other symptoms of upper respiratory infection. Although this condition is self-limited, most patients feel ill, and many do not perform their usual activities. The optimal treatment for this common condition in patients who do not have underlying pulmonary disease is not clear. Clinicians often prescribe antibiotics,1,2 in spite of the fact that they are of little overall benefit.3,4 It is important to examine the effectiveness of alternative approaches.
b2-agonists have been proposed, because healthy patients have impaired airflow when infected with pathogens known to cause acute bronchitis.5-7 Also, cough is the primary symptom for some patients who have asthma,8 and many of these patients benefit from b2-agonists.9 b2-agonists are effective in reducing cough due to other causes, such as bronchoscopy10 and intravenous fentany1,11
We conducted this systematic review to determine whether b2-agonists are effective for patients who have acute bronchitis without underlying pulmonary disease. If b2-agonists are effective, then they should be more widely used; only a minority of US family physicians currently prescribe them for acute bronchitis.2,12
Methods
We attempted to locate all controlled trials that compared b2-agonists with placebo or an alternative treatment in patients older than 2 years who presented with acute bronchitis or acute cough without a clear etiology (eg, pneumonia, pertussis, or sinusitis). We included patients with acute cough, because the clinical definition of acute bronchitis is not standardized. Textbooks13-15 and clinician studies16,17 instruct that cough in association with an acute respiratory infection is required for a diagnosis; otherwise, there are differing criteria regarding the need for other symptoms and signs, such as dyspnea, abnormal chest findings, and sputum.
We searched MEDLINE (1966-2000), EMBASE (1974-2000), and The Cochrane Library (through August 2000) using the key words “bronchitis” or “cough”, together with the terms “adrenergic beta-agonist (exp),” “bronchodilator agents (exp),” “sympathomimetic (exp),” “albuterol,” “salbutamol,” “bitolterol,” “isoetharine,” “metaproterenol,” “pirbuterol,” “salmeterol,” “terbutaline,” “fenoterol,” “formoterol,” or “procaterol” (note that albuterol and salbutamol are the same compound). We also searched conference proceedings databases (Inside Conferences, 1993-99; Conference Papers Index, 1973-99); the System for Information on Grey Literature in Europe database (1980-2000); the reference lists of retrieved articles, review articles, and textbooks; and the Science Citation Index (1990-2000). Finally, we wrote to all US manufacturers of brand name b2-agonists. There were no language restrictions in our search.
Two investigators (C.F., J.S.) independently reviewed all the retrieved titles and abstracts. Studies selected by either investigator as possibly meeting the inclusion criteria were retrieved in their entirety. One investigator (J.S.) then deleted the journal of publication, title, authors, affiliations, and results sections of each study that met the inclusion criteria, and compiled a list of all the reported outcomes. The list of outcomes was forwarded to the other 3 investigators who independently, and then through discussion, determined which outcomes would be included in our review. The main criterion for selection was that the outcome should be directly important to patients. The same 3 investigators then graded the quality of each study using the 5-point Jadad scale, with points given for method of randomization (0-2), adequacy of blinding (0-2), and description of withdrawals (0-1).18 The Jadad scale is a validated, well-accepted, and frequently used quality assessment scale. Agreement on quality was assessed with a k score, and disagreements were resolved by discussion. Trials were excluded if all investigators agreed that the trial did not meet our inclusion criteria. The remaining articles in their entirety were then distributed to all investigators, each of whom independently extracted data for the selected outcomes. Disagreements were resolved by discussion. We attempted to contact authors to obtain missing data.
Summary statistics were calculated using Review Manager 4.1 software (Update Software, Oxford, England). We used fixed effects models for outcomes without statistically significant heterogeneity (at P <.10) and random effects models for outcomes with significant heterogeneity. For dichotomous outcomes, we reported relative risks (RRs), absolute risk reductions, and numbers needed to treat (NNTs), and for continuous outcomes, standardized mean differences (SMD). We considered a level of P less than .05 to be statistically significant.
Results
Included Studies
The major characteristics of the trials are shown in Table 1. We included 6 controlled trials comparing b2-agonists and placebo,19-24 and one trial comparing a b2-agonist with erythromycin.25 A trial comparing a b2-agonist with placebo in children26 was excluded because all participants had recurrent cough and the mean duration of cough (8 weeks) was much longer than the maximum of 30 days used in the other trials.
All trials enrolled patients that presented to primary care settings. The stated diagnoses were “acute bronchitis,”21,22,25 “acute cough,”19,20 and “acute transient cough.”23,24 Both trials in children excluded participants with abnormal lung examinations19 or “with bronchial obstruction needing bronchodilating medication.”23 None of the adult trials excluded patients with wheezing; the percentage with wheezing ranged from 20% to 44% in the 4 trials that mentioned it. All adult trials included both smokers an nonsmokers.
The only trial that mentioned how well patients adhered to study medications25 reported more than 95% compliance for both groups. Regarding co-interventions, 3 trials prohibited other antitussives19,23,24 ; 3 trials allowed them and recorded their use as an outcome20,21,25 ; and one trial did not mention co-interventions. 22 One trial prohibited the use of antibiotics24 ; other trials comparing b2-agonists to placebo allowed the use of antibiotics at the discretion of the clinician (except as noted for the 1994 study by Hueston21). No trials were clearly sponsored by pharmaceutical manufacturers, but the medications were supplied free of charge by manufacturers in 3 studies.19,22,24
The quality of the trials varied from 2 to 4 on the Jadad scale Table 1. The k score for reviewers’ quality scores was 0.27, indicating only fair agreement. The majority of the disagreements related to different initial interpretations of the adequacy of blinding and description of withdrawals. These differences were resolved with further discussion.
Data Analysis
The clinical heterogeneity of the trials was so great that examining them as a single group did not seem reasonable. Therefore, we initially examined the trials as follows: (1) those in children, (2) those in adults comparing b2-agonists with placebo, and (3) those in adults comparing b2-agonists with erythromycin. We then combined the data from the trial that compared a b2-agonist with erythromycin with that from the other trials in adults in a secondary analysis.
Trials in Children
Neither trial involving children demonstrated any benefits from albuterol Table 2. Combining the daily cough scores for days 1 to 3 for these trials revealed a trend toward worse scores in the group receiving albuterol Table 3. The results from the 2 trials were homogeneous.
Trials in Adults Comparing b2-agonists with Placebo
The results of the placebo-controlled trials in adults were mixed; one trial found no benefit from b2-agonists, and 3 found at least one benefit. Combining the daily cough severity scores for the 3 trials that included this outcome20,22,24showed a small nonsignificant trend toward improvement on all days. The results from the individual trials were heterogeneous for day 1 and homogeneous for the other days.
Combining data from the trials that examined persistence of symptoms after a full 7 days of treatment20-22 yielded no significant difference in presence of cough or night cough Table 4. Combined data also do not show a difference regarding the presence of a productive cough after 7 days or a difference regarding whether patients were working after 4 days. There was significant heterogeneity for 3 of the 4 dichotomous outcomes: cough, productive cough, and return to work.
Trials in Adults Comparing b2-agonists with Erythromycin
In the 1994 Hueston study,21 patients given albuterol were less likely to have a cough or a productive cough after 7 days than those given erythromycin, but there were no differences in the presence of night cough after 7 days or in mean days until improvement in cough, well-being, or return to work or normal activities. When the data from this study are combined with that from the other adult trials, there are no significant differences regarding presence after 7 days of cough (RR=0.77; 95% confidence interval [CI], 0.54-1.09), productive cough (RR=0.66; 95% CI, 0.35-1.25), or night cough (RR=0.85; 95% CI, 0.57-1.26).
Adverse Effects
In the trials in children, 11% of the patients given albuterol had shaking or tremor versus 0% given placebo or only dextromethorphan (RR=6.76; 95% CI, 0.86-53.18; NNT=9; 95% CI, 5-100); the results were homogeneous. There were no differences regarding other adverse effects in the trials in children. In the adult trials, patients given b2-agonists were more likely to report tremor, shaking, or nervousness; the percentage of patients having these side effects in the 3 trials that reported specific side effects ranged from 35% to 67% versus control rates of 0% to 23% (RR=7.94; 95% CI, 1.17-53.94; NNT=2.3; 95% CI, 2-3). These data are from the trials that used inhaled fenoterol and oral albuterol.20,22,25 However, in the 1991 Hueston study,25 only 9% of the patients given inhaled albuterol reported any side effects; therefore, there is considerable heterogeneity among the results of the individual trials. There were no significant differences regarding other adverse effects between the b2-agonist group and control groups as a whole, but the trial comparing albuterol with erythromycin noted more gastrointestinal side effects in the erythromycin group (NNT=3; 95% CI, 2-8).
Subgroup Analyses
In the study by Melbye and colleagues,22 the subgroup of patients with evidence of airway obstruction (defined as wheezing on initial examination, a forced expiratory volume in 1 second of <80% predicted, or a positive response to a methacholine challenge test) who were given fenoterol had lower symptom scores beginning at day 2 than those in this subgroup who were given placebo. This was also true for the smaller subgroup that just had wheezing, but no difference was noted for patients with a normal lung examination. No other trial did a subgroup analysis limited to patients with evidence of airflow obstruction. The 1994 Hueston study21 reported that among patients given albuterol, those with wheezing were slightly less likely to be coughing after 7 days than those without wheezing, but the difference was not statistically significant.
Melbye and coworkers22 found that patients who smoked or had also received antibiotics had greater reductions in total symptom scores on day 7 if given fenoterol. Smokers had similar responses to nonsmokers in the studies by Hueston.21,25 Littenberg and colleagues20 found that patients given erythromycin trended toward lower cough severity scores if given albuterol instead of placebo, and patients not given erythromycin showed a trend toward higher scores if given albuterol. The 1994 Hueston study21 reported that the differences between the groups given and not given albuterol persisted after stratification by erythromycin use.
Discussion
The findings from our review do not support the routine use of b2-agonists for patients who do not have underlying pulmonary disease and present with an acute cough or acute bronchitis. These results must be interpreted in light of the patients that were enrolled in the trials. In particular, because the 2 trials in children excluded patients who were wheezing, the utility of b2-agonists in children with acute cough and evidence of airway obstruction is unknown. b2-agonists do lead to modest short-term improvements in clinical scores in children younger than 2 years who have bronchiolitis.27
The discordant results seen in the trials of adults may reflect different patient populations. Although the inclusion criteria were similar in these trials, more patients were wheezing on initial examination in the Hueston studies21,25 than in the studies by Littenberg and coworkers20 or Melbye and colleagues.22 Wheezing in unforced expiration is a specific finding for airflow obstruction28; and therefore, more patients in the Hueston trials21,25 were likely to have had obstruction than in Littenberg and coworkers’ study20 (and since the lungs were auscultated in forced expiration in the latter trial, the actual number with airflow obstruction may have been even less than indicated). The fact that only the subgroup with airway obstruction improved with b2-agonists in the trial by Melbye and colleagues22 reflects the possible importance of this baseline characteristic.
Limitations
Our review has some limitations. Although it includes all of the available data regarding the effectiveness of b2-agonists for patients with acute bronchitis or acute cough, the number of studies and total number of patients included are small. Therefore, our review has limited power to detect differences between patients who were and were not given b2-agonists. In the combined data of trials in adults, there was a trend toward improvements regarding cough, productive cough, night cough, and return to work, but these differences did not reach statistical significance. The midpoint estimates for the relative risk reductions range from 14% to 24% for these outcomes, but all overlap 0. There was also a clinically minor and statistically nonsignificant trend toward lower daily cough severity scores in patients randomized to the b2-agonists.
The studies were also all of a short duration. There is no information as to whether treatment with b2-agonists would alter outcomes beyond 3 to 7 days. This is an important omission, because many patients in these studies were still bothered by symptoms at the end of the trials.
Only 2 studies evaluated inhaled b2-agonists, which would currently be the most likely formulation used in adults and older children. Neither of these studies used spacing devices. The delivery of the medicine may have been suboptimal and resulted in less benefit than might have been seen had spacers been used.
Overall, the quality of the trials was fair to good . There may have been additional biases, however, because most of the trials had unequal distribution of co-interventions and did not record compliance with study medications. Also, even though the studies were all double-blinded, the fact that the majority of the patients in one trial knew which study medication they had been given indicates that the blinding may not have been adequate in these studies because of the taste or side effects of the study medications.
Conclusions
Our review highlights the gaps in evidence regarding the utility of b2-agonists in the treatment of acute cough and acute bronchitis in patients without underlying pulmonary disease. Although there is a possibility that these agents may be useful, additional data demonstrating benefit is required before they can be routinely recommended. There is a particular need for identifying clinical characteristics that can predict which patients might benefit. For example, there is a complete lack of data in children older than 2 years who have signs of airway obstruction. More evidence on the risk-benefit ratio of b2-agonists in adults with clinical signs of airflow limitation is also necessary. Additional areas of useful research would be in evaluating long-acting b2-agonists (because of ease of adherence), in evaluating the benefits of inhaled b2-agonists with spacing devices, and in comparing b2-agonists with other symptomatic treatments.
Acknowledgments
We thank Bill Hueston, Ben Littenberg, Hasse Melbye, and Peter Rowe for providing unpublished information; Bill Grant for assistance with statistics; and Ron D’Souza and Steve MacDonald of the Cochrane Collaboration and Bette Jean Ingui for assistance with database searches.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
1. Gonzales R, Steiner JF, Sande MA. Antibiotic prescribing for adults with colds, upper respiratory tract infections, and bronchitis by ambulatory care physicians. JAMA 1997;278:901-04.
2. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Treatment of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1998;46:469-75.
3. Smucny JJ, Becker LA, Glazier RH, McIsaac W. Are antibiotics effective treatment for acute bronchitis? A meta-analysis. J Fam Pract 1998;47:453-60.
4. Bent S, Saint S, Vittinghoff E, Grady D. Antibiotics in acute bronchitis: a meta-analysis. Am J Med 1999;107:62-67.
5. Hahn D, Dodge R, Golubjatnikov R. Association Chlamydia pneumoniae (strain TWAR) infection with wheezing, asthmatic bronchitis, and adult-onset asthma. JAMA 1991;266:225-30.
6. Melbye H, Kongerud J, Vorland L. Reversible airflow limitation in adults with respiratory infection. Eur Resp J 1994;7:1239-45.
7. Williamson H. Pulmonary function tests in acute bronchitis: evidence for reversible airway obstruction. J Fam Pract 1987;25:251-56.
8. Johnston D, Osborn LM. Cough variant asthma: a review of the clinical literature. J Asthma 1991;28:85-90.
9. Ellul-Micallef R. Effect of terbutaline sulphate in chronic “allergic” cough. BMJ 1983;287:940-43.
10. Vesco D, Kleisbauer JP, Orehek J. Attenuation of bronchofiberoscopy-induced cough by an inhaled beta2-adrenergic agonist, fenoterol. Am Rev Resp Dis 1988;138:805-06.
11. Lui PW, Hsing CH, Chu YC. Terbutaline inhalation suppresses fentanyl-induced coughing. Can J Anaesth 1996;43:1216-19.
12. Mainous AG, Zoorab RJ, Hueston WJ. Current management of acute bronchitis in ambulatory care: the use of antibiotics and bronchodilators. Arch Fam Med 1996;5:79-83.
13. Stern RC. Bronchitis. In: Berhman RE, Kliegman RM, Arvin AM, Nelson WE, eds. Nelson textbook of pediatrics. 15th ed. Philadelphia, Pa: W.B. Saunders; 1996;1210.
14. Weller KA. Bronchitis. In: Rakel RE, ed. Saunders manual of medical practice. Philadelphia, Pa: W.B. Saunders; 1996;120-21.
15. Marrie TJ. Acute bronchitis and community-acquired pneumonia. In: Fishman AP, Elias JA, eds. Fishman’s pulmonary diseases and disorders. 3rd ed. New York, NY: McGraw-Hill; 1998:1985.
16. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
17. Hueston WJ, Mainous AG, Dacus EN, Hopper JE. Does acute bronchitis really exist? J Fam Pract 2000;49:401-06.
18. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clin Trials 1996;17:1-12.
19. Bernard DW, Goepp JG, Duggan AK, Serwint JR, Rowe PC. Is oral albuterol effective for acute cough in non-asthmatic children? Acta Pediatr 1999;88:465-67.
20. Littenberg B, Wheeler M, Smith D. A randomized controlled trial of oral albuterol in acute cough. J Fam Pract 1996;42:49-53.
21. Hueston W. Albuterol delivered by metered-dose inhaler to treat acute bronchitis: a placebo-controlled double-blind study. J Fam Pract 1994;39:437-40.
22. Melbye H, Aasebo U, Straume B. Symptomatic effect of inhaled fenoterol in acute bronchitis: a placebo-controlled double-blind study. Fam Pract 1991;8:216-22.
23. Korppi M, Pietikainen M, Laurikainen K, Silvasti M. Antitussives in the treatment of acute transient cough in children. Acta Pediatr Scand 1991;80:969-71.
24. Tukiainen J, Karttunen P, Silvasti M, et al. The treatment of acute transient cough: a placebo-controlled comparison of dextromethorphan and dextromethorphan-beta2-sympathomimetic combination. Eur J Resp Dis 1986;69:95-99.
25. Hueston W. A comparison of albuterol and erythromycin for the treatment of acute bronchitis. J Fam Pract 1991;33:476-80.
26. Chang AB, Phelan PD, Carlin JB, Sawyer SM, Robertson CF. A randomised, placebo controlled trial of inhaled salbutamol and beclomethasone for recurrent cough. Arch Dis Child 1998;79:6-11.
27. Kellner JD, Ohlsson A, Gadomski AM, Wang EEL. Efficacy of bronchodilator therapy in bronchiolitis. Arch Pediatr Adolesc Med 1996;150:1166-72.
28. Holleman DR, Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA 1995;273:313-19.
29. Cohen J. Statistical power for the behavioral sciences. New York: Academy Press, 1977.
The Accuracy of Physical Diagnostic Tests for Assessing Meniscal Lesions of the Knee: A Meta-Analysis
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking.
SELECTION CRITERIA: Articles written in English, French, German, or Dutch that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included.
DATA COLLECTION and ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol.
MAIN RESULTS: Thirteen studies (of 402) met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous. The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test, and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable.
CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.
Various physical diagnostic tests are available to assess meniscal lesions, such as assessment of joint effusion and joint line tenderness (JLT), the McMurray test, and the Apley compression test.1-4 Many meniscal tests, however, are not easy to perform and seem to be prone to errors.1,2,4 Also, the diagnostic accuracy of the various meniscal tests has been questioned,3-5 and conflicting results regarding that accuracy have been reported.6 Therefore, we systematically reviewed the medical literature to summarize the available evidence about the diagnostic accuracy of physical diagnostic tests for assessing meniscal lesions of the knee and to combine the results of individual studies when possible. We focused on the most common meniscal tests: the assessment of joint effusion, the McMurray test, JLT, and the Apley compression test.
Methods
Selection of Studies
We conducted a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) to identify articles written in English, French, German, or Dutch. The Medical Subject Headings (MeSH) terms “knee injuries,” “knee joint,” “knee,” and “menisci tibial,” and the text words “knee” and “effusion” were used. The results of this strategy were combined with a validated search strategy for the identification of diagnostic studies using the MeSH terms “sensitivity and specificity” (exploded), “physical examination” and “not (animal not (human and animal))” and the text words “sensitivity,” “specificity,” “false positive,” “false negative,” “accuracy,” and “screening,”7 supplemented with the text words “physical examination” and “clinical examination.” Also, the cited references of relevant publications were examined.
Studies were eligible for inclusion if they addressed the accuracy of at least one physical diagnostic test for the assessment of meniscal lesions of the knee and used arthrotomy, arthroscopy, or magnetic resonance imaging (MRI) as the gold standard. Studies were excluded if no reference group (nondiseased group or subjects with lesions other than the lesion of study) had been included, if only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered.
The studies were selected by 2 reviewers independently. A preliminary selection of each study was made by checking the title, the abstract, or both. A definite selection was made by reading the complete article. During a consensus meeting disagreements regarding the selection of studies were discussed, and a definite selection was made. If disagreement persisted, a third reviewer made the final decision.
Assessment of Methodologic Quality and Data Abstraction
The methodologic quality of the selected studies was assessed, and data were abstracted by 2 reviewers independently. A checklist adapted from Irwig and colleagues8 and the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests9 was used for quality assessment. This checklist consisted of 6 criteria for study validity, 5 criteria relevant to the clinical applicability of the results, and 5 items pertaining to the index Table w1, Table w1a test.* In a subsequent consensus meeting, both assessors discussed each criterion on which they initially disagreed. If disagreement persisted, a third reviewer made the final decision.
Statistical Analysis
Statistical analysis was performed according to a strategy adapted from Midgette and colleagues. Figure W1 10** For each study, the sensitivity and specificity of each index test were calculated. The c2 test was used to assess the homogeneity of the sensitivity and the specificity among studies. If homogeneity of both sensitivity and specificity was not rejected (P >.10), summary estimates of sensitivity and specificity were calculated.10 Heterogeneity of sensitivity and specificity might be caused by differences between studies in how clinicians define a positive test result.8 In that case, the pairs of sensitivity and specificity will be negatively correlated, as indicated by a negative Spearman rank correlation coefficient (Rs). When the pairs of sensitivity and specificity are negatively correlated, these pairs can be considered to be originating from a common receiver operating characteristic (ROC) curve, and a summary ROC (SROC) curve was estimated by meta-regression.8,10,11 The better the diagnostic accuracy of the test, the larger the area under the curve.
Differences between study characteristics are another potential source of heterogeneity of sensitivity and specificity.8 Those other sources of heterogeneity were assessed by adding the following characteristics to the meta-regression model: study validity items (most valid category of each item vs other categories), setting (primary care vs other), the spectrum of the diseased and the nondiseased (broad spectrum vs small spectrum), the prevalence of meniscal lesions, and the year of publication. When a significant subgroup was identified (P <.05), separate analyses were performed for each subgroup.
The summary estimates of sensitivity and specificity were used to calculate the predictive value of a positive (PV+) and negative (PV-) test result for circumstances with varying prevalences of meniscal lesions. When the sensitivities or specificities were heterogeneous between studies, however, the summary estimate of sensitivity was used for calculating predictive values with the accompanying specificity, estimated from the SROC curve.
Results
Selection of Studies
The literature search revealed a total of 402 potentially eligible studies, of which 10 were selected for inclusion.12-21 Three other studies were found by reference tracking.22-24 Thus, 13 studies met the selection criteria. The reply to a letter to the editor to one of the studies contained additional information and was also considered for analysis.17,25,26
Methodologic Quality and Study Characteristics
The index test and reference standard had been measured independently (blindly) of each other in only 2 studies.16,21 Verification bias seemed to be present in all studies (patients with an abnormal physical test result were more likely to undergo the gold standard test, inflating the sensitivity and decreasing the specificity). Nine studies applied arthroscopy as the gold standard,12-14,16,17,19-21,24 and 1 study used MRI.15 No study was performed in a primary care setting. In 7 studies a broad spectrum of knee lesions was reported,12-15,17,20,21 and in 4 studies the spectrum was not specified Table 1.18,19,22,23 A broad spectrum of conditions in the reference group (nondiseased) was present in 8 studies,12-15,17,20,22,23 while in 4 studies the spectrum was not specified.18,19,21,24 Details regarding the index tests were poorly reported, except in 2 studies.17,21 In all studies that addressed the McMurray test, the experience of a “thud” or “click” was used for designating a test as positive.12,13,15-19,22 Only 2 studies mentioned assessment of the index test independent of knowledge of other clinical information (including the results of other meniscal tests).17,21Table w2* The age and sex distribution of the patients and the duration of complaints are presented in Table 1.
Accuracy of Meniscal Tests
The accuracy of the assessment of joint effusion was determined in 4 studies, the McMurray test in 11, JLT in 10, the Apley compression test in 3, and 5 studies addressed various other tests. No data were presented in or could be derived from 1 study pertaining to joint effusion, 3 studies regarding the McMurray test,14,23,24 and 1 study on JLT,24 while from 1 study pertaining to both the McMurray test and JLT only the point estimates of the various test characteristics were reported without the original number of patients in the various categories.15 Of the study of Evans and coworkers,17,26 who presented data of an inexperienced and experienced researcher, only the latter results were used. Of the study of Abdon and colleagues,14 who made a distinction between tenderness of the medial and posterior part of the joint line, only the data of the medial part were considered. It should be noted that 2 studies incorporated a very small number of nondiseased subjects.23,24 Also, one of those studies presented results from individual knees instead of subjects.24 Part of their results pertained to both knees of the same subject, which violates the assumption of (statistical) independence of the observations. Therefore, this study was excluded from further analysis. Finally, some studies did not make a distinction between medial and lateral meniscal lesions,13,17,19,22,23 while others presented the results for medial and lateral meniscal lesions separately.12,14,15,18,20 Of the latter studies, only the results of medial meniscal tests were used for statistical analysis.
The diagnostic accuracy of assessment of joint effusion and the various meniscal tests is shown in Table 2. There was significant heterogeneity of sensitivity and specificity of all tests, except for specificity of the Apley compression test (P=.89).
Sensitivity and specificity were negatively correlated for joint effusion (Rs = -1.0), the McMurray test (Rs = -0.43), and JLT (Rs = -0.62). This means that as one increased, the other decreased, which is to be expected. The SROC curves Figure 1 indicate little discriminative power of those meniscal tests. No significant subgroups were detected for both tests. The power of meta-regression analysis, however, was low because of the small number of available studies.
Sensitivity and specificity of the Apley compression test were not correlated (Rs = 0.0) and no SROC curve was estimated. Sources of heterogeneity could not be identified. Only 3 studies, however, addressed this test.
Figure 2 shows the positive predictive value (PV+) and negative predictive value (PV-) for the assessment of joint effusion, the McMurray test, and JLT, according to varying prevalences of meniscal lesions. The summary estimate of sensitivity and accompanying specificity (derived from the SROC curve) were used for joint effusion (0.43 and 0.70), the McMurray test (0.48 and 0.86), and JLT (0.77 and 0.41). Only the McMurray test had a favorable estimated PV+. The PV+ of joint effusion and JLT exceeded the presumed prevalences only slightly, indicating poor additional diagnostic value. The PV- of all tests was poor.
Discussion
Our goal was to summarize the available evidence on the accuracy of various physical diagnostic tests for assessing meniscal lesions of the knee. The accuracy of those tests seems to be poor, and only a positive McMurray test result seems to be of some diagnostic significance.
However, because of the small number and poor quality of the studies found, we have significant concerns about the application of these results. Because of the methodologic flaws, the estimates of the various parameters of test accuracy probably will be biased, and the results of this meta-analysis should be interpreted with care. In view of the presence of review bias and verification bias in the various studies, the sensitivity of the various meniscal tests will be overestimated. The effect of those biases on specificity estimates, however, is less clear: Those specificities could be either overestimated or underestimated. Therefore, a rigorous conclusion regarding the diagnostic accuracy of the various meniscal tests cannot be made. Also, analysis of the influence of other potential sources of bias (like the type of gold standard, setting, and spectrum) was impeded by the low number of studies or the lack of information from studies.
The various physical diagnostic meniscal tests do not seem to be very helpful in guiding clinical decision making, and physicians should be aware of the very limited value of those tests. In the clinical determination of a meniscal lesion, however, meniscal tests are, of course, not applied in isolation. Combining the results of the various tests might improve accurate diagnosis of a meniscal lesion, and including other characteristics as well (eg, elements of history-taking) will further improve diagnosis setting. Those characteristics may even have more diagnostic power than the meniscal tests. Abdon and coworkers14 performed a discriminant analysis and addressed the McMurray test, JLT, and various other signs and symptoms jointly. Of the meniscal tests only, JLT resulted in some additional discriminative power (apart from various elements of history-taking). The results of their analysis, however, are not readily understandable, and the contribution of the individual items to improve the ability to diagnose meniscal lesions correctly remains obscure. Reanalysis of their results by multiple logistic regression might give results that are more directly applicable in clinical practice.
Because no study has been performed in primary care, and test characteristics are influenced by referral filters,27 one can only speculate what the effect will be of extrapolating the observed results to a primary care setting. If family physicians, who will be less experienced in performing those meniscal tests, apply as low a threshold for interpreting a test result as positive, the sensitivity of those tests will be higher, but the specificity will be lower. The predictive value of a negative test result will be affected only slightly, but the predictive value of a positive test result will decrease. On the other hand, when family physicians would apply a high threshold for test positivity, sensitivity decreases and specificity increases, resulting in an increased predictive value of a positive test result. Because of the case mix of patients with traumatic knee problems in primary care (ranging from vague minor knee disorders to clear-cut meniscal lesions), the prior probability (or prevalence) of having a meniscal lesion will be low in primary care, which means that the diagnostic gain will be low also Figure 2.
Recommendations For Future Research
Methodologically sound research on the diagnostic accuracy of the various physical diagnostic tests (determined both for each test separately and for all tests jointly) in combination with patient characteristics (eg, age, physical fitness, and functional demands) and elements of the medical history (eg, the type of trauma and the nature of the complaints) is needed. Such research will be more relevant to clinical practice and patient care if the effect of a correct early diagnosis on the functional outcome of the patient is assessed as well.
Recommendations For Clinical Practice
For the time being, there is little evidence that the diagnosis of meniscal lesions of the knee can be improved by applying the assessment of joint effusion, the McMurray test, JLT, or the Apley compression test. The need for applying more advanced diagnostic methods (eg, MRI) or referral for surgical treatment can be based only on the severity of the patient’s complaints.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
1. McMurray TP. The semilunar cartilages. Br J Surg 1942;29:407-14.
2. Apley AG. The diagnosis of meniscus injuries. J Bone Joint Surg 1947;29:78-84.
3. Nicholas JA, Hershman EB, eds. The lower extremity and spine in sports medicine. Vol 1. 2nd ed. St. Louis, Mo: Mosby; 1995;814-15.
4. Resnick D, ed. Diagnosis of bone and joint disorders. Vol 5. 3rd ed. Philadelphia, Pa: Saunders; 1995;3076.-
5. Stratford PW, Binkley J. A review of the McMurray test: definition, interpretation, and clinical usefulness. J Orthop Sports Phys Ther 1995;22:116-20.
6. Plas CG van der, Dingjan RA, Hamel A, et al. [Dutch College of General Practitioners practice guidelines regarding traumatic knee problems]. [Dutch]. Huisarts en Wetenschap 1998;41:296-300.
7. Devillé WLJM, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
8. Irwig L, Macaskill P, Glaziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119-30.
9. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: recommended methods updated June 6, 1996 Available at som.flinders.edu.au/fusa/cochrane/.
10. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993;13:253-57.
11. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-316.
12. Steinbruck K, Wiehmann JC. [Examination of the knee joint. The value of clinical findings in arthroscopic control]. [German]. Z Orthop Ihre Grenzgeb 1988;126:289-95.
13. Fowler PJ, Lubliner JA. The predictive value of five clinical signs in the evaluation of meniscal pathology. Arthroscopy 1989;5:184-86.
14. Abdon P, Lindstrand A, Thorngren KG. Statistical evaluation of the diagnostic criteria for meniscal tears. Int Orthop 1990;14:341-45.
15. Boeree NR, Ackroyd CE. Assessment of the menisci and cruciate ligaments: an audit of clinical practice. Injury 1991;22:291-94.
16. Saengnipanthkul S, Sirichativapee W, Kowsuwon W, Rojviroj S. The effects of medial patellar plica on clinical diagnosis of medial meniscal lesion. J Med Assoc Thai 1992;75:704-08.
17. Evans PJ, Bell GD, Frank C. Prospective evaluation of the McMurray test. Am J Sports Med 1993;21:604-08.
18. Corea JR, Moussa M, al Othman A. McMurray’s test tested. Knee Surg Sports Traumatol Arthroscop 1994;2:70-72.
19. Grifka J, Richter J, Gumtau M. [Clinical and sonographic meniscus diagnosis]. [German]. Orthopade 1994;23:102-11.
20. Shelbourne KD, Martini DJ, McCarroll JR, VanMeter CD. Correlation of joint line tenderness and meniscal lesions in patients with acute anterior cruciate ligament tears. Am J Sports Med 1995;23:166-69.
21. Mariani PP, Adriani E, Maresca G, Mazzola CG. A prospective evaluation of a test for lateral meniscus tears. Knee Surg Sports Traumatol Arthroscop 1996;4:22-26.
22. Noble J, Erat K. In defence of the meniscus: a prospective study of 200 meniscectomy patients. J Bone Joint Surg 1980;62-B:7-11.
23. Barry OCD, Smith H, McManus F, MacAuley P. Clinical assessment of suspected meniscal tears. Ir J Med Sci 1983;152:149-51.
24. Anderson AF, Lipscomb AB. Clinical diagnosis of meniscal tears: description of a new manipulative test. Am J Sports Med 1986;14:291-93.
25. Stratford PW. Prospective evaluation of the McMurray test. Am J Sports Med 1994;22:567-68.
26. Evans PJ. Authors’ response. Am J Sports Med 1994;22:568.-
27. Knottnerus JA, Leffers P. The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol 1992;45:1143-54.
Clinical Findings Associated with Radiographic Pneumonia in Nursing Home Residents
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
STUDY DESIGN: This was a prospective cohort study.
POPULATION: The residents of 36 nursing homes in central Missouri and the St. Louis area with signs or symptoms suggesting a lower respiratory infection were included.
OUTCOME MEASURED: We compared evaluation findings by project nurses with findings reported from chest radiographs.
RESULTS: Among 2334 episodes of illness in 1474 nursing home residents, 45% of the radiograph reports suggested pneumonia (possible=12%; probable or definite = 33%). In 80% of pneumonia episodes, subjects had 3 or fewer respiratory or general symptoms. Eight variables were significant independent predictors of pneumonia (increased pulse, respiratory rate Ž30, temperature Ž38°C, somnolence or decreased alertness, presence of acute confusion, lung crackles on auscultation, absence of wheezes, and increased white blood count). A simple score (range = -1 to 8) on the basis of these variables identified 33% of subjects (score Ž3) with more than 50% probability of pneumonia and an additional 24% (score of 2) with 44% probability of pneumonia.
CONCLUSIONS: Pneumonia in nursing home residents is usually associated with few symptoms. Nonetheless, a simple clinical prediction rule can identify residents at very high risk for pneumonia. If validated in other studies, physicians could consider treating such residents without obtaining a chest radiograph.
Pneumonia is a leading cause of morbidity, mortality, and hospitalization of nursing home residents.1-8 Atypical presentations and fewer presenting signs and symptoms in older patients complicate diagnosis.9,10 Also, clinician (physician, nurse practitioner, and physician assistant) visits to nursing homes are often sporadic, and radiology facilities are rarely on the premises. As a consequence, residents are commonly sent to emergency departments for evaluation,4,11,12 which undoubtedly contributes to a high hospitalization rate.
Clinicians who periodically see nursing home residents could benefit from a simple clinical tool to identify pneumonia. No large studies of community nursing home residents have systematically studied findings associated with pneumonia. As part of the Missouri LRI Project, we examined how well clinical findings predict radiographic pneumonia.
Methods
The Missouri LRI Project was a prospective observational study in 36 nursing homes in Central Missouri and St. Louis designed to investigate predictors of 2 outcomes of lower respiratory infections (LRIs): mortality and functional decline. Potential cases were identified from August 15, 1995, through September 29, 1998; however, all facilities were not involved until fall 1997. Study facilities were similar in size, ownership, and occupancy to national estimates from the 1995 National Nursing Home Survey (data available on request).13
We trained nursing home staff to report ill residents with any of 6 respiratory symptoms (eg, cough, dyspnea, sputum production) or 6 general symptoms (eg, fever, decline in mobility, mental status changes). Project nurses called and visited facilities frequently to reinforce reporting. Under a physician-authorized protocol, ill residents with a possible LRI received a standardized evaluation by a trained project nurse and usually a chest radiograph, complete blood count, and a chemistry panel. Complete criteria for triggering an evaluation are listed in Table 1. For this paper, we were concerned with the 90% of evaluated residents who received a chest radiograph. Criteria for excluding residents from evaluation are summarized in the Figure 1.
The nurse evaluation included an inventory of current symptoms, a review of important chronic conditions (eg, congestive heart failure), and a targeted physical examination. The examination included vital signs and the following body areas or systems: ears, nose, and throat; cardiac; abdominal; neurologic; extremities; skin; and a detailed lung examination. Most project nurses had advanced practice training; the remainder had extensive clinical experience and training in physical assessment. All received an individualized training session with a project geriatrician. Project nurses had substantially more experience than the nursing home staff, who usually report clinical findings to physicians.
Results of the evaluation were reported to the attending physician, who made all treatment decisions. Since the evaluations were clinically appropriate care authorized by individual attending physicians, the institutional review boards that reviewed the project allowed us to substantially simplify the consent process to a simple acceptance or refusal of the evaluation. In 9.2% of evaluations the resident was transferred to the hospital before project nurses could complete a physical assessment. In these instances, we obtained vital sign and clinical examination data from hospital records.
Radiographic Classification
Since all subjects had at least one illness symptom, for this analysis we classified the presence or absence of pneumonia on the basis of reported radiographic findings. Using defined criteria, 2 clinicians independently separated radiology reports into 3 categories: (a) negative, (b) possible, or (c) probable or definite for pneumonia (hereafter, probable pneumonia). For example, a report describing “new left lower lobe infiltrate suggestive of pneumonia” would have been rated as probable, while a report indicating “possible infiltrate” or “infiltrate suggestive of pneumonia or congestive heart failure” would have been rated as possible. As radiologists rarely provide completely unequivocal readings, we did not separate probable and definite pneumonia. In St. Louis 2 clinicians evaluated the reports, and in central Missouri 2 of 4 clinicians considered each report. Where there was disagreement, all 6 raters from the 2 sites independently reviewed the reports and then attempted to reach consensus. For 13% of radiographs, the project radiologist independently interpreted the actual films. This occurred when: (1) consensus could not be achieved; or (2) consensus was possible pneumonia, but probable pneumonia was needed to quality the episode as an LRI under the project definition.
Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.
For the full range of values, the model derived on the development sample showed a c-statistic of 0.672, which reduced to 0.632 in the validation sample. A value of 1.0 would indicate perfect discrimination between those who did and did not have radiographic pneumonia, while a value of 0.5 would indicate no better than chance discrimination. Model calibration was not acceptable in the validation sample (Hosmer-Lemeshow goodness-of-fit statistic, P=.008). Inspection suggested the disagreement between predicted and observed probability of pneumonia was primarily with lower-risk estimates.
Because the model performed relatively well at distinguishing subjects very likely to have pneumonia, we created a simple point system aimed at identifying such high-risk individuals. Table 4 shows the scoring system. For 33% of subjects (score Ž3), there was a 56% or higher probability of radiographic pneumonia. An additional 24% of subjects (score of 2) had 44% probability of radiographic pneumonia. However, even those with the lowest scores (-1 to 0, 15% of subjects) still had a 24% probability of pneumonia. The relationship between the score and the probability of radiographic evidence of pneumonia is shown in Figure W1.*
Discussion
In a large community-based sample, we considered presenting symptoms, signs, and laboratory findings associated with radiographic pneumonia. Individual findings discriminated poorly, and we could not separate out a very-low-risk group. However, our simple scoring system identified approximately one third to slightly more than one half with high probability of pneumonia—individuals who might be treated without a confirmatory chest x-ray. If our data are confirmed, they suggest a simple clinical strategy in patients with respiratory or general symptoms Table 1 that might suggest pneumonia: (1) if there are no respiratory symptoms, consider other conditions, such as a urinary tract infection, that might fully explain the symptoms; (2) obtain information to apply our symptom score Table 4; (3) for those with scores of 2 or higher (some might choose 3 instead), treat for pneumonia; (4) for those with scores of -1, 0, or 1, obtain a chest radiograph as a guide to treatment.
Considering individual findings, fever was significantly more common in pneumonia, but only 43% of those with possible or probable pneumonia had a temperature of at least 38°C. This reaffirms common wisdom and previous findings that fever is frequently absent in elderly people with pneumonia.9,19 We also confirmed that few signs or symptoms are the norm for nursing home-acquired pneumonia.
Chest examination findings also do not adequately distinguish patients with and without pneumonia Table 2. Also, even expert physicians frequently differ on lung examination findings.20 Nonetheless, presence of crackles and absence of wheezing contribute to our scoring system. Both findings are seen with multiple conditions, but in our data crackles are slightly more associated with pneumonia, while wheezing is more strongly associated with other diseases.
The other components of our scoring system are clinical factors commonly associated with pneumonia. Though none individually discriminates well between those with and without pneumonia Table 2, several combined serve to identify a high-risk group.
Four previous studies from emergency department or outpatient settings developed clinical prediction rules to identify pneumonia.21-24 Criteria for identifying subjects varied substantially, and each rule has limited accuracy in predicting radiographic pneumonia.20 We had adequate data to evaluate 3 of the rules.21-23 As is usually the case when transporting a prediction rule to a new sample, none performed any better than our rule (data not shown). Our sample created the very difficult challenge for any prediction rule of a very high overall prevalence of pneumonia (45%). That made it unlikely that we could identify a low-risk group in whom x-ray studies could be readily forgone, but we were able to identify a highrisk group.
Limitations
Our findings are subject to several limitations. All facilities in our study were located in central or eastern Missouri, and not all physicians or eligible residents in those facilities participated. Compared with national data, we studied an unusually representative sample of nursing home residents from 36 facilities, including rural and urban locations. Also, in episodes excluded because of physician nonparticipation, residents were very similar to included residents in age, vital signs, and presenting symptoms (data available on request). More important, we lack an independent validation sample from a different cohort. Clinical prediction rules usually do not perform as well in independent samples. This is exemplified by the poor performance of the 3 rules we considered from other settings. Overall, our logistic model was only modest in discriminating and was not well calibrated for low-risk episodes in our reserved validation sample. Although we have developed a promising scoring system to identify residents with high probability of radiographic pneumonia, it needs to be validated in other samples of nursing home residents to determine its ultimate usefulness. For all these reasons, our results may not generalize.
Also, although we identified residents prospectively, project nurses were unable to evaluate 9.2% of residents before transfer to a hospital. Clinical findings abstracted from medical records, such as lung findings, may not have been complete. It is also possible that project nurses could have missed some important findings. However, our staff provided a higher level of expertise than is typically available in nursing homes. In fact, this may limit application of our findings. Nursing home staff vary widely in their ability to accurately examine residents or even identify illness. In many instances, facility staff had not obtained vital signs at the point when we identified a resident as ill enough to qualify for an evaluation.25 Therefore, in many nursing homes, physicians may lack confidence to apply our rule without an evaluation by a physician, advanced practice nurse, or physician assistant.
Finally, determining whether subjects had pneumonia primarily depended on our classification of radiographic reports. Though radiographs generally included 2 views, many were portable films of variable quality, and frequently there was no previous radiograph for comparison. In some subjects with pneumonia, radiographic infiltrates might not yet have developed. Also, even under ideal conditions, radiologists commonly disagree on the presence of pneumonia.26 Some subjects may have been misclassified. However, unless radiographic technique or interpretation was specifically related to clinical predictors, misclassification would simply diminish the relationship of predictors to pneumonia rather than creating a bias. We reviewed reports rather than radiographs, because that is the information usually available to clinicians faced with diagnosis and treatment decisions. We also paid special attention to avoiding any bias in the interpretations. All data were recorded before interpreting radiology reports and the interpretations were performed independent of clinical data. We also made special efforts to assure consistency in labeling radiology reports as possible, probable, or negative for pneumonia. When lack of agreement persisted, the study radiologist reinterpreted the actual films.
Conclusions
Most nursing home residents with pneumonia have few symptoms. We created a simple scoring to identify nursing home residents who have a high probability of radiographic pneumonia. If our results are confirmed, physicians might consider initiating treatment without an x-ray in such residents. Low scores do not rule out pneumonia, and most physicians would want to press for further diagnosis or treatment in this group.
Acknowledgments
This study was supported by the Agency for Healthcare Research and Quality (grant HS08551) and Dr Mehr’s Robert Wood Johnson Foundation Generalist Physician Faculty Scholars award. Dr Kruse was partially supported by an Institutional National Research Service Award (PE10038) from the Health Resources and Services Administration. Our project would not have been possible without the support of the many attending physicians, administrators, and staff of the involved nursing homes. Dr Clive Levine re-read more than 200 radiographs; Karen Davenport provided crucial administrative support; and Karen Madrone, MPA, assisted with manuscript preparation. Many other unnamed project staff also contributed.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
1. Irvine PW, Van Buren N, Crossley K. Causes for hospitalization of nursing home residents: the role of infection. J Am Geriatr Soc 1984;32:103-07.
2. Murtaugh CM, Freiman MP. Nursing home residents at risk of hospitalization and the characteristics of their hospital stays. Gerontologist 1995;35:35-43.
3. Jackson MM, Fierer J, Barrett-Connor E, et al. Intensive surveillance for infections in a three-year study of nursing home patients. Am J Epidemiol 1992;135:685-96.
4. Brooks S, Warshaw G, Hasse L, Kues JR. The physician decision-making process in transferring nursing home patients to the hospital. Arch Intern Med 1994;154:902-08.
5. Fried TR, Gillick MR, Lipsitz LA. Whether to transfer? Factors associated with hospitalization and outcome of elderly long-term care patients with pneumonia. J Gen Intern Med 1995;10:246-50.
6. Degelau J, Guay D, Straub K, Luxenberg MG. Effectiveness of oral antibiotic treatment in nursing home-acquired pneumonia. J Am Geriatr Soc 1995;43:245-51.
7. Muder RR, Brennen C, Swenson DL, Wagener M. Pneumonia in a long-term care facility: a prospective study of outcome. Arch Intern Med 1996;156:2365-70.
8. Medina-Walpole AM, Katz PR. Nursing home-acquired pneumonia. J Am Geriatr Soc 1999;47:1005-15.
9. Harper C, Newton P. Clinical aspects of pneumonia in the elderly veteran. J Am Geriatr Soc 1989;37:867-72.
10. Metlay JP, Schulz R, Li YH, Singer DE, Marrie TJ, Coley CM, et al. Influence of age on symptoms at presentation in patients with community-acquired pneumonia. Arch Intern Med 1997;157:1453-59.
11. Kayser-Jones JS, Wiener CL, Barbaccia JC. Factors contributing to the hospitalization of nursing home residents. Gerontologist 1989;29:502-10.
12. Scott HD, Logan M, Waters WJ, Jr, et al. Medical practice variation in the management of acute medical events in nursing homes: a pilot study. R I Med J 1988;71:69-74.
13. Gabrel CS, Jones A. The National Nursing Home Survey: 1997 summary. Vital Health Stat-series 13: data from the National Health Survey 2000;147:1-121.
14. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87.
15. Preisser JS, Koch GG. Categorical data analysis in public health. nn Rev Public Health 1997;18:51-82.
16. SAS Institute Inc The SAS System for Windows. Version 6.1. Cary, NC: SAS Institute, Inc; 1996.
17. D’Agostino RB, Sr, Griffith JL, Schmid CH, Terrin N. Measures for evaluating model performance. In: Proceedings of the biometrics section, 1997. Alexandria, Va: American Statistical Association. Biometrics section; 1998;253-58.
18. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York, NY: Wiley; 1989.
19. Marrie TJ, Haldane EV, Faulkner RS, Durant H, Kwan C. Community-acquired pneumonia requiring hospitalization: is it different in the elderly? J Am Geriatr Soc 1985;33:671-80.
20. Metlay JP, Kapoor WN, Fine MJ. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination. JAMA 1997;278:1440-45.
21. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990;113:664-70.
22. Singal BM, Hedges JR, Radack KL. Decision rules and clinical prediction of pneumonia: evaluation of low-yield criteria. Ann Emerg Med 1989;18:13-20.
23. Gennis P, Gallagher J, Falvo C, Baker S, Than W. Clinical criteria for the detection of pneumonia in adults: guidelines for ordering chest roentgenograms in the emergency department. J Emerg Med 1989;7:263-68.
24. Diehr P, Wood RW, Bushyhead J, Krueger L, Wolcott B, Tompkins RK. Prediction of pneumonia in outpatients with acute cough—a statistical approach. J Chronic Dis 1984;37:215.-
25. Barry CR, Brown K, Esker D, Denning MD, Kruse RL, Binder EF. Nursing assessment of ill nursing home residents. In press.
26. Albaum MN, Hill LC, Murphy M, et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia: PORT Investigators. Chest 1996;110:343-50.
Rate of Case Reporting, Physician Compliance, and Practice Volume in a Practice-Based Research Network Study
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
STUDY DESIGN: This was a prospective observational cohort study of participants in a practice-based research network who submitted data on 231 patients with dyspepsia from a total of 45,337 patient encounters over a 53-week period. Reporting of individual cases involved use of a relatively high-burden data instrument. Outcome measures were compared using rank correlation.
POPULATION: We included 18 physicians in a Wisconsin research network study on initial management of dyspepsia in primary care settings.
OUTCOMES MEASURED: The outcomes were the rate of dyspepsia visits, average weekly patient volume, and self-reported compliance with the study protocol for each physician.
RESULTS: A significant negative correlation existed between physician patient volume and the reported rate of dyspepsia visits. Self-reported compliance with the protocol was negatively correlated with patient volume and positively correlated with the reported rate of dyspepsia visits.
CONCLUSIONS: Practice volume may influence the results in practice-based research. Investigators using practice-base research networks need to consider the complexity of their protocols and should be cognizant of compliance-sensitive measures.
Common medical problems, especially those that are self-limited or in their early phases, can be best studied in community practice settings where they are usually diagnosed and managed. Practice-based research provides one method to conduct studies of these problems. Often practice-based research physicians are linked together in practice-based research networks (PBRNs), thus forming, in effect, laboratories of community practices.1-3
The methodologic limitations of these laboratories are of concern and have not been extensively explored. Although it has been adequately demonstrated that the patient populations and the problems addressed in participating practices are comparable to patients and problems in the general population,4-6 the question of the selection bias of the clinicians has been raised.4
As research involvement can be a costly endeavor for the individual physician,7 participation in a research protocol—to some extent—may be related to the intensity of practice (ie, the volume of patients seen and services provided). It has been shown that high-volume practices differ from low-volume practices8 in that high-volume practices provide lower rates of preventive services and generate lower patient satisfaction. One may anticipate that physicians with more discretionary time (ie, fewer patients) may be better able to fully participate in research activities. There have been no direct studies of the impact of practice volume on the reporting of medical problems and compliance in research studies. This study, conducted as part of a larger Wisconsin Research Network (WReN) study of dyspepsia in primary care settings, is a first step in that direction.
Methods
Eighteen family physicians, making up the Practice-Based Research Group of WReN practices, volunteered to participate in a study of the initial management of dyspepsia in primary care.9 As part of the study protocol, participants were requested to record the number of adult patients presenting with dyspepsia and the total number of patients seen in their clinic for each week of the 12-month study. Dyspepsia was defined as pain in the upper abdomen lasting for at least 2 weeks and not attributable to cardiac or pulmonary disease or trauma. Data was collected for both initial and follow-up visits. Participants were instructed to complete a 1-page data instrument for each dyspeptic patient at the time of the visit. Each instrument contained 68 data elements and took up to 5 minutes to complete. Data forms were mailed to the study coordinator on a monthly basis. Data collection began on January 30, 1995, and continued through February 2, 1996.
An average weekly patient volume was calculated for each physician, as was the reported rate of dyspepsia visits in their practice. The patient volume was estimated for each physician by summing the weekly patient totals and dividing by the number of weeks during which the physician saw patients in the clinic and participated in the study. The reported rate of dyspepsia visits for each physician was estimated as the total number of patient visits reported meeting the study criteria for dyspepsia divided by the total number of patients seen during the study period.
Following completion of primary data collection, a demographic questionnaire was sent out to all 18 participants. The questionnaire distribution occurred approximately 4 months after data collection and during a chart review phase of the primary study. The chart review was performed by a research assistant and did not involve the participating physicians. One question, included to assess compliance with the study protocol, asked, “On a 10-point scale, how compliant were you at recording data for all qualifying dyspepsia patients during the weeks that you were involved with this study?” Responses were circled on a scale from 1 (poor) to 10 (perfect). Type of practice (solo, group multispecialty, or academic) was also obtained. Seventeen of the 18 questionnaires were completed and returned.
MINITAB was used for statistical analyses. Descriptive statistics were calculated for the outcome variables. Because data for reported rate of dyspepsia visits and compliance were not normally distributed, Spearman rank correlation (“ = 0.05) was used to test the hypotheses that practice volume, protocol compliance, and reported rate of dyspepsia visits were correlated. The one solo practitioner was placed with the group practice physicians because of a high level of similarity in all outcome variables. Because differences were noted among the practice types, the Kruskal-Wallis test was used to assess differences in patient volume, compliance, and reported rate of dyspepsia visits.
Results
The average participant in this study was a 46-year-old male physician who had been in practice for 17 years and saw 61.5 patients per week Table w1. Eight physicians were located in group practices, while 5 were in multispecialty and 3 were in academic practices. The mean reported rate of dyspepsia visits was 7.7 cases per 1000 patient visits. Initial dyspepsia visits accounted for 118 of the 231 reported visits for dyspepsia (0.51%), with a total of 45,337 patient visits recorded by participating physicians.
The average participant recorded visits over 43.2 weeks of the possible 53-week study (81.5% overall participation rate). The average self-reported compliance with the study protocol was 6.7 on a 10-point scale but with a very wide range (from 1 to 10). Significant differences among practice types were found in patient volume, reported rate of dyspepsia visits, and self-reported compliance Table 2. Participants from group practices had the highest patient volumes but the lowest rate of dyspepsia visits and compliance. Academic physicians saw the least number of patients but had the highest reported rate of dyspepsia visits and compliance.
Significant negative rank correlations were found to exist between patient volume and reported rate of dyspepsia visits (Figure 1: rs = -0.548; P .05) and between patient volume and compliance with protocol (Figure 2: rs = -0.490; P .05). A significant positive rank correlation was found between compliance with protocol and rate of dyspepsia visits (Figure 3 (: rs = 0.551; P .05). No significant correlation existed between the number of weeks of participation and patient volume (rs = -0.303), rate of dyspepsia visits (rs = 0.065), or compliance with protocol (rs = 0.415).
Discussion
Practice volume can have a significant effect on physicians’ reporting rates in practice-based studies. The rate of dyspepsia visits, as measured by the identification of patients meeting study criteria and having a completed data form, was negatively related to the number of patients seen per week by the physician. Practice volume appears to be linked to reporting by way of compliance. As an extension, it appears that physicians are generally accurate in self-assessment of their compliance with a protocol.
Although previous evaluations of PBRNs have demonstrated high levels of accuracy within reported data,10 the results reported here are somewhat disturbing. If other studies show similar results, the idea that PBRNs can assess prevalence of medical conditions could be called into question. Also, there may be a bias in the higher? volume practices for patients with more severe symptoms to be reported in preference to those with less “attention getting” symptoms, or in low-volume practices to seek out problems for which the patient did not seek attention. Consequently, even when a medical problem is identified, there may be patient selection bias toward those with more or less severe symptoms.
Additional burden and lack of practice support were common reasons for withdrawing from participation in PBRNs.11 Overall participation and compliance with a research protocol, therefore, is likely related to the complexity of that protocol. While the reported rate of dyspepsia visits was negatively related to practice volume, the simple reporting of a weekly tally of patients seen in clinic was not. Consequently, compliance-sensitive measurements (eg, prevalence) may need simple time-efficient protocols. For example, full compliance with the protocol for the approximately 1050 physicians currently involved in the Centers for Disease Control and Prevention US Influenza Sentinel Physician Surveillance Network requires less than 3 minutes per week. This surveillance network for monitoring prevalence of influenza-like illness is a highly accurate, timely, and valued component of influenza surveillance.12 Other enhancements for study protocols may include decreased periods for data gathering, use of intermittent reporting, and use of other office staff for case identification.
Limitations
This study is limited by a potential lack of generalizability. It is an observational study of physician behavior around a complex and relatively high-burden data collection instrument. There were no true standards regarding prevalence of dyspepsia at any location, thus allowing for the possibility that patient populations differed significantly among sites. Self-reported compliance with the research protocol was based on recall 4 months after the end of the data collection period. Also, some of the effect attributable to patient volume could alternatively result from the types of physicians involved in this study.
Academic physicians, with low practice volumes, may be more likely to be compliant with research protocols in general, regardless of their practice volumes. Because of the small sample size, however, this alternate hypothesis cannot be examined independently. With the exclusion of the academic physicians, relationships between the variables demonstrated the same trends, but the Spearman rank correlations were no longer significant (n = 14; patient volume vs rate: rs = -0.345; patient volume vs compliance: rs = -0.187; compliance vs rate: rs = 0.379).
This study does, however, challenge other investigators using PBRNs to revisit suitable data to determine similar patterns. Also, a simple assessment of participant compliance might prove to be an essential enhancement of future practice-based research.
Conclusions
Even encumbered with potential methodologic dilemmas, practice-based research studies may be the only way to approach many common medical issues in the context of the communities in which they occur.1-3 For example, while selection bias in reporting of dyspepsia is clearly a problem in this example, the selection bias is still far less severe than it would be in the gastrointestinal specialty clinic of a referral center. Likewise, if nonreferred conditions are to be tracked over extensive periods of time, the use of community settings is essential, as was done with a recent longitudinal study of depression.13
Acknowledgments
Funding for this study was provided through a grant from the American Academy of Family Physicians. We thank the following participants of the WReN Practice-Based Research Group: R. Baldwin, E. Barr, D. Baumgardner, A. Berlage, M. Chin, D. Erickson, R. Erickson, G. Gay, M. Grajewski, D. Hahn, T. Hankey, D. Madlon-Kay, A. Marquis, E. Ott, D. Pine, and L. Radant.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.
1. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
2. Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996;42:199-203.
3. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
4. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano J. Practice patterns of family physicians in practice-based research networks: a report from ASPN. J Am Board Fam Pract 1999;12:78-84.
5. Green LA, Miller RS, Reed FM, Iverson DC, Barley GE. How representative of typical practice are practice-based research networks? A report from the Ambulatory Sentinel Practice Network Inc (ASPN). Arch Fam Med 1993;2:939-49.
6. Hahn DL, Beasley JW. Diagnosed and possible undiagnosed asthma: a Wisconsin Research Network (WReN) study. J Fam Pract 1994;38:373-79.
7. Hahn DL. Physician opportunity costs for performing practice-based research. J Fam Pract 2000;49:983-84.
8. Zyzanski SJ, Stange KC, Langa D, Flocke SA. Trade-offs in high-volume primary care practice. J Fam Pract 1998;46:397-402.
9. Temte JL, Hankey T. Initial management of dyspepsia in primary care settings: the WReN practice-based research group dyspepsia study. Wis Med J 1998;97:48-49.
10. Green LA, Hames CG, Sr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-06.
11. Green LA, Niebauer LJ, Miller RS, Lutz LJ. An analysis of reasons for discontinuing participation in a practice-based research network. Fam Med 1991;23:447-49.
12. Buffington J, Chapman LE, Schmeltz LM, Kendal AP. Do family physicians make good sentinels for influenza? Arch Fam Med 1993;2:859-64.
13. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten-year follow-up of depression after diagnosis in general practice. Br J Gen Pract 1998;48:1643-46.
Participation and Successful Patient Recruitment in a Randomized Clinical Trial of Dyspepsia Treatment in Primary Care
STUDY DESIGN: A survey was used.
POPULATION: A total of 165 FPs participated in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
OUTCOMES MEASURED: We surveyed FPs about personal and practice characteristics and their motivation for participation in the project. These data were then related to the number of patients recruited. Univariate associations were calculated; relevant factors were entered into a logistic model that predicted patient recruitment.
RESULTS: Data on 128 FPs could be analyzed (80% response rate); these FPs recruited 793 patients in the cohort study (mean = 6.3 per FP) and 527 in the clinical trial (mean = 4.2 per FP). The main reasons for participation were the research topic (59%) and the participation of an academic research group in the study (63%). Many FPs felt that participation was a professional obligation (39%); the financial incentive played a minor role (15%). The number of recruited patients was only independently associated with the participation of an academic research group.
CONCLUSIONS: Successful patient recruitment in primary care research is determined more by motivation driven by the research group than by financial incentives, the research topic, or research experience.
Research in primary care is a growing field; the need for research on new diagnostic and therapeutic methods, the prognostic value of clinical signs and symptoms, and the effectiveness of clinical strategies in the population where these clinical contributions will be applied is well recognized.1-5 The number and scale of research projects in primary care is expanding continuously, and research networks are being established.6-8 This development, however, is putting increasing pressure on family physicians (FP) to actively participate in research. From an FP’s perspective there is a delicate balance between active research participation and efficient clinical practice.9 Proper planning to minimize paperwork and the delegation of research logistics to practice assistants can avoid many practical obstacles.7,10,11
Motivating and recruiting FPs to participate is an essential step in conducting primary care research. Factors such as the clinical relevance of the research subject, personal interest in the topic, ownership, personal contact with the research group, and the required time investment have been shown to influence the participation of FPs.5,11-15
Earlier studies have shown that higher-qualified physicians (with more continuing medical education training or a degree in research), physicians involved in under-or postgraduate training, FPs with research experience, and physicians working in well-organized practices (with more practice assistants and management protocols) are more interested in participating in research. Sex, the number of physicians per practice, and wages do not influence participation.15 The actual impact of financial incentives, however, is unclear. One study reported that incentives raised participation,16 while others could not confirm this.12,15 Some have also drawn attention to the ethical and methodologic aspects of payment for research.5,17
After successfully contracting with FPs to recruit patients for the study, the next task for a research group is to attain the maximal recruitment of patients. Unfortunately, less than half of the participating physicians actually recruit patients in the research project,9,12,18 and often only a minority (20%)18 of the eligible patients are actually included. Our objective in this study was to identify practice and physician characteristics determining successful patient recruitment among FPs participating in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
Methods
Setting
Our data were compiled from a practice-based research network (PBRN) used for a primary care study of dyspepsia, the CIRANO study (Cisapride or Ranitidine in NonOrganic dyspepsia), which was conducted from 1996-1998 in the Netherlands. The CIRANO project consisted of 2 parts: a cohort study, in which dyspeptic patients were included and followed up for 1 year, and a randomized clinical trial in which patients selected from the cohort were treated with either an H2 blocker or a prokinetic drug.19 It was designed by the dyspepsia group of the Julius Centre, a group of primary care researchers who have been involved in gastrointestinal research and guideline development in the Netherlands for a number of years. Janssen Cilag Inc. was the sole sponsor of the project. The protocol was approved by the academic ethical committee, and data monitoring was done in accordance with the GCP guidelines (good clinical practice, a government protocol for conducting drug trials).
In the cohort study the workload for the participating physicians was small. After identification and inclusion by the FP, patients had to complete a validated dyspepsia symptom score, a quality of life questionnaire (COOP/WONCA chart), and a mental health state check list (GHQ 12). Also, the practice assistant performed a Helicobacter pylori whole- blood test. Follow-up of the patient was done by the research group. The FP workload in the clinical trial was heavier. After inclusion, the patients were randomized to one of the treatment arms, treated for 4 weeks, and followed up for 3 months. Patients were seen by the FP at inclusion and after 1 and 3 months.
Monitoring, data recording, and verification were performed by a clinical research organization under supervision of the research group. Data were stored and analyzed by the group.
FPs were given a financial incentive for each patient they included. It comprised a reimbursement for the extra practice time spent completing the research protocol. The estimated overall time investment for FPs was 2 hours with an additional 5 minutes for each patient included in the cohort and 1 additional hour for each patient included in the randomized clinical trial. Since the workload differed significantly, the reimbursement was higher for the patients included in the clinical trial ($25 per patient in the cohort, $70 for each patient in the randomized clinical trial). Also, project bulletins were distributed regularly during the course of the project.
All FPs in the academic network of the Utrecht University (2000 FPs, one third of all Dutch FPs) were invited by the academic research group to participate in the CIRANO study. Two hundred of them expressed interest in the study and asked for documentation. A total of 165 FPs finally signed the research contract.
Questionnaire
Five months after the CIRANO project was completed, all the participating physicians were sent an anonymous questionnaire containing 4 sections: (1) demographic and practice data, (2) initial motivation to participate, (3) evaluation of the logistics of the project, and (4) motivation to participate in future projects. The evaluation questions were Likert-type (a scale of 4 answer categories); motivation was analyzed by asking respondents to indicate the 3 most important reasons from a list of 8 Table 1. A reminder was sent 1 month after the first mailing. Data from the questionnaire, as well as the number of patients included in the study, were entered in SPSS for Windows software version 8.5 for analysis (SPSS, Inc; Chicago, Ill). Questionnaires that were not fully completed were excluded.
Analysis
To calculate odds ratios the number of recruited patients per FP was dichotomized, with cut-off points at the 25th percentile of inclusion: 0 to 4 versus 5 or more patients for inclusion in the cohort study, and 0 or 1 patient versus 2 or more for inclusion in the clinical trial. The association between demographic data, motivating factors and the number of recruited patients was calculated and expressed as an odds ratio (with 95% confidence interval [CI]). Factors that were thus associated with recruitment at a P level of less than 0.25 (so as not to exclude potentially important variables), together with 7 factors known from the literature to be relevant (sex, list size, years in practice, practice location, research experience, high specialization, and financial incentive-driven motivation) were entered in a logistic regression model.
Using a stepwise-backward procedure, determinants of maximal inclusion were analyzed and reported as adjusted odds ratio (with 95% CI). As the workload and the financial incentive for the clinical trial and cohort study differed significantly, we analyzed inclusion in the 2 parts of the research project separately.
Results
Of the 165 participating FPs, 132 returned the questionnaire; the response rate was 80%. Since 4 of the questionnaires were incomplete, data from 128 family physicians could be used for analysis. Most responders were men (87%), and half had been in practice for more than 5 years, primarily in semi-urban areas of the Netherlands (68%). Most (77%) were involved in other professional activities, such as vocational training, continuing medical education (CME), or activities of the College of Family Physicians (CFP) at the district level. Half the responders worked in a group practice, and more than 60% of the practices were “specialized” (defined as performing at least 4 of the following clinical activities in routine daily practice: minor surgery, Doppler studies, electrocardiography, cervical screening, intrauterine device insertion, diabetes protocol, or spirometry). Also, 57% of the participants had previous research experience.
The initial motivation for participation in the project varied Table 1. For the majority of the participants, the research topic and the participation of our academic research group were the most important factors. One third of the respondents considered participation a professional obligation, were attracted by the personal appeal by the research group, or were intrigued by the presentation of the project. The involvement of the sponsor and the financial incentive were important for only a minority in their decision to participate.
In general, the project was well evaluated; 80% of the respondents stated that the project had met their expectations (56% fully, 24% partially), and 60% noted that they would consider participation in a similar research project in the future. More specifically, the participants were satisfied with the quality of the correspondence, the newsletter, and the monitoring of the project Table 2. However, 47% of the participating FPs thought that the overall time investment in the project was too burdensome, and one third mentioned the negative impact the application of the GCP guidelines had on the workload of the project.
From September 1996 until January 1998 these 128 physicians recruited 793 patients in the cohort phase of the study (average = 6.3 per FP; standard deviation [SD]=6.6) and 527 in the clinical trial (average = 4.2 per FP; SD=4.9). A total of 15% of the FPs recruited no patients in the cohort study, while 59% recruited 4 or more patients. In comparison, 21% of the FPs did not recruit any patients in the clinical trial, and 65% recruited 2 or more patients.
In the univariate analysis only the factors “active in CME/CFP” and “motivation by the academic research group” were associated with the number of recruited patients. These associations occurred in both the cohort study and the randomized clinical trial Table 3. After entering these 2 factors were entered in a logistic model, together with 7 factors earlier reported relevant in the literature, multivariate analysis indicated that only the factor “motivation by the participation of the academic research group” predicted the number of patients recruited in cohort study (adjusted odds ratio [OR] = 3.5; 95% CI, 1.4-9.0) and clinical trial (adjusted OR = 2.9; 95% CI, 1.2-6.9).
Discussion
The combination of participation in research and daily clinical practice requires a major time investment. Even though FPs consider their participation thoroughly before a research project commences, the actual numbers of patients they recruit are often disappointing.
In this large randomized trial on dyspepsia in primary care we showed that those FPs whose motivation was driven by the participation of an academic department of family medicine in the research group recruited the most patients. They were 3 times more likely than their colleagues to recruit at least 2 patients for the clinical trial or 5 patients for the cohort. Other factors such as list (practice) size, involvement in professional CME/CFP activities, research experience, and financial incentive may have played a role in the FPs’ decision to participate in the research project but were not associated with the actual number of patients they recruited.
As in most projects, a minority of the participating physicians did not manage to recruit any patients. These colleagues either underestimated the time investment required, had second thoughts about the acceptability for their patients, or were disappointed by the planning and paperwork of the project.
Factors determining patient recruitment in primary care research have hardly been studied. Busy schedules, forgetfulness, poor patient compliance, and FP involvement in too many projects are a few of the reasons given to explain poor recruitment.18,9
One third of all Dutch FPs were invited to participate in the dyspepsia study. Even though bias might have been caused by either the subject (dyspepsia physicians), the fact that half of the FPs had experience in research, or the fact that the participants were generally very active in numerous professional activities (active physicians), we think the results of our study can be generalized to the primary care setting in the Netherlands. Our conclusions may, however, require modification in other countries because of differences in practice organization and research climate in primary care.
Although there were various reasons for participation in the CIRANO study, they are consistent with earlier reports. FPs who participated in our dyspepsia project were mainly motivated by the subject and by the fact that the project was affiliated with our academic primary care research group. The motivation was not a matter of personal acquaintance, since most of the participants were not known to the members of the research group.
A substantial number of the participating colleagues also felt that participation was a professional obligation. This perception might have been induced by the fact that during the introduction of the project, special emphasis was put on the evidence missing from certain paragraphs of the Dutch guidelines on dyspepsia and on the need for primary care-based research to fill this gap. Although the research group felt that this was an important aspect of motivation, the evaluation showed that while it was an important reason to participate, it was not independently associated with patient recruitment. Only 10% of the participants stated that the financial incentive was a major reason to participate. Although this could be an unrealistic subjective statement, multivariate analysis confirmed that incentive-driven motivation was not related to the number of patients recruited. The fact that the results were the same for both the cohort study and the clinical trial might also be an indication that the amount of the incentive played a minor role in patient recruitment. This confirms earlier reports12,13 that FPs probably do not participate in research for the money, although they do want a proper reimbursement for the time invested.
The fact that the majority of our study group were involved in professional education or organization confirms once again15 that active colleagues are the ones most motivated for research. Interest in research, however, obviously does not guarantee successful inclusion. Although a high level of practice organization and a high specialization in clinical activities have also reportedly been associated with optimal recruitment, we could not confirm this with our data.
Conclusions
Collaborators for primary care research projects should primarily be sought among the colleagues who are already active in different professional fields, and who have a strong affiliation with academic research. Successful participation is mainly determined by the initial motivation of the FP: Those who are motivated by the presence of an academic research group in the study recruit best. The research topic, the amount of the financial incentive, research experience, and other factors often suggested to influence patient recruitment are probably less important.
1. Gray DP. Research in general practice: law of inverse opportunity. BMJ 1991;302:1380-82.
2. Mold JW, Green AL. Primary care research: revisiting its definition and rationale. J Fam Pract 2000;49:206-08
3. Wallace P, Drage S, Jackson N. Linking education, research and service in general practice. BMJ 1998;316:323.-
4. Olesen F. Research in general practice is needed to develop family medicine, not get embroiled in defining it. BMJ 1998;316:324-
5. Foy R, Parry J, McAvoy B. Clinical trials in primary care. BMJ 1998;317:1168-69.
6. Smith LFP, Carter YH, Cox J. Accrediting research practices. Br J Gen Pract 1998;48(433):1464-65.
7. Bell Seyer SEM, Klaber Moffett JA. Recruiting patients to randomized trials in primary care; principles and case study. Fam Pract 2000;17:187-91.
8. Smith LFP. Research in general practice: what, who and why? Br J Gen Pract 1997;47:83-86.
9. Tognoni G, Alli C, Avanzini F, et al. Randomized clinical trials in general practice: lessons from a failure. BMJ 1991;303:969-71.
10. Murphy E, Spiegal N, Kinmonth A. Will you help me with my research? Gaining access to primary care settings and subjects. Br J Gen Pract 1992;42:162-65.
11. Ward J. General practitioners’ experience of research. Fam Pract 1994;11:418-23.
12. Kuyvenhoven MM, Dagnelie CF, de Melker RA. Recruitment of general practitioners in a sore throat study. Br J Gen Pract 1997;47:126-27.
13. Kocken RJJ, Prenger-Duchateau A, Smeets-Rinkens PELM, Knottnerus JA. Het oordeel van huisartsen over deelname aan wetenschappelijk onderzoek. Huisarts Wet 1992;35:32-34.
14. Borgiel AEM, Dunn EV, Lamont CT, et al. Recruiting family physicians as participants in research. Fam Pract 1989;6:168-72.
15. Silagy SA, Carson NE. Factors affecting the level of interest and activity in primary care research among general practitioners. Fam Pract 1989;6:173-76.
16. Deehan A, Templeton L, Taylor C, Drummond C, Strang J. The effect of cash and other financial inducements on the response rate of general practitioners in a national postal survey. Br J Gen Pract 1997;46:87-90.
17. Ferguson C. Payment of financial incentives to GP’s may invalidate informed consent process. BMJ 1998;316:75-76.
18. Peto V, Coulter A, Bond A. Factors affecting general practitioners’ recruitment of patients into a prospective study. Fam Pract 1993;10:207-11.
19. Quartero AO, Numans ME, de Melker RA, Hoes AW, de Wit NJ. Dyspepsia in primary care; prokinetic therapy or acid suppression, a randomized clinical trial. Scand J Gastroenterol 2001. In press.
STUDY DESIGN: A survey was used.
POPULATION: A total of 165 FPs participated in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
OUTCOMES MEASURED: We surveyed FPs about personal and practice characteristics and their motivation for participation in the project. These data were then related to the number of patients recruited. Univariate associations were calculated; relevant factors were entered into a logistic model that predicted patient recruitment.
RESULTS: Data on 128 FPs could be analyzed (80% response rate); these FPs recruited 793 patients in the cohort study (mean = 6.3 per FP) and 527 in the clinical trial (mean = 4.2 per FP). The main reasons for participation were the research topic (59%) and the participation of an academic research group in the study (63%). Many FPs felt that participation was a professional obligation (39%); the financial incentive played a minor role (15%). The number of recruited patients was only independently associated with the participation of an academic research group.
CONCLUSIONS: Successful patient recruitment in primary care research is determined more by motivation driven by the research group than by financial incentives, the research topic, or research experience.
Research in primary care is a growing field; the need for research on new diagnostic and therapeutic methods, the prognostic value of clinical signs and symptoms, and the effectiveness of clinical strategies in the population where these clinical contributions will be applied is well recognized.1-5 The number and scale of research projects in primary care is expanding continuously, and research networks are being established.6-8 This development, however, is putting increasing pressure on family physicians (FP) to actively participate in research. From an FP’s perspective there is a delicate balance between active research participation and efficient clinical practice.9 Proper planning to minimize paperwork and the delegation of research logistics to practice assistants can avoid many practical obstacles.7,10,11
Motivating and recruiting FPs to participate is an essential step in conducting primary care research. Factors such as the clinical relevance of the research subject, personal interest in the topic, ownership, personal contact with the research group, and the required time investment have been shown to influence the participation of FPs.5,11-15
Earlier studies have shown that higher-qualified physicians (with more continuing medical education training or a degree in research), physicians involved in under-or postgraduate training, FPs with research experience, and physicians working in well-organized practices (with more practice assistants and management protocols) are more interested in participating in research. Sex, the number of physicians per practice, and wages do not influence participation.15 The actual impact of financial incentives, however, is unclear. One study reported that incentives raised participation,16 while others could not confirm this.12,15 Some have also drawn attention to the ethical and methodologic aspects of payment for research.5,17
After successfully contracting with FPs to recruit patients for the study, the next task for a research group is to attain the maximal recruitment of patients. Unfortunately, less than half of the participating physicians actually recruit patients in the research project,9,12,18 and often only a minority (20%)18 of the eligible patients are actually included. Our objective in this study was to identify practice and physician characteristics determining successful patient recruitment among FPs participating in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
Methods
Setting
Our data were compiled from a practice-based research network (PBRN) used for a primary care study of dyspepsia, the CIRANO study (Cisapride or Ranitidine in NonOrganic dyspepsia), which was conducted from 1996-1998 in the Netherlands. The CIRANO project consisted of 2 parts: a cohort study, in which dyspeptic patients were included and followed up for 1 year, and a randomized clinical trial in which patients selected from the cohort were treated with either an H2 blocker or a prokinetic drug.19 It was designed by the dyspepsia group of the Julius Centre, a group of primary care researchers who have been involved in gastrointestinal research and guideline development in the Netherlands for a number of years. Janssen Cilag Inc. was the sole sponsor of the project. The protocol was approved by the academic ethical committee, and data monitoring was done in accordance with the GCP guidelines (good clinical practice, a government protocol for conducting drug trials).
In the cohort study the workload for the participating physicians was small. After identification and inclusion by the FP, patients had to complete a validated dyspepsia symptom score, a quality of life questionnaire (COOP/WONCA chart), and a mental health state check list (GHQ 12). Also, the practice assistant performed a Helicobacter pylori whole- blood test. Follow-up of the patient was done by the research group. The FP workload in the clinical trial was heavier. After inclusion, the patients were randomized to one of the treatment arms, treated for 4 weeks, and followed up for 3 months. Patients were seen by the FP at inclusion and after 1 and 3 months.
Monitoring, data recording, and verification were performed by a clinical research organization under supervision of the research group. Data were stored and analyzed by the group.
FPs were given a financial incentive for each patient they included. It comprised a reimbursement for the extra practice time spent completing the research protocol. The estimated overall time investment for FPs was 2 hours with an additional 5 minutes for each patient included in the cohort and 1 additional hour for each patient included in the randomized clinical trial. Since the workload differed significantly, the reimbursement was higher for the patients included in the clinical trial ($25 per patient in the cohort, $70 for each patient in the randomized clinical trial). Also, project bulletins were distributed regularly during the course of the project.
All FPs in the academic network of the Utrecht University (2000 FPs, one third of all Dutch FPs) were invited by the academic research group to participate in the CIRANO study. Two hundred of them expressed interest in the study and asked for documentation. A total of 165 FPs finally signed the research contract.
Questionnaire
Five months after the CIRANO project was completed, all the participating physicians were sent an anonymous questionnaire containing 4 sections: (1) demographic and practice data, (2) initial motivation to participate, (3) evaluation of the logistics of the project, and (4) motivation to participate in future projects. The evaluation questions were Likert-type (a scale of 4 answer categories); motivation was analyzed by asking respondents to indicate the 3 most important reasons from a list of 8 Table 1. A reminder was sent 1 month after the first mailing. Data from the questionnaire, as well as the number of patients included in the study, were entered in SPSS for Windows software version 8.5 for analysis (SPSS, Inc; Chicago, Ill). Questionnaires that were not fully completed were excluded.
Analysis
To calculate odds ratios the number of recruited patients per FP was dichotomized, with cut-off points at the 25th percentile of inclusion: 0 to 4 versus 5 or more patients for inclusion in the cohort study, and 0 or 1 patient versus 2 or more for inclusion in the clinical trial. The association between demographic data, motivating factors and the number of recruited patients was calculated and expressed as an odds ratio (with 95% confidence interval [CI]). Factors that were thus associated with recruitment at a P level of less than 0.25 (so as not to exclude potentially important variables), together with 7 factors known from the literature to be relevant (sex, list size, years in practice, practice location, research experience, high specialization, and financial incentive-driven motivation) were entered in a logistic regression model.
Using a stepwise-backward procedure, determinants of maximal inclusion were analyzed and reported as adjusted odds ratio (with 95% CI). As the workload and the financial incentive for the clinical trial and cohort study differed significantly, we analyzed inclusion in the 2 parts of the research project separately.
Results
Of the 165 participating FPs, 132 returned the questionnaire; the response rate was 80%. Since 4 of the questionnaires were incomplete, data from 128 family physicians could be used for analysis. Most responders were men (87%), and half had been in practice for more than 5 years, primarily in semi-urban areas of the Netherlands (68%). Most (77%) were involved in other professional activities, such as vocational training, continuing medical education (CME), or activities of the College of Family Physicians (CFP) at the district level. Half the responders worked in a group practice, and more than 60% of the practices were “specialized” (defined as performing at least 4 of the following clinical activities in routine daily practice: minor surgery, Doppler studies, electrocardiography, cervical screening, intrauterine device insertion, diabetes protocol, or spirometry). Also, 57% of the participants had previous research experience.
The initial motivation for participation in the project varied Table 1. For the majority of the participants, the research topic and the participation of our academic research group were the most important factors. One third of the respondents considered participation a professional obligation, were attracted by the personal appeal by the research group, or were intrigued by the presentation of the project. The involvement of the sponsor and the financial incentive were important for only a minority in their decision to participate.
In general, the project was well evaluated; 80% of the respondents stated that the project had met their expectations (56% fully, 24% partially), and 60% noted that they would consider participation in a similar research project in the future. More specifically, the participants were satisfied with the quality of the correspondence, the newsletter, and the monitoring of the project Table 2. However, 47% of the participating FPs thought that the overall time investment in the project was too burdensome, and one third mentioned the negative impact the application of the GCP guidelines had on the workload of the project.
From September 1996 until January 1998 these 128 physicians recruited 793 patients in the cohort phase of the study (average = 6.3 per FP; standard deviation [SD]=6.6) and 527 in the clinical trial (average = 4.2 per FP; SD=4.9). A total of 15% of the FPs recruited no patients in the cohort study, while 59% recruited 4 or more patients. In comparison, 21% of the FPs did not recruit any patients in the clinical trial, and 65% recruited 2 or more patients.
In the univariate analysis only the factors “active in CME/CFP” and “motivation by the academic research group” were associated with the number of recruited patients. These associations occurred in both the cohort study and the randomized clinical trial Table 3. After entering these 2 factors were entered in a logistic model, together with 7 factors earlier reported relevant in the literature, multivariate analysis indicated that only the factor “motivation by the participation of the academic research group” predicted the number of patients recruited in cohort study (adjusted odds ratio [OR] = 3.5; 95% CI, 1.4-9.0) and clinical trial (adjusted OR = 2.9; 95% CI, 1.2-6.9).
Discussion
The combination of participation in research and daily clinical practice requires a major time investment. Even though FPs consider their participation thoroughly before a research project commences, the actual numbers of patients they recruit are often disappointing.
In this large randomized trial on dyspepsia in primary care we showed that those FPs whose motivation was driven by the participation of an academic department of family medicine in the research group recruited the most patients. They were 3 times more likely than their colleagues to recruit at least 2 patients for the clinical trial or 5 patients for the cohort. Other factors such as list (practice) size, involvement in professional CME/CFP activities, research experience, and financial incentive may have played a role in the FPs’ decision to participate in the research project but were not associated with the actual number of patients they recruited.
As in most projects, a minority of the participating physicians did not manage to recruit any patients. These colleagues either underestimated the time investment required, had second thoughts about the acceptability for their patients, or were disappointed by the planning and paperwork of the project.
Factors determining patient recruitment in primary care research have hardly been studied. Busy schedules, forgetfulness, poor patient compliance, and FP involvement in too many projects are a few of the reasons given to explain poor recruitment.18,9
One third of all Dutch FPs were invited to participate in the dyspepsia study. Even though bias might have been caused by either the subject (dyspepsia physicians), the fact that half of the FPs had experience in research, or the fact that the participants were generally very active in numerous professional activities (active physicians), we think the results of our study can be generalized to the primary care setting in the Netherlands. Our conclusions may, however, require modification in other countries because of differences in practice organization and research climate in primary care.
Although there were various reasons for participation in the CIRANO study, they are consistent with earlier reports. FPs who participated in our dyspepsia project were mainly motivated by the subject and by the fact that the project was affiliated with our academic primary care research group. The motivation was not a matter of personal acquaintance, since most of the participants were not known to the members of the research group.
A substantial number of the participating colleagues also felt that participation was a professional obligation. This perception might have been induced by the fact that during the introduction of the project, special emphasis was put on the evidence missing from certain paragraphs of the Dutch guidelines on dyspepsia and on the need for primary care-based research to fill this gap. Although the research group felt that this was an important aspect of motivation, the evaluation showed that while it was an important reason to participate, it was not independently associated with patient recruitment. Only 10% of the participants stated that the financial incentive was a major reason to participate. Although this could be an unrealistic subjective statement, multivariate analysis confirmed that incentive-driven motivation was not related to the number of patients recruited. The fact that the results were the same for both the cohort study and the clinical trial might also be an indication that the amount of the incentive played a minor role in patient recruitment. This confirms earlier reports12,13 that FPs probably do not participate in research for the money, although they do want a proper reimbursement for the time invested.
The fact that the majority of our study group were involved in professional education or organization confirms once again15 that active colleagues are the ones most motivated for research. Interest in research, however, obviously does not guarantee successful inclusion. Although a high level of practice organization and a high specialization in clinical activities have also reportedly been associated with optimal recruitment, we could not confirm this with our data.
Conclusions
Collaborators for primary care research projects should primarily be sought among the colleagues who are already active in different professional fields, and who have a strong affiliation with academic research. Successful participation is mainly determined by the initial motivation of the FP: Those who are motivated by the presence of an academic research group in the study recruit best. The research topic, the amount of the financial incentive, research experience, and other factors often suggested to influence patient recruitment are probably less important.
STUDY DESIGN: A survey was used.
POPULATION: A total of 165 FPs participated in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
OUTCOMES MEASURED: We surveyed FPs about personal and practice characteristics and their motivation for participation in the project. These data were then related to the number of patients recruited. Univariate associations were calculated; relevant factors were entered into a logistic model that predicted patient recruitment.
RESULTS: Data on 128 FPs could be analyzed (80% response rate); these FPs recruited 793 patients in the cohort study (mean = 6.3 per FP) and 527 in the clinical trial (mean = 4.2 per FP). The main reasons for participation were the research topic (59%) and the participation of an academic research group in the study (63%). Many FPs felt that participation was a professional obligation (39%); the financial incentive played a minor role (15%). The number of recruited patients was only independently associated with the participation of an academic research group.
CONCLUSIONS: Successful patient recruitment in primary care research is determined more by motivation driven by the research group than by financial incentives, the research topic, or research experience.
Research in primary care is a growing field; the need for research on new diagnostic and therapeutic methods, the prognostic value of clinical signs and symptoms, and the effectiveness of clinical strategies in the population where these clinical contributions will be applied is well recognized.1-5 The number and scale of research projects in primary care is expanding continuously, and research networks are being established.6-8 This development, however, is putting increasing pressure on family physicians (FP) to actively participate in research. From an FP’s perspective there is a delicate balance between active research participation and efficient clinical practice.9 Proper planning to minimize paperwork and the delegation of research logistics to practice assistants can avoid many practical obstacles.7,10,11
Motivating and recruiting FPs to participate is an essential step in conducting primary care research. Factors such as the clinical relevance of the research subject, personal interest in the topic, ownership, personal contact with the research group, and the required time investment have been shown to influence the participation of FPs.5,11-15
Earlier studies have shown that higher-qualified physicians (with more continuing medical education training or a degree in research), physicians involved in under-or postgraduate training, FPs with research experience, and physicians working in well-organized practices (with more practice assistants and management protocols) are more interested in participating in research. Sex, the number of physicians per practice, and wages do not influence participation.15 The actual impact of financial incentives, however, is unclear. One study reported that incentives raised participation,16 while others could not confirm this.12,15 Some have also drawn attention to the ethical and methodologic aspects of payment for research.5,17
After successfully contracting with FPs to recruit patients for the study, the next task for a research group is to attain the maximal recruitment of patients. Unfortunately, less than half of the participating physicians actually recruit patients in the research project,9,12,18 and often only a minority (20%)18 of the eligible patients are actually included. Our objective in this study was to identify practice and physician characteristics determining successful patient recruitment among FPs participating in a combined randomized clinical trial/cohort study on drug treatment of dyspepsia in the Netherlands.
Methods
Setting
Our data were compiled from a practice-based research network (PBRN) used for a primary care study of dyspepsia, the CIRANO study (Cisapride or Ranitidine in NonOrganic dyspepsia), which was conducted from 1996-1998 in the Netherlands. The CIRANO project consisted of 2 parts: a cohort study, in which dyspeptic patients were included and followed up for 1 year, and a randomized clinical trial in which patients selected from the cohort were treated with either an H2 blocker or a prokinetic drug.19 It was designed by the dyspepsia group of the Julius Centre, a group of primary care researchers who have been involved in gastrointestinal research and guideline development in the Netherlands for a number of years. Janssen Cilag Inc. was the sole sponsor of the project. The protocol was approved by the academic ethical committee, and data monitoring was done in accordance with the GCP guidelines (good clinical practice, a government protocol for conducting drug trials).
In the cohort study the workload for the participating physicians was small. After identification and inclusion by the FP, patients had to complete a validated dyspepsia symptom score, a quality of life questionnaire (COOP/WONCA chart), and a mental health state check list (GHQ 12). Also, the practice assistant performed a Helicobacter pylori whole- blood test. Follow-up of the patient was done by the research group. The FP workload in the clinical trial was heavier. After inclusion, the patients were randomized to one of the treatment arms, treated for 4 weeks, and followed up for 3 months. Patients were seen by the FP at inclusion and after 1 and 3 months.
Monitoring, data recording, and verification were performed by a clinical research organization under supervision of the research group. Data were stored and analyzed by the group.
FPs were given a financial incentive for each patient they included. It comprised a reimbursement for the extra practice time spent completing the research protocol. The estimated overall time investment for FPs was 2 hours with an additional 5 minutes for each patient included in the cohort and 1 additional hour for each patient included in the randomized clinical trial. Since the workload differed significantly, the reimbursement was higher for the patients included in the clinical trial ($25 per patient in the cohort, $70 for each patient in the randomized clinical trial). Also, project bulletins were distributed regularly during the course of the project.
All FPs in the academic network of the Utrecht University (2000 FPs, one third of all Dutch FPs) were invited by the academic research group to participate in the CIRANO study. Two hundred of them expressed interest in the study and asked for documentation. A total of 165 FPs finally signed the research contract.
Questionnaire
Five months after the CIRANO project was completed, all the participating physicians were sent an anonymous questionnaire containing 4 sections: (1) demographic and practice data, (2) initial motivation to participate, (3) evaluation of the logistics of the project, and (4) motivation to participate in future projects. The evaluation questions were Likert-type (a scale of 4 answer categories); motivation was analyzed by asking respondents to indicate the 3 most important reasons from a list of 8 Table 1. A reminder was sent 1 month after the first mailing. Data from the questionnaire, as well as the number of patients included in the study, were entered in SPSS for Windows software version 8.5 for analysis (SPSS, Inc; Chicago, Ill). Questionnaires that were not fully completed were excluded.
Analysis
To calculate odds ratios the number of recruited patients per FP was dichotomized, with cut-off points at the 25th percentile of inclusion: 0 to 4 versus 5 or more patients for inclusion in the cohort study, and 0 or 1 patient versus 2 or more for inclusion in the clinical trial. The association between demographic data, motivating factors and the number of recruited patients was calculated and expressed as an odds ratio (with 95% confidence interval [CI]). Factors that were thus associated with recruitment at a P level of less than 0.25 (so as not to exclude potentially important variables), together with 7 factors known from the literature to be relevant (sex, list size, years in practice, practice location, research experience, high specialization, and financial incentive-driven motivation) were entered in a logistic regression model.
Using a stepwise-backward procedure, determinants of maximal inclusion were analyzed and reported as adjusted odds ratio (with 95% CI). As the workload and the financial incentive for the clinical trial and cohort study differed significantly, we analyzed inclusion in the 2 parts of the research project separately.
Results
Of the 165 participating FPs, 132 returned the questionnaire; the response rate was 80%. Since 4 of the questionnaires were incomplete, data from 128 family physicians could be used for analysis. Most responders were men (87%), and half had been in practice for more than 5 years, primarily in semi-urban areas of the Netherlands (68%). Most (77%) were involved in other professional activities, such as vocational training, continuing medical education (CME), or activities of the College of Family Physicians (CFP) at the district level. Half the responders worked in a group practice, and more than 60% of the practices were “specialized” (defined as performing at least 4 of the following clinical activities in routine daily practice: minor surgery, Doppler studies, electrocardiography, cervical screening, intrauterine device insertion, diabetes protocol, or spirometry). Also, 57% of the participants had previous research experience.
The initial motivation for participation in the project varied Table 1. For the majority of the participants, the research topic and the participation of our academic research group were the most important factors. One third of the respondents considered participation a professional obligation, were attracted by the personal appeal by the research group, or were intrigued by the presentation of the project. The involvement of the sponsor and the financial incentive were important for only a minority in their decision to participate.
In general, the project was well evaluated; 80% of the respondents stated that the project had met their expectations (56% fully, 24% partially), and 60% noted that they would consider participation in a similar research project in the future. More specifically, the participants were satisfied with the quality of the correspondence, the newsletter, and the monitoring of the project Table 2. However, 47% of the participating FPs thought that the overall time investment in the project was too burdensome, and one third mentioned the negative impact the application of the GCP guidelines had on the workload of the project.
From September 1996 until January 1998 these 128 physicians recruited 793 patients in the cohort phase of the study (average = 6.3 per FP; standard deviation [SD]=6.6) and 527 in the clinical trial (average = 4.2 per FP; SD=4.9). A total of 15% of the FPs recruited no patients in the cohort study, while 59% recruited 4 or more patients. In comparison, 21% of the FPs did not recruit any patients in the clinical trial, and 65% recruited 2 or more patients.
In the univariate analysis only the factors “active in CME/CFP” and “motivation by the academic research group” were associated with the number of recruited patients. These associations occurred in both the cohort study and the randomized clinical trial Table 3. After entering these 2 factors were entered in a logistic model, together with 7 factors earlier reported relevant in the literature, multivariate analysis indicated that only the factor “motivation by the participation of the academic research group” predicted the number of patients recruited in cohort study (adjusted odds ratio [OR] = 3.5; 95% CI, 1.4-9.0) and clinical trial (adjusted OR = 2.9; 95% CI, 1.2-6.9).
Discussion
The combination of participation in research and daily clinical practice requires a major time investment. Even though FPs consider their participation thoroughly before a research project commences, the actual numbers of patients they recruit are often disappointing.
In this large randomized trial on dyspepsia in primary care we showed that those FPs whose motivation was driven by the participation of an academic department of family medicine in the research group recruited the most patients. They were 3 times more likely than their colleagues to recruit at least 2 patients for the clinical trial or 5 patients for the cohort. Other factors such as list (practice) size, involvement in professional CME/CFP activities, research experience, and financial incentive may have played a role in the FPs’ decision to participate in the research project but were not associated with the actual number of patients they recruited.
As in most projects, a minority of the participating physicians did not manage to recruit any patients. These colleagues either underestimated the time investment required, had second thoughts about the acceptability for their patients, or were disappointed by the planning and paperwork of the project.
Factors determining patient recruitment in primary care research have hardly been studied. Busy schedules, forgetfulness, poor patient compliance, and FP involvement in too many projects are a few of the reasons given to explain poor recruitment.18,9
One third of all Dutch FPs were invited to participate in the dyspepsia study. Even though bias might have been caused by either the subject (dyspepsia physicians), the fact that half of the FPs had experience in research, or the fact that the participants were generally very active in numerous professional activities (active physicians), we think the results of our study can be generalized to the primary care setting in the Netherlands. Our conclusions may, however, require modification in other countries because of differences in practice organization and research climate in primary care.
Although there were various reasons for participation in the CIRANO study, they are consistent with earlier reports. FPs who participated in our dyspepsia project were mainly motivated by the subject and by the fact that the project was affiliated with our academic primary care research group. The motivation was not a matter of personal acquaintance, since most of the participants were not known to the members of the research group.
A substantial number of the participating colleagues also felt that participation was a professional obligation. This perception might have been induced by the fact that during the introduction of the project, special emphasis was put on the evidence missing from certain paragraphs of the Dutch guidelines on dyspepsia and on the need for primary care-based research to fill this gap. Although the research group felt that this was an important aspect of motivation, the evaluation showed that while it was an important reason to participate, it was not independently associated with patient recruitment. Only 10% of the participants stated that the financial incentive was a major reason to participate. Although this could be an unrealistic subjective statement, multivariate analysis confirmed that incentive-driven motivation was not related to the number of patients recruited. The fact that the results were the same for both the cohort study and the clinical trial might also be an indication that the amount of the incentive played a minor role in patient recruitment. This confirms earlier reports12,13 that FPs probably do not participate in research for the money, although they do want a proper reimbursement for the time invested.
The fact that the majority of our study group were involved in professional education or organization confirms once again15 that active colleagues are the ones most motivated for research. Interest in research, however, obviously does not guarantee successful inclusion. Although a high level of practice organization and a high specialization in clinical activities have also reportedly been associated with optimal recruitment, we could not confirm this with our data.
Conclusions
Collaborators for primary care research projects should primarily be sought among the colleagues who are already active in different professional fields, and who have a strong affiliation with academic research. Successful participation is mainly determined by the initial motivation of the FP: Those who are motivated by the presence of an academic research group in the study recruit best. The research topic, the amount of the financial incentive, research experience, and other factors often suggested to influence patient recruitment are probably less important.
1. Gray DP. Research in general practice: law of inverse opportunity. BMJ 1991;302:1380-82.
2. Mold JW, Green AL. Primary care research: revisiting its definition and rationale. J Fam Pract 2000;49:206-08
3. Wallace P, Drage S, Jackson N. Linking education, research and service in general practice. BMJ 1998;316:323.-
4. Olesen F. Research in general practice is needed to develop family medicine, not get embroiled in defining it. BMJ 1998;316:324-
5. Foy R, Parry J, McAvoy B. Clinical trials in primary care. BMJ 1998;317:1168-69.
6. Smith LFP, Carter YH, Cox J. Accrediting research practices. Br J Gen Pract 1998;48(433):1464-65.
7. Bell Seyer SEM, Klaber Moffett JA. Recruiting patients to randomized trials in primary care; principles and case study. Fam Pract 2000;17:187-91.
8. Smith LFP. Research in general practice: what, who and why? Br J Gen Pract 1997;47:83-86.
9. Tognoni G, Alli C, Avanzini F, et al. Randomized clinical trials in general practice: lessons from a failure. BMJ 1991;303:969-71.
10. Murphy E, Spiegal N, Kinmonth A. Will you help me with my research? Gaining access to primary care settings and subjects. Br J Gen Pract 1992;42:162-65.
11. Ward J. General practitioners’ experience of research. Fam Pract 1994;11:418-23.
12. Kuyvenhoven MM, Dagnelie CF, de Melker RA. Recruitment of general practitioners in a sore throat study. Br J Gen Pract 1997;47:126-27.
13. Kocken RJJ, Prenger-Duchateau A, Smeets-Rinkens PELM, Knottnerus JA. Het oordeel van huisartsen over deelname aan wetenschappelijk onderzoek. Huisarts Wet 1992;35:32-34.
14. Borgiel AEM, Dunn EV, Lamont CT, et al. Recruiting family physicians as participants in research. Fam Pract 1989;6:168-72.
15. Silagy SA, Carson NE. Factors affecting the level of interest and activity in primary care research among general practitioners. Fam Pract 1989;6:173-76.
16. Deehan A, Templeton L, Taylor C, Drummond C, Strang J. The effect of cash and other financial inducements on the response rate of general practitioners in a national postal survey. Br J Gen Pract 1997;46:87-90.
17. Ferguson C. Payment of financial incentives to GP’s may invalidate informed consent process. BMJ 1998;316:75-76.
18. Peto V, Coulter A, Bond A. Factors affecting general practitioners’ recruitment of patients into a prospective study. Fam Pract 1993;10:207-11.
19. Quartero AO, Numans ME, de Melker RA, Hoes AW, de Wit NJ. Dyspepsia in primary care; prokinetic therapy or acid suppression, a randomized clinical trial. Scand J Gastroenterol 2001. In press.
1. Gray DP. Research in general practice: law of inverse opportunity. BMJ 1991;302:1380-82.
2. Mold JW, Green AL. Primary care research: revisiting its definition and rationale. J Fam Pract 2000;49:206-08
3. Wallace P, Drage S, Jackson N. Linking education, research and service in general practice. BMJ 1998;316:323.-
4. Olesen F. Research in general practice is needed to develop family medicine, not get embroiled in defining it. BMJ 1998;316:324-
5. Foy R, Parry J, McAvoy B. Clinical trials in primary care. BMJ 1998;317:1168-69.
6. Smith LFP, Carter YH, Cox J. Accrediting research practices. Br J Gen Pract 1998;48(433):1464-65.
7. Bell Seyer SEM, Klaber Moffett JA. Recruiting patients to randomized trials in primary care; principles and case study. Fam Pract 2000;17:187-91.
8. Smith LFP. Research in general practice: what, who and why? Br J Gen Pract 1997;47:83-86.
9. Tognoni G, Alli C, Avanzini F, et al. Randomized clinical trials in general practice: lessons from a failure. BMJ 1991;303:969-71.
10. Murphy E, Spiegal N, Kinmonth A. Will you help me with my research? Gaining access to primary care settings and subjects. Br J Gen Pract 1992;42:162-65.
11. Ward J. General practitioners’ experience of research. Fam Pract 1994;11:418-23.
12. Kuyvenhoven MM, Dagnelie CF, de Melker RA. Recruitment of general practitioners in a sore throat study. Br J Gen Pract 1997;47:126-27.
13. Kocken RJJ, Prenger-Duchateau A, Smeets-Rinkens PELM, Knottnerus JA. Het oordeel van huisartsen over deelname aan wetenschappelijk onderzoek. Huisarts Wet 1992;35:32-34.
14. Borgiel AEM, Dunn EV, Lamont CT, et al. Recruiting family physicians as participants in research. Fam Pract 1989;6:168-72.
15. Silagy SA, Carson NE. Factors affecting the level of interest and activity in primary care research among general practitioners. Fam Pract 1989;6:173-76.
16. Deehan A, Templeton L, Taylor C, Drummond C, Strang J. The effect of cash and other financial inducements on the response rate of general practitioners in a national postal survey. Br J Gen Pract 1997;46:87-90.
17. Ferguson C. Payment of financial incentives to GP’s may invalidate informed consent process. BMJ 1998;316:75-76.
18. Peto V, Coulter A, Bond A. Factors affecting general practitioners’ recruitment of patients into a prospective study. Fam Pract 1993;10:207-11.
19. Quartero AO, Numans ME, de Melker RA, Hoes AW, de Wit NJ. Dyspepsia in primary care; prokinetic therapy or acid suppression, a randomized clinical trial. Scand J Gastroenterol 2001. In press.
Understanding Practice from the Ground Up
METHODS: Eighteen practices were purposefully drawn from a random sample of Nebraska family practices that had earlier participated in a study of preventive service delivery. Each practice was studied intensely over a 4- to 12-week period using a comparative case study design that included extended direct observation of the practice environment and clinical encounters, formal and informal interviews of clinicians and staff, and medical record review.
DESIGN: This multimethod assessment process (map) provided insights into a wide range of practice activities ranging from descriptions of the organization and patient care activities to quantitative documentation of physician- and practice-level delivery of a variety of evidence-based preventive services. Initial insights guided subsequent data collection and analysis and led to the integration of complexity science concepts into the design. In response to the needs and wishes of the participants, practice meetings were initiated to provide feedback, resulting in a more collaborative model of practice-based research.
CONCLUSIONS: Our map provided rich data for describing multiple aspects of primary care practice, testing a priori hypotheses, discovering new insights grounded in the actual experience of practice participants, and fostering collaborative practice change.
Clinicians, researchers, and policymakers now recognize that multiple competing demands1 and opportunities2 are simultaneously affecting the physicians, staff, and patients within primary care practices. Our current understanding of outpatient practice is largely based on administrative databases, national surveys, and medical record reviews, with additional insights from surveys of patients or clinicians. These data generally are not designed to capture the richness of the content and context that is needed to better understand the realities and complexities of practice.3-6 The underlying premise of The Prevention and Competing Demands in Primary Care (P&CD) Study is that efforts to change practice should be preceded by efforts to understand it.2,7 The explicit goal of this study is to understand practice structure and process, including details of patients, physicians, staff, and clinical encounters; the practice as an organization; and its relationship to the larger community and health system.
In this paper we describe a dynamic observational multimethod assessment process (MAP) that can be used to understand the complex reality of primary care practice. MAP is based on a multimethod comparative case study design8,9 that integrates elements of epidemiology with methods derived from the qualitative traditions of anthropology and sociology and relies most heavily on qualitative observation and interviewing methods. Studies of this type require an iterative data collection and analysis approach that evolves over time so that new methods can be introduced as the investigators gain a better understanding of important issues. A major strength of our study design was that it allowed hypotheses and insights gained from participants and from ongoing analyses to be integrated into the ongoing investigation.
The study’s primary research questions related to how practice characteristics affect preventive service delivery. Thus, the research design included: (1) an examination of the organizational contexts that support preventive services, (2) an examination of the competing demands imposed by carrying out clinical prevention and illness care in clinical encounters and in the practice, (3) a comparison of the approaches used by practices with high versus low intensity of preventive services delivered to eligible patients, and (4) an examination of approaches used to deliver different types of preventive services. Although the particular focus was on preventive services, the rich MAP allowed pursuit of other research topics that are presented in this issue of JFP.
This article describes the evolutionary methods of the P&CD study, focusing on how data were collected to ensure that sufficient details were available to understand a practice’s values, structures, and processes.*
Emergent research design
The P&CD study was conceived in 1994 to be an in-depth follow-up of insights from the Direct Observation of Primary Care (DOPC) Study that was just getting under way in northeastern Ohio.5 The DOPC Study provided a largely quantitative assessment of patients, physicians, encounters, and practices using patient questionnaires, physician surveys, medical record audits, and direct observation of clinical encounters using the Davis Observation Code.10 That study’s initial findings were presented in the May 1998 theme issue of JFP, and the study processes have recently been described.11 Details of the DOPC methods have been published elsewhere.4,5,12
Although the initial design allowed the DOPC research nurses to collect brief observational notes, the intensity of the quantitative data collection limited the scope of the study’s qualitative data for understanding details of the practice’s organization and the competing demands within clinical encounters. As a consequence, the P&CD study was designed to provide more in-depth description and understanding of the competing demands of family practice, and in particular, to evaluate factors affecting preventive services delivery using a comparative case study design and a MAP.
A key feature of the P&CD study design was an openness to the integration of emerging insights into the data collection protocol. For example, preliminary analyses of the DOPC data13 and other ongoing studies14 led to the discovery that complexity science was valuable for explaining the dynamics of office systems6 and needed to be incorporated into the design. (Complexity science is the study of systems that are characterized by nonlinear dynamics and emergent properties; it emphasizes the need to understand the interrelationships of the whole system and not just collect data about the parts.15) The investigators also developed new ways to display the relationships among physicians and staff in the practices using “practice genograms.”16 The practice genogram is a diagram of the functional and interpersonal relationships among the clinicians, support staff, and other people and organizations interacting with the practice. Throughout the project and consistent with the standards of qualitative research design,17 there were continued modifications and enhancements in the data collection and analysis strategies in response to insights that were emerging from ongoing analyses and interpretation of the data.
An important feature of the project was the development of an advisory committee of consultants and co-investigators that convened annually to provide multidisciplinary input, review results, and provide feedback. The advisory committee included academic representatives with expertise in nursing, health education, women’s health, minority health, and public policy. Two additional members were added to the project to provide expertise into the study of organizations as complex systems. The annual reviews by the advisory committee led to significant changes in the research design while the study was ongoing.
Practice Sample
Beginning in late 1996, we drew from 91 practices in Nebraska that had been randomly selected to participate in an earlier study on tobacco prevention and cessation.18 Initially a sample of 10 practices was purposefully selected19,20 using an iterative process to represent a range in size (small and large), geographic location (urban, suburban, and rural), and rate of delivery of tobacco-related preventive services. Preliminary analyses of these 10 initial practices provided a summary of preventive health delivery strategies in primary care practices and a description of competing demands that enhanced or limited these strategies. To confirm or refute the emerging insights from the original 10 practices, 8 additional practices were selected for further data collection during the second and third years of the study. The sampling strategy in years 2 and 3 ensured that at least 2 practices each from several major regional hospital health systems were included and allowed us to assess emerging hypotheses about the importance of health system context for understanding community practices.
The practices were recruited by contacting one of the physicians to solicit participation; only those in which all family physicians in the practice agreed to participate were included in the study. Twenty-three practices were contacted; all physicians in 18 agreed to participate (78%).
Core Data Collection Methods
Data were collected by trained field researchers who spent 4 weeks or more taking notes at each practice while observing the practice and clinical encounters, conducting informal key informant interviews of staff, collecting office documents, and auditing charts of patients whose encounters were observed. Within each practice, data collection occurred in stages, with a short break after the initial week or 2 of observation to allow preliminary analyses to inform additional data collection.
Observations at the practice level were recorded in a combination of structured observational checklists, unstructured dictated field notes,21-23 and key informant interviews.24 Detailed floor plans of the practice were used to identify where particular activities occurred and where individual practice participants worked. Each day at the practice, the field researcher took short notes or “jottings” and dictated expanded field notes in the evening.23 A template of topics was used periodically to ensure that important aspects of the practice were not being overlooked. The template included lists of features of the community, practice, staff, and patients that the researchers saw as important Figure W1.*13 A 3-page structured practice environment checklist was adapted from earlier work on the DOPC project11 and included a wide range of practice characteristics and functions, including items such as the number and training of staff, counseling options offered, and management of telephone calls and referrals Figure W2a. This checklist also served as a detailed reminder to the research nurse of topics to be included in the field notes. Throughout the time the field researchers were recording field notes and filling in the checklist, they opportunistically asked clinicians or staff informal key informant questions for confirmation or clarification.
Consecutive patients for each clinician in a practice were approached with a goal of recruiting 30 patients who would consent to have the field researcher observe their visit. This generally required approaching 35 to 40 patients. Because some clinicians worked part-time or were not consistently in the practice, it was not always possible to observe 30 visits. After explaining the study and gaining signed informed consent from patients who agreed to participate, the field researchers observed the outpatient visit as unobtrusively as possible. A 1-page structured encounter checklist Figure W3 that was also modified from the DOPC study11 provided blanks for noting the reason for the visit, chief complaints, and final diagnoses, and for indicating whether any of approximately 100 preventive services were ordered or delivered. Space was provided at the bottom of the form for recording notes that were later used to dictate a re-creation of the encounter. At a later time, a chart audit was done on each observed patient’s medical record using a structure chart audit form Figure W4.
After the initial observational data were transcribed, a preliminary practice genogram16 was drawn, and an initial practice summary was written. The genogram of practice participants, roles, and relationships was initially diagrammed on a white board by a transdisciplinary research team by interviewing the field researcher about the current and past practice clinicians and staff and about the health system and community. The demographics of individuals were recorded, including age, sex, years with the practice, percentage of work effort, and job responsibilities. Additional details included functional and emotional relationships observed in the practice, such as who worked together and any obvious conflicts among members of the practice or with health system or community affiliations. This process enabled the investigators to identify areas of incomplete data so the field researchers could return to fill in missing details.
In addition to the key informant interviews done as part of the observation activities, more formal semistructured individual depth interviews were arranged with each clinician and many of the office staff.25 These interviews consisted of a 30-minute to 1-hour narrative interview in which the respondent was asked open-ended questions designed to elicit in-depth responses Figure W5. Although the major focus of these interviews was on the delivery of preventive services, more general questions were included to understand perspectives on practice process. For example, respondents were asked: “Could you describe for me a typical day for you in this practice?” and “If you believed a change was needed regarding some specific delivery of a service within this office, could you describe the process you would go through to try to get it implemented?” These interviews were audiotaped and transcribed verbatim.
To supplement the observational and interview data, field researchers gathered existing documents and artifacts from the practice. Items such as blank charts and flow sheets, patient schedules, personnel lists, samples of patient education materials and handouts, mission and vision statements, and annual reports were collected and compiled in binders. In some practices, particularly those affiliated with hospital health systems, materials were also available from Web sites. All transcribed interviews and dictated field notes from the practice and encounter observations were imported into FolioViews 4.11 (NextPage, Provo, Utah), a text-base management software program that facilitates coding, searching, and retrieval of large computerized text files.26
Emerging Design Decisions
Midway through data collection at the first practice, the advisory committee met to review the data and discuss any concerns. The advisory committee identified a number of emerging hypotheses related to complexity science concepts that were used to guide subsequent data collection and management. It was deemed particularly important to identify “attractors”—factors in the practice and in the larger environment that influenced the structure and function of the practice as an organization.6 For example, an attractor might be a particular burning interest of one of the physicians, an expectation of the local hospital systems, or a dominant demographic characteristic of the patients being served.13 An expanded systemic model of primary care Figure 1 was articulated that characterized 6 core areas for data collection: (1) patient perceptions and behavior, (2) physician perceptions and behavior, (3) encounter structures and processes, (4) practice structures and processes, (5) community characteristics, and (6) the larger health system. This model identified the need for additional data on the community context and patient experience. Checklists were revised and field researchers were asked to spend more time gathering data about the community. It was also apparent that accurate calculation of certain preventive service delivery rates would require patient input and a larger sample size. For example, a patient exit card
Figure W6 was developed to ascertain self-reported tobacco use status (for all patients) and use of obstetrics and gynecology services and history of hysterectomy (for women). These data were used to determine a patient’s eligibility for tobacco cessation counseling and Papanicolaou tests, respectively. Observation of 30 encounters with each clinician was done to increase the stability for calculating rates for common preventive service recommendations.
A larger issue emerged with the discovery that, after contributing data over the course of weeks or months, members of the practices desired feedback in a timely manner. Practice clinicians and staff were very interested in how they were doing and asked when they would be receiving a report of our results. They did not want to wait 3 years for the completion of the study. Although ongoing analyses were anticipated, these had primarily been designed to ensure completeness of data and to provide feedback to field researchers on areas where clarifications were needed. In response to the emergent desire for feedback, the team generated rapid-turnaround summary reports for each practice. A summary report template was designed to present the descriptive details of the practice, including the practice genogram and a summary of the strengths and weaknesses of the practice’s prevention approach. The final page in the report provided the practice with a series of questions or points for self-reflection that often included process questions, such as “How can this organization become a team?” or “How can this practice deliver preventive services more consistently?” These reports were shared interactively with practices at a debriefing meeting within 2 to 3 months of completing data collection at each practice.
The feedback meetings provided an important opportunity to check the validity of the researchers’ analyses by comparing them with the perspectives of the practice participants. In all the practices, the response was a strong overall validation of the research team’s interpretation of the practice and its structures and processes. During the feedback presentations, the practice physicians and staff consistently made comments like, “Wow, did you ever get us,” or “This is like looking in a mirror.” In a number of sessions, the participants mentioned that the report raised issues about which they were vaguely aware and that the findings were stimulating considerable self-reflection. In several practices, the physicians disclosed that they would be taking actions in the future to modify some of the deficiencies the reports uncovered.
The next modification came as the data were being collected simultaneously in the second and third practices. We realized that despite our efforts to be nondisruptive, participating in the project required extra effort on the part of the practice. Each practice therefore received partial compensation in the form of a $500 certificate for the purchase of books or equipment.
After completing data collection at several practices, it was apparent that patients’ perspectives were still under-represented and that this was limiting the understanding of the practice. To gain further insights into patients’ experiences, beginning with the sixth practice we adopted the patient path approach described by Pommerenke and Dietrich,27 following patients from the time they walk into the practice until the time they leave, using a patient path form for recording activities at different stages of the visit Figure W7. Additional brief open-ended interviews were conducted in the waiting room or examination rooms while patients were waiting.
Although we had asked the research nurses to be more thorough in their descriptions of the community, data from the community and health system were still incomplete. This became even more pronounced when studying practices that were part of health systems. Therefore, beginning with the sixth practice, we included in-depth interviews with individuals from health systems (eg, regional managers and medical directors). A further refinement came with the use of community key informant trees, a systematic process of identifying and interviewing members of the local community surrounding the practice.24,28 These interviews of patients, church leaders, and other individuals from the community began with the ninth practice.
Once all the modifications were incorporated, the final case study design provided data at each of the 6 levels as shown in the Table 1. Particularly detailed data were available at the clinician, encounter, and practice levels. For example, at the clinician level the data included perceptions of roles as ascertained through the in-depth interviews, as well as actual behaviors recorded in the encounter field notes and chart audits. Insights on the structures and process of the practice were obtained through unstructured observations of the practice, structured checklists, written documents, and interviews. Supporting data were collected on patients’ perspectives, the community, and the health system that provided contexts for the practice case studies.
Discussion
The complexity of primary care practices is best understood from multiple perspectives,29 a principle that guided the initial selection of a multimethod comparative case study design for this investigation. The MAP that emerged from this comparative case study design has a number of strengths and weaknesses. A particular asset of the design was our ability to investigate specific phenomena within their context rather than isolated from it.29 The design also encouraged the investigators to pursue emerging insights, thus informing multiple perspectives that might not have otherwise been considered, although this may be somewhat limited by the purposeful sampling strategy that focuses on maximizing information about a particular topic.
Limitations
A limitation for broader implementation of this research design is the intensity of data collection and analysis, which are difficult to accomplish without considerable resources and a research team with diverse skills. There might also be some concerns about the ways that the data collection process alters practice behavior; however, the prolonged observational time frame and the multiple data sources for “triangulation” are designed to limit any potential Hawthorne effect. That is, by collecting and systematically comparing data from multiple sources, including direct observation, different forms of interviews, and existing documents, the investigators were able to identify inconsistencies in patterns of behavior.17,19
Although the data collection, analysis, and feedback process appeared to increase a practice’s self-reflection, our study limited the input of patient and practice participants in the design, analysis, and interpretation and thus does not approach the participatory research paradigm espoused by McCauley and colleagues.30 Still, our study moved from being primarily observational descriptive research into a more collaborative and interventional project, in part at the request of the participants. This suggests a model and method for future research in the arena of health care process and outcome improvement with the practices as collaborators. The MAP characterized in this paper offers a means for simultaneously describing, understanding, and improving the richly complex and varied processes and outcomes of primary care. By more actively engaging the practices in the research process, the MAP also points toward a new more collaborative model of practice-based research.
Conclusions
The comprehensive data in the P&CD study provide a unique opportunity to understand and describe multiple perspectives from the clinician, patient, encounter, practice, community, and health system spheres. Each of the papers in this issue of JFP used some of these comprehensive data to study one or more of these spheres. For example, the encounter field notes were the primary source of data for exploring how “family” presents in encounters31Several of the authors used subsets of patients, including patients presenting with acute respiratory track infections,33 smokers,34 and frequent attenders.35 These authors each supplemented the encounter field notes with data from the medical record reviews, medical record field notes, and patient exit cards. The complete data set including practice field notes, practice genograms, physician and staff interviews, office environment checklists, and encounter field notes were used to describe staff training, roles, and functions.36 This is only a part of the research adventure available in this type of data. We hope many others will join in the excitement.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a Research Center grant from the American Academy of Family Physicians. Drs Crabtree, Miller and Stange are associated with the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio. We are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to thank Connie Gibbs and Jen Rouse, who spent many hours in the practices collecting data, and Diane Dodendorf and Jason Lebsack, who spent countless hours coordinating transcription and data management activities, for their dedicated work. We are especially indebted to Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. The ongoing analyses that ensured the quality and comprehensiveness of the data were made possible through the dedicated work of Helen McIlvain, PhD; Jeffrey Susman, MD; Virginia Aita, PhD; Kristine McVea, MD; Elisabeth Backer, MD; Paul Turner, PhD; and Louis Pol, PhD. Finally, we thank the members of the advisory committee: Valerie Gilchrist, MD; Paul Nutting, MD, MPH; Carlos Jaén, MD, PhD; Kurt Stange, MD, PhD; William Miller, MD, MA; Reuben McDaniel, PhD; and Ruth Anderson, RN, PhD.
1. Jaén CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
2. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
3. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
4. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patients visits. Med Care 1998;36:851-67.
5. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box’: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
6. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
7. Stange KC. One size doesn’t fit all: multimethod research yields new insights into interventions to increase prevention in family practice. J Fam Pract 1996;43:358-60.
8. Stake RE. The art of case study research. Thousand Oaks, Calif: Sage Publications; 1995:xv,175.
9. Crabtree BF, Miller W. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
10. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
11. DOPC Writing Group. The direct observation of primary care study: insights from the process of conducting multimethod, transdisciplinary research in community family practice. J Fam Pract 2001;50:345-52.
12. Stange KC, Flocke SA, Goodwin MA, Kelly RB, Zyzanski SJ. Direct observation of rates of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
13. Crabtree BF, Miller WL, Aita VA, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
14. McVea K, Crabtree BF, Medder JD, et al. An ounce of prevention? Evaluation of the “Put Prevention into Practice” program. J Fam Pract 1996;43:361-69.
15. McDaniel R, Driebe DJ. Complexity science and health care management. Adv Health Care Manage 2001;2:11-36.
16. McIlvain H, Crabtree B, Medder J, Stange KC, Miller WL. Using practice genograms to understand and describe practice configurations. Fam Med 1998;30:490-96.
17. Lincoln YS, Guba EG. Naturalistic inquiry. Beverly Hills, Calif: Sage Publications; 1985;416.-
18. McIlvain HE, Crabtree BF, Backer EL, Turner PD. Use of office-based smoking cessation activities in family practices. J Fam Pract 2000;49:1025-29.
19. Patton MQ. Qualitative evaluation and research methods. 2nd ed. Newbury Park, Calif: Sage Publications; 1990.
20. Kuzel A. Sampling in qualitative inquiry. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;33-45.
21. Jorgensen DL. Participant observation. Newbury Park, Calif: Sage Publications; 1989.
22. Spradley JP. Participant observation. New York, NY: Harcourt Brace Jovanovich College Publishers; 1980.
23. Bogdewic SP. Participant observation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;47-69.
24. Gilchrist VJ, Williams RL. Key informant interviews. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;71-88.
25. Miller WL, Crabtree BF. Depth interviewing. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;89-107.
26. Meadows L, Dodendorf D. Data management & interpretation using computers to assist. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;195-218.
27. Pommerenke FA, Dietrich AJ. Improving and maintaining preventive services. Part 1: supplying the patient model. J Fam Pract 1992;34:86-91.
28. Williams RL, Snider R, Ryan M. A key informant ‘tree’ as a tool for community oriented primary care. Fam Pract Res J 1994;14:277-84.
29. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
30. McCauley A, Commanda L, Freeman W. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
31. Main DS, Holcomb S, Dickinson P, Stange KC, Crabtree BF. The role of the family in medical encounters. J Fam Pract 2001;50:888.-
32. Robinson D, Prest L, Susman JL, Rasmussen D, Rouse J, Crabtree BF. Technician, detective, friend and healer: understanding mental health management in family practice. J Fam Pract 2001;50:864-70.
33. Scott J, DiCicco-Bloom B, Cohen D, et al. Antibiotic use in the treatment of URI. J Fam Pract 2001;50:853-58.
34. Jaén CR, McIlvain H, Pol L, Phillips RL, Flocke SA, Crabtree BF. Tailoring tobacco counseling to the competing demands in the clinical encounter. J Fam Pract 2001;50:859-63.
35. Smucker D, Zink T, Susman JL, Crabtree BF. Caring for patients who make frequent visits to family practices. J Fam Pract 2001;50:847-52.
36. Aita VA, Dodendorf D, Lebsack J, Tallia AF, Crabtree BF. Patient care staffing patterns and roles in community-based family practices. J Fam Pract 2001;50:889.-
METHODS: Eighteen practices were purposefully drawn from a random sample of Nebraska family practices that had earlier participated in a study of preventive service delivery. Each practice was studied intensely over a 4- to 12-week period using a comparative case study design that included extended direct observation of the practice environment and clinical encounters, formal and informal interviews of clinicians and staff, and medical record review.
DESIGN: This multimethod assessment process (map) provided insights into a wide range of practice activities ranging from descriptions of the organization and patient care activities to quantitative documentation of physician- and practice-level delivery of a variety of evidence-based preventive services. Initial insights guided subsequent data collection and analysis and led to the integration of complexity science concepts into the design. In response to the needs and wishes of the participants, practice meetings were initiated to provide feedback, resulting in a more collaborative model of practice-based research.
CONCLUSIONS: Our map provided rich data for describing multiple aspects of primary care practice, testing a priori hypotheses, discovering new insights grounded in the actual experience of practice participants, and fostering collaborative practice change.
Clinicians, researchers, and policymakers now recognize that multiple competing demands1 and opportunities2 are simultaneously affecting the physicians, staff, and patients within primary care practices. Our current understanding of outpatient practice is largely based on administrative databases, national surveys, and medical record reviews, with additional insights from surveys of patients or clinicians. These data generally are not designed to capture the richness of the content and context that is needed to better understand the realities and complexities of practice.3-6 The underlying premise of The Prevention and Competing Demands in Primary Care (P&CD) Study is that efforts to change practice should be preceded by efforts to understand it.2,7 The explicit goal of this study is to understand practice structure and process, including details of patients, physicians, staff, and clinical encounters; the practice as an organization; and its relationship to the larger community and health system.
In this paper we describe a dynamic observational multimethod assessment process (MAP) that can be used to understand the complex reality of primary care practice. MAP is based on a multimethod comparative case study design8,9 that integrates elements of epidemiology with methods derived from the qualitative traditions of anthropology and sociology and relies most heavily on qualitative observation and interviewing methods. Studies of this type require an iterative data collection and analysis approach that evolves over time so that new methods can be introduced as the investigators gain a better understanding of important issues. A major strength of our study design was that it allowed hypotheses and insights gained from participants and from ongoing analyses to be integrated into the ongoing investigation.
The study’s primary research questions related to how practice characteristics affect preventive service delivery. Thus, the research design included: (1) an examination of the organizational contexts that support preventive services, (2) an examination of the competing demands imposed by carrying out clinical prevention and illness care in clinical encounters and in the practice, (3) a comparison of the approaches used by practices with high versus low intensity of preventive services delivered to eligible patients, and (4) an examination of approaches used to deliver different types of preventive services. Although the particular focus was on preventive services, the rich MAP allowed pursuit of other research topics that are presented in this issue of JFP.
This article describes the evolutionary methods of the P&CD study, focusing on how data were collected to ensure that sufficient details were available to understand a practice’s values, structures, and processes.*
Emergent research design
The P&CD study was conceived in 1994 to be an in-depth follow-up of insights from the Direct Observation of Primary Care (DOPC) Study that was just getting under way in northeastern Ohio.5 The DOPC Study provided a largely quantitative assessment of patients, physicians, encounters, and practices using patient questionnaires, physician surveys, medical record audits, and direct observation of clinical encounters using the Davis Observation Code.10 That study’s initial findings were presented in the May 1998 theme issue of JFP, and the study processes have recently been described.11 Details of the DOPC methods have been published elsewhere.4,5,12
Although the initial design allowed the DOPC research nurses to collect brief observational notes, the intensity of the quantitative data collection limited the scope of the study’s qualitative data for understanding details of the practice’s organization and the competing demands within clinical encounters. As a consequence, the P&CD study was designed to provide more in-depth description and understanding of the competing demands of family practice, and in particular, to evaluate factors affecting preventive services delivery using a comparative case study design and a MAP.
A key feature of the P&CD study design was an openness to the integration of emerging insights into the data collection protocol. For example, preliminary analyses of the DOPC data13 and other ongoing studies14 led to the discovery that complexity science was valuable for explaining the dynamics of office systems6 and needed to be incorporated into the design. (Complexity science is the study of systems that are characterized by nonlinear dynamics and emergent properties; it emphasizes the need to understand the interrelationships of the whole system and not just collect data about the parts.15) The investigators also developed new ways to display the relationships among physicians and staff in the practices using “practice genograms.”16 The practice genogram is a diagram of the functional and interpersonal relationships among the clinicians, support staff, and other people and organizations interacting with the practice. Throughout the project and consistent with the standards of qualitative research design,17 there were continued modifications and enhancements in the data collection and analysis strategies in response to insights that were emerging from ongoing analyses and interpretation of the data.
An important feature of the project was the development of an advisory committee of consultants and co-investigators that convened annually to provide multidisciplinary input, review results, and provide feedback. The advisory committee included academic representatives with expertise in nursing, health education, women’s health, minority health, and public policy. Two additional members were added to the project to provide expertise into the study of organizations as complex systems. The annual reviews by the advisory committee led to significant changes in the research design while the study was ongoing.
Practice Sample
Beginning in late 1996, we drew from 91 practices in Nebraska that had been randomly selected to participate in an earlier study on tobacco prevention and cessation.18 Initially a sample of 10 practices was purposefully selected19,20 using an iterative process to represent a range in size (small and large), geographic location (urban, suburban, and rural), and rate of delivery of tobacco-related preventive services. Preliminary analyses of these 10 initial practices provided a summary of preventive health delivery strategies in primary care practices and a description of competing demands that enhanced or limited these strategies. To confirm or refute the emerging insights from the original 10 practices, 8 additional practices were selected for further data collection during the second and third years of the study. The sampling strategy in years 2 and 3 ensured that at least 2 practices each from several major regional hospital health systems were included and allowed us to assess emerging hypotheses about the importance of health system context for understanding community practices.
The practices were recruited by contacting one of the physicians to solicit participation; only those in which all family physicians in the practice agreed to participate were included in the study. Twenty-three practices were contacted; all physicians in 18 agreed to participate (78%).
Core Data Collection Methods
Data were collected by trained field researchers who spent 4 weeks or more taking notes at each practice while observing the practice and clinical encounters, conducting informal key informant interviews of staff, collecting office documents, and auditing charts of patients whose encounters were observed. Within each practice, data collection occurred in stages, with a short break after the initial week or 2 of observation to allow preliminary analyses to inform additional data collection.
Observations at the practice level were recorded in a combination of structured observational checklists, unstructured dictated field notes,21-23 and key informant interviews.24 Detailed floor plans of the practice were used to identify where particular activities occurred and where individual practice participants worked. Each day at the practice, the field researcher took short notes or “jottings” and dictated expanded field notes in the evening.23 A template of topics was used periodically to ensure that important aspects of the practice were not being overlooked. The template included lists of features of the community, practice, staff, and patients that the researchers saw as important Figure W1.*13 A 3-page structured practice environment checklist was adapted from earlier work on the DOPC project11 and included a wide range of practice characteristics and functions, including items such as the number and training of staff, counseling options offered, and management of telephone calls and referrals Figure W2a. This checklist also served as a detailed reminder to the research nurse of topics to be included in the field notes. Throughout the time the field researchers were recording field notes and filling in the checklist, they opportunistically asked clinicians or staff informal key informant questions for confirmation or clarification.
Consecutive patients for each clinician in a practice were approached with a goal of recruiting 30 patients who would consent to have the field researcher observe their visit. This generally required approaching 35 to 40 patients. Because some clinicians worked part-time or were not consistently in the practice, it was not always possible to observe 30 visits. After explaining the study and gaining signed informed consent from patients who agreed to participate, the field researchers observed the outpatient visit as unobtrusively as possible. A 1-page structured encounter checklist Figure W3 that was also modified from the DOPC study11 provided blanks for noting the reason for the visit, chief complaints, and final diagnoses, and for indicating whether any of approximately 100 preventive services were ordered or delivered. Space was provided at the bottom of the form for recording notes that were later used to dictate a re-creation of the encounter. At a later time, a chart audit was done on each observed patient’s medical record using a structure chart audit form Figure W4.
After the initial observational data were transcribed, a preliminary practice genogram16 was drawn, and an initial practice summary was written. The genogram of practice participants, roles, and relationships was initially diagrammed on a white board by a transdisciplinary research team by interviewing the field researcher about the current and past practice clinicians and staff and about the health system and community. The demographics of individuals were recorded, including age, sex, years with the practice, percentage of work effort, and job responsibilities. Additional details included functional and emotional relationships observed in the practice, such as who worked together and any obvious conflicts among members of the practice or with health system or community affiliations. This process enabled the investigators to identify areas of incomplete data so the field researchers could return to fill in missing details.
In addition to the key informant interviews done as part of the observation activities, more formal semistructured individual depth interviews were arranged with each clinician and many of the office staff.25 These interviews consisted of a 30-minute to 1-hour narrative interview in which the respondent was asked open-ended questions designed to elicit in-depth responses Figure W5. Although the major focus of these interviews was on the delivery of preventive services, more general questions were included to understand perspectives on practice process. For example, respondents were asked: “Could you describe for me a typical day for you in this practice?” and “If you believed a change was needed regarding some specific delivery of a service within this office, could you describe the process you would go through to try to get it implemented?” These interviews were audiotaped and transcribed verbatim.
To supplement the observational and interview data, field researchers gathered existing documents and artifacts from the practice. Items such as blank charts and flow sheets, patient schedules, personnel lists, samples of patient education materials and handouts, mission and vision statements, and annual reports were collected and compiled in binders. In some practices, particularly those affiliated with hospital health systems, materials were also available from Web sites. All transcribed interviews and dictated field notes from the practice and encounter observations were imported into FolioViews 4.11 (NextPage, Provo, Utah), a text-base management software program that facilitates coding, searching, and retrieval of large computerized text files.26
Emerging Design Decisions
Midway through data collection at the first practice, the advisory committee met to review the data and discuss any concerns. The advisory committee identified a number of emerging hypotheses related to complexity science concepts that were used to guide subsequent data collection and management. It was deemed particularly important to identify “attractors”—factors in the practice and in the larger environment that influenced the structure and function of the practice as an organization.6 For example, an attractor might be a particular burning interest of one of the physicians, an expectation of the local hospital systems, or a dominant demographic characteristic of the patients being served.13 An expanded systemic model of primary care Figure 1 was articulated that characterized 6 core areas for data collection: (1) patient perceptions and behavior, (2) physician perceptions and behavior, (3) encounter structures and processes, (4) practice structures and processes, (5) community characteristics, and (6) the larger health system. This model identified the need for additional data on the community context and patient experience. Checklists were revised and field researchers were asked to spend more time gathering data about the community. It was also apparent that accurate calculation of certain preventive service delivery rates would require patient input and a larger sample size. For example, a patient exit card
Figure W6 was developed to ascertain self-reported tobacco use status (for all patients) and use of obstetrics and gynecology services and history of hysterectomy (for women). These data were used to determine a patient’s eligibility for tobacco cessation counseling and Papanicolaou tests, respectively. Observation of 30 encounters with each clinician was done to increase the stability for calculating rates for common preventive service recommendations.
A larger issue emerged with the discovery that, after contributing data over the course of weeks or months, members of the practices desired feedback in a timely manner. Practice clinicians and staff were very interested in how they were doing and asked when they would be receiving a report of our results. They did not want to wait 3 years for the completion of the study. Although ongoing analyses were anticipated, these had primarily been designed to ensure completeness of data and to provide feedback to field researchers on areas where clarifications were needed. In response to the emergent desire for feedback, the team generated rapid-turnaround summary reports for each practice. A summary report template was designed to present the descriptive details of the practice, including the practice genogram and a summary of the strengths and weaknesses of the practice’s prevention approach. The final page in the report provided the practice with a series of questions or points for self-reflection that often included process questions, such as “How can this organization become a team?” or “How can this practice deliver preventive services more consistently?” These reports were shared interactively with practices at a debriefing meeting within 2 to 3 months of completing data collection at each practice.
The feedback meetings provided an important opportunity to check the validity of the researchers’ analyses by comparing them with the perspectives of the practice participants. In all the practices, the response was a strong overall validation of the research team’s interpretation of the practice and its structures and processes. During the feedback presentations, the practice physicians and staff consistently made comments like, “Wow, did you ever get us,” or “This is like looking in a mirror.” In a number of sessions, the participants mentioned that the report raised issues about which they were vaguely aware and that the findings were stimulating considerable self-reflection. In several practices, the physicians disclosed that they would be taking actions in the future to modify some of the deficiencies the reports uncovered.
The next modification came as the data were being collected simultaneously in the second and third practices. We realized that despite our efforts to be nondisruptive, participating in the project required extra effort on the part of the practice. Each practice therefore received partial compensation in the form of a $500 certificate for the purchase of books or equipment.
After completing data collection at several practices, it was apparent that patients’ perspectives were still under-represented and that this was limiting the understanding of the practice. To gain further insights into patients’ experiences, beginning with the sixth practice we adopted the patient path approach described by Pommerenke and Dietrich,27 following patients from the time they walk into the practice until the time they leave, using a patient path form for recording activities at different stages of the visit Figure W7. Additional brief open-ended interviews were conducted in the waiting room or examination rooms while patients were waiting.
Although we had asked the research nurses to be more thorough in their descriptions of the community, data from the community and health system were still incomplete. This became even more pronounced when studying practices that were part of health systems. Therefore, beginning with the sixth practice, we included in-depth interviews with individuals from health systems (eg, regional managers and medical directors). A further refinement came with the use of community key informant trees, a systematic process of identifying and interviewing members of the local community surrounding the practice.24,28 These interviews of patients, church leaders, and other individuals from the community began with the ninth practice.
Once all the modifications were incorporated, the final case study design provided data at each of the 6 levels as shown in the Table 1. Particularly detailed data were available at the clinician, encounter, and practice levels. For example, at the clinician level the data included perceptions of roles as ascertained through the in-depth interviews, as well as actual behaviors recorded in the encounter field notes and chart audits. Insights on the structures and process of the practice were obtained through unstructured observations of the practice, structured checklists, written documents, and interviews. Supporting data were collected on patients’ perspectives, the community, and the health system that provided contexts for the practice case studies.
Discussion
The complexity of primary care practices is best understood from multiple perspectives,29 a principle that guided the initial selection of a multimethod comparative case study design for this investigation. The MAP that emerged from this comparative case study design has a number of strengths and weaknesses. A particular asset of the design was our ability to investigate specific phenomena within their context rather than isolated from it.29 The design also encouraged the investigators to pursue emerging insights, thus informing multiple perspectives that might not have otherwise been considered, although this may be somewhat limited by the purposeful sampling strategy that focuses on maximizing information about a particular topic.
Limitations
A limitation for broader implementation of this research design is the intensity of data collection and analysis, which are difficult to accomplish without considerable resources and a research team with diverse skills. There might also be some concerns about the ways that the data collection process alters practice behavior; however, the prolonged observational time frame and the multiple data sources for “triangulation” are designed to limit any potential Hawthorne effect. That is, by collecting and systematically comparing data from multiple sources, including direct observation, different forms of interviews, and existing documents, the investigators were able to identify inconsistencies in patterns of behavior.17,19
Although the data collection, analysis, and feedback process appeared to increase a practice’s self-reflection, our study limited the input of patient and practice participants in the design, analysis, and interpretation and thus does not approach the participatory research paradigm espoused by McCauley and colleagues.30 Still, our study moved from being primarily observational descriptive research into a more collaborative and interventional project, in part at the request of the participants. This suggests a model and method for future research in the arena of health care process and outcome improvement with the practices as collaborators. The MAP characterized in this paper offers a means for simultaneously describing, understanding, and improving the richly complex and varied processes and outcomes of primary care. By more actively engaging the practices in the research process, the MAP also points toward a new more collaborative model of practice-based research.
Conclusions
The comprehensive data in the P&CD study provide a unique opportunity to understand and describe multiple perspectives from the clinician, patient, encounter, practice, community, and health system spheres. Each of the papers in this issue of JFP used some of these comprehensive data to study one or more of these spheres. For example, the encounter field notes were the primary source of data for exploring how “family” presents in encounters31Several of the authors used subsets of patients, including patients presenting with acute respiratory track infections,33 smokers,34 and frequent attenders.35 These authors each supplemented the encounter field notes with data from the medical record reviews, medical record field notes, and patient exit cards. The complete data set including practice field notes, practice genograms, physician and staff interviews, office environment checklists, and encounter field notes were used to describe staff training, roles, and functions.36 This is only a part of the research adventure available in this type of data. We hope many others will join in the excitement.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a Research Center grant from the American Academy of Family Physicians. Drs Crabtree, Miller and Stange are associated with the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio. We are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to thank Connie Gibbs and Jen Rouse, who spent many hours in the practices collecting data, and Diane Dodendorf and Jason Lebsack, who spent countless hours coordinating transcription and data management activities, for their dedicated work. We are especially indebted to Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. The ongoing analyses that ensured the quality and comprehensiveness of the data were made possible through the dedicated work of Helen McIlvain, PhD; Jeffrey Susman, MD; Virginia Aita, PhD; Kristine McVea, MD; Elisabeth Backer, MD; Paul Turner, PhD; and Louis Pol, PhD. Finally, we thank the members of the advisory committee: Valerie Gilchrist, MD; Paul Nutting, MD, MPH; Carlos Jaén, MD, PhD; Kurt Stange, MD, PhD; William Miller, MD, MA; Reuben McDaniel, PhD; and Ruth Anderson, RN, PhD.
METHODS: Eighteen practices were purposefully drawn from a random sample of Nebraska family practices that had earlier participated in a study of preventive service delivery. Each practice was studied intensely over a 4- to 12-week period using a comparative case study design that included extended direct observation of the practice environment and clinical encounters, formal and informal interviews of clinicians and staff, and medical record review.
DESIGN: This multimethod assessment process (map) provided insights into a wide range of practice activities ranging from descriptions of the organization and patient care activities to quantitative documentation of physician- and practice-level delivery of a variety of evidence-based preventive services. Initial insights guided subsequent data collection and analysis and led to the integration of complexity science concepts into the design. In response to the needs and wishes of the participants, practice meetings were initiated to provide feedback, resulting in a more collaborative model of practice-based research.
CONCLUSIONS: Our map provided rich data for describing multiple aspects of primary care practice, testing a priori hypotheses, discovering new insights grounded in the actual experience of practice participants, and fostering collaborative practice change.
Clinicians, researchers, and policymakers now recognize that multiple competing demands1 and opportunities2 are simultaneously affecting the physicians, staff, and patients within primary care practices. Our current understanding of outpatient practice is largely based on administrative databases, national surveys, and medical record reviews, with additional insights from surveys of patients or clinicians. These data generally are not designed to capture the richness of the content and context that is needed to better understand the realities and complexities of practice.3-6 The underlying premise of The Prevention and Competing Demands in Primary Care (P&CD) Study is that efforts to change practice should be preceded by efforts to understand it.2,7 The explicit goal of this study is to understand practice structure and process, including details of patients, physicians, staff, and clinical encounters; the practice as an organization; and its relationship to the larger community and health system.
In this paper we describe a dynamic observational multimethod assessment process (MAP) that can be used to understand the complex reality of primary care practice. MAP is based on a multimethod comparative case study design8,9 that integrates elements of epidemiology with methods derived from the qualitative traditions of anthropology and sociology and relies most heavily on qualitative observation and interviewing methods. Studies of this type require an iterative data collection and analysis approach that evolves over time so that new methods can be introduced as the investigators gain a better understanding of important issues. A major strength of our study design was that it allowed hypotheses and insights gained from participants and from ongoing analyses to be integrated into the ongoing investigation.
The study’s primary research questions related to how practice characteristics affect preventive service delivery. Thus, the research design included: (1) an examination of the organizational contexts that support preventive services, (2) an examination of the competing demands imposed by carrying out clinical prevention and illness care in clinical encounters and in the practice, (3) a comparison of the approaches used by practices with high versus low intensity of preventive services delivered to eligible patients, and (4) an examination of approaches used to deliver different types of preventive services. Although the particular focus was on preventive services, the rich MAP allowed pursuit of other research topics that are presented in this issue of JFP.
This article describes the evolutionary methods of the P&CD study, focusing on how data were collected to ensure that sufficient details were available to understand a practice’s values, structures, and processes.*
Emergent research design
The P&CD study was conceived in 1994 to be an in-depth follow-up of insights from the Direct Observation of Primary Care (DOPC) Study that was just getting under way in northeastern Ohio.5 The DOPC Study provided a largely quantitative assessment of patients, physicians, encounters, and practices using patient questionnaires, physician surveys, medical record audits, and direct observation of clinical encounters using the Davis Observation Code.10 That study’s initial findings were presented in the May 1998 theme issue of JFP, and the study processes have recently been described.11 Details of the DOPC methods have been published elsewhere.4,5,12
Although the initial design allowed the DOPC research nurses to collect brief observational notes, the intensity of the quantitative data collection limited the scope of the study’s qualitative data for understanding details of the practice’s organization and the competing demands within clinical encounters. As a consequence, the P&CD study was designed to provide more in-depth description and understanding of the competing demands of family practice, and in particular, to evaluate factors affecting preventive services delivery using a comparative case study design and a MAP.
A key feature of the P&CD study design was an openness to the integration of emerging insights into the data collection protocol. For example, preliminary analyses of the DOPC data13 and other ongoing studies14 led to the discovery that complexity science was valuable for explaining the dynamics of office systems6 and needed to be incorporated into the design. (Complexity science is the study of systems that are characterized by nonlinear dynamics and emergent properties; it emphasizes the need to understand the interrelationships of the whole system and not just collect data about the parts.15) The investigators also developed new ways to display the relationships among physicians and staff in the practices using “practice genograms.”16 The practice genogram is a diagram of the functional and interpersonal relationships among the clinicians, support staff, and other people and organizations interacting with the practice. Throughout the project and consistent with the standards of qualitative research design,17 there were continued modifications and enhancements in the data collection and analysis strategies in response to insights that were emerging from ongoing analyses and interpretation of the data.
An important feature of the project was the development of an advisory committee of consultants and co-investigators that convened annually to provide multidisciplinary input, review results, and provide feedback. The advisory committee included academic representatives with expertise in nursing, health education, women’s health, minority health, and public policy. Two additional members were added to the project to provide expertise into the study of organizations as complex systems. The annual reviews by the advisory committee led to significant changes in the research design while the study was ongoing.
Practice Sample
Beginning in late 1996, we drew from 91 practices in Nebraska that had been randomly selected to participate in an earlier study on tobacco prevention and cessation.18 Initially a sample of 10 practices was purposefully selected19,20 using an iterative process to represent a range in size (small and large), geographic location (urban, suburban, and rural), and rate of delivery of tobacco-related preventive services. Preliminary analyses of these 10 initial practices provided a summary of preventive health delivery strategies in primary care practices and a description of competing demands that enhanced or limited these strategies. To confirm or refute the emerging insights from the original 10 practices, 8 additional practices were selected for further data collection during the second and third years of the study. The sampling strategy in years 2 and 3 ensured that at least 2 practices each from several major regional hospital health systems were included and allowed us to assess emerging hypotheses about the importance of health system context for understanding community practices.
The practices were recruited by contacting one of the physicians to solicit participation; only those in which all family physicians in the practice agreed to participate were included in the study. Twenty-three practices were contacted; all physicians in 18 agreed to participate (78%).
Core Data Collection Methods
Data were collected by trained field researchers who spent 4 weeks or more taking notes at each practice while observing the practice and clinical encounters, conducting informal key informant interviews of staff, collecting office documents, and auditing charts of patients whose encounters were observed. Within each practice, data collection occurred in stages, with a short break after the initial week or 2 of observation to allow preliminary analyses to inform additional data collection.
Observations at the practice level were recorded in a combination of structured observational checklists, unstructured dictated field notes,21-23 and key informant interviews.24 Detailed floor plans of the practice were used to identify where particular activities occurred and where individual practice participants worked. Each day at the practice, the field researcher took short notes or “jottings” and dictated expanded field notes in the evening.23 A template of topics was used periodically to ensure that important aspects of the practice were not being overlooked. The template included lists of features of the community, practice, staff, and patients that the researchers saw as important Figure W1.*13 A 3-page structured practice environment checklist was adapted from earlier work on the DOPC project11 and included a wide range of practice characteristics and functions, including items such as the number and training of staff, counseling options offered, and management of telephone calls and referrals Figure W2a. This checklist also served as a detailed reminder to the research nurse of topics to be included in the field notes. Throughout the time the field researchers were recording field notes and filling in the checklist, they opportunistically asked clinicians or staff informal key informant questions for confirmation or clarification.
Consecutive patients for each clinician in a practice were approached with a goal of recruiting 30 patients who would consent to have the field researcher observe their visit. This generally required approaching 35 to 40 patients. Because some clinicians worked part-time or were not consistently in the practice, it was not always possible to observe 30 visits. After explaining the study and gaining signed informed consent from patients who agreed to participate, the field researchers observed the outpatient visit as unobtrusively as possible. A 1-page structured encounter checklist Figure W3 that was also modified from the DOPC study11 provided blanks for noting the reason for the visit, chief complaints, and final diagnoses, and for indicating whether any of approximately 100 preventive services were ordered or delivered. Space was provided at the bottom of the form for recording notes that were later used to dictate a re-creation of the encounter. At a later time, a chart audit was done on each observed patient’s medical record using a structure chart audit form Figure W4.
After the initial observational data were transcribed, a preliminary practice genogram16 was drawn, and an initial practice summary was written. The genogram of practice participants, roles, and relationships was initially diagrammed on a white board by a transdisciplinary research team by interviewing the field researcher about the current and past practice clinicians and staff and about the health system and community. The demographics of individuals were recorded, including age, sex, years with the practice, percentage of work effort, and job responsibilities. Additional details included functional and emotional relationships observed in the practice, such as who worked together and any obvious conflicts among members of the practice or with health system or community affiliations. This process enabled the investigators to identify areas of incomplete data so the field researchers could return to fill in missing details.
In addition to the key informant interviews done as part of the observation activities, more formal semistructured individual depth interviews were arranged with each clinician and many of the office staff.25 These interviews consisted of a 30-minute to 1-hour narrative interview in which the respondent was asked open-ended questions designed to elicit in-depth responses Figure W5. Although the major focus of these interviews was on the delivery of preventive services, more general questions were included to understand perspectives on practice process. For example, respondents were asked: “Could you describe for me a typical day for you in this practice?” and “If you believed a change was needed regarding some specific delivery of a service within this office, could you describe the process you would go through to try to get it implemented?” These interviews were audiotaped and transcribed verbatim.
To supplement the observational and interview data, field researchers gathered existing documents and artifacts from the practice. Items such as blank charts and flow sheets, patient schedules, personnel lists, samples of patient education materials and handouts, mission and vision statements, and annual reports were collected and compiled in binders. In some practices, particularly those affiliated with hospital health systems, materials were also available from Web sites. All transcribed interviews and dictated field notes from the practice and encounter observations were imported into FolioViews 4.11 (NextPage, Provo, Utah), a text-base management software program that facilitates coding, searching, and retrieval of large computerized text files.26
Emerging Design Decisions
Midway through data collection at the first practice, the advisory committee met to review the data and discuss any concerns. The advisory committee identified a number of emerging hypotheses related to complexity science concepts that were used to guide subsequent data collection and management. It was deemed particularly important to identify “attractors”—factors in the practice and in the larger environment that influenced the structure and function of the practice as an organization.6 For example, an attractor might be a particular burning interest of one of the physicians, an expectation of the local hospital systems, or a dominant demographic characteristic of the patients being served.13 An expanded systemic model of primary care Figure 1 was articulated that characterized 6 core areas for data collection: (1) patient perceptions and behavior, (2) physician perceptions and behavior, (3) encounter structures and processes, (4) practice structures and processes, (5) community characteristics, and (6) the larger health system. This model identified the need for additional data on the community context and patient experience. Checklists were revised and field researchers were asked to spend more time gathering data about the community. It was also apparent that accurate calculation of certain preventive service delivery rates would require patient input and a larger sample size. For example, a patient exit card
Figure W6 was developed to ascertain self-reported tobacco use status (for all patients) and use of obstetrics and gynecology services and history of hysterectomy (for women). These data were used to determine a patient’s eligibility for tobacco cessation counseling and Papanicolaou tests, respectively. Observation of 30 encounters with each clinician was done to increase the stability for calculating rates for common preventive service recommendations.
A larger issue emerged with the discovery that, after contributing data over the course of weeks or months, members of the practices desired feedback in a timely manner. Practice clinicians and staff were very interested in how they were doing and asked when they would be receiving a report of our results. They did not want to wait 3 years for the completion of the study. Although ongoing analyses were anticipated, these had primarily been designed to ensure completeness of data and to provide feedback to field researchers on areas where clarifications were needed. In response to the emergent desire for feedback, the team generated rapid-turnaround summary reports for each practice. A summary report template was designed to present the descriptive details of the practice, including the practice genogram and a summary of the strengths and weaknesses of the practice’s prevention approach. The final page in the report provided the practice with a series of questions or points for self-reflection that often included process questions, such as “How can this organization become a team?” or “How can this practice deliver preventive services more consistently?” These reports were shared interactively with practices at a debriefing meeting within 2 to 3 months of completing data collection at each practice.
The feedback meetings provided an important opportunity to check the validity of the researchers’ analyses by comparing them with the perspectives of the practice participants. In all the practices, the response was a strong overall validation of the research team’s interpretation of the practice and its structures and processes. During the feedback presentations, the practice physicians and staff consistently made comments like, “Wow, did you ever get us,” or “This is like looking in a mirror.” In a number of sessions, the participants mentioned that the report raised issues about which they were vaguely aware and that the findings were stimulating considerable self-reflection. In several practices, the physicians disclosed that they would be taking actions in the future to modify some of the deficiencies the reports uncovered.
The next modification came as the data were being collected simultaneously in the second and third practices. We realized that despite our efforts to be nondisruptive, participating in the project required extra effort on the part of the practice. Each practice therefore received partial compensation in the form of a $500 certificate for the purchase of books or equipment.
After completing data collection at several practices, it was apparent that patients’ perspectives were still under-represented and that this was limiting the understanding of the practice. To gain further insights into patients’ experiences, beginning with the sixth practice we adopted the patient path approach described by Pommerenke and Dietrich,27 following patients from the time they walk into the practice until the time they leave, using a patient path form for recording activities at different stages of the visit Figure W7. Additional brief open-ended interviews were conducted in the waiting room or examination rooms while patients were waiting.
Although we had asked the research nurses to be more thorough in their descriptions of the community, data from the community and health system were still incomplete. This became even more pronounced when studying practices that were part of health systems. Therefore, beginning with the sixth practice, we included in-depth interviews with individuals from health systems (eg, regional managers and medical directors). A further refinement came with the use of community key informant trees, a systematic process of identifying and interviewing members of the local community surrounding the practice.24,28 These interviews of patients, church leaders, and other individuals from the community began with the ninth practice.
Once all the modifications were incorporated, the final case study design provided data at each of the 6 levels as shown in the Table 1. Particularly detailed data were available at the clinician, encounter, and practice levels. For example, at the clinician level the data included perceptions of roles as ascertained through the in-depth interviews, as well as actual behaviors recorded in the encounter field notes and chart audits. Insights on the structures and process of the practice were obtained through unstructured observations of the practice, structured checklists, written documents, and interviews. Supporting data were collected on patients’ perspectives, the community, and the health system that provided contexts for the practice case studies.
Discussion
The complexity of primary care practices is best understood from multiple perspectives,29 a principle that guided the initial selection of a multimethod comparative case study design for this investigation. The MAP that emerged from this comparative case study design has a number of strengths and weaknesses. A particular asset of the design was our ability to investigate specific phenomena within their context rather than isolated from it.29 The design also encouraged the investigators to pursue emerging insights, thus informing multiple perspectives that might not have otherwise been considered, although this may be somewhat limited by the purposeful sampling strategy that focuses on maximizing information about a particular topic.
Limitations
A limitation for broader implementation of this research design is the intensity of data collection and analysis, which are difficult to accomplish without considerable resources and a research team with diverse skills. There might also be some concerns about the ways that the data collection process alters practice behavior; however, the prolonged observational time frame and the multiple data sources for “triangulation” are designed to limit any potential Hawthorne effect. That is, by collecting and systematically comparing data from multiple sources, including direct observation, different forms of interviews, and existing documents, the investigators were able to identify inconsistencies in patterns of behavior.17,19
Although the data collection, analysis, and feedback process appeared to increase a practice’s self-reflection, our study limited the input of patient and practice participants in the design, analysis, and interpretation and thus does not approach the participatory research paradigm espoused by McCauley and colleagues.30 Still, our study moved from being primarily observational descriptive research into a more collaborative and interventional project, in part at the request of the participants. This suggests a model and method for future research in the arena of health care process and outcome improvement with the practices as collaborators. The MAP characterized in this paper offers a means for simultaneously describing, understanding, and improving the richly complex and varied processes and outcomes of primary care. By more actively engaging the practices in the research process, the MAP also points toward a new more collaborative model of practice-based research.
Conclusions
The comprehensive data in the P&CD study provide a unique opportunity to understand and describe multiple perspectives from the clinician, patient, encounter, practice, community, and health system spheres. Each of the papers in this issue of JFP used some of these comprehensive data to study one or more of these spheres. For example, the encounter field notes were the primary source of data for exploring how “family” presents in encounters31Several of the authors used subsets of patients, including patients presenting with acute respiratory track infections,33 smokers,34 and frequent attenders.35 These authors each supplemented the encounter field notes with data from the medical record reviews, medical record field notes, and patient exit cards. The complete data set including practice field notes, practice genograms, physician and staff interviews, office environment checklists, and encounter field notes were used to describe staff training, roles, and functions.36 This is only a part of the research adventure available in this type of data. We hope many others will join in the excitement.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a Research Center grant from the American Academy of Family Physicians. Drs Crabtree, Miller and Stange are associated with the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio. We are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to thank Connie Gibbs and Jen Rouse, who spent many hours in the practices collecting data, and Diane Dodendorf and Jason Lebsack, who spent countless hours coordinating transcription and data management activities, for their dedicated work. We are especially indebted to Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. The ongoing analyses that ensured the quality and comprehensiveness of the data were made possible through the dedicated work of Helen McIlvain, PhD; Jeffrey Susman, MD; Virginia Aita, PhD; Kristine McVea, MD; Elisabeth Backer, MD; Paul Turner, PhD; and Louis Pol, PhD. Finally, we thank the members of the advisory committee: Valerie Gilchrist, MD; Paul Nutting, MD, MPH; Carlos Jaén, MD, PhD; Kurt Stange, MD, PhD; William Miller, MD, MA; Reuben McDaniel, PhD; and Ruth Anderson, RN, PhD.
1. Jaén CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
2. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
3. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
4. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patients visits. Med Care 1998;36:851-67.
5. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box’: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
6. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
7. Stange KC. One size doesn’t fit all: multimethod research yields new insights into interventions to increase prevention in family practice. J Fam Pract 1996;43:358-60.
8. Stake RE. The art of case study research. Thousand Oaks, Calif: Sage Publications; 1995:xv,175.
9. Crabtree BF, Miller W. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
10. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
11. DOPC Writing Group. The direct observation of primary care study: insights from the process of conducting multimethod, transdisciplinary research in community family practice. J Fam Pract 2001;50:345-52.
12. Stange KC, Flocke SA, Goodwin MA, Kelly RB, Zyzanski SJ. Direct observation of rates of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
13. Crabtree BF, Miller WL, Aita VA, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
14. McVea K, Crabtree BF, Medder JD, et al. An ounce of prevention? Evaluation of the “Put Prevention into Practice” program. J Fam Pract 1996;43:361-69.
15. McDaniel R, Driebe DJ. Complexity science and health care management. Adv Health Care Manage 2001;2:11-36.
16. McIlvain H, Crabtree B, Medder J, Stange KC, Miller WL. Using practice genograms to understand and describe practice configurations. Fam Med 1998;30:490-96.
17. Lincoln YS, Guba EG. Naturalistic inquiry. Beverly Hills, Calif: Sage Publications; 1985;416.-
18. McIlvain HE, Crabtree BF, Backer EL, Turner PD. Use of office-based smoking cessation activities in family practices. J Fam Pract 2000;49:1025-29.
19. Patton MQ. Qualitative evaluation and research methods. 2nd ed. Newbury Park, Calif: Sage Publications; 1990.
20. Kuzel A. Sampling in qualitative inquiry. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;33-45.
21. Jorgensen DL. Participant observation. Newbury Park, Calif: Sage Publications; 1989.
22. Spradley JP. Participant observation. New York, NY: Harcourt Brace Jovanovich College Publishers; 1980.
23. Bogdewic SP. Participant observation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;47-69.
24. Gilchrist VJ, Williams RL. Key informant interviews. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;71-88.
25. Miller WL, Crabtree BF. Depth interviewing. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;89-107.
26. Meadows L, Dodendorf D. Data management & interpretation using computers to assist. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;195-218.
27. Pommerenke FA, Dietrich AJ. Improving and maintaining preventive services. Part 1: supplying the patient model. J Fam Pract 1992;34:86-91.
28. Williams RL, Snider R, Ryan M. A key informant ‘tree’ as a tool for community oriented primary care. Fam Pract Res J 1994;14:277-84.
29. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
30. McCauley A, Commanda L, Freeman W. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
31. Main DS, Holcomb S, Dickinson P, Stange KC, Crabtree BF. The role of the family in medical encounters. J Fam Pract 2001;50:888.-
32. Robinson D, Prest L, Susman JL, Rasmussen D, Rouse J, Crabtree BF. Technician, detective, friend and healer: understanding mental health management in family practice. J Fam Pract 2001;50:864-70.
33. Scott J, DiCicco-Bloom B, Cohen D, et al. Antibiotic use in the treatment of URI. J Fam Pract 2001;50:853-58.
34. Jaén CR, McIlvain H, Pol L, Phillips RL, Flocke SA, Crabtree BF. Tailoring tobacco counseling to the competing demands in the clinical encounter. J Fam Pract 2001;50:859-63.
35. Smucker D, Zink T, Susman JL, Crabtree BF. Caring for patients who make frequent visits to family practices. J Fam Pract 2001;50:847-52.
36. Aita VA, Dodendorf D, Lebsack J, Tallia AF, Crabtree BF. Patient care staffing patterns and roles in community-based family practices. J Fam Pract 2001;50:889.-
1. Jaén CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
2. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
3. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
4. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patients visits. Med Care 1998;36:851-67.
5. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box’: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
6. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
7. Stange KC. One size doesn’t fit all: multimethod research yields new insights into interventions to increase prevention in family practice. J Fam Pract 1996;43:358-60.
8. Stake RE. The art of case study research. Thousand Oaks, Calif: Sage Publications; 1995:xv,175.
9. Crabtree BF, Miller W. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
10. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
11. DOPC Writing Group. The direct observation of primary care study: insights from the process of conducting multimethod, transdisciplinary research in community family practice. J Fam Pract 2001;50:345-52.
12. Stange KC, Flocke SA, Goodwin MA, Kelly RB, Zyzanski SJ. Direct observation of rates of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
13. Crabtree BF, Miller WL, Aita VA, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
14. McVea K, Crabtree BF, Medder JD, et al. An ounce of prevention? Evaluation of the “Put Prevention into Practice” program. J Fam Pract 1996;43:361-69.
15. McDaniel R, Driebe DJ. Complexity science and health care management. Adv Health Care Manage 2001;2:11-36.
16. McIlvain H, Crabtree B, Medder J, Stange KC, Miller WL. Using practice genograms to understand and describe practice configurations. Fam Med 1998;30:490-96.
17. Lincoln YS, Guba EG. Naturalistic inquiry. Beverly Hills, Calif: Sage Publications; 1985;416.-
18. McIlvain HE, Crabtree BF, Backer EL, Turner PD. Use of office-based smoking cessation activities in family practices. J Fam Pract 2000;49:1025-29.
19. Patton MQ. Qualitative evaluation and research methods. 2nd ed. Newbury Park, Calif: Sage Publications; 1990.
20. Kuzel A. Sampling in qualitative inquiry. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;33-45.
21. Jorgensen DL. Participant observation. Newbury Park, Calif: Sage Publications; 1989.
22. Spradley JP. Participant observation. New York, NY: Harcourt Brace Jovanovich College Publishers; 1980.
23. Bogdewic SP. Participant observation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;47-69.
24. Gilchrist VJ, Williams RL. Key informant interviews. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;71-88.
25. Miller WL, Crabtree BF. Depth interviewing. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;89-107.
26. Meadows L, Dodendorf D. Data management & interpretation using computers to assist. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;195-218.
27. Pommerenke FA, Dietrich AJ. Improving and maintaining preventive services. Part 1: supplying the patient model. J Fam Pract 1992;34:86-91.
28. Williams RL, Snider R, Ryan M. A key informant ‘tree’ as a tool for community oriented primary care. Fam Pract Res J 1994;14:277-84.
29. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
30. McCauley A, Commanda L, Freeman W. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
31. Main DS, Holcomb S, Dickinson P, Stange KC, Crabtree BF. The role of the family in medical encounters. J Fam Pract 2001;50:888.-
32. Robinson D, Prest L, Susman JL, Rasmussen D, Rouse J, Crabtree BF. Technician, detective, friend and healer: understanding mental health management in family practice. J Fam Pract 2001;50:864-70.
33. Scott J, DiCicco-Bloom B, Cohen D, et al. Antibiotic use in the treatment of URI. J Fam Pract 2001;50:853-58.
34. Jaén CR, McIlvain H, Pol L, Phillips RL, Flocke SA, Crabtree BF. Tailoring tobacco counseling to the competing demands in the clinical encounter. J Fam Pract 2001;50:859-63.
35. Smucker D, Zink T, Susman JL, Crabtree BF. Caring for patients who make frequent visits to family practices. J Fam Pract 2001;50:847-52.
36. Aita VA, Dodendorf D, Lebsack J, Tallia AF, Crabtree BF. Patient care staffing patterns and roles in community-based family practices. J Fam Pract 2001;50:889.-
Practice Jazz: Understanding Variation in Family Practices Using Complexity Science
Our emerging understanding conceptualizes family practices as local professional complex adaptive systems. These systems exist for the purpose of seeing patients for everyday health concerns and assisting them in getting on with their daily lives. Each family practice is unique because of history and initial conditions, particular agents (eg, physicians, staff, patients, systems), nonlinear interactions among agents, the local ecology, and regional and global influences. How all these factors manifest in a particular practice can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. The concepts of sensemaking and improvisation can be used to understand how practices deal with variation.
We conclude that complexity science concepts can provide a useful framework for understanding variation and change in family practices. The challenge is to differentiate error from relational variation and to improve practices’ sensemaking and improvisational skills. Future efforts to improve practice should focus on optimizing a practice’s care as a whole and enhancing reflective practice and relationship-centered care.
One major focus of health services research and quality improvement efforts is to identify and reduce variation.1-4 Standardization is the approach usually offered to minimize variation, thus reducing errors and increasing quality.5,6 These interventions are often based on an industrial quality improvement paradigm7 using linear interventions that assume that inputs reliably lead to proportionate responses.8 These interventions include re-engineering and expanded information systems.9,10 If the application of linear Newtonian views is correct, then standardization is the key to quality improvement, and effective practices will look much alike. The search for and attempt to implement best practice guidelines11-14 are examples of efforts to bring practices into conformity and to establish process standards for best behavior. However, the search for simple, easily transportable interventions has not been as successful as traditional logic might suggest.15-19
Emerging views of organizations derived from complexity science bring the key understanding that practices are more than commodity-delivering businesses—they are complex adaptive systems.20 These systems involve connected participants interacting in ways that generate the spontaneous emergence of new structures and behaviors. In complex adaptive systems, we expect to see variation in practice patterns, even when the outcomes of practices are similar.
In a previous issue of JFP,21 we proposed a model of primary care practices as complex adaptive systems and suggested implications and strategies for change. Since then we have begun applying this theoretical framework to other studies designed to understand and advance generalist practice. Our present purpose is to advance the application of complexity science to understanding and improving primary care practices and their co-evolving health care systems.
The theory application process
The 3 studies that this theory application builds on began with the Direct Observation of Primary Care (DOPC) study, a 3-year (1994-1997) multimethod descriptive investigation of the content of 4454 patient visits to 138 family physicians in 84 family practices.22,23 One of the outcomes of this study was a model for understanding change in family practice based on complexity science.21 Subsequently, The Prevention and Competing Demands in Primary Care Study (PCDPC) was a 3-year (1996-1999) in-depth descriptive case study of 1637 outpatient visits to 56 clinicians in18 family practices purposefully sampled to include diversity in geographic location, size of practice, and intensity of delivery of preventive services.24 We also implemented a 4-year (1997-2001) multimethod clinical trial, the Study to Enhance Prevention by Understanding and Practice (STEP-UP), to understand and improve the delivery of preventive health services in 77 family practices.25
We conducted an explicit theory application and refinement process consisting of 10 meetings to analyze data from DOPC, PCDPC, and STEP-UP, informed by a literature review of complexity science. The developing model was explicitly tested in 3 Nebraska family practices in 1999 in an effort to improve diabetes care.26 The resulting specific theory application to family practices was then evaluated using 2 cases from these studies.
Application of complexity science to family practice
Synthesizing our observations of the family practices and the literature review, we developed the following theoretical model. Family practices are local professional complex adaptive systems with the primary purpose of seeing patients for everyday health concerns to assist “them” in getting on with their daily lives. “Them” refers to the patients and their families, the clinicians, and the office staff. Practice leaders and managers usually describe their practices in terms of efficiency, productivity, adherence to standards of care, and patient satisfaction. Increasingly, practices function in interactive ways with managers or owners from local health care systems. Still, these practices behaved more like complex adaptive systems operating within a professional milieu than like businesses.27
Complex adaptive systems are like a family reunion; they are dynamic-bounded webs of diverse agents interacting nonlinearly.28 Dynamic refers to the continual presence of multiple interactions and their accompanying surprises, challenges, and responses, both within the system and between the system (eg, practices) and its environment. Bounded refers to the defining purpose or intent of the system (eg, to deliver health care to local patients). The metaphor of a web characterizes the multiple interconnections of the system. The agents in practices include clinicians, office staff, and patients, and can also include pharmaceutical representatives, health care system administrators, and others. Agents have the capacity to exchange information, learn, and adjust their behavior. No individual agent can ever know or understand everything that is occurring. The nonlinear relationships among the agents are the result of ongoing feedback loops and mean that small changes can lead to large effects and big changes can lead to small effects. For example, the introduction of a small medical record stamp to identify smokers in one practice leads to a dramatic increase in smoking counseling, while a major quality improvement program in another practice results in minimal change.
What makes an organization “professional” is the application of specialized values and expertise to address difficult problems and uncertainty. These values and skills are acquired through specialized training and are created by the larger social context.29 It is the socially defined professional values and expertise of the family physician that are applied in their daily activities. A co-evolving professional world of the health system and payer manager is increasingly interacting with the physician professional value system.
Each family practice is unique because of 5 features:
- History and initial conditions,including any explicit or implicit mission and the underlying priorities for the practice.
- Particular agents and their unique styles and interests.
- The pattern of nonlinear interactions among agents.
- The local fitness landscape(ie, the practice’s ecological niche) and its particular expectations, community values, competitive issues, and ecology.
- Regional and global influences, such as larger health care systems, finances and regulations, and culture.
The local fitness landscape, a complexity science term from evolutionary biology, specifically refers to the local terrain and all the many complex adaptive systems, from microbes to organizations, that seek their own purpose and niche within that terrain. Biological evolution and technological evolution are processes attempting to optimize systems riddled with conflicting constraints.30 Each family practice must evolve by attempting to optimize the entire package of services it delivers. This evolution must account for all the other competing and cooperating health care services and local resources, such as economic conditions, availability of insurance types, and the particular local disease and illness epidemiology.
How all of these factors manifest in a particular family practice at any given time and over time can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. Self-organization refers to the spontaneous development of structures and forms of behavior in systems characterized by multiple feedback loops and nonlinear dynamics. These structures are a function of the patterns of relationships among agents. Everything changes in response to and as a result of everything else, with each complex adaptive system seeking a better position in its local fitness landscape—a niche where it can prosper and survive.31 In this setting of ongoing co-evolution, both competition and collaboration become strategies for workable solutions. As the agents of any complex adaptive system interact, novelty and surprise continuously emerge in unpredictable ways. For example, a new successful approach to scheduling is introduced by an unassuming receptionist. This emergence creates a system that is greater than the sum of its parts; it is what cannot be understood through a reductionist (one problem at a time) examination of the practice.32
A particular family practice is the unique self-organized system that emerges when particular physicians and staff (agents) come together in particular ways with particular goals, preferences, and priorities (initial conditions) within a particular community setting (local fitness landscape) given specific regional and global influences. At any future point, this practice is the unique self-organized system that has emerged through co-evolution with all the other systems in the local fitness landscape.
The result is much variation between and within family practices. Practices have much in common, however, because of their common goal of seeing patients to assist them with their everyday health problems in a shared cultural and historical context. From that perspective, variation in family practices is inevitable and a powerful source of creative possibility, value, and good clinical practice. Practices use 2 strategies for successfully enhancing that creativity: sensemaking and improvisation. Sensemaking is a social activity that requires interaction among agents.33,34 People must come to have some notion of “Who am I,” “Why am I here,” and “What is going on around me?” Improvisation is a strategy for dealing with surprise in complex adaptive systems. Improvisation can be described as intuition guiding action in a spontaneous way.35 Intuition is not a random guess at what to do, but the result of using high levels of expertise to act in the moment.27,36
Among the many practices in the studies we are currently analyzing, multiple areas of variation are observed. These include differences in charting systems, clinical care decisions, scheduling, billing and coding procedures, staff relationships, and management and clinical styles. Sometimes these variations provide an adaptive advantage, but often not, and it is seldom clear in advance which will be true. Inflexible standardization, however, is often poorly responsive to the needs of different practices’ diverse agents and to the almost constant situations of uncertainty, contextual uniqueness, and surprise that occur in the practices.
Case studies
To illustrate the application of complexity science–based sensemaking to family practice, we present 2 case studies.
We selected 2 practices that had high quality of care as measured by delivery of preventive health services and patient satisfaction. One case takes advantage of the longitudinal data from DOPC and STEP-UP, and the other makes use of the more in-depth cross-sectional data from PCDPC. The cases were also selected to assure maximum variation in location and affiliation, and homogeneity in practice size (4-8 clinicians). The names have been changed to protect confidentiality.
Franchise Family Practice
History/Initial Conditions
Franchise is one of several primary care offices created by the Health Salute Corporation in affluent suburban areas of intense competition for market share growth. The corporate intent for this practice is to be productive and profitable. Two family physicians, a pediatrician, and a nurse practitioner were brought in from other practices, and several of their staff members followed. They all agree that their mission is to be the best practice in the Health Salute Corporation. Their identity is to capture market share through better efficiency, a mechanistic approach (scientific and standardized), and a friendly and caring attitude.
Agents and Patterns of Interaction
The practice manager’s daily attire in stockings and heels sets the tone for interactions, which are formal and professional. A small core of staff is dedicated to this practice, but there is often temporary help from other Health Salute offices during busy times. The patient population of predominantly mobile, insured, 2 working parent families tends to value convenience over relationships. The physicians seem to have little emotional investment in this particular practice, place, or each other. Conflicts are minimized and usually covered over with humor.
Local Fitness Landscape
There is no clear sense of community in this new suburb. Franchise is located in the heart of “minivan land,” an unrolling suburban carpet. The 2 competing systems are a major threat and are constantly being discussed. It is very clear that the survival of Franchise is dependent on success in the marketplace as determined by Health Salute.
Regional/Global Influences
Managed care has a strong presence, with much pressure to implement multiple practice guidelines, frequent chart audits, and different formularies.
Self-Organization
In many respects, Franchise Family Practice comes close to fulfilling its mission. Franchise is friendly, fun loving, and clean. It is a high performer at delivering preventive health services and is full of glitz and protocols. There are multiple systems in place for all phases of practice operation, and the manager sees they are working.
Emergence
Despite this managed order, surprises, problems, uncertainty, and complexities keep arising on a daily basis. Occasionally individuals respond creatively, but more often they stick to the protocols and generate even more trouble. There are frequent staff meetings where common problems are discussed. Many different solutions emerge in these discussions, but the final resolution is usually based on what the practice management thinks Health Salute would want. Even in this intensely structured practice, multiple competing demands, power distributions, and interpersonal battles are being simultaneously worked out on a daily basis.
Co-Evolution
As the suburbs grew, more practices were opened. The original practice in the area was soon challenged by Franchise and then by another competitor. Each of the 3 practices often acted or reacted in response to the others. Approximately a year after our research ended, Franchise Family Practice was closed by Health Salute because of inadequate profitability, and within a few months the second competitor also closed its practice.
Dusty Garden Family Practice
History/Initial Conditions
Dusty Garden began as a pioneering model for community-oriented primary care in an economically impoverished urban area. The practice was created with a focus on the patient in this underserved community. Envisioned by its founding family physician and practice manager, this practice was established in close collaboration with a community board. Survival is dependent on the ability to obtain funding for many poorly reimbursed services.
Agents and Patterns of Interaction
Dusty Garden has a dense and diverse web of complex interdependence. During the first few years of our research, most practice staff members came from the community. The practice grew from 4 to 6 family physicians and 2 nurse practitioners, and there was also much staff turnover. Dusty Garden was often a stepping stone for some clinicians, a chance to work in an “idealistic place” before going on to other things. However, the leadership has remained stable.
Local Fitness Landscape
The health care needs in the community were great, and the practice responded by growing rapidly, at times exceeding resources. At the same time, the local market consolidated into 2 competing hospital systems, with nearly all practices officially aligned with one of them. Necessary external funding also became more difficult to obtain.
Regional/Global Influences
Patients were represented by a mix of insurers. Uninsured or underinsured patients were cared for under a sliding-scale reimbursement scheme. There was also a perceived need to demonstrate successes and quality of care, and to place more emphasis on productivity and efficiency.
Self-Organization
The original practice was located in a dusty and cluttered building. It was difficult to tell who was responsible for what, but a shared sense of purpose gave the practice a family feel. Conflicts were evident but quickly resolved by frank discussions and a shared commitment to the practice mission. Schedules were constantly being disrupted by responding to patients and staff members’ diverse needs. In spite of this seeming chaos, Dusty Garden was an exemplar at delivering preventive services. This was accomplished by several dedicated clinicians and practice systems that involved the active participation of multiple personnel.
Co-Evolution
Significant change was occurring, and the practice was pushed to divide or grow in response to increasing patient demand and community need. They chose to grow and move into a much larger, newer, and more functional building down the street, resulting in a set of unanticipated consequences as both they and the fitness landscape changed. The new facility was more accessible and visible to a demographically different set of patients. Because of the changes in the local health care system, Dusty Garden also felt compelled to develop a relationship with the academic hospital system.
Emergence
What emerged was an organization where staff were isolated in large functionally differentiated spaces. The greater practice size demanded more specialization, and patterns of relationships dramatically changed. The change altered the number and specifics of the agents; many of the community-based staff left; and there was frequent turnover among newly hired and overwhelmed front office staff. The change also altered the number and character of the interactions per agent. Still, the vision remains strong, and many meetings are occurring to restore a “new” sense of practice community. Preventive service delivery rates remain high. The leadership initially responded to these changes with efforts aimed at greater standardization, but these are now being balanced by paying more attention to what solutions are being improvised “on the ground.”
Case study analysis
A comparative analysis of these 2 cases provides the following insights based on a complexity science view of the world:
- Each practice was performing well using the delivery of preventive health and patient satisfaction as proxies for total practice performance.
- The practices differed from each other in critical ways that seem to be at odds with traditional “best practices” thinking.
- The practices were similar in that they had each organized themselves, coevolved, and emerged as a function of the nonlinear interdependencies among agents and the local fitness landscape, not solely as a function of some externally imposed script.
- Each practice engaged in sensemaking activity to understand its unfolding world.
- Each practice engaged in improvisational behavior as a strategy for developing strategic and tactical responses to its unfolding world.
- Variation was often a source of strength, not a sign of bad practice.
Discussion
The traditional,2-4 largely unsubstantiated,37-40 view is that the best way to improve care is to eliminate variation. A view of family practice informed by complexity science suggests otherwise. In complex adaptive systems, agents in the practices create responses to changing circumstances—they improvise, or play practice jazz. Jazz players are often seen as role models of sensemaking and improvisational behavior.28 They know a general musical structure, and within that they create jazz. Bad jazz occurs when one person plays what the others cannot make sense of and build on. All the players have an interdependent responsibility to create good jazz. When good jazz players hear something unexpected, they make sense of it and improvise. Dealing with the uncertain nature of complex adaptive systems involves thinking in terms of making sense of what is emerging. How can I improvise to use whatever happens to further the system’s development? It involves building on emergent characteristics of the complex adaptive system to develop patterns of social interaction41 among agents that give them confidence in each other, lead to small wins, and enhance the capacity to learn from unpredicted events.42
Nevertheless, differentiating desirable from undesirable variation is an opportunity to learn from our history, and an opportunity to improve our practice jazz.37,41,43 Small changes can have large results in some settings, while large efforts may lead to meager results in others. Complexity theory offers a framework for understanding these phenomena in family practice, and lays the groundwork for future research. On the basis of the proposed theoretical model, we hypothesize that it is critical to differentiate the variations that are sources of error from the variations due to the dynamics of relationships. From the perspective of complexity science, relational variation is linked to diversity among agents and represents constructive and adaptive variations and emergent behavior within an ever-changing and unpredictable local fitness landscape. From this perspective, the goals are to eliminate error through development of better systems of operation and to reduce confusion and poor judgment by improving sensemaking and communication. It is also important to enhance desirable variation by developing the skills of the relationship-centered clinical method,44-46 improvisation, and reflective practice.47-50
Sensemaking may be enhanced by considering the 4 ways of knowing health and health care,51 which include understanding: (1) the clinician; (2) the patient, family, and community; (3) systems; and (4) scientific evidence about disease and treatment. Judgments about the variation can be made within each way of knowing. Desirable variation due to clinician factors should build on each clinician’s unique skills and values, and compensate for or improve weaknesses. Local adaptation of objective evidence and the development of unique approaches to meeting the needs of patients in their personal and local context represent potentially desirable sources of variation. Evidence-based medicine provides a basis for reducing variation on the basis of scientific knowledge developed from studies of groups of individuals. Systems that integrate scientific evidence with the unique needs of patients, families, and communities and the specific talents of clinicians represent an opportunity for interventions that both reduce variation from known effective health care approaches and increase variability that personalizes care. Complexity science can help us to look for the inter-relationships among these different ways of knowing and to recognize what is not knowable or controllable. Yet, complexity science represents only a partial answer to efforts to integrate these diverse perspectives. There is a need for additional theoretical work to develop approaches that both include and transcend current ways of thinking.52,53
Family practices are systems co-evolving within fitness landscapes where there is a continual need for sensemaking and improvisation. This is particularly true during the current period of rapid change and co-evolution of practices with a rapidly changing health care system. Excessive standardization with the goal of trying to maximize each part is as potentially problematic51 as variation from scientific evidence. Like a healthy body, a healthy practice represents a balance of the generalizable and the particular. The result is tension between the local, the regulatory, and the universal—and between patient, professional, societal, and ecological expectations. We believe the principles of complexity science explain why linear quality improvement interventions (one disease at a time) often have limited effect and poor transportability.15,16,19,54-56 These principles may also explain why countries with a higher proportion of primary care services have better population health status36,57,58 despite the repeated observation that specialists do better at following disease guidelines and improving disease specific outcomes.59-61 It is never just about the specific; it is about the specific in relation to the whole, and the whole is always more than the sum of the specifics. Good primary care serves all members of the community well with the resources available. Further application of complexity science to understand these paradoxes will require more quality longitudinal data in multiple practices and broad integrative measures of the process and outcomes of care.51,62
Conclusions
Family physicians are told to implement guidelines, to diagnose and treat in specific ways, and to eliminate variation in practice. Our study using complexity science suggests that this is only part of the story. Family practices are systems that self-organize, reveal emergent behavior, and co-evolve. Successful practices are those that minimize errors, make good sense of what is happening, and effectively improvise to make good practice jazz. Seeking to eliminate error by dampening all variation through the imposition of excessive standardization and external controls is unlikely to be sustainably effective and is likely to have long-term negative consequences. We encourage all family practice staff members to become knowledgeable of practice guidelines and evidence-based practice; these are some of the core skills of good patient care.63 Using these core skills to implement flexible, locally meaningful systems may reduce error. Also, efforts to change and improve future practice are best served by focusing on improving care as a whole and on developing the skills of reflective practice and relationship-centered care.51 We encourage policymakers to acknowledge the potential benefits of some kinds of variation and to support its healthy evolution.
Acknowledgments
The data used in our paper came from studies supported by a grant (1RO1 HS08776) from the Agency for Health Care Policy and Research (now the Agency for Healthcare Research and Quality) and grants from the National Cancer Institute (1RO1 CA60862 and 2RO1 CA60862). A A grant from the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio, supported communications, analyses, and writing. We are grateful to the practices participating in the research on which our paper was based.
1. McPherson K, Wennberg JE, Hovind OB, Clifford P. Small-area variations in the use of common surgical procedures: an international comparison of New England, England, and Norway. N Engl J Med 1982;307:1310-14.
2. DeMott K. Healthcare practices vary widely from town to town: regional Dartmouth Atlas. Health Syst Lead 1997;4:2-3.
3. Fineberg HV, Funkhouser AR, Marks H. Variation in medical practice: a review of the literature. Ind Health Care 1985;2:143-68.
4. Volinn E, Diehr P, Ciol MA, Loeser JD. Why does geographic variation in health care practices matter? (And seven questions to ask in evaluating studies on geographic variation). Spine 1994;19:2092S-100S.
5. Carnett WG. Clinical practice guidelines: a tool to improve care. Qual Manag Health Care 1999;8:13-21.
6. Weiner JP, Parente ST, Garnick DW, Fowles J, Lawthers AG, Palmer RH. Variation in office-based quality: a claims-based profile of care provided to Medicare patients with diabetes. JAMA 1995;273:1503-08.
7. Laffel G, Blumenthal D. The case for using industrial quality management science in health care organizations. JAMA 1989;262:2869-73.
8. Kaplan D, Glass L. Understanding nonlinear dynamics. New York, NY: Springer-Verlag; 1995.
9. Shortell SM, Gillies RR, Anderson DA. Remaking health care in America. 2nd ed. San Francisco, Calif: Jossey-Bass; 2000.
10. Kaegi L. AMA clinical quality improvement forum ties it all together: from guidelines to measurement to analysis and back to guidelines. Jt Comm J Qual Improve 1999;25:95-106.
11. Burns LR, Denton M, Goldfein S, Warrick L, Morenz B, Sales B. The use of continuous quality improvement methods in the development and dissemination of medical practice guidelines. Qual Rev Bull 1992;18:434-39.
12. Gottlieb LK, Sokol HN, Murrey KO, Schoenbaum SC. Algorithm-based clinical quality improvement: clinical guidelines and continuous quality improvement. HMO Pract 1992;6:5-12.
13. Woolf SH. Practice guidelines: a new reality in medicine. Arch Intern Med 1990;150:1811-18.
14. Kamerow DB. Before and after guidelines. J Fam Pract 1997;44:344-46.
15. Solberg L, Kottke T, Brekke M. Failure of a trial of continuous quality improvement and systems intervention to increase the delivery of clinical preventive services. Effect Clin Pract 2000;3:105-15.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Davis SA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance: a systematic reviw of the effect of continuing medical education strategies. JAMA 1995;274:700-05.
18. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
19. Davis DA, Taylor-Vaisey A. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
20. Committee on Quality of Health Care in America. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academy Press; 2001.
21. Miller WL, Crabtree BF, McDaniel RR, Jr, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
22. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box:’ a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
23. The DOPC Writing Group. Conducting the Direct Observation of Primary Care Study: insights from the process of conducting multimethod transdisciplinary research in community practice. J Fam Pract 2001;50:345-52.
24. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
25. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study To Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med 2001;21:2-28.
26. Helseth LD. Using the complexity model to enhance diabetes management in three family medicine practices: a qual
27. Anderson RA, McDaniel RR, Jr. Managing healthcare organizations: where professionalism meets complexity science. Health Care Manage Rev 2000;25:83-92.
28. McDaniel Jr RR, Driebe DJ. Complexity science and health care management. Adv Strat Manage 2001;2:11-36.
29. Friedson E. Professionalis reborn: theory, prophecy, and policy. Chicago, Ill: The University of Chicago Press; 1994.
30. Smith TS. Nonlinear dynamics and the micro-macro bridge. In: Eve RA, Horsfall S, Lee ME, eds. Chaos, complexity, and sociology: myths, models, and theories. Thousand Oaks, Calif: Sage Publications; 1997;52-63.
31. Kauffman S. At home in the universe: the search for the laws of self-organization and complexity. New York, NY: Oxford University Press; 1995.
32. Newman DV. Emergence and strange attractors. Phil Sci 1996;63:245-61.
33. Weick KE. Sensemaking in organizations. In: Whetten D, ed. Foundations for organizational science. Thousand Oaks, Calif: Sage Publications; 1995.
34. Thomas JB, Clark SM, Gioia D. Strategic sensemaking and organizational performance: linkages among scanning, interpretation, action and outcomes. Acad Manage J 1993;36:239-70.
35. Crossan M, Sorrenti M. Making sense of improvisation. Adv Strat Manage 1997;14:155-80.
36. Tanenbaum SJ. Evidence and expertise: the challenge of the outcomes movement to medical professionalism. Acad Med 1999;74:757-63.
37. Casparie AF. The ambiguous relationship between practice variation and appropriateness of care: an agenda for further research. Health Policy 1996;35:247-65.
38. Chassin MR. Explaining geographic variations: the enthusiasm hypothesis. Med Care 1993;31:YS37-44.
39. Leape LL, Park RE, Solomon DH, Chassin MR, Kosecoff J, Brook RH. Does inappropriate use explain small-area variations in the use of health care services? JAMA 1990;263:669-72.
40. Stano M. Evaluating the policy role of the small area variations and physician practice style hypotheses. Health Policy 1993;24:9-17.
41. Westert GP, Groenewegen PP. Medical practice variations: changing the theoretical approach. Scand J Public Health 1999;27:173-80.
42. McDaniel RR,, Jr. Strategic leadership: a view from quantum and chaos theories. Health Care Manage Re 1997;22:21-37.
43. Fertig A, Roland M, King H, Moore T. Understanding variation in rates of referral among general practitioners: are inappropriate referrals important and would guidelines help to reduce rates? BMJ 1993;307:1467-70.
44. Stewart M, Weston WW, Brown JB, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, Calif: Sage Publications; 1995.
45. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on outcomes. J Fam Pract 2000;49:796-804.
46. Roter D. The enduring and evolving nature of the patient-physician relationship. Pat Ed Counsel 2000;39:5-15.
47. Novack DH, Suchman AL, Clark W, Epstein RM, Najberg E, Kaplan C. Calibrating the physician: physician personal awareness and effective patient care. JAMA 1997;278:502-09.
48. Clark R, Croft P. Critical reading for the reflective practitioner: a guide for primary care. Oxford, England: Butterworth-Heiineman; 1998.
49. Epstein R. Mindful practice. JAMA 1999;282:833-39.
50. Bolton G. Reflective practice: written and professional development. London, England: Sage Publications; 2001.
51. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
52. Wilber K. A brief theory of everything. Boston, Mass: Shambhala Publications, Inc; 2000.
53. Wilber K. Sex, ecology, spirituality. Boston, Mass: Shambhala Publications, Inc; 1995, 2000.
54. Davis P, Gribben B, Scott A, Lay-Yee R. The “supply hypothesis” and medical practice variation in primary care: testing economic and clinical models of inter-practitioner variation. Soc Sci Med 2000;50:407-18.
55. O’Connell DL, Henry D, Tomlins R. Randomised controlled trial of effect of feedback on general practitioners’ prescribing in Australia. BMJ 1999;318:507-11.
56. Salisbury C, Bosanquet N, Wilkinson E, Bosanquet A, Hasler J. The implementation of evidence-based medicine in general practice prescribing. Br J Gen Pract 1998;48:1849-52.
57. Starfield B. Primary care: balancing health needs, services, and technology. New York, NY: Oxford University Press; 1998.
58. Starfield B. Is US health really the best in the world? JAMA 2000;284:483-85.
59. Ayanian JZ, Guadagnoli E, McNeil BA, Cleary PD. Treatment and outcomes of acute myocardial infarction among patients of cardiologists and generalist physicians. Arch Intern Med 1997;157:2570-76.
60. Harrold L, Field T, Gurwitz J. Knowledge, patterns of care, and outcomes of care for generalists and specialists. J Gen Intern Med 1999;14:499-511.
61. MacLean CH, Louie R, Leake B, et al. Quality of care for patients with rheumatoid arthritis. JAMA 2000;284:984-92.
62. Longo DR. Patient practice variation: a call for research. Med Care 1993;31:YS81-5.
63. Shaughnessy AF, Slawson DC, Becker L. Clinical jazz: Harmonizing clinical experience and evidence-based medicine. J Fam Pract 1998;47:425-28.
Our emerging understanding conceptualizes family practices as local professional complex adaptive systems. These systems exist for the purpose of seeing patients for everyday health concerns and assisting them in getting on with their daily lives. Each family practice is unique because of history and initial conditions, particular agents (eg, physicians, staff, patients, systems), nonlinear interactions among agents, the local ecology, and regional and global influences. How all these factors manifest in a particular practice can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. The concepts of sensemaking and improvisation can be used to understand how practices deal with variation.
We conclude that complexity science concepts can provide a useful framework for understanding variation and change in family practices. The challenge is to differentiate error from relational variation and to improve practices’ sensemaking and improvisational skills. Future efforts to improve practice should focus on optimizing a practice’s care as a whole and enhancing reflective practice and relationship-centered care.
One major focus of health services research and quality improvement efforts is to identify and reduce variation.1-4 Standardization is the approach usually offered to minimize variation, thus reducing errors and increasing quality.5,6 These interventions are often based on an industrial quality improvement paradigm7 using linear interventions that assume that inputs reliably lead to proportionate responses.8 These interventions include re-engineering and expanded information systems.9,10 If the application of linear Newtonian views is correct, then standardization is the key to quality improvement, and effective practices will look much alike. The search for and attempt to implement best practice guidelines11-14 are examples of efforts to bring practices into conformity and to establish process standards for best behavior. However, the search for simple, easily transportable interventions has not been as successful as traditional logic might suggest.15-19
Emerging views of organizations derived from complexity science bring the key understanding that practices are more than commodity-delivering businesses—they are complex adaptive systems.20 These systems involve connected participants interacting in ways that generate the spontaneous emergence of new structures and behaviors. In complex adaptive systems, we expect to see variation in practice patterns, even when the outcomes of practices are similar.
In a previous issue of JFP,21 we proposed a model of primary care practices as complex adaptive systems and suggested implications and strategies for change. Since then we have begun applying this theoretical framework to other studies designed to understand and advance generalist practice. Our present purpose is to advance the application of complexity science to understanding and improving primary care practices and their co-evolving health care systems.
The theory application process
The 3 studies that this theory application builds on began with the Direct Observation of Primary Care (DOPC) study, a 3-year (1994-1997) multimethod descriptive investigation of the content of 4454 patient visits to 138 family physicians in 84 family practices.22,23 One of the outcomes of this study was a model for understanding change in family practice based on complexity science.21 Subsequently, The Prevention and Competing Demands in Primary Care Study (PCDPC) was a 3-year (1996-1999) in-depth descriptive case study of 1637 outpatient visits to 56 clinicians in18 family practices purposefully sampled to include diversity in geographic location, size of practice, and intensity of delivery of preventive services.24 We also implemented a 4-year (1997-2001) multimethod clinical trial, the Study to Enhance Prevention by Understanding and Practice (STEP-UP), to understand and improve the delivery of preventive health services in 77 family practices.25
We conducted an explicit theory application and refinement process consisting of 10 meetings to analyze data from DOPC, PCDPC, and STEP-UP, informed by a literature review of complexity science. The developing model was explicitly tested in 3 Nebraska family practices in 1999 in an effort to improve diabetes care.26 The resulting specific theory application to family practices was then evaluated using 2 cases from these studies.
Application of complexity science to family practice
Synthesizing our observations of the family practices and the literature review, we developed the following theoretical model. Family practices are local professional complex adaptive systems with the primary purpose of seeing patients for everyday health concerns to assist “them” in getting on with their daily lives. “Them” refers to the patients and their families, the clinicians, and the office staff. Practice leaders and managers usually describe their practices in terms of efficiency, productivity, adherence to standards of care, and patient satisfaction. Increasingly, practices function in interactive ways with managers or owners from local health care systems. Still, these practices behaved more like complex adaptive systems operating within a professional milieu than like businesses.27
Complex adaptive systems are like a family reunion; they are dynamic-bounded webs of diverse agents interacting nonlinearly.28 Dynamic refers to the continual presence of multiple interactions and their accompanying surprises, challenges, and responses, both within the system and between the system (eg, practices) and its environment. Bounded refers to the defining purpose or intent of the system (eg, to deliver health care to local patients). The metaphor of a web characterizes the multiple interconnections of the system. The agents in practices include clinicians, office staff, and patients, and can also include pharmaceutical representatives, health care system administrators, and others. Agents have the capacity to exchange information, learn, and adjust their behavior. No individual agent can ever know or understand everything that is occurring. The nonlinear relationships among the agents are the result of ongoing feedback loops and mean that small changes can lead to large effects and big changes can lead to small effects. For example, the introduction of a small medical record stamp to identify smokers in one practice leads to a dramatic increase in smoking counseling, while a major quality improvement program in another practice results in minimal change.
What makes an organization “professional” is the application of specialized values and expertise to address difficult problems and uncertainty. These values and skills are acquired through specialized training and are created by the larger social context.29 It is the socially defined professional values and expertise of the family physician that are applied in their daily activities. A co-evolving professional world of the health system and payer manager is increasingly interacting with the physician professional value system.
Each family practice is unique because of 5 features:
- History and initial conditions,including any explicit or implicit mission and the underlying priorities for the practice.
- Particular agents and their unique styles and interests.
- The pattern of nonlinear interactions among agents.
- The local fitness landscape(ie, the practice’s ecological niche) and its particular expectations, community values, competitive issues, and ecology.
- Regional and global influences, such as larger health care systems, finances and regulations, and culture.
The local fitness landscape, a complexity science term from evolutionary biology, specifically refers to the local terrain and all the many complex adaptive systems, from microbes to organizations, that seek their own purpose and niche within that terrain. Biological evolution and technological evolution are processes attempting to optimize systems riddled with conflicting constraints.30 Each family practice must evolve by attempting to optimize the entire package of services it delivers. This evolution must account for all the other competing and cooperating health care services and local resources, such as economic conditions, availability of insurance types, and the particular local disease and illness epidemiology.
How all of these factors manifest in a particular family practice at any given time and over time can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. Self-organization refers to the spontaneous development of structures and forms of behavior in systems characterized by multiple feedback loops and nonlinear dynamics. These structures are a function of the patterns of relationships among agents. Everything changes in response to and as a result of everything else, with each complex adaptive system seeking a better position in its local fitness landscape—a niche where it can prosper and survive.31 In this setting of ongoing co-evolution, both competition and collaboration become strategies for workable solutions. As the agents of any complex adaptive system interact, novelty and surprise continuously emerge in unpredictable ways. For example, a new successful approach to scheduling is introduced by an unassuming receptionist. This emergence creates a system that is greater than the sum of its parts; it is what cannot be understood through a reductionist (one problem at a time) examination of the practice.32
A particular family practice is the unique self-organized system that emerges when particular physicians and staff (agents) come together in particular ways with particular goals, preferences, and priorities (initial conditions) within a particular community setting (local fitness landscape) given specific regional and global influences. At any future point, this practice is the unique self-organized system that has emerged through co-evolution with all the other systems in the local fitness landscape.
The result is much variation between and within family practices. Practices have much in common, however, because of their common goal of seeing patients to assist them with their everyday health problems in a shared cultural and historical context. From that perspective, variation in family practices is inevitable and a powerful source of creative possibility, value, and good clinical practice. Practices use 2 strategies for successfully enhancing that creativity: sensemaking and improvisation. Sensemaking is a social activity that requires interaction among agents.33,34 People must come to have some notion of “Who am I,” “Why am I here,” and “What is going on around me?” Improvisation is a strategy for dealing with surprise in complex adaptive systems. Improvisation can be described as intuition guiding action in a spontaneous way.35 Intuition is not a random guess at what to do, but the result of using high levels of expertise to act in the moment.27,36
Among the many practices in the studies we are currently analyzing, multiple areas of variation are observed. These include differences in charting systems, clinical care decisions, scheduling, billing and coding procedures, staff relationships, and management and clinical styles. Sometimes these variations provide an adaptive advantage, but often not, and it is seldom clear in advance which will be true. Inflexible standardization, however, is often poorly responsive to the needs of different practices’ diverse agents and to the almost constant situations of uncertainty, contextual uniqueness, and surprise that occur in the practices.
Case studies
To illustrate the application of complexity science–based sensemaking to family practice, we present 2 case studies.
We selected 2 practices that had high quality of care as measured by delivery of preventive health services and patient satisfaction. One case takes advantage of the longitudinal data from DOPC and STEP-UP, and the other makes use of the more in-depth cross-sectional data from PCDPC. The cases were also selected to assure maximum variation in location and affiliation, and homogeneity in practice size (4-8 clinicians). The names have been changed to protect confidentiality.
Franchise Family Practice
History/Initial Conditions
Franchise is one of several primary care offices created by the Health Salute Corporation in affluent suburban areas of intense competition for market share growth. The corporate intent for this practice is to be productive and profitable. Two family physicians, a pediatrician, and a nurse practitioner were brought in from other practices, and several of their staff members followed. They all agree that their mission is to be the best practice in the Health Salute Corporation. Their identity is to capture market share through better efficiency, a mechanistic approach (scientific and standardized), and a friendly and caring attitude.
Agents and Patterns of Interaction
The practice manager’s daily attire in stockings and heels sets the tone for interactions, which are formal and professional. A small core of staff is dedicated to this practice, but there is often temporary help from other Health Salute offices during busy times. The patient population of predominantly mobile, insured, 2 working parent families tends to value convenience over relationships. The physicians seem to have little emotional investment in this particular practice, place, or each other. Conflicts are minimized and usually covered over with humor.
Local Fitness Landscape
There is no clear sense of community in this new suburb. Franchise is located in the heart of “minivan land,” an unrolling suburban carpet. The 2 competing systems are a major threat and are constantly being discussed. It is very clear that the survival of Franchise is dependent on success in the marketplace as determined by Health Salute.
Regional/Global Influences
Managed care has a strong presence, with much pressure to implement multiple practice guidelines, frequent chart audits, and different formularies.
Self-Organization
In many respects, Franchise Family Practice comes close to fulfilling its mission. Franchise is friendly, fun loving, and clean. It is a high performer at delivering preventive health services and is full of glitz and protocols. There are multiple systems in place for all phases of practice operation, and the manager sees they are working.
Emergence
Despite this managed order, surprises, problems, uncertainty, and complexities keep arising on a daily basis. Occasionally individuals respond creatively, but more often they stick to the protocols and generate even more trouble. There are frequent staff meetings where common problems are discussed. Many different solutions emerge in these discussions, but the final resolution is usually based on what the practice management thinks Health Salute would want. Even in this intensely structured practice, multiple competing demands, power distributions, and interpersonal battles are being simultaneously worked out on a daily basis.
Co-Evolution
As the suburbs grew, more practices were opened. The original practice in the area was soon challenged by Franchise and then by another competitor. Each of the 3 practices often acted or reacted in response to the others. Approximately a year after our research ended, Franchise Family Practice was closed by Health Salute because of inadequate profitability, and within a few months the second competitor also closed its practice.
Dusty Garden Family Practice
History/Initial Conditions
Dusty Garden began as a pioneering model for community-oriented primary care in an economically impoverished urban area. The practice was created with a focus on the patient in this underserved community. Envisioned by its founding family physician and practice manager, this practice was established in close collaboration with a community board. Survival is dependent on the ability to obtain funding for many poorly reimbursed services.
Agents and Patterns of Interaction
Dusty Garden has a dense and diverse web of complex interdependence. During the first few years of our research, most practice staff members came from the community. The practice grew from 4 to 6 family physicians and 2 nurse practitioners, and there was also much staff turnover. Dusty Garden was often a stepping stone for some clinicians, a chance to work in an “idealistic place” before going on to other things. However, the leadership has remained stable.
Local Fitness Landscape
The health care needs in the community were great, and the practice responded by growing rapidly, at times exceeding resources. At the same time, the local market consolidated into 2 competing hospital systems, with nearly all practices officially aligned with one of them. Necessary external funding also became more difficult to obtain.
Regional/Global Influences
Patients were represented by a mix of insurers. Uninsured or underinsured patients were cared for under a sliding-scale reimbursement scheme. There was also a perceived need to demonstrate successes and quality of care, and to place more emphasis on productivity and efficiency.
Self-Organization
The original practice was located in a dusty and cluttered building. It was difficult to tell who was responsible for what, but a shared sense of purpose gave the practice a family feel. Conflicts were evident but quickly resolved by frank discussions and a shared commitment to the practice mission. Schedules were constantly being disrupted by responding to patients and staff members’ diverse needs. In spite of this seeming chaos, Dusty Garden was an exemplar at delivering preventive services. This was accomplished by several dedicated clinicians and practice systems that involved the active participation of multiple personnel.
Co-Evolution
Significant change was occurring, and the practice was pushed to divide or grow in response to increasing patient demand and community need. They chose to grow and move into a much larger, newer, and more functional building down the street, resulting in a set of unanticipated consequences as both they and the fitness landscape changed. The new facility was more accessible and visible to a demographically different set of patients. Because of the changes in the local health care system, Dusty Garden also felt compelled to develop a relationship with the academic hospital system.
Emergence
What emerged was an organization where staff were isolated in large functionally differentiated spaces. The greater practice size demanded more specialization, and patterns of relationships dramatically changed. The change altered the number and specifics of the agents; many of the community-based staff left; and there was frequent turnover among newly hired and overwhelmed front office staff. The change also altered the number and character of the interactions per agent. Still, the vision remains strong, and many meetings are occurring to restore a “new” sense of practice community. Preventive service delivery rates remain high. The leadership initially responded to these changes with efforts aimed at greater standardization, but these are now being balanced by paying more attention to what solutions are being improvised “on the ground.”
Case study analysis
A comparative analysis of these 2 cases provides the following insights based on a complexity science view of the world:
- Each practice was performing well using the delivery of preventive health and patient satisfaction as proxies for total practice performance.
- The practices differed from each other in critical ways that seem to be at odds with traditional “best practices” thinking.
- The practices were similar in that they had each organized themselves, coevolved, and emerged as a function of the nonlinear interdependencies among agents and the local fitness landscape, not solely as a function of some externally imposed script.
- Each practice engaged in sensemaking activity to understand its unfolding world.
- Each practice engaged in improvisational behavior as a strategy for developing strategic and tactical responses to its unfolding world.
- Variation was often a source of strength, not a sign of bad practice.
Discussion
The traditional,2-4 largely unsubstantiated,37-40 view is that the best way to improve care is to eliminate variation. A view of family practice informed by complexity science suggests otherwise. In complex adaptive systems, agents in the practices create responses to changing circumstances—they improvise, or play practice jazz. Jazz players are often seen as role models of sensemaking and improvisational behavior.28 They know a general musical structure, and within that they create jazz. Bad jazz occurs when one person plays what the others cannot make sense of and build on. All the players have an interdependent responsibility to create good jazz. When good jazz players hear something unexpected, they make sense of it and improvise. Dealing with the uncertain nature of complex adaptive systems involves thinking in terms of making sense of what is emerging. How can I improvise to use whatever happens to further the system’s development? It involves building on emergent characteristics of the complex adaptive system to develop patterns of social interaction41 among agents that give them confidence in each other, lead to small wins, and enhance the capacity to learn from unpredicted events.42
Nevertheless, differentiating desirable from undesirable variation is an opportunity to learn from our history, and an opportunity to improve our practice jazz.37,41,43 Small changes can have large results in some settings, while large efforts may lead to meager results in others. Complexity theory offers a framework for understanding these phenomena in family practice, and lays the groundwork for future research. On the basis of the proposed theoretical model, we hypothesize that it is critical to differentiate the variations that are sources of error from the variations due to the dynamics of relationships. From the perspective of complexity science, relational variation is linked to diversity among agents and represents constructive and adaptive variations and emergent behavior within an ever-changing and unpredictable local fitness landscape. From this perspective, the goals are to eliminate error through development of better systems of operation and to reduce confusion and poor judgment by improving sensemaking and communication. It is also important to enhance desirable variation by developing the skills of the relationship-centered clinical method,44-46 improvisation, and reflective practice.47-50
Sensemaking may be enhanced by considering the 4 ways of knowing health and health care,51 which include understanding: (1) the clinician; (2) the patient, family, and community; (3) systems; and (4) scientific evidence about disease and treatment. Judgments about the variation can be made within each way of knowing. Desirable variation due to clinician factors should build on each clinician’s unique skills and values, and compensate for or improve weaknesses. Local adaptation of objective evidence and the development of unique approaches to meeting the needs of patients in their personal and local context represent potentially desirable sources of variation. Evidence-based medicine provides a basis for reducing variation on the basis of scientific knowledge developed from studies of groups of individuals. Systems that integrate scientific evidence with the unique needs of patients, families, and communities and the specific talents of clinicians represent an opportunity for interventions that both reduce variation from known effective health care approaches and increase variability that personalizes care. Complexity science can help us to look for the inter-relationships among these different ways of knowing and to recognize what is not knowable or controllable. Yet, complexity science represents only a partial answer to efforts to integrate these diverse perspectives. There is a need for additional theoretical work to develop approaches that both include and transcend current ways of thinking.52,53
Family practices are systems co-evolving within fitness landscapes where there is a continual need for sensemaking and improvisation. This is particularly true during the current period of rapid change and co-evolution of practices with a rapidly changing health care system. Excessive standardization with the goal of trying to maximize each part is as potentially problematic51 as variation from scientific evidence. Like a healthy body, a healthy practice represents a balance of the generalizable and the particular. The result is tension between the local, the regulatory, and the universal—and between patient, professional, societal, and ecological expectations. We believe the principles of complexity science explain why linear quality improvement interventions (one disease at a time) often have limited effect and poor transportability.15,16,19,54-56 These principles may also explain why countries with a higher proportion of primary care services have better population health status36,57,58 despite the repeated observation that specialists do better at following disease guidelines and improving disease specific outcomes.59-61 It is never just about the specific; it is about the specific in relation to the whole, and the whole is always more than the sum of the specifics. Good primary care serves all members of the community well with the resources available. Further application of complexity science to understand these paradoxes will require more quality longitudinal data in multiple practices and broad integrative measures of the process and outcomes of care.51,62
Conclusions
Family physicians are told to implement guidelines, to diagnose and treat in specific ways, and to eliminate variation in practice. Our study using complexity science suggests that this is only part of the story. Family practices are systems that self-organize, reveal emergent behavior, and co-evolve. Successful practices are those that minimize errors, make good sense of what is happening, and effectively improvise to make good practice jazz. Seeking to eliminate error by dampening all variation through the imposition of excessive standardization and external controls is unlikely to be sustainably effective and is likely to have long-term negative consequences. We encourage all family practice staff members to become knowledgeable of practice guidelines and evidence-based practice; these are some of the core skills of good patient care.63 Using these core skills to implement flexible, locally meaningful systems may reduce error. Also, efforts to change and improve future practice are best served by focusing on improving care as a whole and on developing the skills of reflective practice and relationship-centered care.51 We encourage policymakers to acknowledge the potential benefits of some kinds of variation and to support its healthy evolution.
Acknowledgments
The data used in our paper came from studies supported by a grant (1RO1 HS08776) from the Agency for Health Care Policy and Research (now the Agency for Healthcare Research and Quality) and grants from the National Cancer Institute (1RO1 CA60862 and 2RO1 CA60862). A A grant from the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio, supported communications, analyses, and writing. We are grateful to the practices participating in the research on which our paper was based.
Our emerging understanding conceptualizes family practices as local professional complex adaptive systems. These systems exist for the purpose of seeing patients for everyday health concerns and assisting them in getting on with their daily lives. Each family practice is unique because of history and initial conditions, particular agents (eg, physicians, staff, patients, systems), nonlinear interactions among agents, the local ecology, and regional and global influences. How all these factors manifest in a particular practice can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. The concepts of sensemaking and improvisation can be used to understand how practices deal with variation.
We conclude that complexity science concepts can provide a useful framework for understanding variation and change in family practices. The challenge is to differentiate error from relational variation and to improve practices’ sensemaking and improvisational skills. Future efforts to improve practice should focus on optimizing a practice’s care as a whole and enhancing reflective practice and relationship-centered care.
One major focus of health services research and quality improvement efforts is to identify and reduce variation.1-4 Standardization is the approach usually offered to minimize variation, thus reducing errors and increasing quality.5,6 These interventions are often based on an industrial quality improvement paradigm7 using linear interventions that assume that inputs reliably lead to proportionate responses.8 These interventions include re-engineering and expanded information systems.9,10 If the application of linear Newtonian views is correct, then standardization is the key to quality improvement, and effective practices will look much alike. The search for and attempt to implement best practice guidelines11-14 are examples of efforts to bring practices into conformity and to establish process standards for best behavior. However, the search for simple, easily transportable interventions has not been as successful as traditional logic might suggest.15-19
Emerging views of organizations derived from complexity science bring the key understanding that practices are more than commodity-delivering businesses—they are complex adaptive systems.20 These systems involve connected participants interacting in ways that generate the spontaneous emergence of new structures and behaviors. In complex adaptive systems, we expect to see variation in practice patterns, even when the outcomes of practices are similar.
In a previous issue of JFP,21 we proposed a model of primary care practices as complex adaptive systems and suggested implications and strategies for change. Since then we have begun applying this theoretical framework to other studies designed to understand and advance generalist practice. Our present purpose is to advance the application of complexity science to understanding and improving primary care practices and their co-evolving health care systems.
The theory application process
The 3 studies that this theory application builds on began with the Direct Observation of Primary Care (DOPC) study, a 3-year (1994-1997) multimethod descriptive investigation of the content of 4454 patient visits to 138 family physicians in 84 family practices.22,23 One of the outcomes of this study was a model for understanding change in family practice based on complexity science.21 Subsequently, The Prevention and Competing Demands in Primary Care Study (PCDPC) was a 3-year (1996-1999) in-depth descriptive case study of 1637 outpatient visits to 56 clinicians in18 family practices purposefully sampled to include diversity in geographic location, size of practice, and intensity of delivery of preventive services.24 We also implemented a 4-year (1997-2001) multimethod clinical trial, the Study to Enhance Prevention by Understanding and Practice (STEP-UP), to understand and improve the delivery of preventive health services in 77 family practices.25
We conducted an explicit theory application and refinement process consisting of 10 meetings to analyze data from DOPC, PCDPC, and STEP-UP, informed by a literature review of complexity science. The developing model was explicitly tested in 3 Nebraska family practices in 1999 in an effort to improve diabetes care.26 The resulting specific theory application to family practices was then evaluated using 2 cases from these studies.
Application of complexity science to family practice
Synthesizing our observations of the family practices and the literature review, we developed the following theoretical model. Family practices are local professional complex adaptive systems with the primary purpose of seeing patients for everyday health concerns to assist “them” in getting on with their daily lives. “Them” refers to the patients and their families, the clinicians, and the office staff. Practice leaders and managers usually describe their practices in terms of efficiency, productivity, adherence to standards of care, and patient satisfaction. Increasingly, practices function in interactive ways with managers or owners from local health care systems. Still, these practices behaved more like complex adaptive systems operating within a professional milieu than like businesses.27
Complex adaptive systems are like a family reunion; they are dynamic-bounded webs of diverse agents interacting nonlinearly.28 Dynamic refers to the continual presence of multiple interactions and their accompanying surprises, challenges, and responses, both within the system and between the system (eg, practices) and its environment. Bounded refers to the defining purpose or intent of the system (eg, to deliver health care to local patients). The metaphor of a web characterizes the multiple interconnections of the system. The agents in practices include clinicians, office staff, and patients, and can also include pharmaceutical representatives, health care system administrators, and others. Agents have the capacity to exchange information, learn, and adjust their behavior. No individual agent can ever know or understand everything that is occurring. The nonlinear relationships among the agents are the result of ongoing feedback loops and mean that small changes can lead to large effects and big changes can lead to small effects. For example, the introduction of a small medical record stamp to identify smokers in one practice leads to a dramatic increase in smoking counseling, while a major quality improvement program in another practice results in minimal change.
What makes an organization “professional” is the application of specialized values and expertise to address difficult problems and uncertainty. These values and skills are acquired through specialized training and are created by the larger social context.29 It is the socially defined professional values and expertise of the family physician that are applied in their daily activities. A co-evolving professional world of the health system and payer manager is increasingly interacting with the physician professional value system.
Each family practice is unique because of 5 features:
- History and initial conditions,including any explicit or implicit mission and the underlying priorities for the practice.
- Particular agents and their unique styles and interests.
- The pattern of nonlinear interactions among agents.
- The local fitness landscape(ie, the practice’s ecological niche) and its particular expectations, community values, competitive issues, and ecology.
- Regional and global influences, such as larger health care systems, finances and regulations, and culture.
The local fitness landscape, a complexity science term from evolutionary biology, specifically refers to the local terrain and all the many complex adaptive systems, from microbes to organizations, that seek their own purpose and niche within that terrain. Biological evolution and technological evolution are processes attempting to optimize systems riddled with conflicting constraints.30 Each family practice must evolve by attempting to optimize the entire package of services it delivers. This evolution must account for all the other competing and cooperating health care services and local resources, such as economic conditions, availability of insurance types, and the particular local disease and illness epidemiology.
How all of these factors manifest in a particular family practice at any given time and over time can be understood using 3 complexity science properties: self-organization, emergence, and co-evolution. Self-organization refers to the spontaneous development of structures and forms of behavior in systems characterized by multiple feedback loops and nonlinear dynamics. These structures are a function of the patterns of relationships among agents. Everything changes in response to and as a result of everything else, with each complex adaptive system seeking a better position in its local fitness landscape—a niche where it can prosper and survive.31 In this setting of ongoing co-evolution, both competition and collaboration become strategies for workable solutions. As the agents of any complex adaptive system interact, novelty and surprise continuously emerge in unpredictable ways. For example, a new successful approach to scheduling is introduced by an unassuming receptionist. This emergence creates a system that is greater than the sum of its parts; it is what cannot be understood through a reductionist (one problem at a time) examination of the practice.32
A particular family practice is the unique self-organized system that emerges when particular physicians and staff (agents) come together in particular ways with particular goals, preferences, and priorities (initial conditions) within a particular community setting (local fitness landscape) given specific regional and global influences. At any future point, this practice is the unique self-organized system that has emerged through co-evolution with all the other systems in the local fitness landscape.
The result is much variation between and within family practices. Practices have much in common, however, because of their common goal of seeing patients to assist them with their everyday health problems in a shared cultural and historical context. From that perspective, variation in family practices is inevitable and a powerful source of creative possibility, value, and good clinical practice. Practices use 2 strategies for successfully enhancing that creativity: sensemaking and improvisation. Sensemaking is a social activity that requires interaction among agents.33,34 People must come to have some notion of “Who am I,” “Why am I here,” and “What is going on around me?” Improvisation is a strategy for dealing with surprise in complex adaptive systems. Improvisation can be described as intuition guiding action in a spontaneous way.35 Intuition is not a random guess at what to do, but the result of using high levels of expertise to act in the moment.27,36
Among the many practices in the studies we are currently analyzing, multiple areas of variation are observed. These include differences in charting systems, clinical care decisions, scheduling, billing and coding procedures, staff relationships, and management and clinical styles. Sometimes these variations provide an adaptive advantage, but often not, and it is seldom clear in advance which will be true. Inflexible standardization, however, is often poorly responsive to the needs of different practices’ diverse agents and to the almost constant situations of uncertainty, contextual uniqueness, and surprise that occur in the practices.
Case studies
To illustrate the application of complexity science–based sensemaking to family practice, we present 2 case studies.
We selected 2 practices that had high quality of care as measured by delivery of preventive health services and patient satisfaction. One case takes advantage of the longitudinal data from DOPC and STEP-UP, and the other makes use of the more in-depth cross-sectional data from PCDPC. The cases were also selected to assure maximum variation in location and affiliation, and homogeneity in practice size (4-8 clinicians). The names have been changed to protect confidentiality.
Franchise Family Practice
History/Initial Conditions
Franchise is one of several primary care offices created by the Health Salute Corporation in affluent suburban areas of intense competition for market share growth. The corporate intent for this practice is to be productive and profitable. Two family physicians, a pediatrician, and a nurse practitioner were brought in from other practices, and several of their staff members followed. They all agree that their mission is to be the best practice in the Health Salute Corporation. Their identity is to capture market share through better efficiency, a mechanistic approach (scientific and standardized), and a friendly and caring attitude.
Agents and Patterns of Interaction
The practice manager’s daily attire in stockings and heels sets the tone for interactions, which are formal and professional. A small core of staff is dedicated to this practice, but there is often temporary help from other Health Salute offices during busy times. The patient population of predominantly mobile, insured, 2 working parent families tends to value convenience over relationships. The physicians seem to have little emotional investment in this particular practice, place, or each other. Conflicts are minimized and usually covered over with humor.
Local Fitness Landscape
There is no clear sense of community in this new suburb. Franchise is located in the heart of “minivan land,” an unrolling suburban carpet. The 2 competing systems are a major threat and are constantly being discussed. It is very clear that the survival of Franchise is dependent on success in the marketplace as determined by Health Salute.
Regional/Global Influences
Managed care has a strong presence, with much pressure to implement multiple practice guidelines, frequent chart audits, and different formularies.
Self-Organization
In many respects, Franchise Family Practice comes close to fulfilling its mission. Franchise is friendly, fun loving, and clean. It is a high performer at delivering preventive health services and is full of glitz and protocols. There are multiple systems in place for all phases of practice operation, and the manager sees they are working.
Emergence
Despite this managed order, surprises, problems, uncertainty, and complexities keep arising on a daily basis. Occasionally individuals respond creatively, but more often they stick to the protocols and generate even more trouble. There are frequent staff meetings where common problems are discussed. Many different solutions emerge in these discussions, but the final resolution is usually based on what the practice management thinks Health Salute would want. Even in this intensely structured practice, multiple competing demands, power distributions, and interpersonal battles are being simultaneously worked out on a daily basis.
Co-Evolution
As the suburbs grew, more practices were opened. The original practice in the area was soon challenged by Franchise and then by another competitor. Each of the 3 practices often acted or reacted in response to the others. Approximately a year after our research ended, Franchise Family Practice was closed by Health Salute because of inadequate profitability, and within a few months the second competitor also closed its practice.
Dusty Garden Family Practice
History/Initial Conditions
Dusty Garden began as a pioneering model for community-oriented primary care in an economically impoverished urban area. The practice was created with a focus on the patient in this underserved community. Envisioned by its founding family physician and practice manager, this practice was established in close collaboration with a community board. Survival is dependent on the ability to obtain funding for many poorly reimbursed services.
Agents and Patterns of Interaction
Dusty Garden has a dense and diverse web of complex interdependence. During the first few years of our research, most practice staff members came from the community. The practice grew from 4 to 6 family physicians and 2 nurse practitioners, and there was also much staff turnover. Dusty Garden was often a stepping stone for some clinicians, a chance to work in an “idealistic place” before going on to other things. However, the leadership has remained stable.
Local Fitness Landscape
The health care needs in the community were great, and the practice responded by growing rapidly, at times exceeding resources. At the same time, the local market consolidated into 2 competing hospital systems, with nearly all practices officially aligned with one of them. Necessary external funding also became more difficult to obtain.
Regional/Global Influences
Patients were represented by a mix of insurers. Uninsured or underinsured patients were cared for under a sliding-scale reimbursement scheme. There was also a perceived need to demonstrate successes and quality of care, and to place more emphasis on productivity and efficiency.
Self-Organization
The original practice was located in a dusty and cluttered building. It was difficult to tell who was responsible for what, but a shared sense of purpose gave the practice a family feel. Conflicts were evident but quickly resolved by frank discussions and a shared commitment to the practice mission. Schedules were constantly being disrupted by responding to patients and staff members’ diverse needs. In spite of this seeming chaos, Dusty Garden was an exemplar at delivering preventive services. This was accomplished by several dedicated clinicians and practice systems that involved the active participation of multiple personnel.
Co-Evolution
Significant change was occurring, and the practice was pushed to divide or grow in response to increasing patient demand and community need. They chose to grow and move into a much larger, newer, and more functional building down the street, resulting in a set of unanticipated consequences as both they and the fitness landscape changed. The new facility was more accessible and visible to a demographically different set of patients. Because of the changes in the local health care system, Dusty Garden also felt compelled to develop a relationship with the academic hospital system.
Emergence
What emerged was an organization where staff were isolated in large functionally differentiated spaces. The greater practice size demanded more specialization, and patterns of relationships dramatically changed. The change altered the number and specifics of the agents; many of the community-based staff left; and there was frequent turnover among newly hired and overwhelmed front office staff. The change also altered the number and character of the interactions per agent. Still, the vision remains strong, and many meetings are occurring to restore a “new” sense of practice community. Preventive service delivery rates remain high. The leadership initially responded to these changes with efforts aimed at greater standardization, but these are now being balanced by paying more attention to what solutions are being improvised “on the ground.”
Case study analysis
A comparative analysis of these 2 cases provides the following insights based on a complexity science view of the world:
- Each practice was performing well using the delivery of preventive health and patient satisfaction as proxies for total practice performance.
- The practices differed from each other in critical ways that seem to be at odds with traditional “best practices” thinking.
- The practices were similar in that they had each organized themselves, coevolved, and emerged as a function of the nonlinear interdependencies among agents and the local fitness landscape, not solely as a function of some externally imposed script.
- Each practice engaged in sensemaking activity to understand its unfolding world.
- Each practice engaged in improvisational behavior as a strategy for developing strategic and tactical responses to its unfolding world.
- Variation was often a source of strength, not a sign of bad practice.
Discussion
The traditional,2-4 largely unsubstantiated,37-40 view is that the best way to improve care is to eliminate variation. A view of family practice informed by complexity science suggests otherwise. In complex adaptive systems, agents in the practices create responses to changing circumstances—they improvise, or play practice jazz. Jazz players are often seen as role models of sensemaking and improvisational behavior.28 They know a general musical structure, and within that they create jazz. Bad jazz occurs when one person plays what the others cannot make sense of and build on. All the players have an interdependent responsibility to create good jazz. When good jazz players hear something unexpected, they make sense of it and improvise. Dealing with the uncertain nature of complex adaptive systems involves thinking in terms of making sense of what is emerging. How can I improvise to use whatever happens to further the system’s development? It involves building on emergent characteristics of the complex adaptive system to develop patterns of social interaction41 among agents that give them confidence in each other, lead to small wins, and enhance the capacity to learn from unpredicted events.42
Nevertheless, differentiating desirable from undesirable variation is an opportunity to learn from our history, and an opportunity to improve our practice jazz.37,41,43 Small changes can have large results in some settings, while large efforts may lead to meager results in others. Complexity theory offers a framework for understanding these phenomena in family practice, and lays the groundwork for future research. On the basis of the proposed theoretical model, we hypothesize that it is critical to differentiate the variations that are sources of error from the variations due to the dynamics of relationships. From the perspective of complexity science, relational variation is linked to diversity among agents and represents constructive and adaptive variations and emergent behavior within an ever-changing and unpredictable local fitness landscape. From this perspective, the goals are to eliminate error through development of better systems of operation and to reduce confusion and poor judgment by improving sensemaking and communication. It is also important to enhance desirable variation by developing the skills of the relationship-centered clinical method,44-46 improvisation, and reflective practice.47-50
Sensemaking may be enhanced by considering the 4 ways of knowing health and health care,51 which include understanding: (1) the clinician; (2) the patient, family, and community; (3) systems; and (4) scientific evidence about disease and treatment. Judgments about the variation can be made within each way of knowing. Desirable variation due to clinician factors should build on each clinician’s unique skills and values, and compensate for or improve weaknesses. Local adaptation of objective evidence and the development of unique approaches to meeting the needs of patients in their personal and local context represent potentially desirable sources of variation. Evidence-based medicine provides a basis for reducing variation on the basis of scientific knowledge developed from studies of groups of individuals. Systems that integrate scientific evidence with the unique needs of patients, families, and communities and the specific talents of clinicians represent an opportunity for interventions that both reduce variation from known effective health care approaches and increase variability that personalizes care. Complexity science can help us to look for the inter-relationships among these different ways of knowing and to recognize what is not knowable or controllable. Yet, complexity science represents only a partial answer to efforts to integrate these diverse perspectives. There is a need for additional theoretical work to develop approaches that both include and transcend current ways of thinking.52,53
Family practices are systems co-evolving within fitness landscapes where there is a continual need for sensemaking and improvisation. This is particularly true during the current period of rapid change and co-evolution of practices with a rapidly changing health care system. Excessive standardization with the goal of trying to maximize each part is as potentially problematic51 as variation from scientific evidence. Like a healthy body, a healthy practice represents a balance of the generalizable and the particular. The result is tension between the local, the regulatory, and the universal—and between patient, professional, societal, and ecological expectations. We believe the principles of complexity science explain why linear quality improvement interventions (one disease at a time) often have limited effect and poor transportability.15,16,19,54-56 These principles may also explain why countries with a higher proportion of primary care services have better population health status36,57,58 despite the repeated observation that specialists do better at following disease guidelines and improving disease specific outcomes.59-61 It is never just about the specific; it is about the specific in relation to the whole, and the whole is always more than the sum of the specifics. Good primary care serves all members of the community well with the resources available. Further application of complexity science to understand these paradoxes will require more quality longitudinal data in multiple practices and broad integrative measures of the process and outcomes of care.51,62
Conclusions
Family physicians are told to implement guidelines, to diagnose and treat in specific ways, and to eliminate variation in practice. Our study using complexity science suggests that this is only part of the story. Family practices are systems that self-organize, reveal emergent behavior, and co-evolve. Successful practices are those that minimize errors, make good sense of what is happening, and effectively improvise to make good practice jazz. Seeking to eliminate error by dampening all variation through the imposition of excessive standardization and external controls is unlikely to be sustainably effective and is likely to have long-term negative consequences. We encourage all family practice staff members to become knowledgeable of practice guidelines and evidence-based practice; these are some of the core skills of good patient care.63 Using these core skills to implement flexible, locally meaningful systems may reduce error. Also, efforts to change and improve future practice are best served by focusing on improving care as a whole and on developing the skills of reflective practice and relationship-centered care.51 We encourage policymakers to acknowledge the potential benefits of some kinds of variation and to support its healthy evolution.
Acknowledgments
The data used in our paper came from studies supported by a grant (1RO1 HS08776) from the Agency for Health Care Policy and Research (now the Agency for Healthcare Research and Quality) and grants from the National Cancer Institute (1RO1 CA60862 and 2RO1 CA60862). A A grant from the Center for Research in Family Practice and Primary Care, Cleveland, New Brunswick, Allentown, and San Antonio, supported communications, analyses, and writing. We are grateful to the practices participating in the research on which our paper was based.
1. McPherson K, Wennberg JE, Hovind OB, Clifford P. Small-area variations in the use of common surgical procedures: an international comparison of New England, England, and Norway. N Engl J Med 1982;307:1310-14.
2. DeMott K. Healthcare practices vary widely from town to town: regional Dartmouth Atlas. Health Syst Lead 1997;4:2-3.
3. Fineberg HV, Funkhouser AR, Marks H. Variation in medical practice: a review of the literature. Ind Health Care 1985;2:143-68.
4. Volinn E, Diehr P, Ciol MA, Loeser JD. Why does geographic variation in health care practices matter? (And seven questions to ask in evaluating studies on geographic variation). Spine 1994;19:2092S-100S.
5. Carnett WG. Clinical practice guidelines: a tool to improve care. Qual Manag Health Care 1999;8:13-21.
6. Weiner JP, Parente ST, Garnick DW, Fowles J, Lawthers AG, Palmer RH. Variation in office-based quality: a claims-based profile of care provided to Medicare patients with diabetes. JAMA 1995;273:1503-08.
7. Laffel G, Blumenthal D. The case for using industrial quality management science in health care organizations. JAMA 1989;262:2869-73.
8. Kaplan D, Glass L. Understanding nonlinear dynamics. New York, NY: Springer-Verlag; 1995.
9. Shortell SM, Gillies RR, Anderson DA. Remaking health care in America. 2nd ed. San Francisco, Calif: Jossey-Bass; 2000.
10. Kaegi L. AMA clinical quality improvement forum ties it all together: from guidelines to measurement to analysis and back to guidelines. Jt Comm J Qual Improve 1999;25:95-106.
11. Burns LR, Denton M, Goldfein S, Warrick L, Morenz B, Sales B. The use of continuous quality improvement methods in the development and dissemination of medical practice guidelines. Qual Rev Bull 1992;18:434-39.
12. Gottlieb LK, Sokol HN, Murrey KO, Schoenbaum SC. Algorithm-based clinical quality improvement: clinical guidelines and continuous quality improvement. HMO Pract 1992;6:5-12.
13. Woolf SH. Practice guidelines: a new reality in medicine. Arch Intern Med 1990;150:1811-18.
14. Kamerow DB. Before and after guidelines. J Fam Pract 1997;44:344-46.
15. Solberg L, Kottke T, Brekke M. Failure of a trial of continuous quality improvement and systems intervention to increase the delivery of clinical preventive services. Effect Clin Pract 2000;3:105-15.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Davis SA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance: a systematic reviw of the effect of continuing medical education strategies. JAMA 1995;274:700-05.
18. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
19. Davis DA, Taylor-Vaisey A. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
20. Committee on Quality of Health Care in America. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academy Press; 2001.
21. Miller WL, Crabtree BF, McDaniel RR, Jr, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
22. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box:’ a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
23. The DOPC Writing Group. Conducting the Direct Observation of Primary Care Study: insights from the process of conducting multimethod transdisciplinary research in community practice. J Fam Pract 2001;50:345-52.
24. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
25. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study To Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med 2001;21:2-28.
26. Helseth LD. Using the complexity model to enhance diabetes management in three family medicine practices: a qual
27. Anderson RA, McDaniel RR, Jr. Managing healthcare organizations: where professionalism meets complexity science. Health Care Manage Rev 2000;25:83-92.
28. McDaniel Jr RR, Driebe DJ. Complexity science and health care management. Adv Strat Manage 2001;2:11-36.
29. Friedson E. Professionalis reborn: theory, prophecy, and policy. Chicago, Ill: The University of Chicago Press; 1994.
30. Smith TS. Nonlinear dynamics and the micro-macro bridge. In: Eve RA, Horsfall S, Lee ME, eds. Chaos, complexity, and sociology: myths, models, and theories. Thousand Oaks, Calif: Sage Publications; 1997;52-63.
31. Kauffman S. At home in the universe: the search for the laws of self-organization and complexity. New York, NY: Oxford University Press; 1995.
32. Newman DV. Emergence and strange attractors. Phil Sci 1996;63:245-61.
33. Weick KE. Sensemaking in organizations. In: Whetten D, ed. Foundations for organizational science. Thousand Oaks, Calif: Sage Publications; 1995.
34. Thomas JB, Clark SM, Gioia D. Strategic sensemaking and organizational performance: linkages among scanning, interpretation, action and outcomes. Acad Manage J 1993;36:239-70.
35. Crossan M, Sorrenti M. Making sense of improvisation. Adv Strat Manage 1997;14:155-80.
36. Tanenbaum SJ. Evidence and expertise: the challenge of the outcomes movement to medical professionalism. Acad Med 1999;74:757-63.
37. Casparie AF. The ambiguous relationship between practice variation and appropriateness of care: an agenda for further research. Health Policy 1996;35:247-65.
38. Chassin MR. Explaining geographic variations: the enthusiasm hypothesis. Med Care 1993;31:YS37-44.
39. Leape LL, Park RE, Solomon DH, Chassin MR, Kosecoff J, Brook RH. Does inappropriate use explain small-area variations in the use of health care services? JAMA 1990;263:669-72.
40. Stano M. Evaluating the policy role of the small area variations and physician practice style hypotheses. Health Policy 1993;24:9-17.
41. Westert GP, Groenewegen PP. Medical practice variations: changing the theoretical approach. Scand J Public Health 1999;27:173-80.
42. McDaniel RR,, Jr. Strategic leadership: a view from quantum and chaos theories. Health Care Manage Re 1997;22:21-37.
43. Fertig A, Roland M, King H, Moore T. Understanding variation in rates of referral among general practitioners: are inappropriate referrals important and would guidelines help to reduce rates? BMJ 1993;307:1467-70.
44. Stewart M, Weston WW, Brown JB, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, Calif: Sage Publications; 1995.
45. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on outcomes. J Fam Pract 2000;49:796-804.
46. Roter D. The enduring and evolving nature of the patient-physician relationship. Pat Ed Counsel 2000;39:5-15.
47. Novack DH, Suchman AL, Clark W, Epstein RM, Najberg E, Kaplan C. Calibrating the physician: physician personal awareness and effective patient care. JAMA 1997;278:502-09.
48. Clark R, Croft P. Critical reading for the reflective practitioner: a guide for primary care. Oxford, England: Butterworth-Heiineman; 1998.
49. Epstein R. Mindful practice. JAMA 1999;282:833-39.
50. Bolton G. Reflective practice: written and professional development. London, England: Sage Publications; 2001.
51. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
52. Wilber K. A brief theory of everything. Boston, Mass: Shambhala Publications, Inc; 2000.
53. Wilber K. Sex, ecology, spirituality. Boston, Mass: Shambhala Publications, Inc; 1995, 2000.
54. Davis P, Gribben B, Scott A, Lay-Yee R. The “supply hypothesis” and medical practice variation in primary care: testing economic and clinical models of inter-practitioner variation. Soc Sci Med 2000;50:407-18.
55. O’Connell DL, Henry D, Tomlins R. Randomised controlled trial of effect of feedback on general practitioners’ prescribing in Australia. BMJ 1999;318:507-11.
56. Salisbury C, Bosanquet N, Wilkinson E, Bosanquet A, Hasler J. The implementation of evidence-based medicine in general practice prescribing. Br J Gen Pract 1998;48:1849-52.
57. Starfield B. Primary care: balancing health needs, services, and technology. New York, NY: Oxford University Press; 1998.
58. Starfield B. Is US health really the best in the world? JAMA 2000;284:483-85.
59. Ayanian JZ, Guadagnoli E, McNeil BA, Cleary PD. Treatment and outcomes of acute myocardial infarction among patients of cardiologists and generalist physicians. Arch Intern Med 1997;157:2570-76.
60. Harrold L, Field T, Gurwitz J. Knowledge, patterns of care, and outcomes of care for generalists and specialists. J Gen Intern Med 1999;14:499-511.
61. MacLean CH, Louie R, Leake B, et al. Quality of care for patients with rheumatoid arthritis. JAMA 2000;284:984-92.
62. Longo DR. Patient practice variation: a call for research. Med Care 1993;31:YS81-5.
63. Shaughnessy AF, Slawson DC, Becker L. Clinical jazz: Harmonizing clinical experience and evidence-based medicine. J Fam Pract 1998;47:425-28.
1. McPherson K, Wennberg JE, Hovind OB, Clifford P. Small-area variations in the use of common surgical procedures: an international comparison of New England, England, and Norway. N Engl J Med 1982;307:1310-14.
2. DeMott K. Healthcare practices vary widely from town to town: regional Dartmouth Atlas. Health Syst Lead 1997;4:2-3.
3. Fineberg HV, Funkhouser AR, Marks H. Variation in medical practice: a review of the literature. Ind Health Care 1985;2:143-68.
4. Volinn E, Diehr P, Ciol MA, Loeser JD. Why does geographic variation in health care practices matter? (And seven questions to ask in evaluating studies on geographic variation). Spine 1994;19:2092S-100S.
5. Carnett WG. Clinical practice guidelines: a tool to improve care. Qual Manag Health Care 1999;8:13-21.
6. Weiner JP, Parente ST, Garnick DW, Fowles J, Lawthers AG, Palmer RH. Variation in office-based quality: a claims-based profile of care provided to Medicare patients with diabetes. JAMA 1995;273:1503-08.
7. Laffel G, Blumenthal D. The case for using industrial quality management science in health care organizations. JAMA 1989;262:2869-73.
8. Kaplan D, Glass L. Understanding nonlinear dynamics. New York, NY: Springer-Verlag; 1995.
9. Shortell SM, Gillies RR, Anderson DA. Remaking health care in America. 2nd ed. San Francisco, Calif: Jossey-Bass; 2000.
10. Kaegi L. AMA clinical quality improvement forum ties it all together: from guidelines to measurement to analysis and back to guidelines. Jt Comm J Qual Improve 1999;25:95-106.
11. Burns LR, Denton M, Goldfein S, Warrick L, Morenz B, Sales B. The use of continuous quality improvement methods in the development and dissemination of medical practice guidelines. Qual Rev Bull 1992;18:434-39.
12. Gottlieb LK, Sokol HN, Murrey KO, Schoenbaum SC. Algorithm-based clinical quality improvement: clinical guidelines and continuous quality improvement. HMO Pract 1992;6:5-12.
13. Woolf SH. Practice guidelines: a new reality in medicine. Arch Intern Med 1990;150:1811-18.
14. Kamerow DB. Before and after guidelines. J Fam Pract 1997;44:344-46.
15. Solberg L, Kottke T, Brekke M. Failure of a trial of continuous quality improvement and systems intervention to increase the delivery of clinical preventive services. Effect Clin Pract 2000;3:105-15.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Davis SA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance: a systematic reviw of the effect of continuing medical education strategies. JAMA 1995;274:700-05.
18. Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
19. Davis DA, Taylor-Vaisey A. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
20. Committee on Quality of Health Care in America. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academy Press; 2001.
21. Miller WL, Crabtree BF, McDaniel RR, Jr, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
22. Stange KC, Zyzanski SJ, Jaén CR, et al. Illuminating the ‘black box:’ a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
23. The DOPC Writing Group. Conducting the Direct Observation of Primary Care Study: insights from the process of conducting multimethod transdisciplinary research in community practice. J Fam Pract 2001;50:345-52.
24. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
25. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study To Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med 2001;21:2-28.
26. Helseth LD. Using the complexity model to enhance diabetes management in three family medicine practices: a qual
27. Anderson RA, McDaniel RR, Jr. Managing healthcare organizations: where professionalism meets complexity science. Health Care Manage Rev 2000;25:83-92.
28. McDaniel Jr RR, Driebe DJ. Complexity science and health care management. Adv Strat Manage 2001;2:11-36.
29. Friedson E. Professionalis reborn: theory, prophecy, and policy. Chicago, Ill: The University of Chicago Press; 1994.
30. Smith TS. Nonlinear dynamics and the micro-macro bridge. In: Eve RA, Horsfall S, Lee ME, eds. Chaos, complexity, and sociology: myths, models, and theories. Thousand Oaks, Calif: Sage Publications; 1997;52-63.
31. Kauffman S. At home in the universe: the search for the laws of self-organization and complexity. New York, NY: Oxford University Press; 1995.
32. Newman DV. Emergence and strange attractors. Phil Sci 1996;63:245-61.
33. Weick KE. Sensemaking in organizations. In: Whetten D, ed. Foundations for organizational science. Thousand Oaks, Calif: Sage Publications; 1995.
34. Thomas JB, Clark SM, Gioia D. Strategic sensemaking and organizational performance: linkages among scanning, interpretation, action and outcomes. Acad Manage J 1993;36:239-70.
35. Crossan M, Sorrenti M. Making sense of improvisation. Adv Strat Manage 1997;14:155-80.
36. Tanenbaum SJ. Evidence and expertise: the challenge of the outcomes movement to medical professionalism. Acad Med 1999;74:757-63.
37. Casparie AF. The ambiguous relationship between practice variation and appropriateness of care: an agenda for further research. Health Policy 1996;35:247-65.
38. Chassin MR. Explaining geographic variations: the enthusiasm hypothesis. Med Care 1993;31:YS37-44.
39. Leape LL, Park RE, Solomon DH, Chassin MR, Kosecoff J, Brook RH. Does inappropriate use explain small-area variations in the use of health care services? JAMA 1990;263:669-72.
40. Stano M. Evaluating the policy role of the small area variations and physician practice style hypotheses. Health Policy 1993;24:9-17.
41. Westert GP, Groenewegen PP. Medical practice variations: changing the theoretical approach. Scand J Public Health 1999;27:173-80.
42. McDaniel RR,, Jr. Strategic leadership: a view from quantum and chaos theories. Health Care Manage Re 1997;22:21-37.
43. Fertig A, Roland M, King H, Moore T. Understanding variation in rates of referral among general practitioners: are inappropriate referrals important and would guidelines help to reduce rates? BMJ 1993;307:1467-70.
44. Stewart M, Weston WW, Brown JB, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, Calif: Sage Publications; 1995.
45. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on outcomes. J Fam Pract 2000;49:796-804.
46. Roter D. The enduring and evolving nature of the patient-physician relationship. Pat Ed Counsel 2000;39:5-15.
47. Novack DH, Suchman AL, Clark W, Epstein RM, Najberg E, Kaplan C. Calibrating the physician: physician personal awareness and effective patient care. JAMA 1997;278:502-09.
48. Clark R, Croft P. Critical reading for the reflective practitioner: a guide for primary care. Oxford, England: Butterworth-Heiineman; 1998.
49. Epstein R. Mindful practice. JAMA 1999;282:833-39.
50. Bolton G. Reflective practice: written and professional development. London, England: Sage Publications; 2001.
51. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
52. Wilber K. A brief theory of everything. Boston, Mass: Shambhala Publications, Inc; 2000.
53. Wilber K. Sex, ecology, spirituality. Boston, Mass: Shambhala Publications, Inc; 1995, 2000.
54. Davis P, Gribben B, Scott A, Lay-Yee R. The “supply hypothesis” and medical practice variation in primary care: testing economic and clinical models of inter-practitioner variation. Soc Sci Med 2000;50:407-18.
55. O’Connell DL, Henry D, Tomlins R. Randomised controlled trial of effect of feedback on general practitioners’ prescribing in Australia. BMJ 1999;318:507-11.
56. Salisbury C, Bosanquet N, Wilkinson E, Bosanquet A, Hasler J. The implementation of evidence-based medicine in general practice prescribing. Br J Gen Pract 1998;48:1849-52.
57. Starfield B. Primary care: balancing health needs, services, and technology. New York, NY: Oxford University Press; 1998.
58. Starfield B. Is US health really the best in the world? JAMA 2000;284:483-85.
59. Ayanian JZ, Guadagnoli E, McNeil BA, Cleary PD. Treatment and outcomes of acute myocardial infarction among patients of cardiologists and generalist physicians. Arch Intern Med 1997;157:2570-76.
60. Harrold L, Field T, Gurwitz J. Knowledge, patterns of care, and outcomes of care for generalists and specialists. J Gen Intern Med 1999;14:499-511.
61. MacLean CH, Louie R, Leake B, et al. Quality of care for patients with rheumatoid arthritis. JAMA 2000;284:984-92.
62. Longo DR. Patient practice variation: a call for research. Med Care 1993;31:YS81-5.
63. Shaughnessy AF, Slawson DC, Becker L. Clinical jazz: Harmonizing clinical experience and evidence-based medicine. J Fam Pract 1998;47:425-28.
Technician, Friend, Detective, and Healer: Family Physicians’ Responses to Emotional Distress
STUDY DESIGN: We used a multimethod comparative case study design of 18 family practices that included detailed descriptive field notes from direct observation of 1637 outpatient visits. An immersion/crystallization approach was used to explore physicians’ responses to emotional distress and apparent mental health issues.
POPULATION: A total of 379 outpatient encounters were reviewed from a purposeful sample of 13 family physicians from the 57 clinicians observed.
OUTCOMES MEASURED: Descriptive field notes of outpatient visits were examined for emotional content and physicians’ responses to emotional distress.
RESULTS: Analyses revealed a 3-phase process by which physicians responded to emotional distress: recognition, triage, and management. The analyses also uncovered a 4-quadrant typology of management based on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced).
CONCLUSIONS: Physicians appear to manage mental health issues by using 1 of 4 approaches based on their philosophy and core set of skills. Physician education and practice improvement should be tailored to build on physicians’ natural philosophical proclivity and psychosocial skills.
Primary care practices have been called America’s de facto mental health network, with more than two thirds of mental health disorders treated in the primary care sector.1 Up to 40% of primary care patients have a mental health problem,2 and 19% of outpatients report significant emotional distress during the previous 4 weeks.3 However, the detection and treatment rates of these problems are low.3-6
Thus, although the clinical philosophy of primary care professionals suggests that mental health care is an integral part of practice,7-9 there is an apparent discrepancy between these espoused ideals and usual clinical practice.3,5,10-11 Explanations of these findings include the reluctance of primary care physicians to label their patients and their use of observation and informal counseling as initial treatment efforts.11-13 The competing demands of practice, lack of resources, inadequate reimbursement, and various organizational factors such as mental health carve-outs also profoundly influence management.14-16 Using cluster analysis, Roter and colleagues17 found 5 distinct communication patterns between patients with ongoing medical problems and their physicians, ranging from narrowly biomedical to consumerist.17 Robinson and Roter18-19 found that patients are likely to respond to direct inquiry by physicians about psychosocial distress and that physicians often briefly counsel their patients in return. Callahan and coworkers3 demonstrated that recent emotional distress and mental health problems have an important impact on encounter activities (eg, more time on history taking and counseling). Despite these investigations, a robust model of physicians’ response to emotional distress remains incompletely characterized.
We sought to develop a typology of physicians’ reactions to and management of patients’ mental health problems and emotional distress. Our findings can help clinicians identify their own style and consider ways of meeting particular patient needs that may be better suited to an alternative approach.
Methods
Detailed descriptive field notes of outpatient visits were collected as part of a large multimethod comparative case study of 18 midwestern family practices. Trained field researchers spent 4 weeks or more in each practice and directly observed the practice environment and 30 outpatient visits with each clinician in the practice. While observing the outpatient visits, the field researcher took chronological notes of what was occurring during the encounters. These notes were later used to dictate detailed descriptions of each encounter. Although there were differences in the style of reporting among the observers, the quality of data was consistent. Details of the design and data collection can be found elsewhere in this issue.20
Two family physician researchers, 3 family therapists, and a medical anthropologist reviewed encounters from a purposeful sample of family physicians. Initially, encounters from 3 physicians representing diverse practice approaches (as assessed globally by a research nurse collecting the primary data) were reviewed. The goals were to understand the depth and detail of the data and to develop initial hypotheses, an organizational schema, and a crude overview of the presentation of and physician response to mental health issues. The management of mental health issues and emotional distress was then explored in a purposeful sample of physicians selected to maximize variation in sex, type, and location of practice; ethnicity; and age. By the nature of this qualitative study (without access to an independent gold standard for diagnosis of mental disorders), a broad definition of mental health problems was used, encompassing emotional distress and psychological problems. On the basis of the preliminary review of field note data, the research group identified that patients were presenting with emotional issues when they found a reported change in affect, a verbal report of an emotional issue, a somatic complaint often associated with emotional distress, or a follow-up visit for an expressed mental health issue (eg, refill of an antidepressant). This working definition was reached in the preliminary phase of the study, and through discussion a consensus was reached on the mental health aspects of each encounter.
Physicians were our unit of analysis, and the authors reviewed every outpatient visit available from each of the 13 physicians selected from the larger sample. The research team members used an editing organizing style for analysis,21 individually highlighted text they believed to be relevant, and made interpretive notes or observations in the margins.22 The research team then engaged in detailed discussions of the encounter transcripts. Particular attention was given to the total context of the encounter, recognizing other potential competing demands within the visit. The goal of this lengthy process was to reach consensus about what was important and how it should be interpreted. After discussing every encounter of a given physician, a summary case narrative was prepared and consensus reached about key themes for that physician.
After completing this initial review, matrices (eg, variations in patient management by practice location, physician age, and sex) were constructed to visualize other emergent patterns and facilitate comparisons across cases.23 Additional physicians were reviewed to search for confirming and disconfirming evidence (eg, did management vary by physician ethnicity?) until saturation was reached (ie, until no further novel information or themes were identified). This required the review of outpatient visits from 13 physicians. One of the primary research nurses who conducted the participant observation provided input that ensured a full diversity of physicians was considered. She also served as an additional check on interpretation of the primary data. Finally, overall theses common to all physicians were identified and important variations in management noted. Thus, we began by looking at individual physicians’ responses within each encounter, developed a coherent description of each physician’s modus operandi, and then identified overarching themes describing broad approaches to emotional distress and mental health issues.
Results
The 379 patient visits to 13 physicians represented a diverse sample of practice and encounter types (Table 1, Table 2). Although the chief complaints of many patients did not overtly appear to relate to a mental health condition or emotional distress, many patients’ emotional concerns presented within the context of an acute or chronic medical condition. All physicians had many encounters in which both overt and more covert emotional concerns and mental health issues emerged.
Physician Responses Within Encounters
The research team noted a wide range of physician reactions to patients presenting with emotional distress or potential mental health problems. During the physician-patient interaction, physicians apparently either recognized the emotional component of the encounter or did not. If emotional distress was recognized, physicians appeared to either actively ignore this problem, gloss over or triage it, or actively manage the distress. These phenomena are illustrated in Figure 1 and will be described in more detail.
Recognition
Not all emotional and mental health issues were apparently recognized. Such missed opportunities were identified with all the participant physicians, even among physicians who were consistently more attentive to addressing mental health problems. For example, during a follow-up visit with a middle-aged man with abdominal tenderness a computed tomography scan had disclosed a renal mass. The patient’s wife asked numerous questions about possible depression and anxiety in her alcohol-using husband. The physician did not pursue any of these concerns.
However, a minority of physicians actively asked about mental health problems. This “active case finding” often capitalized on the physician’s previous knowledge of the patient’s social situation or personal issues. In one encounter focusing on breast cancer follow-up, the physician asked a woman how she was interacting with her spouse after a mastectomy. In instances such as this, active case finding was part of the chatting that opened or ended an encounter, particularly among physician and patients who were familiar with each other. Physicians in this sample neither used screening instruments (eg, the Primary Care Evaluation of Mental Disorders or the Zung mental health scales) nor routinely inquired about suicidal ideation, even in their seemingly most severely depressed patients.
Gloss-Over/Triage
In some instances, the physician apparently understood the impact of a situation but seemed to gloss over the issues. During a health care maintenance visit, a woman reported that she had a miscarriage 3 months earlier. The physician asked, “Is this a good thing or a sad thing?” The patient stated that it was a sad thing, because they were looking forward to the birth. There was no further probing into how the patient and family were dealing with the miscarriage.
In other encounters, physicians clearly seemed to recognize the psychologic implications of an encounter but chose to postpone management. These physicians appeared to triage certain cases based on time, competing demands, or perhaps their own ability to weather another challenging patient. For example, in the case of a patient with arm pain seeking workers’ compensation, the patient noted, “the pain (after doing some minor chores) was simply not worth it.” The physician did not pursue this cue further, but rather concentrated on the scheduling of magnetic resonance imaging, an electromyogram, and a follow-up appointment. However, the physician later related to the nurse researcher her understanding of the impact of this problem and acknowledged the patient’s discomfort. Thus, this physician apparently “triaged” this issue to a later date.
Management
Although only a minority of physicians actively sought cases of emotional distress in these encounters, most actively managed mental health problems. Prompted by the patient’s presentation, physicians followed up on “leads” to potential mental health issues, including: a mother who discussed the death of her daughter, a woman with menstrual irregularity, and marital and financial stress. Such encounters demonstrated physicians being sensitive to the underlying psychosocial issues in their patients’ lives.
The management response appeared to be predicated on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced). In some instances, physicians appeared to spend considerable time on mental health issues with patients but apparently ran out of tools to deal with their problems effectively. This situation was most evident with patients who were substance abusers, who had chronic pain, those seeking workers’ compensation, and individuals with vague or multiple somatic symptoms.
A 4-Quadrant Typology of Physicians
A 4-quadrant typology of physicians emerged based on their philosophy and skill, as ascertained from the patient encounters Figure 2. Philosophically, physicians were on a continuum of being biomedically to biopsychosocially inclined, with each exhibiting a discernable dominant philosophy. Biomedically oriented physicians concentrated on the medical aspects of care and minimally explored the psychosocial milieu of the patient. Biopsychosocially oriented physicians addressed the patients’ emotional, physical, social, and sometimes spiritual wellbeing. Regardless of management approach (biomedical vs biopsychosocial), physicians demonstrated varying levels of competence in dealing with emotional distress.
Most physicians used “basic” skills—empathy, encouragement, small talk, use of silence, direct advice giving, and superficial education—to address their patients’ mental health problems. In some encounters the use of simple strategies was seemingly appropriate and effective; only occasionally were more advanced skills used. Such advanced skills ranged from effectively setting an agenda and soliciting the patient’s perspective to the use of more challenging interviewing skills, such as confrontation, implementing behavioral prescriptions, navigating referrals for skeptical patients, and mental health referrals that were part of a carefully developed treatment plan.
By combining the philosophy and skill dimensions, a 4-quadrant typology of physicians was apparent: the Technician, the Friend, the Detective, and the Healer. The Technician was medically oriented, dispensing medications and direct advice. Encounters were problem focused, and at times the physician appeared to be abrupt, ignorant of clear emotional distress, and not patient centered. In an encounter for follow-up of anxiety, one Technician told a patient complaining of neurologic symptoms that they might be stress related but still referred her to a neurologist. When she said, “This is really a frustrating way to feel,” he responded with, “Well, a neurologist deals with this,” and gave her samples of paroxetine, checked her for a sinus infection, and ended the encounter. Another patient seeing this physician for a complaint of depressive symptoms was identified without any discussion of underlying psychosocial issues; fluoxetine was dispensed in an encounter lasting less than 5 minutes.
The Friend was a biopsychosocially oriented physician with basic skills. One Friend extensively explored the patient’s background, concerns, and spiritual dimensions of illness. Encounters were long and tangential. A diverse array of topics was explored in a patient-centered fashion. However, only very basic counseling and management skills were ever observed with this physician. Direct advice was common, and conflict appeared to be avoided. A metaphor emerged of friends having coffee together.
Friends did not always appear to deliver care that optimally managed mental health issues. In some instances so many issues were discussed that the physician appeared to have difficulty setting an agenda for the visit and prioritizing problems. For example, for a patient just discharged after hospitalization for severe depression, there was no explicit discussion of depressive symptoms or suicidal ideation, despite a lengthy encounter.
The Detective was usually biomedically focused but when the occasion warranted, this type of physician demonstrated an impressive breadth of detective skills. For example, one Detective appeared most comfortable providing focused, snappy, medically oriented care. But she was alert to cues of emotional distress and demonstrated appropriate use of self-disclosure and confrontation in managing a patient with depression. In short, she was usually able to provide solutions for each case while focused on more biomedical issues.
The Healer used a full breadth of biopsychosocial skills, integrated most aspects of care seamlessly, and appeared comfortable with both strictly biomedical and psychosocial dimensions of care. One Healer regularly sought signs of emotional distress and exhibited an impressive range of skills in dealing with such problems as substance abuse and pain syndromes. For example, he astutely linked a patient’s stressful lifestyle with current somatic symptoms. In another encounter, with a woman with high blood pressure and weight gain, he assessed the possible biopsychosocial causes of the problem (etiologic stressors, sleep habits, relationship issues, diet changes, and depression, and probed about any anniversaries of a major stressor). However, even this Healer appeared to occasionally consciously temporize or triage emotional and mental health issues, such as when working with a patient with low back pain who was resistant to the treatment plan. During another encounter, he appeared to avoid the emotional implications of a diagnosis of venereal disease.
Thus, physicians addressed psychological problems in a variety of ways—from a strictly biomedical model to a more holistic fashion. Physicians also demonstrated a wide range of skills—from very basic to quite advanced, and applied these skills differently with different patients in different situations. Although a given provider’s performance often varied among encounters, most physicians appeared to have a preferred practice philosophy and singular skill set that they regularly used during patient visits.
Discussion
As in previous studies,10-15 we found that not all physicians appeared comfortable, trained, adept, or motivated to make sense of the emotional distress presented by patients. Parallel to the findings of Roter and colleages17 with regard to general communication patterns of primary care physicians, a typology of physician responses to emotional distress emerged from our data. The framework of encounters (recognition, triage, and management) and 4-quadrant physician typology that surfaced from this study helps clarify how physicians respond to emotional distress. Each of the approaches in this typology is likely to have pros and cons for meeting different patient needs for mental health and general medical care.
Understanding physicians’ predominant styles based on their philosophy and skill set can have 2 important uses. First, physicians can reflect and seek feedback on their own style. Patient needs that may be less well met by this style can then be identified and alternate ways of meeting these needs pursued. Second, clinicians and continuing medical education providers can use this typology to design educational approaches. This education should focus on expanding clinician flexibility and increasing insight into when to use what approach. The outcomes and tradeoffs in effectiveness, efficiency, and integration of care remain important areas for future research.24
Given the constraints on time, personal energy, and apparent competition between chronic physical and mental health problems, physician behaviors can be viewed as an understandable adaptation to the realities of a busy family practice.11-15-24-28 Although we have documented significant variation in counseling skills among family physicians, there is no data to suggest that expansion of these skills would necessarily improve patient outcomes.29 The effect of a long-term relationship and its quality between patients and a family physician on patient mental health outcomes remains unexplored and is a fruitful area for further research. Also, it is important to recognize that physicians are not homogeneous in their personality, philosophy, and skills and that patients self-select the kind of physician that best fits their own personality and style. Different approaches are likely to be functional for diverse clinicians with varied patients and situations.24
Limitations
Our study has important limitations, including its sampling, design, and lack of a reference standard for mental health conditions. This qualitative research, by its very nature, is not based on a random population sample and is therefore not generalizable in the traditional quantitative sense. Its generalizability lies in the resonance it generates among primary care physicians and patients who recognize these patterns from their own experiences. Also, the findings are consistent with our existing understanding of competing demands9,28,30 and physician communication strategies.17-19 To the extent to which midwestern physicians and patients do not reflect the ethnic and socioeconomic diversity of other parts of the country, these findings may also be limited. Future research should attempt to include diversity. Patients’ emotional distress may be communicated in other ways besides speech or may not be communicated at all, so the direct observation approach we used cannot always correctly infer patients’ unexpressed mental health needs or physicians’ assessment of the situation. Because the data were cross-sectional, it is not possible to determine what had occurred in previous visits in a longitudinal management strategy. Nevertheless, the richness of the field note data provided an excellent detailed view of a large sample of visits. Finally, the lack of a reference standard for diagnosing mental health conditions does not alter the main findings of this study—a typology of physicians’ responses to emotional distress within their practices.
In trying to understand and improve the treatment of mental health issues, many previous researchers have focused on improving physician knowledge and dissemination of guidelines; such efforts have been disappointing when used alone.31,32 Other investigators have sought to improve the interviewing skills of physicians, and while modestly successful, these studies have been limited in scope, length of follow-up, and ability to be replicated widely.18,19,33,34 Other approaches have included collaborative management and quality improvement efforts; while successful, such interventions may be difficult to replicate in the usual physician practice setting without substantial external resources.35-38
Conclusions
The chasm between ideal care of mental health disorders and actual practice may be narrower than mental health professionals would have us believe, and it is certainly bridgeable. It is possible to have better outcomes for medical conditions, improved patient and provider satisfaction, and reduced costs of care.39,40 By studying the exemplary physicians found in real world practices—as found in this study and others—we might better understand that combination of inclination, skill, and setting that promotes quality cost-effective care. We found that mental health care, while sporadically and diversely attended to in outpatient visits, is often integrated with care of the diverse medical, social, and family problems that constitute primary care. Irrespective of differences in philosophy, training, or interest, however, structural and economic issues still appear to severely limit the ability of even willing family physicians to practice coherent integrated primary care.41 It is therefore important for the field as a whole to provide feasible strategies for promoting recognition and treatment of mental health issues by diverse clinicians and patients in usual practice settings.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a research center grant from the American Academy of Family Physicians. The authors are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to acknowledge the dedicated work of Connie Gibbs and Jen Rouse, who spent countless hours collecting data; Diane Dodendorf and Jason Lebsack, who coordinated transcription and data management; and Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. We would also like to thank Kurt C. Stange, MD, PhD, for reviewing earlier drafts of our manuscript.
1. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.
2. Goldman LS, Nielsen NH, Champion HC, et al. Awareness, diagnosis, and treatment of depression. J Gen Intern Med 1999;14:569-80.
3. Callahan EJ, Jaen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
4. DeGruy FV. Mental healthcare in the primary care setting: a paradigm problem. Fam Syst Health 1997;15:3-23.
5. Schulberg HC, Block MR, Madonia MJ, et al. Treating major depression in primary care practice: eight-month clinical outcomes. Arch Gen Psychiatry 1996;53:913-19.
6. Coyne JC, Klinkman MS, Gallo SM, Schwenk TL. Short-term outcomes of detected and undetected depressed primary care patients and depressed psychiatry outpatients. Gen Hosp Psychiatry 1997;19:333-43.
7. DeGruy F. Mental health care in the primary care setting. Institute of Medicine Committee on the Future of Primary Care. Primary care: America’s health in a new era. Washington, DC: National Academy Press; 1996.
8. Frey J. The clinical philosophy of family medicine. Am J Med 1998;104:327-29.
9. Williams JW. Competing demands: does care for depression fit in primary care? J Gen Intern Med 1998;13:137-39.
10. Williams JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Rost K, Humphrey J, Kelleher K. Physician management p and barriers to care for rural patients with depression. Arch Fam Med 1994;3:409-14.
12. Susman JL, Crabtree BF, Essink G, et al. Depression in rural family practice: easy to recognize, difficult to diagnose. Arch Fam Med 1995;4:427-31.
13. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.
14. Klinkman MS. Competing demands in psychosocial care: a model for the identification and treatment of depressive disorders in primary care. Gen Hosp Psychiatry 1997;19:98-111.
15. Susman JL. Mental health problems within primary care: shooting first and then asking questions? J Fam Pract 1995;41:540-42.
16. Solberg L, Korsen N, Oxman T, et al. Depression care: a problem in need of a system. J Fam Pract 1999;48:973-79.
17. Roter DL, Stewart M, Putnam SM, et al. Communication patterns of primary care physicians. JAMA 1997;277:350-56.
18. Robinson JW, Roter DL. Counseling by primary care physicians of patients who disclose psychosocial problems. J Fam Pract 1999;48:698-705.
19. Robinson JW, Roter DL. Psychosocial problem disclosure by primary care patients. Soc Sci Med 1999;4899:1352-62.
20. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
21. Miller WL, Crabtree BF. The dance of interpretation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
22. Addison RB. A grounded hermeneutic editing approach. In: Crabtree BF, Miller WL, ed. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
23. Miles MB, Huberman AM. Qualitative data analysis: an expanded sourcebook. 2nd ed. Newbury Park, Calif: Sage Publications; 1994.
24. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
25. Main DS, Lutz LJ, Barrett JE, Matthew J, Miller RS. The role of primary care clinician attitudes, beliefs, and training in the diagnosis and treatment of depression: a report from the Ambulatory Sentinel Practice Network Inc. Arch Fam Med 1993;2:1061-66.
26. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
27. Nutting PA, Rost K, Smith J, et al. Competing demands from physical problems: effect on initiating and completing depression care over 6 months. Arch Fam Med 2000;9:1059-64.
28. Jaen CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
29. Tiemens BG, Ormel J, Jenner JA, et al. Training primary-care physicians to recognize, diagnose, and manage depression: does it improve patient outcomes? Psychol Med 1999;29:833-45.
30. Stange KC, Jaen CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
31. Lin EH, Katon WJ, Simon GE, et al. Achieving guidelines for the treatment of depression in primary care: is physician education enough? Med Care 1997;35:831-42.
32. Feldman EL, Jaffe A, Galambos N, et al. Clinical practice guidelines on depression: awareness, attitudes, and content knowledge among family physicians in New York. Arch Fam Med 1998;7:58-62.
33. Marvel MK, Epstein RM, Flowers K, Beckman HB. Soliciting the patient’s agenda: have we improved? JAMA 1999;281:283-87.
34. Hulsman RL, Ros WJ, Winnubst JA, et al. Teaching clinically experienced physicians communication skills: review of evaluation studies. Med Educ 1999;33:665-68.
35. Katon WM, Von Korff M, Lin E, et al. Collaborative management to achieve treatment guidelines: impact on depression in primary care. JAMA 1995;273:1026-31.
36. Wells KB, Scherbourne C, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care: a randomized controlled trial. JAMA 2000;283:212-20.
37. Brown JB, Shye D, McFarland BH, et al. Controlled trials of CQI and academic detailing to implement a clinical practice guideline for depression. J Qual Improvement 2000;26:39-54.
38. Law D, Crane D. The influence of marital and family therapy on health care utilization in a health maintenance organization. J Marital Fam Ther 2000;26:281-91.
39. Campbell TL, Franks P, Fiscella K, et al. Do physicians who diagnose more mental health disorders generate lower health care costs? J Fam Pract 2000;49:305-10.
40. Katon W. Collaborative care: patient satisfaction, outcome, and medical cost-offset. Fam Syst Med 1995;13:351-65.
41. Degruy FV. Mental health diagnoses and the costs of primary care. J Fam Pract 2000;49:311-13.
STUDY DESIGN: We used a multimethod comparative case study design of 18 family practices that included detailed descriptive field notes from direct observation of 1637 outpatient visits. An immersion/crystallization approach was used to explore physicians’ responses to emotional distress and apparent mental health issues.
POPULATION: A total of 379 outpatient encounters were reviewed from a purposeful sample of 13 family physicians from the 57 clinicians observed.
OUTCOMES MEASURED: Descriptive field notes of outpatient visits were examined for emotional content and physicians’ responses to emotional distress.
RESULTS: Analyses revealed a 3-phase process by which physicians responded to emotional distress: recognition, triage, and management. The analyses also uncovered a 4-quadrant typology of management based on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced).
CONCLUSIONS: Physicians appear to manage mental health issues by using 1 of 4 approaches based on their philosophy and core set of skills. Physician education and practice improvement should be tailored to build on physicians’ natural philosophical proclivity and psychosocial skills.
Primary care practices have been called America’s de facto mental health network, with more than two thirds of mental health disorders treated in the primary care sector.1 Up to 40% of primary care patients have a mental health problem,2 and 19% of outpatients report significant emotional distress during the previous 4 weeks.3 However, the detection and treatment rates of these problems are low.3-6
Thus, although the clinical philosophy of primary care professionals suggests that mental health care is an integral part of practice,7-9 there is an apparent discrepancy between these espoused ideals and usual clinical practice.3,5,10-11 Explanations of these findings include the reluctance of primary care physicians to label their patients and their use of observation and informal counseling as initial treatment efforts.11-13 The competing demands of practice, lack of resources, inadequate reimbursement, and various organizational factors such as mental health carve-outs also profoundly influence management.14-16 Using cluster analysis, Roter and colleagues17 found 5 distinct communication patterns between patients with ongoing medical problems and their physicians, ranging from narrowly biomedical to consumerist.17 Robinson and Roter18-19 found that patients are likely to respond to direct inquiry by physicians about psychosocial distress and that physicians often briefly counsel their patients in return. Callahan and coworkers3 demonstrated that recent emotional distress and mental health problems have an important impact on encounter activities (eg, more time on history taking and counseling). Despite these investigations, a robust model of physicians’ response to emotional distress remains incompletely characterized.
We sought to develop a typology of physicians’ reactions to and management of patients’ mental health problems and emotional distress. Our findings can help clinicians identify their own style and consider ways of meeting particular patient needs that may be better suited to an alternative approach.
Methods
Detailed descriptive field notes of outpatient visits were collected as part of a large multimethod comparative case study of 18 midwestern family practices. Trained field researchers spent 4 weeks or more in each practice and directly observed the practice environment and 30 outpatient visits with each clinician in the practice. While observing the outpatient visits, the field researcher took chronological notes of what was occurring during the encounters. These notes were later used to dictate detailed descriptions of each encounter. Although there were differences in the style of reporting among the observers, the quality of data was consistent. Details of the design and data collection can be found elsewhere in this issue.20
Two family physician researchers, 3 family therapists, and a medical anthropologist reviewed encounters from a purposeful sample of family physicians. Initially, encounters from 3 physicians representing diverse practice approaches (as assessed globally by a research nurse collecting the primary data) were reviewed. The goals were to understand the depth and detail of the data and to develop initial hypotheses, an organizational schema, and a crude overview of the presentation of and physician response to mental health issues. The management of mental health issues and emotional distress was then explored in a purposeful sample of physicians selected to maximize variation in sex, type, and location of practice; ethnicity; and age. By the nature of this qualitative study (without access to an independent gold standard for diagnosis of mental disorders), a broad definition of mental health problems was used, encompassing emotional distress and psychological problems. On the basis of the preliminary review of field note data, the research group identified that patients were presenting with emotional issues when they found a reported change in affect, a verbal report of an emotional issue, a somatic complaint often associated with emotional distress, or a follow-up visit for an expressed mental health issue (eg, refill of an antidepressant). This working definition was reached in the preliminary phase of the study, and through discussion a consensus was reached on the mental health aspects of each encounter.
Physicians were our unit of analysis, and the authors reviewed every outpatient visit available from each of the 13 physicians selected from the larger sample. The research team members used an editing organizing style for analysis,21 individually highlighted text they believed to be relevant, and made interpretive notes or observations in the margins.22 The research team then engaged in detailed discussions of the encounter transcripts. Particular attention was given to the total context of the encounter, recognizing other potential competing demands within the visit. The goal of this lengthy process was to reach consensus about what was important and how it should be interpreted. After discussing every encounter of a given physician, a summary case narrative was prepared and consensus reached about key themes for that physician.
After completing this initial review, matrices (eg, variations in patient management by practice location, physician age, and sex) were constructed to visualize other emergent patterns and facilitate comparisons across cases.23 Additional physicians were reviewed to search for confirming and disconfirming evidence (eg, did management vary by physician ethnicity?) until saturation was reached (ie, until no further novel information or themes were identified). This required the review of outpatient visits from 13 physicians. One of the primary research nurses who conducted the participant observation provided input that ensured a full diversity of physicians was considered. She also served as an additional check on interpretation of the primary data. Finally, overall theses common to all physicians were identified and important variations in management noted. Thus, we began by looking at individual physicians’ responses within each encounter, developed a coherent description of each physician’s modus operandi, and then identified overarching themes describing broad approaches to emotional distress and mental health issues.
Results
The 379 patient visits to 13 physicians represented a diverse sample of practice and encounter types (Table 1, Table 2). Although the chief complaints of many patients did not overtly appear to relate to a mental health condition or emotional distress, many patients’ emotional concerns presented within the context of an acute or chronic medical condition. All physicians had many encounters in which both overt and more covert emotional concerns and mental health issues emerged.
Physician Responses Within Encounters
The research team noted a wide range of physician reactions to patients presenting with emotional distress or potential mental health problems. During the physician-patient interaction, physicians apparently either recognized the emotional component of the encounter or did not. If emotional distress was recognized, physicians appeared to either actively ignore this problem, gloss over or triage it, or actively manage the distress. These phenomena are illustrated in Figure 1 and will be described in more detail.
Recognition
Not all emotional and mental health issues were apparently recognized. Such missed opportunities were identified with all the participant physicians, even among physicians who were consistently more attentive to addressing mental health problems. For example, during a follow-up visit with a middle-aged man with abdominal tenderness a computed tomography scan had disclosed a renal mass. The patient’s wife asked numerous questions about possible depression and anxiety in her alcohol-using husband. The physician did not pursue any of these concerns.
However, a minority of physicians actively asked about mental health problems. This “active case finding” often capitalized on the physician’s previous knowledge of the patient’s social situation or personal issues. In one encounter focusing on breast cancer follow-up, the physician asked a woman how she was interacting with her spouse after a mastectomy. In instances such as this, active case finding was part of the chatting that opened or ended an encounter, particularly among physician and patients who were familiar with each other. Physicians in this sample neither used screening instruments (eg, the Primary Care Evaluation of Mental Disorders or the Zung mental health scales) nor routinely inquired about suicidal ideation, even in their seemingly most severely depressed patients.
Gloss-Over/Triage
In some instances, the physician apparently understood the impact of a situation but seemed to gloss over the issues. During a health care maintenance visit, a woman reported that she had a miscarriage 3 months earlier. The physician asked, “Is this a good thing or a sad thing?” The patient stated that it was a sad thing, because they were looking forward to the birth. There was no further probing into how the patient and family were dealing with the miscarriage.
In other encounters, physicians clearly seemed to recognize the psychologic implications of an encounter but chose to postpone management. These physicians appeared to triage certain cases based on time, competing demands, or perhaps their own ability to weather another challenging patient. For example, in the case of a patient with arm pain seeking workers’ compensation, the patient noted, “the pain (after doing some minor chores) was simply not worth it.” The physician did not pursue this cue further, but rather concentrated on the scheduling of magnetic resonance imaging, an electromyogram, and a follow-up appointment. However, the physician later related to the nurse researcher her understanding of the impact of this problem and acknowledged the patient’s discomfort. Thus, this physician apparently “triaged” this issue to a later date.
Management
Although only a minority of physicians actively sought cases of emotional distress in these encounters, most actively managed mental health problems. Prompted by the patient’s presentation, physicians followed up on “leads” to potential mental health issues, including: a mother who discussed the death of her daughter, a woman with menstrual irregularity, and marital and financial stress. Such encounters demonstrated physicians being sensitive to the underlying psychosocial issues in their patients’ lives.
The management response appeared to be predicated on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced). In some instances, physicians appeared to spend considerable time on mental health issues with patients but apparently ran out of tools to deal with their problems effectively. This situation was most evident with patients who were substance abusers, who had chronic pain, those seeking workers’ compensation, and individuals with vague or multiple somatic symptoms.
A 4-Quadrant Typology of Physicians
A 4-quadrant typology of physicians emerged based on their philosophy and skill, as ascertained from the patient encounters Figure 2. Philosophically, physicians were on a continuum of being biomedically to biopsychosocially inclined, with each exhibiting a discernable dominant philosophy. Biomedically oriented physicians concentrated on the medical aspects of care and minimally explored the psychosocial milieu of the patient. Biopsychosocially oriented physicians addressed the patients’ emotional, physical, social, and sometimes spiritual wellbeing. Regardless of management approach (biomedical vs biopsychosocial), physicians demonstrated varying levels of competence in dealing with emotional distress.
Most physicians used “basic” skills—empathy, encouragement, small talk, use of silence, direct advice giving, and superficial education—to address their patients’ mental health problems. In some encounters the use of simple strategies was seemingly appropriate and effective; only occasionally were more advanced skills used. Such advanced skills ranged from effectively setting an agenda and soliciting the patient’s perspective to the use of more challenging interviewing skills, such as confrontation, implementing behavioral prescriptions, navigating referrals for skeptical patients, and mental health referrals that were part of a carefully developed treatment plan.
By combining the philosophy and skill dimensions, a 4-quadrant typology of physicians was apparent: the Technician, the Friend, the Detective, and the Healer. The Technician was medically oriented, dispensing medications and direct advice. Encounters were problem focused, and at times the physician appeared to be abrupt, ignorant of clear emotional distress, and not patient centered. In an encounter for follow-up of anxiety, one Technician told a patient complaining of neurologic symptoms that they might be stress related but still referred her to a neurologist. When she said, “This is really a frustrating way to feel,” he responded with, “Well, a neurologist deals with this,” and gave her samples of paroxetine, checked her for a sinus infection, and ended the encounter. Another patient seeing this physician for a complaint of depressive symptoms was identified without any discussion of underlying psychosocial issues; fluoxetine was dispensed in an encounter lasting less than 5 minutes.
The Friend was a biopsychosocially oriented physician with basic skills. One Friend extensively explored the patient’s background, concerns, and spiritual dimensions of illness. Encounters were long and tangential. A diverse array of topics was explored in a patient-centered fashion. However, only very basic counseling and management skills were ever observed with this physician. Direct advice was common, and conflict appeared to be avoided. A metaphor emerged of friends having coffee together.
Friends did not always appear to deliver care that optimally managed mental health issues. In some instances so many issues were discussed that the physician appeared to have difficulty setting an agenda for the visit and prioritizing problems. For example, for a patient just discharged after hospitalization for severe depression, there was no explicit discussion of depressive symptoms or suicidal ideation, despite a lengthy encounter.
The Detective was usually biomedically focused but when the occasion warranted, this type of physician demonstrated an impressive breadth of detective skills. For example, one Detective appeared most comfortable providing focused, snappy, medically oriented care. But she was alert to cues of emotional distress and demonstrated appropriate use of self-disclosure and confrontation in managing a patient with depression. In short, she was usually able to provide solutions for each case while focused on more biomedical issues.
The Healer used a full breadth of biopsychosocial skills, integrated most aspects of care seamlessly, and appeared comfortable with both strictly biomedical and psychosocial dimensions of care. One Healer regularly sought signs of emotional distress and exhibited an impressive range of skills in dealing with such problems as substance abuse and pain syndromes. For example, he astutely linked a patient’s stressful lifestyle with current somatic symptoms. In another encounter, with a woman with high blood pressure and weight gain, he assessed the possible biopsychosocial causes of the problem (etiologic stressors, sleep habits, relationship issues, diet changes, and depression, and probed about any anniversaries of a major stressor). However, even this Healer appeared to occasionally consciously temporize or triage emotional and mental health issues, such as when working with a patient with low back pain who was resistant to the treatment plan. During another encounter, he appeared to avoid the emotional implications of a diagnosis of venereal disease.
Thus, physicians addressed psychological problems in a variety of ways—from a strictly biomedical model to a more holistic fashion. Physicians also demonstrated a wide range of skills—from very basic to quite advanced, and applied these skills differently with different patients in different situations. Although a given provider’s performance often varied among encounters, most physicians appeared to have a preferred practice philosophy and singular skill set that they regularly used during patient visits.
Discussion
As in previous studies,10-15 we found that not all physicians appeared comfortable, trained, adept, or motivated to make sense of the emotional distress presented by patients. Parallel to the findings of Roter and colleages17 with regard to general communication patterns of primary care physicians, a typology of physician responses to emotional distress emerged from our data. The framework of encounters (recognition, triage, and management) and 4-quadrant physician typology that surfaced from this study helps clarify how physicians respond to emotional distress. Each of the approaches in this typology is likely to have pros and cons for meeting different patient needs for mental health and general medical care.
Understanding physicians’ predominant styles based on their philosophy and skill set can have 2 important uses. First, physicians can reflect and seek feedback on their own style. Patient needs that may be less well met by this style can then be identified and alternate ways of meeting these needs pursued. Second, clinicians and continuing medical education providers can use this typology to design educational approaches. This education should focus on expanding clinician flexibility and increasing insight into when to use what approach. The outcomes and tradeoffs in effectiveness, efficiency, and integration of care remain important areas for future research.24
Given the constraints on time, personal energy, and apparent competition between chronic physical and mental health problems, physician behaviors can be viewed as an understandable adaptation to the realities of a busy family practice.11-15-24-28 Although we have documented significant variation in counseling skills among family physicians, there is no data to suggest that expansion of these skills would necessarily improve patient outcomes.29 The effect of a long-term relationship and its quality between patients and a family physician on patient mental health outcomes remains unexplored and is a fruitful area for further research. Also, it is important to recognize that physicians are not homogeneous in their personality, philosophy, and skills and that patients self-select the kind of physician that best fits their own personality and style. Different approaches are likely to be functional for diverse clinicians with varied patients and situations.24
Limitations
Our study has important limitations, including its sampling, design, and lack of a reference standard for mental health conditions. This qualitative research, by its very nature, is not based on a random population sample and is therefore not generalizable in the traditional quantitative sense. Its generalizability lies in the resonance it generates among primary care physicians and patients who recognize these patterns from their own experiences. Also, the findings are consistent with our existing understanding of competing demands9,28,30 and physician communication strategies.17-19 To the extent to which midwestern physicians and patients do not reflect the ethnic and socioeconomic diversity of other parts of the country, these findings may also be limited. Future research should attempt to include diversity. Patients’ emotional distress may be communicated in other ways besides speech or may not be communicated at all, so the direct observation approach we used cannot always correctly infer patients’ unexpressed mental health needs or physicians’ assessment of the situation. Because the data were cross-sectional, it is not possible to determine what had occurred in previous visits in a longitudinal management strategy. Nevertheless, the richness of the field note data provided an excellent detailed view of a large sample of visits. Finally, the lack of a reference standard for diagnosing mental health conditions does not alter the main findings of this study—a typology of physicians’ responses to emotional distress within their practices.
In trying to understand and improve the treatment of mental health issues, many previous researchers have focused on improving physician knowledge and dissemination of guidelines; such efforts have been disappointing when used alone.31,32 Other investigators have sought to improve the interviewing skills of physicians, and while modestly successful, these studies have been limited in scope, length of follow-up, and ability to be replicated widely.18,19,33,34 Other approaches have included collaborative management and quality improvement efforts; while successful, such interventions may be difficult to replicate in the usual physician practice setting without substantial external resources.35-38
Conclusions
The chasm between ideal care of mental health disorders and actual practice may be narrower than mental health professionals would have us believe, and it is certainly bridgeable. It is possible to have better outcomes for medical conditions, improved patient and provider satisfaction, and reduced costs of care.39,40 By studying the exemplary physicians found in real world practices—as found in this study and others—we might better understand that combination of inclination, skill, and setting that promotes quality cost-effective care. We found that mental health care, while sporadically and diversely attended to in outpatient visits, is often integrated with care of the diverse medical, social, and family problems that constitute primary care. Irrespective of differences in philosophy, training, or interest, however, structural and economic issues still appear to severely limit the ability of even willing family physicians to practice coherent integrated primary care.41 It is therefore important for the field as a whole to provide feasible strategies for promoting recognition and treatment of mental health issues by diverse clinicians and patients in usual practice settings.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a research center grant from the American Academy of Family Physicians. The authors are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to acknowledge the dedicated work of Connie Gibbs and Jen Rouse, who spent countless hours collecting data; Diane Dodendorf and Jason Lebsack, who coordinated transcription and data management; and Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. We would also like to thank Kurt C. Stange, MD, PhD, for reviewing earlier drafts of our manuscript.
STUDY DESIGN: We used a multimethod comparative case study design of 18 family practices that included detailed descriptive field notes from direct observation of 1637 outpatient visits. An immersion/crystallization approach was used to explore physicians’ responses to emotional distress and apparent mental health issues.
POPULATION: A total of 379 outpatient encounters were reviewed from a purposeful sample of 13 family physicians from the 57 clinicians observed.
OUTCOMES MEASURED: Descriptive field notes of outpatient visits were examined for emotional content and physicians’ responses to emotional distress.
RESULTS: Analyses revealed a 3-phase process by which physicians responded to emotional distress: recognition, triage, and management. The analyses also uncovered a 4-quadrant typology of management based on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced).
CONCLUSIONS: Physicians appear to manage mental health issues by using 1 of 4 approaches based on their philosophy and core set of skills. Physician education and practice improvement should be tailored to build on physicians’ natural philosophical proclivity and psychosocial skills.
Primary care practices have been called America’s de facto mental health network, with more than two thirds of mental health disorders treated in the primary care sector.1 Up to 40% of primary care patients have a mental health problem,2 and 19% of outpatients report significant emotional distress during the previous 4 weeks.3 However, the detection and treatment rates of these problems are low.3-6
Thus, although the clinical philosophy of primary care professionals suggests that mental health care is an integral part of practice,7-9 there is an apparent discrepancy between these espoused ideals and usual clinical practice.3,5,10-11 Explanations of these findings include the reluctance of primary care physicians to label their patients and their use of observation and informal counseling as initial treatment efforts.11-13 The competing demands of practice, lack of resources, inadequate reimbursement, and various organizational factors such as mental health carve-outs also profoundly influence management.14-16 Using cluster analysis, Roter and colleagues17 found 5 distinct communication patterns between patients with ongoing medical problems and their physicians, ranging from narrowly biomedical to consumerist.17 Robinson and Roter18-19 found that patients are likely to respond to direct inquiry by physicians about psychosocial distress and that physicians often briefly counsel their patients in return. Callahan and coworkers3 demonstrated that recent emotional distress and mental health problems have an important impact on encounter activities (eg, more time on history taking and counseling). Despite these investigations, a robust model of physicians’ response to emotional distress remains incompletely characterized.
We sought to develop a typology of physicians’ reactions to and management of patients’ mental health problems and emotional distress. Our findings can help clinicians identify their own style and consider ways of meeting particular patient needs that may be better suited to an alternative approach.
Methods
Detailed descriptive field notes of outpatient visits were collected as part of a large multimethod comparative case study of 18 midwestern family practices. Trained field researchers spent 4 weeks or more in each practice and directly observed the practice environment and 30 outpatient visits with each clinician in the practice. While observing the outpatient visits, the field researcher took chronological notes of what was occurring during the encounters. These notes were later used to dictate detailed descriptions of each encounter. Although there were differences in the style of reporting among the observers, the quality of data was consistent. Details of the design and data collection can be found elsewhere in this issue.20
Two family physician researchers, 3 family therapists, and a medical anthropologist reviewed encounters from a purposeful sample of family physicians. Initially, encounters from 3 physicians representing diverse practice approaches (as assessed globally by a research nurse collecting the primary data) were reviewed. The goals were to understand the depth and detail of the data and to develop initial hypotheses, an organizational schema, and a crude overview of the presentation of and physician response to mental health issues. The management of mental health issues and emotional distress was then explored in a purposeful sample of physicians selected to maximize variation in sex, type, and location of practice; ethnicity; and age. By the nature of this qualitative study (without access to an independent gold standard for diagnosis of mental disorders), a broad definition of mental health problems was used, encompassing emotional distress and psychological problems. On the basis of the preliminary review of field note data, the research group identified that patients were presenting with emotional issues when they found a reported change in affect, a verbal report of an emotional issue, a somatic complaint often associated with emotional distress, or a follow-up visit for an expressed mental health issue (eg, refill of an antidepressant). This working definition was reached in the preliminary phase of the study, and through discussion a consensus was reached on the mental health aspects of each encounter.
Physicians were our unit of analysis, and the authors reviewed every outpatient visit available from each of the 13 physicians selected from the larger sample. The research team members used an editing organizing style for analysis,21 individually highlighted text they believed to be relevant, and made interpretive notes or observations in the margins.22 The research team then engaged in detailed discussions of the encounter transcripts. Particular attention was given to the total context of the encounter, recognizing other potential competing demands within the visit. The goal of this lengthy process was to reach consensus about what was important and how it should be interpreted. After discussing every encounter of a given physician, a summary case narrative was prepared and consensus reached about key themes for that physician.
After completing this initial review, matrices (eg, variations in patient management by practice location, physician age, and sex) were constructed to visualize other emergent patterns and facilitate comparisons across cases.23 Additional physicians were reviewed to search for confirming and disconfirming evidence (eg, did management vary by physician ethnicity?) until saturation was reached (ie, until no further novel information or themes were identified). This required the review of outpatient visits from 13 physicians. One of the primary research nurses who conducted the participant observation provided input that ensured a full diversity of physicians was considered. She also served as an additional check on interpretation of the primary data. Finally, overall theses common to all physicians were identified and important variations in management noted. Thus, we began by looking at individual physicians’ responses within each encounter, developed a coherent description of each physician’s modus operandi, and then identified overarching themes describing broad approaches to emotional distress and mental health issues.
Results
The 379 patient visits to 13 physicians represented a diverse sample of practice and encounter types (Table 1, Table 2). Although the chief complaints of many patients did not overtly appear to relate to a mental health condition or emotional distress, many patients’ emotional concerns presented within the context of an acute or chronic medical condition. All physicians had many encounters in which both overt and more covert emotional concerns and mental health issues emerged.
Physician Responses Within Encounters
The research team noted a wide range of physician reactions to patients presenting with emotional distress or potential mental health problems. During the physician-patient interaction, physicians apparently either recognized the emotional component of the encounter or did not. If emotional distress was recognized, physicians appeared to either actively ignore this problem, gloss over or triage it, or actively manage the distress. These phenomena are illustrated in Figure 1 and will be described in more detail.
Recognition
Not all emotional and mental health issues were apparently recognized. Such missed opportunities were identified with all the participant physicians, even among physicians who were consistently more attentive to addressing mental health problems. For example, during a follow-up visit with a middle-aged man with abdominal tenderness a computed tomography scan had disclosed a renal mass. The patient’s wife asked numerous questions about possible depression and anxiety in her alcohol-using husband. The physician did not pursue any of these concerns.
However, a minority of physicians actively asked about mental health problems. This “active case finding” often capitalized on the physician’s previous knowledge of the patient’s social situation or personal issues. In one encounter focusing on breast cancer follow-up, the physician asked a woman how she was interacting with her spouse after a mastectomy. In instances such as this, active case finding was part of the chatting that opened or ended an encounter, particularly among physician and patients who were familiar with each other. Physicians in this sample neither used screening instruments (eg, the Primary Care Evaluation of Mental Disorders or the Zung mental health scales) nor routinely inquired about suicidal ideation, even in their seemingly most severely depressed patients.
Gloss-Over/Triage
In some instances, the physician apparently understood the impact of a situation but seemed to gloss over the issues. During a health care maintenance visit, a woman reported that she had a miscarriage 3 months earlier. The physician asked, “Is this a good thing or a sad thing?” The patient stated that it was a sad thing, because they were looking forward to the birth. There was no further probing into how the patient and family were dealing with the miscarriage.
In other encounters, physicians clearly seemed to recognize the psychologic implications of an encounter but chose to postpone management. These physicians appeared to triage certain cases based on time, competing demands, or perhaps their own ability to weather another challenging patient. For example, in the case of a patient with arm pain seeking workers’ compensation, the patient noted, “the pain (after doing some minor chores) was simply not worth it.” The physician did not pursue this cue further, but rather concentrated on the scheduling of magnetic resonance imaging, an electromyogram, and a follow-up appointment. However, the physician later related to the nurse researcher her understanding of the impact of this problem and acknowledged the patient’s discomfort. Thus, this physician apparently “triaged” this issue to a later date.
Management
Although only a minority of physicians actively sought cases of emotional distress in these encounters, most actively managed mental health problems. Prompted by the patient’s presentation, physicians followed up on “leads” to potential mental health issues, including: a mother who discussed the death of her daughter, a woman with menstrual irregularity, and marital and financial stress. Such encounters demonstrated physicians being sensitive to the underlying psychosocial issues in their patients’ lives.
The management response appeared to be predicated on the physician’s philosophy (biomedical vs holistic) and skill level (basic vs more advanced). In some instances, physicians appeared to spend considerable time on mental health issues with patients but apparently ran out of tools to deal with their problems effectively. This situation was most evident with patients who were substance abusers, who had chronic pain, those seeking workers’ compensation, and individuals with vague or multiple somatic symptoms.
A 4-Quadrant Typology of Physicians
A 4-quadrant typology of physicians emerged based on their philosophy and skill, as ascertained from the patient encounters Figure 2. Philosophically, physicians were on a continuum of being biomedically to biopsychosocially inclined, with each exhibiting a discernable dominant philosophy. Biomedically oriented physicians concentrated on the medical aspects of care and minimally explored the psychosocial milieu of the patient. Biopsychosocially oriented physicians addressed the patients’ emotional, physical, social, and sometimes spiritual wellbeing. Regardless of management approach (biomedical vs biopsychosocial), physicians demonstrated varying levels of competence in dealing with emotional distress.
Most physicians used “basic” skills—empathy, encouragement, small talk, use of silence, direct advice giving, and superficial education—to address their patients’ mental health problems. In some encounters the use of simple strategies was seemingly appropriate and effective; only occasionally were more advanced skills used. Such advanced skills ranged from effectively setting an agenda and soliciting the patient’s perspective to the use of more challenging interviewing skills, such as confrontation, implementing behavioral prescriptions, navigating referrals for skeptical patients, and mental health referrals that were part of a carefully developed treatment plan.
By combining the philosophy and skill dimensions, a 4-quadrant typology of physicians was apparent: the Technician, the Friend, the Detective, and the Healer. The Technician was medically oriented, dispensing medications and direct advice. Encounters were problem focused, and at times the physician appeared to be abrupt, ignorant of clear emotional distress, and not patient centered. In an encounter for follow-up of anxiety, one Technician told a patient complaining of neurologic symptoms that they might be stress related but still referred her to a neurologist. When she said, “This is really a frustrating way to feel,” he responded with, “Well, a neurologist deals with this,” and gave her samples of paroxetine, checked her for a sinus infection, and ended the encounter. Another patient seeing this physician for a complaint of depressive symptoms was identified without any discussion of underlying psychosocial issues; fluoxetine was dispensed in an encounter lasting less than 5 minutes.
The Friend was a biopsychosocially oriented physician with basic skills. One Friend extensively explored the patient’s background, concerns, and spiritual dimensions of illness. Encounters were long and tangential. A diverse array of topics was explored in a patient-centered fashion. However, only very basic counseling and management skills were ever observed with this physician. Direct advice was common, and conflict appeared to be avoided. A metaphor emerged of friends having coffee together.
Friends did not always appear to deliver care that optimally managed mental health issues. In some instances so many issues were discussed that the physician appeared to have difficulty setting an agenda for the visit and prioritizing problems. For example, for a patient just discharged after hospitalization for severe depression, there was no explicit discussion of depressive symptoms or suicidal ideation, despite a lengthy encounter.
The Detective was usually biomedically focused but when the occasion warranted, this type of physician demonstrated an impressive breadth of detective skills. For example, one Detective appeared most comfortable providing focused, snappy, medically oriented care. But she was alert to cues of emotional distress and demonstrated appropriate use of self-disclosure and confrontation in managing a patient with depression. In short, she was usually able to provide solutions for each case while focused on more biomedical issues.
The Healer used a full breadth of biopsychosocial skills, integrated most aspects of care seamlessly, and appeared comfortable with both strictly biomedical and psychosocial dimensions of care. One Healer regularly sought signs of emotional distress and exhibited an impressive range of skills in dealing with such problems as substance abuse and pain syndromes. For example, he astutely linked a patient’s stressful lifestyle with current somatic symptoms. In another encounter, with a woman with high blood pressure and weight gain, he assessed the possible biopsychosocial causes of the problem (etiologic stressors, sleep habits, relationship issues, diet changes, and depression, and probed about any anniversaries of a major stressor). However, even this Healer appeared to occasionally consciously temporize or triage emotional and mental health issues, such as when working with a patient with low back pain who was resistant to the treatment plan. During another encounter, he appeared to avoid the emotional implications of a diagnosis of venereal disease.
Thus, physicians addressed psychological problems in a variety of ways—from a strictly biomedical model to a more holistic fashion. Physicians also demonstrated a wide range of skills—from very basic to quite advanced, and applied these skills differently with different patients in different situations. Although a given provider’s performance often varied among encounters, most physicians appeared to have a preferred practice philosophy and singular skill set that they regularly used during patient visits.
Discussion
As in previous studies,10-15 we found that not all physicians appeared comfortable, trained, adept, or motivated to make sense of the emotional distress presented by patients. Parallel to the findings of Roter and colleages17 with regard to general communication patterns of primary care physicians, a typology of physician responses to emotional distress emerged from our data. The framework of encounters (recognition, triage, and management) and 4-quadrant physician typology that surfaced from this study helps clarify how physicians respond to emotional distress. Each of the approaches in this typology is likely to have pros and cons for meeting different patient needs for mental health and general medical care.
Understanding physicians’ predominant styles based on their philosophy and skill set can have 2 important uses. First, physicians can reflect and seek feedback on their own style. Patient needs that may be less well met by this style can then be identified and alternate ways of meeting these needs pursued. Second, clinicians and continuing medical education providers can use this typology to design educational approaches. This education should focus on expanding clinician flexibility and increasing insight into when to use what approach. The outcomes and tradeoffs in effectiveness, efficiency, and integration of care remain important areas for future research.24
Given the constraints on time, personal energy, and apparent competition between chronic physical and mental health problems, physician behaviors can be viewed as an understandable adaptation to the realities of a busy family practice.11-15-24-28 Although we have documented significant variation in counseling skills among family physicians, there is no data to suggest that expansion of these skills would necessarily improve patient outcomes.29 The effect of a long-term relationship and its quality between patients and a family physician on patient mental health outcomes remains unexplored and is a fruitful area for further research. Also, it is important to recognize that physicians are not homogeneous in their personality, philosophy, and skills and that patients self-select the kind of physician that best fits their own personality and style. Different approaches are likely to be functional for diverse clinicians with varied patients and situations.24
Limitations
Our study has important limitations, including its sampling, design, and lack of a reference standard for mental health conditions. This qualitative research, by its very nature, is not based on a random population sample and is therefore not generalizable in the traditional quantitative sense. Its generalizability lies in the resonance it generates among primary care physicians and patients who recognize these patterns from their own experiences. Also, the findings are consistent with our existing understanding of competing demands9,28,30 and physician communication strategies.17-19 To the extent to which midwestern physicians and patients do not reflect the ethnic and socioeconomic diversity of other parts of the country, these findings may also be limited. Future research should attempt to include diversity. Patients’ emotional distress may be communicated in other ways besides speech or may not be communicated at all, so the direct observation approach we used cannot always correctly infer patients’ unexpressed mental health needs or physicians’ assessment of the situation. Because the data were cross-sectional, it is not possible to determine what had occurred in previous visits in a longitudinal management strategy. Nevertheless, the richness of the field note data provided an excellent detailed view of a large sample of visits. Finally, the lack of a reference standard for diagnosing mental health conditions does not alter the main findings of this study—a typology of physicians’ responses to emotional distress within their practices.
In trying to understand and improve the treatment of mental health issues, many previous researchers have focused on improving physician knowledge and dissemination of guidelines; such efforts have been disappointing when used alone.31,32 Other investigators have sought to improve the interviewing skills of physicians, and while modestly successful, these studies have been limited in scope, length of follow-up, and ability to be replicated widely.18,19,33,34 Other approaches have included collaborative management and quality improvement efforts; while successful, such interventions may be difficult to replicate in the usual physician practice setting without substantial external resources.35-38
Conclusions
The chasm between ideal care of mental health disorders and actual practice may be narrower than mental health professionals would have us believe, and it is certainly bridgeable. It is possible to have better outcomes for medical conditions, improved patient and provider satisfaction, and reduced costs of care.39,40 By studying the exemplary physicians found in real world practices—as found in this study and others—we might better understand that combination of inclination, skill, and setting that promotes quality cost-effective care. We found that mental health care, while sporadically and diversely attended to in outpatient visits, is often integrated with care of the diverse medical, social, and family problems that constitute primary care. Irrespective of differences in philosophy, training, or interest, however, structural and economic issues still appear to severely limit the ability of even willing family physicians to practice coherent integrated primary care.41 It is therefore important for the field as a whole to provide feasible strategies for promoting recognition and treatment of mental health issues by diverse clinicians and patients in usual practice settings.
Acknowledgments
Our study was supported by a grant from the Agency for Healthcare Research and Quality (R01 HS08776) and a research center grant from the American Academy of Family Physicians. The authors are grateful to the physicians, staff, and patients from the 18 practices, without whose participation this study would not have been possible. We also wish to acknowledge the dedicated work of Connie Gibbs and Jen Rouse, who spent countless hours collecting data; Diane Dodendorf and Jason Lebsack, who coordinated transcription and data management; and Mary McAndrews, who transcribed hundreds of taped interviews and dictated field notes. We would also like to thank Kurt C. Stange, MD, PhD, for reviewing earlier drafts of our manuscript.
1. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.
2. Goldman LS, Nielsen NH, Champion HC, et al. Awareness, diagnosis, and treatment of depression. J Gen Intern Med 1999;14:569-80.
3. Callahan EJ, Jaen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
4. DeGruy FV. Mental healthcare in the primary care setting: a paradigm problem. Fam Syst Health 1997;15:3-23.
5. Schulberg HC, Block MR, Madonia MJ, et al. Treating major depression in primary care practice: eight-month clinical outcomes. Arch Gen Psychiatry 1996;53:913-19.
6. Coyne JC, Klinkman MS, Gallo SM, Schwenk TL. Short-term outcomes of detected and undetected depressed primary care patients and depressed psychiatry outpatients. Gen Hosp Psychiatry 1997;19:333-43.
7. DeGruy F. Mental health care in the primary care setting. Institute of Medicine Committee on the Future of Primary Care. Primary care: America’s health in a new era. Washington, DC: National Academy Press; 1996.
8. Frey J. The clinical philosophy of family medicine. Am J Med 1998;104:327-29.
9. Williams JW. Competing demands: does care for depression fit in primary care? J Gen Intern Med 1998;13:137-39.
10. Williams JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Rost K, Humphrey J, Kelleher K. Physician management p and barriers to care for rural patients with depression. Arch Fam Med 1994;3:409-14.
12. Susman JL, Crabtree BF, Essink G, et al. Depression in rural family practice: easy to recognize, difficult to diagnose. Arch Fam Med 1995;4:427-31.
13. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.
14. Klinkman MS. Competing demands in psychosocial care: a model for the identification and treatment of depressive disorders in primary care. Gen Hosp Psychiatry 1997;19:98-111.
15. Susman JL. Mental health problems within primary care: shooting first and then asking questions? J Fam Pract 1995;41:540-42.
16. Solberg L, Korsen N, Oxman T, et al. Depression care: a problem in need of a system. J Fam Pract 1999;48:973-79.
17. Roter DL, Stewart M, Putnam SM, et al. Communication patterns of primary care physicians. JAMA 1997;277:350-56.
18. Robinson JW, Roter DL. Counseling by primary care physicians of patients who disclose psychosocial problems. J Fam Pract 1999;48:698-705.
19. Robinson JW, Roter DL. Psychosocial problem disclosure by primary care patients. Soc Sci Med 1999;4899:1352-62.
20. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
21. Miller WL, Crabtree BF. The dance of interpretation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
22. Addison RB. A grounded hermeneutic editing approach. In: Crabtree BF, Miller WL, ed. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
23. Miles MB, Huberman AM. Qualitative data analysis: an expanded sourcebook. 2nd ed. Newbury Park, Calif: Sage Publications; 1994.
24. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
25. Main DS, Lutz LJ, Barrett JE, Matthew J, Miller RS. The role of primary care clinician attitudes, beliefs, and training in the diagnosis and treatment of depression: a report from the Ambulatory Sentinel Practice Network Inc. Arch Fam Med 1993;2:1061-66.
26. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
27. Nutting PA, Rost K, Smith J, et al. Competing demands from physical problems: effect on initiating and completing depression care over 6 months. Arch Fam Med 2000;9:1059-64.
28. Jaen CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
29. Tiemens BG, Ormel J, Jenner JA, et al. Training primary-care physicians to recognize, diagnose, and manage depression: does it improve patient outcomes? Psychol Med 1999;29:833-45.
30. Stange KC, Jaen CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
31. Lin EH, Katon WJ, Simon GE, et al. Achieving guidelines for the treatment of depression in primary care: is physician education enough? Med Care 1997;35:831-42.
32. Feldman EL, Jaffe A, Galambos N, et al. Clinical practice guidelines on depression: awareness, attitudes, and content knowledge among family physicians in New York. Arch Fam Med 1998;7:58-62.
33. Marvel MK, Epstein RM, Flowers K, Beckman HB. Soliciting the patient’s agenda: have we improved? JAMA 1999;281:283-87.
34. Hulsman RL, Ros WJ, Winnubst JA, et al. Teaching clinically experienced physicians communication skills: review of evaluation studies. Med Educ 1999;33:665-68.
35. Katon WM, Von Korff M, Lin E, et al. Collaborative management to achieve treatment guidelines: impact on depression in primary care. JAMA 1995;273:1026-31.
36. Wells KB, Scherbourne C, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care: a randomized controlled trial. JAMA 2000;283:212-20.
37. Brown JB, Shye D, McFarland BH, et al. Controlled trials of CQI and academic detailing to implement a clinical practice guideline for depression. J Qual Improvement 2000;26:39-54.
38. Law D, Crane D. The influence of marital and family therapy on health care utilization in a health maintenance organization. J Marital Fam Ther 2000;26:281-91.
39. Campbell TL, Franks P, Fiscella K, et al. Do physicians who diagnose more mental health disorders generate lower health care costs? J Fam Pract 2000;49:305-10.
40. Katon W. Collaborative care: patient satisfaction, outcome, and medical cost-offset. Fam Syst Med 1995;13:351-65.
41. Degruy FV. Mental health diagnoses and the costs of primary care. J Fam Pract 2000;49:311-13.
1. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.
2. Goldman LS, Nielsen NH, Champion HC, et al. Awareness, diagnosis, and treatment of depression. J Gen Intern Med 1999;14:569-80.
3. Callahan EJ, Jaen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
4. DeGruy FV. Mental healthcare in the primary care setting: a paradigm problem. Fam Syst Health 1997;15:3-23.
5. Schulberg HC, Block MR, Madonia MJ, et al. Treating major depression in primary care practice: eight-month clinical outcomes. Arch Gen Psychiatry 1996;53:913-19.
6. Coyne JC, Klinkman MS, Gallo SM, Schwenk TL. Short-term outcomes of detected and undetected depressed primary care patients and depressed psychiatry outpatients. Gen Hosp Psychiatry 1997;19:333-43.
7. DeGruy F. Mental health care in the primary care setting. Institute of Medicine Committee on the Future of Primary Care. Primary care: America’s health in a new era. Washington, DC: National Academy Press; 1996.
8. Frey J. The clinical philosophy of family medicine. Am J Med 1998;104:327-29.
9. Williams JW. Competing demands: does care for depression fit in primary care? J Gen Intern Med 1998;13:137-39.
10. Williams JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Rost K, Humphrey J, Kelleher K. Physician management p and barriers to care for rural patients with depression. Arch Fam Med 1994;3:409-14.
12. Susman JL, Crabtree BF, Essink G, et al. Depression in rural family practice: easy to recognize, difficult to diagnose. Arch Fam Med 1995;4:427-31.
13. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.
14. Klinkman MS. Competing demands in psychosocial care: a model for the identification and treatment of depressive disorders in primary care. Gen Hosp Psychiatry 1997;19:98-111.
15. Susman JL. Mental health problems within primary care: shooting first and then asking questions? J Fam Pract 1995;41:540-42.
16. Solberg L, Korsen N, Oxman T, et al. Depression care: a problem in need of a system. J Fam Pract 1999;48:973-79.
17. Roter DL, Stewart M, Putnam SM, et al. Communication patterns of primary care physicians. JAMA 1997;277:350-56.
18. Robinson JW, Roter DL. Counseling by primary care physicians of patients who disclose psychosocial problems. J Fam Pract 1999;48:698-705.
19. Robinson JW, Roter DL. Psychosocial problem disclosure by primary care patients. Soc Sci Med 1999;4899:1352-62.
20. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:881-87.
21. Miller WL, Crabtree BF. The dance of interpretation. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
22. Addison RB. A grounded hermeneutic editing approach. In: Crabtree BF, Miller WL, ed. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999.
23. Miles MB, Huberman AM. Qualitative data analysis: an expanded sourcebook. 2nd ed. Newbury Park, Calif: Sage Publications; 1994.
24. Stange KC, Miller WL, McWhinney I. Developing the knowledge base of family practice. Fam Med 2001;33:286-97.
25. Main DS, Lutz LJ, Barrett JE, Matthew J, Miller RS. The role of primary care clinician attitudes, beliefs, and training in the diagnosis and treatment of depression: a report from the Ambulatory Sentinel Practice Network Inc. Arch Fam Med 1993;2:1061-66.
26. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
27. Nutting PA, Rost K, Smith J, et al. Competing demands from physical problems: effect on initiating and completing depression care over 6 months. Arch Fam Med 2000;9:1059-64.
28. Jaen CR, Stange KC, Nutting PA. Competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
29. Tiemens BG, Ormel J, Jenner JA, et al. Training primary-care physicians to recognize, diagnose, and manage depression: does it improve patient outcomes? Psychol Med 1999;29:833-45.
30. Stange KC, Jaen CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
31. Lin EH, Katon WJ, Simon GE, et al. Achieving guidelines for the treatment of depression in primary care: is physician education enough? Med Care 1997;35:831-42.
32. Feldman EL, Jaffe A, Galambos N, et al. Clinical practice guidelines on depression: awareness, attitudes, and content knowledge among family physicians in New York. Arch Fam Med 1998;7:58-62.
33. Marvel MK, Epstein RM, Flowers K, Beckman HB. Soliciting the patient’s agenda: have we improved? JAMA 1999;281:283-87.
34. Hulsman RL, Ros WJ, Winnubst JA, et al. Teaching clinically experienced physicians communication skills: review of evaluation studies. Med Educ 1999;33:665-68.
35. Katon WM, Von Korff M, Lin E, et al. Collaborative management to achieve treatment guidelines: impact on depression in primary care. JAMA 1995;273:1026-31.
36. Wells KB, Scherbourne C, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care: a randomized controlled trial. JAMA 2000;283:212-20.
37. Brown JB, Shye D, McFarland BH, et al. Controlled trials of CQI and academic detailing to implement a clinical practice guideline for depression. J Qual Improvement 2000;26:39-54.
38. Law D, Crane D. The influence of marital and family therapy on health care utilization in a health maintenance organization. J Marital Fam Ther 2000;26:281-91.
39. Campbell TL, Franks P, Fiscella K, et al. Do physicians who diagnose more mental health disorders generate lower health care costs? J Fam Pract 2000;49:305-10.
40. Katon W. Collaborative care: patient satisfaction, outcome, and medical cost-offset. Fam Syst Med 1995;13:351-65.
41. Degruy FV. Mental health diagnoses and the costs of primary care. J Fam Pract 2000;49:311-13.
Antibiotic Use in Acute Respiratory Infections and the Ways Patients Pressure Physicians for a Prescription
STUDY DESIGN: A multimethod comparative case study was performed including descriptive field notes of outpatient visits.
POPULATION: We included patients (children and adults) and clinicians in 18 purposefully selected family practices in a midwestern state. A total of 298 outpatient visits for acute respiratory tract (ART) infections were selected for analysis from more than 1600 encounters observed.
OUTCOMES MEASURED: Unnecessary antibiotic use and patterns of physician-patient communication were measured.
RESULTS: Antibiotics were prescribed in 68% of the ART infection visits, and of those, 80% were determined to be unnecessary according to Centers for Disease Control and Prevention guidelines. Patients were observed to pressure physicians for medication. The types of patterns identified were direct request, candidate diagnosis (a diagnosis suggested by the patient), implied candidate diagnosis (a set of symptoms specifically indexing a particular diagnosis), portraying severity of illness, appealing to life-world circumstances, and previous use of antibiotics. Also, clinicians were observed to rationalize their antibiotic prescriptions by reporting medically acceptable reasons and diagnoses to patients.
CONCLUSIONS: Patients strongly influence the antibiotic prescribing of physicians by using a number of different behaviors. To decrease antibiotic use for ART infections, patients should be educated about the dangers and limited benefits of such use, and clinicians should consider appropriate responses to these different patient pressures to prescribe antibiotics.
Acute respiratory tract (ART) infections, such as common cold, bronchitis, pharyngitis, sinusitis, and otitis media, are among the most common problems seen in primary care practice.1 Unnecessary use of antibiotics for these infections is a major worldwide problem both in terms of cost2 and as a contributor to the development of antibiotic-resistant bacteria.3
Although there is some evidence that physicians misdiagnose many viral infections as bacterial,4,5 recent studies suggest that the reasons for unnecessary antibiotic prescribing are more complex, having as much or more to do with patient and physician expectations as with physicians’ diagnostic skills.6-8 These studies are limited to describing perceptions of behavior rather than actual behavior, because of their use of interview and focus group data. Consequently, we do not know what actually happens during outpatient visits for ART infections that leads to antibiotic prescribing.
Two studies by Stivers9,10 underscore the importance of directly observing what transpires during encounters with pediatric ART infection patients. Stivers’ examination of videotaped visits found that, in some cases, parental pressure for antibiotics influenced the physician’s decision to prescribe. This finding has not been replicated, however, in family practice settings, where both adults and children are seen. We used direct observation of outpatient visits to family physicians for ART infections to analyze the effects of physician-patient communication on unnecessary antibiotic prescribing. By understanding the ways these communication patterns influence prescribing behavior, practicing family physicians can develop strategies to deliver more appropriate care for ART infections.
Methods
These data were collected as part of the Prevention & Competing Demands in Primary Care Study, which was an in-depth observational study begun in October 1996 and completed in August 1999 that examined the organizational and clinical structures and process of community-based family practices. Each of 18 purposefully selected practices was studied using a multimethod comparative case study design that involved extensive direct observation of clinical encounters and office systems by field researchers who spent 4 weeks or more in each practice. Field researchers directly observed and dictated descriptions of approximately 30 patient encounters with each of the more than 50 clinicians. Details of the sampling and data collection are available elsewhere in this issue.11
Data Analysis and Interpretation
Encounters related to ART infection were identified in the database using search terms for symptoms and diagnoses including: sore throat, runny nose, congestion, cough, drainage, postnasal drainage, earache, cold, upper respiratory infection, pharyngitis, sinusitis, bronchitis, and otitis. ART infection was identified as the principal or associated diagnosis in 316 outpatient visits of a total of 1637 observed encounters; 298 had sufficiently rich data for analysis. The encounters were first coded for antibiotic use or nonuse.
Before any qualitative analysis began, visits during which antibiotics were prescribed were further characterized as appropriate or unnecessary according to guidelines by the Centers for Disease Control and Prevention (CDC) for judicious use of antibiotics for children12 and adults.Table W113* Two family physicians assigned appropriate/unnecessary codes independently. Inter-rater reliability was good (k=0.71). All disagreements were resolved by discussion.
Subsequently, the text for each outpatient visit was read independently by 2 family physicians, a medical anthropologist, a nurse, and a communication specialist. This research team discussed individual encounters as a group to identify emerging patterns of physician-patient interaction.
Results
Women made up 59% of the study sample; 64% were 16 years or older (and classified as adults). Antibiotics were prescribed in 204 of the 298 ART infection encounters (68%). Antibiotic use was unnecessary according to the CDC guidelines in 164 of these (80%). Adults were more likely than children to receive unnecessary antibiotics Table 1.
Our analysis identified 6 different types of patient behaviors that advocated for medication, particularly antibioticsTable 2. These behaviors fell into 3 broad categories: explicit requests, presentation of chief complaint, and appeals to lifeworld circumstances. Multiple pressures were noted in many encounters.
While patients occasionally made direct requests for antibiotics, they much more frequently positioned themselves indirectly for receiving antibiotic treatment by the way they presented the chief complaint. Four distinct approaches were identified: symptoms only,9 candidate diagnosis,9 implied candidate diagnosis,9 and portraying the severity and inability to shake the illness.
A second category of indirect approach used life-world circumstances10 (eg, an upcoming family vacation) or a past history with successful antibiotic treatment to formulate appeals for antibiotics in the current encounter. In those cases in which antibiotics were clearly unnecessary, physicians often rationalized their prescribing practices by finding symptoms or assigning diagnoses to justify antibiotic use. Each of these patient pressures, as well as the physician-rationalizing behavior, is illustrated with sample visits. The samples are taken directly from transcribed field notes, but the names have been altered to protect the identity of patients and clinicians.
Explicit Request
Explicit requests for antibiotics were observed in only 6% of cases (n=15). For example:
Claire asked the patient, “How are you doing?” and she said, “Well, I’m coughing up phlegm, I ache and I have chills and a sore throat.” Claire said, “You have bronchoconstriction, and 3 times a day, if you need to, you should use proventil.” The patient asked if she could have an antibiotic for her cold; cephalexin has worked in the past. Claire said that she would get her cephalexin and also some samples of an inhaler.
Presentation of the Chief Complaint
Patients frequently put pressure on the physician for treatment during the presentation of the chief complaint, the exception being the symptoms-only presentation. This is different from the other indirect pressures, which usually occurred during different parts of the medical encounter.
Symptoms-only presentation (eg, “I have a cough and a sore throat.”) In the symptoms-only approach (n=15), the patient reports his or her symptoms with little embellishment. This approach does not pressure physicians for antibiotic treatment.
Candidate diagnosis (eg, “I think I’ve got strep throat.”) In contrast, patients also presented their chief complaint to the physician by offering a candidate diagnosis (n=18). As shown in the following example, the patient responds by offering a diagnosis. This is a way of indirectly advocating for antibiotic treatment.
A 21-year-old white woman went to see Dr. Maxwell with an acute problem of congestion. Dr Maxwell said, “Well, how are you doing?” The patient said, “It sounds like bronchitis. It started about 4 days ago.”
Implied candidate diagnosis (eg, “My throat hurts; it’s red; and it has white spots.”) The implied candidate diagnosis is a hybrid of the symptoms-only and the candidate diagnosis approaches (n=48). When presenting their chief complaint, patients reported very specific symptoms that indexed a particular diagnosis. For example:
A 29-year-old woman went to see Dr Redmond with swollen glands, congestion, and white spots on her throat. When Dr Redmond and I went into the examination room, the patient had a pink paper top on, and Dr Redmond told her that her throat culture was negative.
The patient reports that she has swollen glands, congestion, and white spots on her throat. The symptoms specifically index a particular condition (strep throat). The patient’s presentation of symptoms clearly implies a diagnosis of strep throat, and the physician ordered a strep culture before seeing the patient.
Candidate diagnoses and implied candidate diagnoses delicately assert that the nature of the patient’s problem is already known. The reason for the medical visit is to seek treatment for the patient’s already known condition. When candidate and implied candidate diagnoses point to a condition the patient believes to be treatable (eg, bronchitis, strep throat, ear infection), this way of presenting the chief complaint looks directly ahead to a treatment involving a prescription for an antibiotic and thus indirectly pressures the physician to prescribe one.
Portraying the severity of one’s illness (eg, “I can’t shake this, Doc.”) The most common strategy was for patients to subtly pressure physicians for medication by portraying the severity of their condition and their inability to shake the illness on their own (n=99). For example:
The patient was sitting up on the table, and right away he told Dr Lamont, “I just can’t shake it. I feel like the back of my throat has raw hamburger hanging in it.” Dr Lamont checked the patient’s throat well, and the patient said, “This has lasted 4 days and it has been getting worse today.” Dr Lamont checked the patient’s ears, glands, and lungs. “I’m going to give you a shot of penicillin, slow release. It’s some kind of an infection. It may be a virus.”
Portraying the severity of one’s illness may not in and of itself advocate for medication; however, portrayals of the severity of one’s condition were usually accompanied by other actions implicating the need for medication. By opening the encounter with the announcement “I just can’t shake it,” the patient implies that he needs help in getting well. This subtly suggests the need for a prescription medication to alleviate his sore throat. At the end of this visit, the patient receives an antibiotic shot.
Appeals to Nonmedical Circumstances
Patients also used nonmedical circumstances to advocate for medication. These behaviors tended to occur after the problem presentation in the encounter and either centered on some important event, such as a big examination or a trip out of town (n=16), or focused on a previous positive experience with antibiotics for themselves or a family member (n=39).
Appealing to life-world circumstance (eg, “But I’m going to Disney World.”) This patient uses an upcoming trip to make an appeal to the clinician to prescribe medication:
The patient is a 33-year-old man coming in with an acute problem of a sore throat. The patient stated that he had been trying to manage this on his own, but he was taking his wife and 2 children to Disney World at the end of the week and was becoming worried that he was still going be sick and not able to enjoy a trip that they had saved so long for. He also told Dr Liam: “I know we’ll just get to Florida, and the kids will get sick, and then we’ll all be sick again. Dr. Liam said, “Well, we can have you bring them in, but then we’d be treating them for something that they haven’t gotten. Let me think about this a bit.” He does the rapid strep test, and it’s negative. Dr Liam reported the news of a negative strep test and said, “Many times we get a 50% false-negative, so I’m gonna go ahead and put you on an antibiotic and see if we can’t get you feeling better.” With this the patient said, “Well, what do you think I should do about my kids?” Dr. Liam asked if the kids were seen in this clinic, and the patient responded that they had never been seen there before. Dr Liam said, “Well, I’ll go ahead and give you a script for erythromycin in case these kids get sick down in Florida. If they do, go ahead and give them the medicine; if they don’t, throw away the prescription.”
This case is interesting because once he is treated with an antibiotic, the patient uses the same argument to make an appeal for antibiotics for his children (both of whom have never been seen by this physician).
Previous positive experience with antibiotics (eg, “I got an antibiotic for this before.”). Patients also appealed to other nonmedical contingencies to advocate for antibiotic treatment. For example:
Our next patient was a 51-year-old woman complaining of a cold and laryngitis. The doctor asked the patient about her symptoms. The patient responded, saying that she had been taking medication during the end of December for the same symptoms; they had cleared after taking antibiotics, and now they were back again.
The patient indirectly makes an appeal for antibiotic treatment by stating that she received antibiotics in the past for the same symptoms that she has now.
Patients used several variations of this approach. These included stating that another physician prescribed an antibiotic for this illness in the past; that others in the family are sick with an illness for which they received antibiotics; that they have a history of illness for which antibiotics are regularly prescribed; and that they were recently taking an antibiotic for an illness that has not improved (with the idea that an antibiotic is needed again).
Effectiveness of Patient Pressures
Physicians prescribed an antibiotic unnecessarily in 80% of the encounters in which some patient pressure was observed. They seemed able to resist certain types of pressures better than others. Unnecessary antibiotics were prescribed for a smaller percentage of implied candidate diagnoses and candidate diagnoses and for a larger percentage of direct patient requests and previous positive experiences with antibiotics Table 3.
Physicians’ Response to Prescribing an Unnecessary Antibiotic
When physicians prescribed an antibiotic unnecessarily, they often rationalized this practice by finding symptoms or assigning diagnoses that, to them, justified prescribing antibiotics. Physicians used various rationales, such as red throat or enlarged tonsils; severe, prolonged, or productive cough; yellow or green mucus; sinus tenderness on palpation; associated chronic disease; history of previous infection; and the desire to “cover” the patient “just in case.” None of these rationales are supported by evidence as correlating with bacterial infection. An example of this kind of rationalization follows:
This is a 20-year-old woman coming in with a complaint of a worsening cough. She said that her chest had a prickly, burning sensation, and it hurt to breathe. Dr Hart asked if she was able to bring anything up. She said that she really couldn’t. It was just a really terrible barky cough. Following the physician examination, the physician told the patient that her lungs basically sounded clear, but she could certainly hear some rough bronchial sounds. With this, she said. “What I think is happening here with your cold is that it is probably ending up in a bronchitis-type situation, and probably what we should do is put you on an antibiotic and order a decongestant.”
Discussion
This investigation, in agreement with the pediatric studies of Stivers,9,10 suggests that the connection between patient diagnosis and physician prescribing is highly complex, involving patient presentation and physician-patient communication as much as, if not more than, physician diagnostic skills. Also, these data suggest that physicians are better able to resist patient pressures that are framed in medical terms such as candidate diagnoses or implied candidate diagnoses but are much less able to resist pressures that are not medicalized, such as portraying severity of illness and use of life-world circumstances. Thus, it is not surprising that past interventions designed to increase physician knowledge regarding when to prescribe antibiotics have had limited success.14,15 Physicians appear to be trying to maximize patient satisfaction by giving antibiotic-seeking patients what they want. Our findings show the need to modify current thinking about the diagnostic and treatment process to reduce the use of antibiotics. Rather than thinking of these processes as physician controlled, the powerful role patients play in this interaction must be considered.
Our study has important implications for future research. From a methodologic standpoint, our findings illustrate the importance of qualitative evaluation of directly observed medical encounters. The patterns of patient behavior observed could not have been discerned using survey, interview, or focus group data.
Limitations
Because these data were collected by field researchers who were unaware that ART infection would be a focus of our study, it is possible that there were other patient symptoms and behavior related to ART infection, as well as physician behaviors related to antibiotic prescribing, that were not recorded. The data were sufficiently rich, however, to easily and reliably apply the CDC guidelines for appropriate use of antibiotics. Any unrecorded behaviors might add to, but not substantially change, our conclusions that patients indirectly pressure their physicians for treatment, and physicians respond by giving antibiotics. Studies using videotaped encounters might uncover such additional important patient and physician behaviors. Since the patient population studied was limited to a single midwestern state, it is possible that other populations with a different ethnic or racial mix might behave differently. Future research in this area should attempt to include such populations. Finally, too few encounters per physician were observed in this study to evaluate whether particular physicians were high or low prescribers (such a pattern has been reported by De Sutter and colleagues16).
Conclusions
Physicians should be educated about the subtle approaches patients use to pressure them for antibiotic treatment and should be shown techniques for responding to these pressures without prescribing antibiotics unnecessarily. Our findings also suggest the need to increase patients’ awareness both of the dangers and lack of effectiveness of using antibiotics for ART infections and of the amount of influence that patients have on antibiotic prescribing. Macfarlane and coworkers17 have shown that use of patient education materials reduces visits for ART infection. Additional approaches to decreasing patient pressure for antibiotic prescriptions are needed to diminish antibiotic overuse and its public health consequences.
Acknowledgments
Our study was funded by the Agency for Healthcare Research and Quality Grant R01 HS08776. Dr Scott is a postdoctoral fellow supported by the Health Resources and Services Administration (HRSA) PE1011 and the Agency for Healthcare Research and Quality (AHRQ) HS09788. Analysis of these data was supported by a Research Center grant from the American Academy of Family Physicians (Center for Research in Family Practice and Primary Care). Drs Jaen and Crabtree are associated with the Center for Research in Family Practicer and Primary Care, Cleveland, New Brunswick, Allentown. and San Antonio. The authors wish to thank the family physicians of Nebraska who were willing to open their practices to us. We also thank Kurt C. Stange, MD, PhD, for his thoughtful comments on drafts of this manuscript.
Related Resources
U.S. Centers for Disease Control and Prevention—Promoting Appropriate Antibiotic Use in the Community http://www.cdc.gov/antibioticresistance/tools.htm
A vast resource of of patient education resources.
1. Woodwell DA. National Ambulatory Medical Care Survey: 1996 summary. Adv Data 1997;305:1-25.
2. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-49.
3. Seaton RA, Steinke DT, Phillips G, MacDonald T, Davey PG. Community antibiotic therapy, hospitalization and subsequent respiratory tract isolation of Haemophilus influenzae resistant to amoxicillin: a nested case-control study. J Antimicrob Chemother 2000;46:307-09.
4. Hueston WJ, Eberlein C, Johnson D, Mainous AG, 3rd. Criteria used by clinicians to differentiate sinusitis from viral upper respiratory tract infection. J Fam Pract 1998;46:487-92.
5. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
6. Britten N, Ukoumunne O. The influence of patients’ hopes of receiving a prescription on doctors’ perceptions and the decision to prescribe: a questionnaire survey. BMJ 1997;315:1506-10.
7. Macfarlane J, Holmes W, Macfarlane R, Britten N. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. BMJ 1997;315:1211-14.
8. Mangione-Smith R, McGlynn EA, Elliott MN, Krogstad P, Brook RH. The relationship between perceived parental expectations and pediatrician antimicrobial prescribing behavior. Pediatrics 1999;103:711-18.
9. Stivers T. ‘Symptoms only’ versus ‘candidate diagnosis’ presentations: presenting the problem in pediatric encounters. Health Comm. In press.
10. Stivers T. Participating in decisions about treatment: overt parent pressure for antibiotic medication in pediatric encounters. Soc Sci Med. Submitted
11. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:880-87.
12. Dowell SF, Marcy SM, Phillips WR, Gerber MA, Schwartz B. Principles of judicious use of antimicrobial agents for pediatric upper respiratory tract infections. Pediatrics January 1998;101:163-65.
13. Gonzales R, Bartlett JG, Besser RE, et al. Principles of appropriate antibiotic use for treatment of acute respiratory tract infections in adults: background, specific aims, and methods. Ann Intern Med 2001;134:479-86.
14. Mainous AG, 3rd, Hueston WJ, Love MM, Evans ME, Finger R. An evaluation of statewide strategies to reduce antibiotic overuse. Fam Med 2000;32:22-29.
15. Poses RM, Cebul RD, Wigton RS. You can lead a horse to water—improving physicians’ knowledge of probabilities may not affect their decisions. Med Decis Making 1995;15:65-75.
16. De Sutter AI, De Meyere MJ, De Maeseneer JM, Peersman WP. Antibiotic prescribing in acute infections of the nose or sinuses: a matter of personal habit? Fam Pract 2001;18:209-13.
17. Macfarlane JT, Holmes WF, Macfarlane RM. Reducing reconsultations for acute lower respiratory tract illness with an information leaflet: a randomized controlled study of patients in primary care. Br J Gen Pract 1997;47:719-22.
STUDY DESIGN: A multimethod comparative case study was performed including descriptive field notes of outpatient visits.
POPULATION: We included patients (children and adults) and clinicians in 18 purposefully selected family practices in a midwestern state. A total of 298 outpatient visits for acute respiratory tract (ART) infections were selected for analysis from more than 1600 encounters observed.
OUTCOMES MEASURED: Unnecessary antibiotic use and patterns of physician-patient communication were measured.
RESULTS: Antibiotics were prescribed in 68% of the ART infection visits, and of those, 80% were determined to be unnecessary according to Centers for Disease Control and Prevention guidelines. Patients were observed to pressure physicians for medication. The types of patterns identified were direct request, candidate diagnosis (a diagnosis suggested by the patient), implied candidate diagnosis (a set of symptoms specifically indexing a particular diagnosis), portraying severity of illness, appealing to life-world circumstances, and previous use of antibiotics. Also, clinicians were observed to rationalize their antibiotic prescriptions by reporting medically acceptable reasons and diagnoses to patients.
CONCLUSIONS: Patients strongly influence the antibiotic prescribing of physicians by using a number of different behaviors. To decrease antibiotic use for ART infections, patients should be educated about the dangers and limited benefits of such use, and clinicians should consider appropriate responses to these different patient pressures to prescribe antibiotics.
Acute respiratory tract (ART) infections, such as common cold, bronchitis, pharyngitis, sinusitis, and otitis media, are among the most common problems seen in primary care practice.1 Unnecessary use of antibiotics for these infections is a major worldwide problem both in terms of cost2 and as a contributor to the development of antibiotic-resistant bacteria.3
Although there is some evidence that physicians misdiagnose many viral infections as bacterial,4,5 recent studies suggest that the reasons for unnecessary antibiotic prescribing are more complex, having as much or more to do with patient and physician expectations as with physicians’ diagnostic skills.6-8 These studies are limited to describing perceptions of behavior rather than actual behavior, because of their use of interview and focus group data. Consequently, we do not know what actually happens during outpatient visits for ART infections that leads to antibiotic prescribing.
Two studies by Stivers9,10 underscore the importance of directly observing what transpires during encounters with pediatric ART infection patients. Stivers’ examination of videotaped visits found that, in some cases, parental pressure for antibiotics influenced the physician’s decision to prescribe. This finding has not been replicated, however, in family practice settings, where both adults and children are seen. We used direct observation of outpatient visits to family physicians for ART infections to analyze the effects of physician-patient communication on unnecessary antibiotic prescribing. By understanding the ways these communication patterns influence prescribing behavior, practicing family physicians can develop strategies to deliver more appropriate care for ART infections.
Methods
These data were collected as part of the Prevention & Competing Demands in Primary Care Study, which was an in-depth observational study begun in October 1996 and completed in August 1999 that examined the organizational and clinical structures and process of community-based family practices. Each of 18 purposefully selected practices was studied using a multimethod comparative case study design that involved extensive direct observation of clinical encounters and office systems by field researchers who spent 4 weeks or more in each practice. Field researchers directly observed and dictated descriptions of approximately 30 patient encounters with each of the more than 50 clinicians. Details of the sampling and data collection are available elsewhere in this issue.11
Data Analysis and Interpretation
Encounters related to ART infection were identified in the database using search terms for symptoms and diagnoses including: sore throat, runny nose, congestion, cough, drainage, postnasal drainage, earache, cold, upper respiratory infection, pharyngitis, sinusitis, bronchitis, and otitis. ART infection was identified as the principal or associated diagnosis in 316 outpatient visits of a total of 1637 observed encounters; 298 had sufficiently rich data for analysis. The encounters were first coded for antibiotic use or nonuse.
Before any qualitative analysis began, visits during which antibiotics were prescribed were further characterized as appropriate or unnecessary according to guidelines by the Centers for Disease Control and Prevention (CDC) for judicious use of antibiotics for children12 and adults.Table W113* Two family physicians assigned appropriate/unnecessary codes independently. Inter-rater reliability was good (k=0.71). All disagreements were resolved by discussion.
Subsequently, the text for each outpatient visit was read independently by 2 family physicians, a medical anthropologist, a nurse, and a communication specialist. This research team discussed individual encounters as a group to identify emerging patterns of physician-patient interaction.
Results
Women made up 59% of the study sample; 64% were 16 years or older (and classified as adults). Antibiotics were prescribed in 204 of the 298 ART infection encounters (68%). Antibiotic use was unnecessary according to the CDC guidelines in 164 of these (80%). Adults were more likely than children to receive unnecessary antibiotics Table 1.
Our analysis identified 6 different types of patient behaviors that advocated for medication, particularly antibioticsTable 2. These behaviors fell into 3 broad categories: explicit requests, presentation of chief complaint, and appeals to lifeworld circumstances. Multiple pressures were noted in many encounters.
While patients occasionally made direct requests for antibiotics, they much more frequently positioned themselves indirectly for receiving antibiotic treatment by the way they presented the chief complaint. Four distinct approaches were identified: symptoms only,9 candidate diagnosis,9 implied candidate diagnosis,9 and portraying the severity and inability to shake the illness.
A second category of indirect approach used life-world circumstances10 (eg, an upcoming family vacation) or a past history with successful antibiotic treatment to formulate appeals for antibiotics in the current encounter. In those cases in which antibiotics were clearly unnecessary, physicians often rationalized their prescribing practices by finding symptoms or assigning diagnoses to justify antibiotic use. Each of these patient pressures, as well as the physician-rationalizing behavior, is illustrated with sample visits. The samples are taken directly from transcribed field notes, but the names have been altered to protect the identity of patients and clinicians.
Explicit Request
Explicit requests for antibiotics were observed in only 6% of cases (n=15). For example:
Claire asked the patient, “How are you doing?” and she said, “Well, I’m coughing up phlegm, I ache and I have chills and a sore throat.” Claire said, “You have bronchoconstriction, and 3 times a day, if you need to, you should use proventil.” The patient asked if she could have an antibiotic for her cold; cephalexin has worked in the past. Claire said that she would get her cephalexin and also some samples of an inhaler.
Presentation of the Chief Complaint
Patients frequently put pressure on the physician for treatment during the presentation of the chief complaint, the exception being the symptoms-only presentation. This is different from the other indirect pressures, which usually occurred during different parts of the medical encounter.
Symptoms-only presentation (eg, “I have a cough and a sore throat.”) In the symptoms-only approach (n=15), the patient reports his or her symptoms with little embellishment. This approach does not pressure physicians for antibiotic treatment.
Candidate diagnosis (eg, “I think I’ve got strep throat.”) In contrast, patients also presented their chief complaint to the physician by offering a candidate diagnosis (n=18). As shown in the following example, the patient responds by offering a diagnosis. This is a way of indirectly advocating for antibiotic treatment.
A 21-year-old white woman went to see Dr. Maxwell with an acute problem of congestion. Dr Maxwell said, “Well, how are you doing?” The patient said, “It sounds like bronchitis. It started about 4 days ago.”
Implied candidate diagnosis (eg, “My throat hurts; it’s red; and it has white spots.”) The implied candidate diagnosis is a hybrid of the symptoms-only and the candidate diagnosis approaches (n=48). When presenting their chief complaint, patients reported very specific symptoms that indexed a particular diagnosis. For example:
A 29-year-old woman went to see Dr Redmond with swollen glands, congestion, and white spots on her throat. When Dr Redmond and I went into the examination room, the patient had a pink paper top on, and Dr Redmond told her that her throat culture was negative.
The patient reports that she has swollen glands, congestion, and white spots on her throat. The symptoms specifically index a particular condition (strep throat). The patient’s presentation of symptoms clearly implies a diagnosis of strep throat, and the physician ordered a strep culture before seeing the patient.
Candidate diagnoses and implied candidate diagnoses delicately assert that the nature of the patient’s problem is already known. The reason for the medical visit is to seek treatment for the patient’s already known condition. When candidate and implied candidate diagnoses point to a condition the patient believes to be treatable (eg, bronchitis, strep throat, ear infection), this way of presenting the chief complaint looks directly ahead to a treatment involving a prescription for an antibiotic and thus indirectly pressures the physician to prescribe one.
Portraying the severity of one’s illness (eg, “I can’t shake this, Doc.”) The most common strategy was for patients to subtly pressure physicians for medication by portraying the severity of their condition and their inability to shake the illness on their own (n=99). For example:
The patient was sitting up on the table, and right away he told Dr Lamont, “I just can’t shake it. I feel like the back of my throat has raw hamburger hanging in it.” Dr Lamont checked the patient’s throat well, and the patient said, “This has lasted 4 days and it has been getting worse today.” Dr Lamont checked the patient’s ears, glands, and lungs. “I’m going to give you a shot of penicillin, slow release. It’s some kind of an infection. It may be a virus.”
Portraying the severity of one’s illness may not in and of itself advocate for medication; however, portrayals of the severity of one’s condition were usually accompanied by other actions implicating the need for medication. By opening the encounter with the announcement “I just can’t shake it,” the patient implies that he needs help in getting well. This subtly suggests the need for a prescription medication to alleviate his sore throat. At the end of this visit, the patient receives an antibiotic shot.
Appeals to Nonmedical Circumstances
Patients also used nonmedical circumstances to advocate for medication. These behaviors tended to occur after the problem presentation in the encounter and either centered on some important event, such as a big examination or a trip out of town (n=16), or focused on a previous positive experience with antibiotics for themselves or a family member (n=39).
Appealing to life-world circumstance (eg, “But I’m going to Disney World.”) This patient uses an upcoming trip to make an appeal to the clinician to prescribe medication:
The patient is a 33-year-old man coming in with an acute problem of a sore throat. The patient stated that he had been trying to manage this on his own, but he was taking his wife and 2 children to Disney World at the end of the week and was becoming worried that he was still going be sick and not able to enjoy a trip that they had saved so long for. He also told Dr Liam: “I know we’ll just get to Florida, and the kids will get sick, and then we’ll all be sick again. Dr. Liam said, “Well, we can have you bring them in, but then we’d be treating them for something that they haven’t gotten. Let me think about this a bit.” He does the rapid strep test, and it’s negative. Dr Liam reported the news of a negative strep test and said, “Many times we get a 50% false-negative, so I’m gonna go ahead and put you on an antibiotic and see if we can’t get you feeling better.” With this the patient said, “Well, what do you think I should do about my kids?” Dr. Liam asked if the kids were seen in this clinic, and the patient responded that they had never been seen there before. Dr Liam said, “Well, I’ll go ahead and give you a script for erythromycin in case these kids get sick down in Florida. If they do, go ahead and give them the medicine; if they don’t, throw away the prescription.”
This case is interesting because once he is treated with an antibiotic, the patient uses the same argument to make an appeal for antibiotics for his children (both of whom have never been seen by this physician).
Previous positive experience with antibiotics (eg, “I got an antibiotic for this before.”). Patients also appealed to other nonmedical contingencies to advocate for antibiotic treatment. For example:
Our next patient was a 51-year-old woman complaining of a cold and laryngitis. The doctor asked the patient about her symptoms. The patient responded, saying that she had been taking medication during the end of December for the same symptoms; they had cleared after taking antibiotics, and now they were back again.
The patient indirectly makes an appeal for antibiotic treatment by stating that she received antibiotics in the past for the same symptoms that she has now.
Patients used several variations of this approach. These included stating that another physician prescribed an antibiotic for this illness in the past; that others in the family are sick with an illness for which they received antibiotics; that they have a history of illness for which antibiotics are regularly prescribed; and that they were recently taking an antibiotic for an illness that has not improved (with the idea that an antibiotic is needed again).
Effectiveness of Patient Pressures
Physicians prescribed an antibiotic unnecessarily in 80% of the encounters in which some patient pressure was observed. They seemed able to resist certain types of pressures better than others. Unnecessary antibiotics were prescribed for a smaller percentage of implied candidate diagnoses and candidate diagnoses and for a larger percentage of direct patient requests and previous positive experiences with antibiotics Table 3.
Physicians’ Response to Prescribing an Unnecessary Antibiotic
When physicians prescribed an antibiotic unnecessarily, they often rationalized this practice by finding symptoms or assigning diagnoses that, to them, justified prescribing antibiotics. Physicians used various rationales, such as red throat or enlarged tonsils; severe, prolonged, or productive cough; yellow or green mucus; sinus tenderness on palpation; associated chronic disease; history of previous infection; and the desire to “cover” the patient “just in case.” None of these rationales are supported by evidence as correlating with bacterial infection. An example of this kind of rationalization follows:
This is a 20-year-old woman coming in with a complaint of a worsening cough. She said that her chest had a prickly, burning sensation, and it hurt to breathe. Dr Hart asked if she was able to bring anything up. She said that she really couldn’t. It was just a really terrible barky cough. Following the physician examination, the physician told the patient that her lungs basically sounded clear, but she could certainly hear some rough bronchial sounds. With this, she said. “What I think is happening here with your cold is that it is probably ending up in a bronchitis-type situation, and probably what we should do is put you on an antibiotic and order a decongestant.”
Discussion
This investigation, in agreement with the pediatric studies of Stivers,9,10 suggests that the connection between patient diagnosis and physician prescribing is highly complex, involving patient presentation and physician-patient communication as much as, if not more than, physician diagnostic skills. Also, these data suggest that physicians are better able to resist patient pressures that are framed in medical terms such as candidate diagnoses or implied candidate diagnoses but are much less able to resist pressures that are not medicalized, such as portraying severity of illness and use of life-world circumstances. Thus, it is not surprising that past interventions designed to increase physician knowledge regarding when to prescribe antibiotics have had limited success.14,15 Physicians appear to be trying to maximize patient satisfaction by giving antibiotic-seeking patients what they want. Our findings show the need to modify current thinking about the diagnostic and treatment process to reduce the use of antibiotics. Rather than thinking of these processes as physician controlled, the powerful role patients play in this interaction must be considered.
Our study has important implications for future research. From a methodologic standpoint, our findings illustrate the importance of qualitative evaluation of directly observed medical encounters. The patterns of patient behavior observed could not have been discerned using survey, interview, or focus group data.
Limitations
Because these data were collected by field researchers who were unaware that ART infection would be a focus of our study, it is possible that there were other patient symptoms and behavior related to ART infection, as well as physician behaviors related to antibiotic prescribing, that were not recorded. The data were sufficiently rich, however, to easily and reliably apply the CDC guidelines for appropriate use of antibiotics. Any unrecorded behaviors might add to, but not substantially change, our conclusions that patients indirectly pressure their physicians for treatment, and physicians respond by giving antibiotics. Studies using videotaped encounters might uncover such additional important patient and physician behaviors. Since the patient population studied was limited to a single midwestern state, it is possible that other populations with a different ethnic or racial mix might behave differently. Future research in this area should attempt to include such populations. Finally, too few encounters per physician were observed in this study to evaluate whether particular physicians were high or low prescribers (such a pattern has been reported by De Sutter and colleagues16).
Conclusions
Physicians should be educated about the subtle approaches patients use to pressure them for antibiotic treatment and should be shown techniques for responding to these pressures without prescribing antibiotics unnecessarily. Our findings also suggest the need to increase patients’ awareness both of the dangers and lack of effectiveness of using antibiotics for ART infections and of the amount of influence that patients have on antibiotic prescribing. Macfarlane and coworkers17 have shown that use of patient education materials reduces visits for ART infection. Additional approaches to decreasing patient pressure for antibiotic prescriptions are needed to diminish antibiotic overuse and its public health consequences.
Acknowledgments
Our study was funded by the Agency for Healthcare Research and Quality Grant R01 HS08776. Dr Scott is a postdoctoral fellow supported by the Health Resources and Services Administration (HRSA) PE1011 and the Agency for Healthcare Research and Quality (AHRQ) HS09788. Analysis of these data was supported by a Research Center grant from the American Academy of Family Physicians (Center for Research in Family Practice and Primary Care). Drs Jaen and Crabtree are associated with the Center for Research in Family Practicer and Primary Care, Cleveland, New Brunswick, Allentown. and San Antonio. The authors wish to thank the family physicians of Nebraska who were willing to open their practices to us. We also thank Kurt C. Stange, MD, PhD, for his thoughtful comments on drafts of this manuscript.
Related Resources
U.S. Centers for Disease Control and Prevention—Promoting Appropriate Antibiotic Use in the Community http://www.cdc.gov/antibioticresistance/tools.htm
A vast resource of of patient education resources.
STUDY DESIGN: A multimethod comparative case study was performed including descriptive field notes of outpatient visits.
POPULATION: We included patients (children and adults) and clinicians in 18 purposefully selected family practices in a midwestern state. A total of 298 outpatient visits for acute respiratory tract (ART) infections were selected for analysis from more than 1600 encounters observed.
OUTCOMES MEASURED: Unnecessary antibiotic use and patterns of physician-patient communication were measured.
RESULTS: Antibiotics were prescribed in 68% of the ART infection visits, and of those, 80% were determined to be unnecessary according to Centers for Disease Control and Prevention guidelines. Patients were observed to pressure physicians for medication. The types of patterns identified were direct request, candidate diagnosis (a diagnosis suggested by the patient), implied candidate diagnosis (a set of symptoms specifically indexing a particular diagnosis), portraying severity of illness, appealing to life-world circumstances, and previous use of antibiotics. Also, clinicians were observed to rationalize their antibiotic prescriptions by reporting medically acceptable reasons and diagnoses to patients.
CONCLUSIONS: Patients strongly influence the antibiotic prescribing of physicians by using a number of different behaviors. To decrease antibiotic use for ART infections, patients should be educated about the dangers and limited benefits of such use, and clinicians should consider appropriate responses to these different patient pressures to prescribe antibiotics.
Acute respiratory tract (ART) infections, such as common cold, bronchitis, pharyngitis, sinusitis, and otitis media, are among the most common problems seen in primary care practice.1 Unnecessary use of antibiotics for these infections is a major worldwide problem both in terms of cost2 and as a contributor to the development of antibiotic-resistant bacteria.3
Although there is some evidence that physicians misdiagnose many viral infections as bacterial,4,5 recent studies suggest that the reasons for unnecessary antibiotic prescribing are more complex, having as much or more to do with patient and physician expectations as with physicians’ diagnostic skills.6-8 These studies are limited to describing perceptions of behavior rather than actual behavior, because of their use of interview and focus group data. Consequently, we do not know what actually happens during outpatient visits for ART infections that leads to antibiotic prescribing.
Two studies by Stivers9,10 underscore the importance of directly observing what transpires during encounters with pediatric ART infection patients. Stivers’ examination of videotaped visits found that, in some cases, parental pressure for antibiotics influenced the physician’s decision to prescribe. This finding has not been replicated, however, in family practice settings, where both adults and children are seen. We used direct observation of outpatient visits to family physicians for ART infections to analyze the effects of physician-patient communication on unnecessary antibiotic prescribing. By understanding the ways these communication patterns influence prescribing behavior, practicing family physicians can develop strategies to deliver more appropriate care for ART infections.
Methods
These data were collected as part of the Prevention & Competing Demands in Primary Care Study, which was an in-depth observational study begun in October 1996 and completed in August 1999 that examined the organizational and clinical structures and process of community-based family practices. Each of 18 purposefully selected practices was studied using a multimethod comparative case study design that involved extensive direct observation of clinical encounters and office systems by field researchers who spent 4 weeks or more in each practice. Field researchers directly observed and dictated descriptions of approximately 30 patient encounters with each of the more than 50 clinicians. Details of the sampling and data collection are available elsewhere in this issue.11
Data Analysis and Interpretation
Encounters related to ART infection were identified in the database using search terms for symptoms and diagnoses including: sore throat, runny nose, congestion, cough, drainage, postnasal drainage, earache, cold, upper respiratory infection, pharyngitis, sinusitis, bronchitis, and otitis. ART infection was identified as the principal or associated diagnosis in 316 outpatient visits of a total of 1637 observed encounters; 298 had sufficiently rich data for analysis. The encounters were first coded for antibiotic use or nonuse.
Before any qualitative analysis began, visits during which antibiotics were prescribed were further characterized as appropriate or unnecessary according to guidelines by the Centers for Disease Control and Prevention (CDC) for judicious use of antibiotics for children12 and adults.Table W113* Two family physicians assigned appropriate/unnecessary codes independently. Inter-rater reliability was good (k=0.71). All disagreements were resolved by discussion.
Subsequently, the text for each outpatient visit was read independently by 2 family physicians, a medical anthropologist, a nurse, and a communication specialist. This research team discussed individual encounters as a group to identify emerging patterns of physician-patient interaction.
Results
Women made up 59% of the study sample; 64% were 16 years or older (and classified as adults). Antibiotics were prescribed in 204 of the 298 ART infection encounters (68%). Antibiotic use was unnecessary according to the CDC guidelines in 164 of these (80%). Adults were more likely than children to receive unnecessary antibiotics Table 1.
Our analysis identified 6 different types of patient behaviors that advocated for medication, particularly antibioticsTable 2. These behaviors fell into 3 broad categories: explicit requests, presentation of chief complaint, and appeals to lifeworld circumstances. Multiple pressures were noted in many encounters.
While patients occasionally made direct requests for antibiotics, they much more frequently positioned themselves indirectly for receiving antibiotic treatment by the way they presented the chief complaint. Four distinct approaches were identified: symptoms only,9 candidate diagnosis,9 implied candidate diagnosis,9 and portraying the severity and inability to shake the illness.
A second category of indirect approach used life-world circumstances10 (eg, an upcoming family vacation) or a past history with successful antibiotic treatment to formulate appeals for antibiotics in the current encounter. In those cases in which antibiotics were clearly unnecessary, physicians often rationalized their prescribing practices by finding symptoms or assigning diagnoses to justify antibiotic use. Each of these patient pressures, as well as the physician-rationalizing behavior, is illustrated with sample visits. The samples are taken directly from transcribed field notes, but the names have been altered to protect the identity of patients and clinicians.
Explicit Request
Explicit requests for antibiotics were observed in only 6% of cases (n=15). For example:
Claire asked the patient, “How are you doing?” and she said, “Well, I’m coughing up phlegm, I ache and I have chills and a sore throat.” Claire said, “You have bronchoconstriction, and 3 times a day, if you need to, you should use proventil.” The patient asked if she could have an antibiotic for her cold; cephalexin has worked in the past. Claire said that she would get her cephalexin and also some samples of an inhaler.
Presentation of the Chief Complaint
Patients frequently put pressure on the physician for treatment during the presentation of the chief complaint, the exception being the symptoms-only presentation. This is different from the other indirect pressures, which usually occurred during different parts of the medical encounter.
Symptoms-only presentation (eg, “I have a cough and a sore throat.”) In the symptoms-only approach (n=15), the patient reports his or her symptoms with little embellishment. This approach does not pressure physicians for antibiotic treatment.
Candidate diagnosis (eg, “I think I’ve got strep throat.”) In contrast, patients also presented their chief complaint to the physician by offering a candidate diagnosis (n=18). As shown in the following example, the patient responds by offering a diagnosis. This is a way of indirectly advocating for antibiotic treatment.
A 21-year-old white woman went to see Dr. Maxwell with an acute problem of congestion. Dr Maxwell said, “Well, how are you doing?” The patient said, “It sounds like bronchitis. It started about 4 days ago.”
Implied candidate diagnosis (eg, “My throat hurts; it’s red; and it has white spots.”) The implied candidate diagnosis is a hybrid of the symptoms-only and the candidate diagnosis approaches (n=48). When presenting their chief complaint, patients reported very specific symptoms that indexed a particular diagnosis. For example:
A 29-year-old woman went to see Dr Redmond with swollen glands, congestion, and white spots on her throat. When Dr Redmond and I went into the examination room, the patient had a pink paper top on, and Dr Redmond told her that her throat culture was negative.
The patient reports that she has swollen glands, congestion, and white spots on her throat. The symptoms specifically index a particular condition (strep throat). The patient’s presentation of symptoms clearly implies a diagnosis of strep throat, and the physician ordered a strep culture before seeing the patient.
Candidate diagnoses and implied candidate diagnoses delicately assert that the nature of the patient’s problem is already known. The reason for the medical visit is to seek treatment for the patient’s already known condition. When candidate and implied candidate diagnoses point to a condition the patient believes to be treatable (eg, bronchitis, strep throat, ear infection), this way of presenting the chief complaint looks directly ahead to a treatment involving a prescription for an antibiotic and thus indirectly pressures the physician to prescribe one.
Portraying the severity of one’s illness (eg, “I can’t shake this, Doc.”) The most common strategy was for patients to subtly pressure physicians for medication by portraying the severity of their condition and their inability to shake the illness on their own (n=99). For example:
The patient was sitting up on the table, and right away he told Dr Lamont, “I just can’t shake it. I feel like the back of my throat has raw hamburger hanging in it.” Dr Lamont checked the patient’s throat well, and the patient said, “This has lasted 4 days and it has been getting worse today.” Dr Lamont checked the patient’s ears, glands, and lungs. “I’m going to give you a shot of penicillin, slow release. It’s some kind of an infection. It may be a virus.”
Portraying the severity of one’s illness may not in and of itself advocate for medication; however, portrayals of the severity of one’s condition were usually accompanied by other actions implicating the need for medication. By opening the encounter with the announcement “I just can’t shake it,” the patient implies that he needs help in getting well. This subtly suggests the need for a prescription medication to alleviate his sore throat. At the end of this visit, the patient receives an antibiotic shot.
Appeals to Nonmedical Circumstances
Patients also used nonmedical circumstances to advocate for medication. These behaviors tended to occur after the problem presentation in the encounter and either centered on some important event, such as a big examination or a trip out of town (n=16), or focused on a previous positive experience with antibiotics for themselves or a family member (n=39).
Appealing to life-world circumstance (eg, “But I’m going to Disney World.”) This patient uses an upcoming trip to make an appeal to the clinician to prescribe medication:
The patient is a 33-year-old man coming in with an acute problem of a sore throat. The patient stated that he had been trying to manage this on his own, but he was taking his wife and 2 children to Disney World at the end of the week and was becoming worried that he was still going be sick and not able to enjoy a trip that they had saved so long for. He also told Dr Liam: “I know we’ll just get to Florida, and the kids will get sick, and then we’ll all be sick again. Dr. Liam said, “Well, we can have you bring them in, but then we’d be treating them for something that they haven’t gotten. Let me think about this a bit.” He does the rapid strep test, and it’s negative. Dr Liam reported the news of a negative strep test and said, “Many times we get a 50% false-negative, so I’m gonna go ahead and put you on an antibiotic and see if we can’t get you feeling better.” With this the patient said, “Well, what do you think I should do about my kids?” Dr. Liam asked if the kids were seen in this clinic, and the patient responded that they had never been seen there before. Dr Liam said, “Well, I’ll go ahead and give you a script for erythromycin in case these kids get sick down in Florida. If they do, go ahead and give them the medicine; if they don’t, throw away the prescription.”
This case is interesting because once he is treated with an antibiotic, the patient uses the same argument to make an appeal for antibiotics for his children (both of whom have never been seen by this physician).
Previous positive experience with antibiotics (eg, “I got an antibiotic for this before.”). Patients also appealed to other nonmedical contingencies to advocate for antibiotic treatment. For example:
Our next patient was a 51-year-old woman complaining of a cold and laryngitis. The doctor asked the patient about her symptoms. The patient responded, saying that she had been taking medication during the end of December for the same symptoms; they had cleared after taking antibiotics, and now they were back again.
The patient indirectly makes an appeal for antibiotic treatment by stating that she received antibiotics in the past for the same symptoms that she has now.
Patients used several variations of this approach. These included stating that another physician prescribed an antibiotic for this illness in the past; that others in the family are sick with an illness for which they received antibiotics; that they have a history of illness for which antibiotics are regularly prescribed; and that they were recently taking an antibiotic for an illness that has not improved (with the idea that an antibiotic is needed again).
Effectiveness of Patient Pressures
Physicians prescribed an antibiotic unnecessarily in 80% of the encounters in which some patient pressure was observed. They seemed able to resist certain types of pressures better than others. Unnecessary antibiotics were prescribed for a smaller percentage of implied candidate diagnoses and candidate diagnoses and for a larger percentage of direct patient requests and previous positive experiences with antibiotics Table 3.
Physicians’ Response to Prescribing an Unnecessary Antibiotic
When physicians prescribed an antibiotic unnecessarily, they often rationalized this practice by finding symptoms or assigning diagnoses that, to them, justified prescribing antibiotics. Physicians used various rationales, such as red throat or enlarged tonsils; severe, prolonged, or productive cough; yellow or green mucus; sinus tenderness on palpation; associated chronic disease; history of previous infection; and the desire to “cover” the patient “just in case.” None of these rationales are supported by evidence as correlating with bacterial infection. An example of this kind of rationalization follows:
This is a 20-year-old woman coming in with a complaint of a worsening cough. She said that her chest had a prickly, burning sensation, and it hurt to breathe. Dr Hart asked if she was able to bring anything up. She said that she really couldn’t. It was just a really terrible barky cough. Following the physician examination, the physician told the patient that her lungs basically sounded clear, but she could certainly hear some rough bronchial sounds. With this, she said. “What I think is happening here with your cold is that it is probably ending up in a bronchitis-type situation, and probably what we should do is put you on an antibiotic and order a decongestant.”
Discussion
This investigation, in agreement with the pediatric studies of Stivers,9,10 suggests that the connection between patient diagnosis and physician prescribing is highly complex, involving patient presentation and physician-patient communication as much as, if not more than, physician diagnostic skills. Also, these data suggest that physicians are better able to resist patient pressures that are framed in medical terms such as candidate diagnoses or implied candidate diagnoses but are much less able to resist pressures that are not medicalized, such as portraying severity of illness and use of life-world circumstances. Thus, it is not surprising that past interventions designed to increase physician knowledge regarding when to prescribe antibiotics have had limited success.14,15 Physicians appear to be trying to maximize patient satisfaction by giving antibiotic-seeking patients what they want. Our findings show the need to modify current thinking about the diagnostic and treatment process to reduce the use of antibiotics. Rather than thinking of these processes as physician controlled, the powerful role patients play in this interaction must be considered.
Our study has important implications for future research. From a methodologic standpoint, our findings illustrate the importance of qualitative evaluation of directly observed medical encounters. The patterns of patient behavior observed could not have been discerned using survey, interview, or focus group data.
Limitations
Because these data were collected by field researchers who were unaware that ART infection would be a focus of our study, it is possible that there were other patient symptoms and behavior related to ART infection, as well as physician behaviors related to antibiotic prescribing, that were not recorded. The data were sufficiently rich, however, to easily and reliably apply the CDC guidelines for appropriate use of antibiotics. Any unrecorded behaviors might add to, but not substantially change, our conclusions that patients indirectly pressure their physicians for treatment, and physicians respond by giving antibiotics. Studies using videotaped encounters might uncover such additional important patient and physician behaviors. Since the patient population studied was limited to a single midwestern state, it is possible that other populations with a different ethnic or racial mix might behave differently. Future research in this area should attempt to include such populations. Finally, too few encounters per physician were observed in this study to evaluate whether particular physicians were high or low prescribers (such a pattern has been reported by De Sutter and colleagues16).
Conclusions
Physicians should be educated about the subtle approaches patients use to pressure them for antibiotic treatment and should be shown techniques for responding to these pressures without prescribing antibiotics unnecessarily. Our findings also suggest the need to increase patients’ awareness both of the dangers and lack of effectiveness of using antibiotics for ART infections and of the amount of influence that patients have on antibiotic prescribing. Macfarlane and coworkers17 have shown that use of patient education materials reduces visits for ART infection. Additional approaches to decreasing patient pressure for antibiotic prescriptions are needed to diminish antibiotic overuse and its public health consequences.
Acknowledgments
Our study was funded by the Agency for Healthcare Research and Quality Grant R01 HS08776. Dr Scott is a postdoctoral fellow supported by the Health Resources and Services Administration (HRSA) PE1011 and the Agency for Healthcare Research and Quality (AHRQ) HS09788. Analysis of these data was supported by a Research Center grant from the American Academy of Family Physicians (Center for Research in Family Practice and Primary Care). Drs Jaen and Crabtree are associated with the Center for Research in Family Practicer and Primary Care, Cleveland, New Brunswick, Allentown. and San Antonio. The authors wish to thank the family physicians of Nebraska who were willing to open their practices to us. We also thank Kurt C. Stange, MD, PhD, for his thoughtful comments on drafts of this manuscript.
Related Resources
U.S. Centers for Disease Control and Prevention—Promoting Appropriate Antibiotic Use in the Community http://www.cdc.gov/antibioticresistance/tools.htm
A vast resource of of patient education resources.
1. Woodwell DA. National Ambulatory Medical Care Survey: 1996 summary. Adv Data 1997;305:1-25.
2. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-49.
3. Seaton RA, Steinke DT, Phillips G, MacDonald T, Davey PG. Community antibiotic therapy, hospitalization and subsequent respiratory tract isolation of Haemophilus influenzae resistant to amoxicillin: a nested case-control study. J Antimicrob Chemother 2000;46:307-09.
4. Hueston WJ, Eberlein C, Johnson D, Mainous AG, 3rd. Criteria used by clinicians to differentiate sinusitis from viral upper respiratory tract infection. J Fam Pract 1998;46:487-92.
5. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
6. Britten N, Ukoumunne O. The influence of patients’ hopes of receiving a prescription on doctors’ perceptions and the decision to prescribe: a questionnaire survey. BMJ 1997;315:1506-10.
7. Macfarlane J, Holmes W, Macfarlane R, Britten N. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. BMJ 1997;315:1211-14.
8. Mangione-Smith R, McGlynn EA, Elliott MN, Krogstad P, Brook RH. The relationship between perceived parental expectations and pediatrician antimicrobial prescribing behavior. Pediatrics 1999;103:711-18.
9. Stivers T. ‘Symptoms only’ versus ‘candidate diagnosis’ presentations: presenting the problem in pediatric encounters. Health Comm. In press.
10. Stivers T. Participating in decisions about treatment: overt parent pressure for antibiotic medication in pediatric encounters. Soc Sci Med. Submitted
11. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:880-87.
12. Dowell SF, Marcy SM, Phillips WR, Gerber MA, Schwartz B. Principles of judicious use of antimicrobial agents for pediatric upper respiratory tract infections. Pediatrics January 1998;101:163-65.
13. Gonzales R, Bartlett JG, Besser RE, et al. Principles of appropriate antibiotic use for treatment of acute respiratory tract infections in adults: background, specific aims, and methods. Ann Intern Med 2001;134:479-86.
14. Mainous AG, 3rd, Hueston WJ, Love MM, Evans ME, Finger R. An evaluation of statewide strategies to reduce antibiotic overuse. Fam Med 2000;32:22-29.
15. Poses RM, Cebul RD, Wigton RS. You can lead a horse to water—improving physicians’ knowledge of probabilities may not affect their decisions. Med Decis Making 1995;15:65-75.
16. De Sutter AI, De Meyere MJ, De Maeseneer JM, Peersman WP. Antibiotic prescribing in acute infections of the nose or sinuses: a matter of personal habit? Fam Pract 2001;18:209-13.
17. Macfarlane JT, Holmes WF, Macfarlane RM. Reducing reconsultations for acute lower respiratory tract illness with an information leaflet: a randomized controlled study of patients in primary care. Br J Gen Pract 1997;47:719-22.
1. Woodwell DA. National Ambulatory Medical Care Survey: 1996 summary. Adv Data 1997;305:1-25.
2. Mainous AG, 3rd, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a Medicaid population. Arch Fam Med 1998;7:45-49.
3. Seaton RA, Steinke DT, Phillips G, MacDonald T, Davey PG. Community antibiotic therapy, hospitalization and subsequent respiratory tract isolation of Haemophilus influenzae resistant to amoxicillin: a nested case-control study. J Antimicrob Chemother 2000;46:307-09.
4. Hueston WJ, Eberlein C, Johnson D, Mainous AG, 3rd. Criteria used by clinicians to differentiate sinusitis from viral upper respiratory tract infection. J Fam Pract 1998;46:487-92.
5. Oeffinger KC, Snell LM, Foster BM, Panico KG, Archer RK. Diagnosis of acute bronchitis in adults: a national survey of family physicians. J Fam Pract 1997;45:402-09.
6. Britten N, Ukoumunne O. The influence of patients’ hopes of receiving a prescription on doctors’ perceptions and the decision to prescribe: a questionnaire survey. BMJ 1997;315:1506-10.
7. Macfarlane J, Holmes W, Macfarlane R, Britten N. Influence of patients’ expectations on antibiotic management of acute lower respiratory tract illness in general practice: questionnaire study. BMJ 1997;315:1211-14.
8. Mangione-Smith R, McGlynn EA, Elliott MN, Krogstad P, Brook RH. The relationship between perceived parental expectations and pediatrician antimicrobial prescribing behavior. Pediatrics 1999;103:711-18.
9. Stivers T. ‘Symptoms only’ versus ‘candidate diagnosis’ presentations: presenting the problem in pediatric encounters. Health Comm. In press.
10. Stivers T. Participating in decisions about treatment: overt parent pressure for antibiotic medication in pediatric encounters. Soc Sci Med. Submitted
11. Crabtree BF, Miller WL, Stange KC. Understanding practice from the ground up. J Fam Pract 2001;50:880-87.
12. Dowell SF, Marcy SM, Phillips WR, Gerber MA, Schwartz B. Principles of judicious use of antimicrobial agents for pediatric upper respiratory tract infections. Pediatrics January 1998;101:163-65.
13. Gonzales R, Bartlett JG, Besser RE, et al. Principles of appropriate antibiotic use for treatment of acute respiratory tract infections in adults: background, specific aims, and methods. Ann Intern Med 2001;134:479-86.
14. Mainous AG, 3rd, Hueston WJ, Love MM, Evans ME, Finger R. An evaluation of statewide strategies to reduce antibiotic overuse. Fam Med 2000;32:22-29.
15. Poses RM, Cebul RD, Wigton RS. You can lead a horse to water—improving physicians’ knowledge of probabilities may not affect their decisions. Med Decis Making 1995;15:65-75.
16. De Sutter AI, De Meyere MJ, De Maeseneer JM, Peersman WP. Antibiotic prescribing in acute infections of the nose or sinuses: a matter of personal habit? Fam Pract 2001;18:209-13.
17. Macfarlane JT, Holmes WF, Macfarlane RM. Reducing reconsultations for acute lower respiratory tract illness with an information leaflet: a randomized controlled study of patients in primary care. Br J Gen Pract 1997;47:719-22.