User login
Sleeping Position: Change in Practice, Advice, and Opinion in the Newborn Nursery
STUDY DESIGN: We conducted telephone interviews with the head nurses in all of the newborn nurseries in Missouri.
POPULATION: In 1992 there were 79 hospitals in Missouri with newborn nurseries; in 1999 that number had decreased to 75.
OUTCOMES MEASURED: During the interviews, we solicited nursery infant sleep position policy and practice, head nurses’ opinions about the supine sleep recommendation, and nurses’ advice to parents regarding sleep position.
RESULTS: In 1992, 32% of the nurseries used the prone position for sleep, and 58% of the head nurses interviewed disagreed with the recommendations of the American Academy of Pediatrics (AAP). By 1999, all newborn nurseries in Missouri placed infants on their backs or sides for sleep. The rate of disagreement with the AAP recommendation had decreased, with 25% of respondents indicating that they disagreed.
CONCLUSIONS: From 1992 to 1999 nurseries in Missouri have changed from predominantly using prone and lateral positioning to lateral and supine positioning for newborns. Some nurses continue to voice concern about placing infants on their backs and expressed a willingness to place babies prone. Since there is agreement between nurses’ usual infant positioning and the advice given to parents, and because both are important influences on infant positioning by parents, future campaigns to decrease SIDS should emphasize correcting nurses’ positioning behavior and advising parents to increase infant supine positioning.
Sudden infant death syndrome (SIDS) is the leading cause of postneonatal infant mortality in the United States, accounting for approximately one third of all such deaths.1 Between 1992 and 1996 the rate of SIDS deaths in the United States declined from 1.2 per 1000 live births to 0.74 per 1000 live births, and this decline accounted for 75% of the decline in the postneonatal infant death rate.2
By 1990 a strong association between the prone sleeping position and SIDS had been established,3 and in 1992 the American Academy of Pediatrics (AAP) Task Force on Infant Positioning and SIDS recommended that all full-term infants be placed in the lateral or supine position for sleep.4,5 The Back to Sleep campaign was launched in 1994 by the US Department of Health and Human Services and other partners to help disseminate the message that back sleeping can reduce the risk of SIDS and save lives.2,6 Studies in other countries indicated that SIDS rates declined concurrent with decreases in the prevalence of prone sleeping.3 Between 1992 and 1995 the SIDS rate in the United States declined 30%, while the prevalence of prone sleeping decreased from 78% to 43%.1 Since the initial AAP recommendations, additional evidence has accumulated supporting supine or side positioning7,8 and more recently that side positioning carries a higher risk for SIDS than supine.7
The recommendation of a health care professional and observation of sleep position in the hospital have both been shown to be important determinants in parents’ decision making about infant sleep position.2,9,10 Although there are several studies of sleep positioning by caregivers and of sleep position recommendations by health care providers, there is only one study of nursery nursing staff regarding infant sleep position.2,9,11,12 The National Institute of Child Health and Human Development has conducted a survey of health professionals regarding infant sleep position since 1993, but the results have not been published.9 Our study was initiated to assess the infant sleep position policies and practices of newborn nurseries in Missouri and evaluate nursery staff opinions of the AAP recommendation shortly after the recommendation was released in 1992 and again in 1999.
Methods
Study Design
A nurse interviewer conducted a telephone survey of all newborn nurseries in Missouri in 1992 and 1999. The Missouri Department of Health hospital profile database was used to identify hospitals with newborn nurseries before both surveys.13 An experienced obstetric nurse clinician contacted each hospital newborn nursery, spoke to the head nurse or charge nurse, and invited that nurse to participate in a short survey on infant sleep position.
After agreeing to participate, respondents were asked 10 questions about the policy and practice for infant sleep positioning in the nursery, what position they advise parents to use on discharge, and their opinion of the AAP recommendation. To maintain consistency between the 1992 and 1999 surveys, the 1999 survey was modified for date references and deletion of a question about any recent changes to their sleep position policy.
The opinion question was recorded as a narrative response during the interview. For data analysis, the responses were coded into 4 categories: agree, disagree, no opinion, or other Table 1. Three of the authors (J.E.D., R.L.P., P.G.S.) independently coded the opinion statements from the 1992 and 1999 surveys, and any discrepancies were resolved by consensus. The same coding criteria were used for both surveys. The data were analyzed using SAS software (SAS Institute, Cary, NC).
Our study was granted an exemption by the institutional review board at the University of Missouri–Columbia.
Results
In 1992 there were 79 hospitals with newborn nurseries in Missouri, and by 1999 this number had decreased to 75; however, the average number of nursery beds per hospital remained relatively constant with 16 in 1992 (median=12) and 15 in 1999 (median=12). For both surveys all hospitals with newborn nurseries in Missouri were contacted and agreed to participate.
In 1992, 92% of the head nurses were aware of the AAP recommendation for back or side sleeping position. By 1999, all were aware of the recommendation; however, the percentage of nurseries with an infant sleep position policy decreased from 98% to 95%.
Marked changes occurred in the infant sleep position used in these nurseries in the 7 years between surveys. In 1992, 32% of the survey respondents reported that their usual practice or policy was to use the prone position (exclusively) or stomach or side for sleep. In 1999, none of the respondents reported using the prone position as usual practice Table 2. Reported use of the supine position and of both supine and lateral position increased dramatically between 1992 and 1999 while exclusive use of side positioning decreased. Several of the nurses stated that they still place some babies on their stomach in the nursery and justified this by stating that they do not tell the parents or they watch the babies closely.
The positioning advice given to parents changed from 1992 to 1999, with more hospitals advising use of supine position exclusively or both the supine and the lateral position. In 1999, no nursery staff advised parents to place their babies prone. The position respondents stated that they used in the nursery and that they advised parents at discharge were highly correlated in 1999 but less so in 1992. The percentage agreement between the position used in the nursery and the advice given to parents in 1992 was 68% ({k}=0.52) and had increased to 75% ({k}=0.57) in 1999.
The proportion of respondents who agreed with the AAP recommendation increased between 1992 and 1999, while the proportion disagreeing decreased. Even so, in 1999 25% of the respondents stated opinions disagreeing with the recommendation.
The nurses who disagreed with the AAP recommendation in 1992 were more likely to state use of the prone position was their usual practice than respondents who agreed (68% vs 0%). In 1999 none of the respondents reported that prone position was their usual practice, and no respondent who disagreed with the recommendation used the supine position exclusively. Conversely, 36% of 1999 respondents who agreed with the recommendation placed babies exclusively on their backs (data not shown).
Discussion
This is the first published study that reports how newborn nurseries have implemented the AAP recommendations for infant sleep positioning. Our 2 surveys of head nurses in all newborn nurseries in Missouri show a significant change in nursery practice and nursing advice since the publication of the AAP statement on infant sleep positioning in 1992.
We were surprised to learn that 4 nurseries reported no infant sleep position policies or standard practice, but were encouraged by the overall decline in prone and exclusively side positioning. In 1992, 84% of Missouri nurseries were routinely placing babies prone or on their side. Both of these positions are of concern: Side positioning is the least stable, and both prone and side positioning are associated with an increased risk of SIDS.2,7,8 In 1999, 78% of head nurses reported back or back or side positioning as their standard practice, and only 23% still used side positioning as their standard practice. Head nurses report that prone positioning is no longer a standard practice in any Missouri nursery, but some nurses indicate a willingness to make exceptions for “spitty” babies and immediately after feeding.
Nursery advice to parents about positioning has also changed. In 1992, 80% of head nurses advised parents to place their babies either on their sides or prone. In 1999, 8% still advised side; 20% recommended supine; and 72% recommended either back or side. No head nurses reported that placing babies prone for sleep was standard advice.
The overall change in head nurse opinions of the AAP recommendations is also encouraging. Between 1992 and 1999, head nurses who disagreed with the AAP recommendations declined from 63% to 25%. This opinion conversion is critical, because nurses who disagreed with the AAP recommendations did not use supine positioning as their standard practice. In 1999 there was 75% agreement between what nurses did and what they advised.
Changing behavior is difficult, but public policies can lead to change.14 There has been considerable research devoted to the effect that clinical guidelines have on practitioner behavior.15 Several barriers to behavioral change have been identified16 including familiarity, awareness, and agreement with the guideline. Our study demonstrates the effectiveness of a clinical policy and a public campaign to change clinical behavior and identifies targets for future educational programs. We believe the Back to Sleep Program was successful due to a multifaceted approach that included the general press, professional society outreach, nursing and physician involvement, and education of parents. The diversity of influences applied in this campaign offers a model for eliciting specific behavioral change by clinicians and patients. Even so, our findings suggest that some clinicians may still cling to behaviors, and changing these behaviors may require specific targeted actions.
Newborn nursery nurses have an important role in influencing infant sleep positioning at home. There is increasing evidence that what advice and observation regarding infant sleep positioning while in the hospital is important for what they do at home. A study of inner city mothers found that the most important determinant of intended and actual home sleep positioning was the mothers’ observation of the sleep position used in the hospital. These mothers observed their babies in prone positions 14% to 17% of the time in the newborn nursery, despite hospital policies regarding side or supine positioning in all 3 participating hospitals.9 This finding is of concern, because in our study nurses who disagreed with AAP recommendations made exceptions and positioned some babies prone. So, even though prone positioning is no longer standard practice, it is still used in some nurseries and may be witnessed by parents.
Lesko and colleagues17 found that advice from a health care professional had the most important influence on a mother’s decision to use nonprone sleep positions at 1 month. Gibson and coworkers18 found that nearly half of parents in suburban and inner city clinics reported that health professional advice influenced how they positioned their infants. The Centers for Disease Control and Prevention (CDC)19 recently cited this evidence in recommending that outreach programs to influence infant sleep position should consider the role of advice from health professionals. This should reinforce the important role that family physicians have in recommending supine positioning over all other sleep positions. We believe that the change in advice given to parents demonstrated in our study has an important effect on home infant sleep positioning. The strong correlation found between position used and advice given may indicate that nurses who make exceptions for prone or side positioning may also bias their advice.
Limitations and Strengths
Our study had several limitations. We relied solely on the head nurses’ reporting of the conditions within their institutions. We did not conduct any observations of nursery practices, contact parents of new infants to corroborate their experience with the responses from head nurses, or otherwise validate survey responses. It is possible that actual practice differs from what was reported. We were also unable to match hospitals from the 1992 and 1999 data because of the way the 1992 hospital data were collected. The individual surveys completed in 1992 did not include the name of the hospital on the data form, which made it impossible to compare them with the individual hospitals in 1999.
Despite these limitations, this study has several strengths. We contacted and interviewed head nurses at every hospital newborn nursery in Missouri shortly after the AAP infant sleep position recommendation was released and again 7 years later. Also, the same obstetrical nurse clinician conducted the interviews using identical questions, which provided consistency between surveys.
Conclusions
Important change has occurred in nursery practice, opinion, and advice to parents since the announcement of the AAP recommendation on infant sleep position in 1992; however, some head nurses still disagree with this recommendation, and this may affect the nursery positioning practice and the advice given to mothers. Infant sleep position and advice from newborn nursery nurses should be consistent with current AAP recommendations and hospital policy. Our study further supports the CDC recommendation that outreach programs to influence infant sleep position should consider the role of advice from health professionals19 and emphasizes the importance of family physicians in parental choice for infant sleep position. Our study should remind all health care professionals of the impact of their advice to parents regarding infant sleep position. With the overwhelming evidence supporting the supine position, increased educational efforts focused on influencing nursery staff practice and advice may be necessary to increase infant supine sleep positioning. These educational efforts should include the family physician’s role in influencing nursery staff practices.
Acknowledgments
Support for our study was provided by the Center for Family Medicine Science in the Department of Family and Community Medicine at the University of Missouri–Columbia. The Center is funded in part by the American Academy of Family Physicians. We would like to thank Sharon Cornelison, RNC, for her diligent data collection, Darla Horman, MA, for management of the 1992 survey data, and Mirra Smith for data abstraction.
1. National Center for Health Statistics. Births and deaths: United States, 1995. Hyattsville, Md: US Department of Health and Human Services, Public Health Service, Center for Disease Control; 1996.
2. Willinger M, Hoffman HJ, Wu KT, et al. Factors associated with the transition to nonprone sleep positions of infants in the United States: the National Infant Sleep Position Study. JAMA 1998;280:329-35.
3. Willinger M. SIDS prevention. Pediatr Ann 1995;24:358-64.
4. American Academy of Pediatrics AAP Task Force on Infant Positioning and SIDS. Positioning and SIDS. Pediatrics 1992;89:1120-26.
5. Willinger M, Hoffman HJ, Hartford RB. Infant sleep position and risk for sudden infant death syndrome: report of meeting held January 13 and 14, 1994, National Institutes of Health, Bethesda, Md. Pediatrics 1994;93:814-19.
6. Clinton Administration announces expanded Back to Sleep campaign: Tipper Gore to lead new effort Rockville, Md, 1997. January 12, 2000. National Institute of Child Health and Human Development. February 9, 2000. Available at: www.nichd.nih.gov/sids/clinton.htm.
7. Fleming PJ, Blair PS, Bacon C, et al. Environment of infants during sleep and risk of the sudden infant death syndrome: results of 1993-5 case-control study for confidential inquiry into stillbirths and deaths in infancy. Confidential enquiry into stillbirths and deaths regional coordinators and researchers. BMJ 1996;313:191-95.
8. Mitchell EA, Tuohy PG, Brunt JM, et al. Risk factors for sudden infant death syndrome following the prevention campaign in New Zealand: a prospective study. Pediatrics 1997;100:835-40.
9. Brenner RA, Simons-Morton BG, Bhaskar B, et al. Prevalence and predictors of the prone sleep position among inner-city infants. JAMA 1998;280:341-46.
10. Willinger M, Ko C, Hoffman HJ, et al. Factors associated with caregivers’ choice of infant sleep position, 1994-1998. JAMA 2000;283:2135-42.
11. Peeke K, Herschberger CM, Kuehn D. Levett J Infant sleep position nursing practice and knowledge Am J Matern Child Nurs 1999;24:301-04.
12. Scheidt P, Willinger M, Hoffman H, et al. Recommended infant sleep positions for reduction of SIDS Risk. Am J Dis Children 1993;147:462.-
13. Missouri Department of Health Missouri hospital profiles 1997. Jefferson City, Mo: Missouri Department of Health, Center for Health Information Management and Epidemiology, State Center for Health Statistics; 1998.
14. Longo DR, Brownson RC, Johnson JC, et al. Hospital smoking bans and employee smoking behavior: results of a national survey. JAMA 1996;275:1252-57.
15. Grimshaw JM, Russell IT. Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993;342:1317-22.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Lesko SM, Corwin MJ, Vezina RM, et al. Changes in sleep position during infancy: a prospective longitudinal assessment. JAMA 1998;280:336-40.
18. Gibson E, Cullen JA, Spinner S, Rankin K, Spitzer AR. Infant sleep position following new AAP guidelines: American Academy of Pediatrics. Pediatrics 1995;96:t-72.
19. Assessment of infant sleeping position—selected states 1996. MMWR Morb Mortal Wkly Rep 1998;47:873-77.
STUDY DESIGN: We conducted telephone interviews with the head nurses in all of the newborn nurseries in Missouri.
POPULATION: In 1992 there were 79 hospitals in Missouri with newborn nurseries; in 1999 that number had decreased to 75.
OUTCOMES MEASURED: During the interviews, we solicited nursery infant sleep position policy and practice, head nurses’ opinions about the supine sleep recommendation, and nurses’ advice to parents regarding sleep position.
RESULTS: In 1992, 32% of the nurseries used the prone position for sleep, and 58% of the head nurses interviewed disagreed with the recommendations of the American Academy of Pediatrics (AAP). By 1999, all newborn nurseries in Missouri placed infants on their backs or sides for sleep. The rate of disagreement with the AAP recommendation had decreased, with 25% of respondents indicating that they disagreed.
CONCLUSIONS: From 1992 to 1999 nurseries in Missouri have changed from predominantly using prone and lateral positioning to lateral and supine positioning for newborns. Some nurses continue to voice concern about placing infants on their backs and expressed a willingness to place babies prone. Since there is agreement between nurses’ usual infant positioning and the advice given to parents, and because both are important influences on infant positioning by parents, future campaigns to decrease SIDS should emphasize correcting nurses’ positioning behavior and advising parents to increase infant supine positioning.
Sudden infant death syndrome (SIDS) is the leading cause of postneonatal infant mortality in the United States, accounting for approximately one third of all such deaths.1 Between 1992 and 1996 the rate of SIDS deaths in the United States declined from 1.2 per 1000 live births to 0.74 per 1000 live births, and this decline accounted for 75% of the decline in the postneonatal infant death rate.2
By 1990 a strong association between the prone sleeping position and SIDS had been established,3 and in 1992 the American Academy of Pediatrics (AAP) Task Force on Infant Positioning and SIDS recommended that all full-term infants be placed in the lateral or supine position for sleep.4,5 The Back to Sleep campaign was launched in 1994 by the US Department of Health and Human Services and other partners to help disseminate the message that back sleeping can reduce the risk of SIDS and save lives.2,6 Studies in other countries indicated that SIDS rates declined concurrent with decreases in the prevalence of prone sleeping.3 Between 1992 and 1995 the SIDS rate in the United States declined 30%, while the prevalence of prone sleeping decreased from 78% to 43%.1 Since the initial AAP recommendations, additional evidence has accumulated supporting supine or side positioning7,8 and more recently that side positioning carries a higher risk for SIDS than supine.7
The recommendation of a health care professional and observation of sleep position in the hospital have both been shown to be important determinants in parents’ decision making about infant sleep position.2,9,10 Although there are several studies of sleep positioning by caregivers and of sleep position recommendations by health care providers, there is only one study of nursery nursing staff regarding infant sleep position.2,9,11,12 The National Institute of Child Health and Human Development has conducted a survey of health professionals regarding infant sleep position since 1993, but the results have not been published.9 Our study was initiated to assess the infant sleep position policies and practices of newborn nurseries in Missouri and evaluate nursery staff opinions of the AAP recommendation shortly after the recommendation was released in 1992 and again in 1999.
Methods
Study Design
A nurse interviewer conducted a telephone survey of all newborn nurseries in Missouri in 1992 and 1999. The Missouri Department of Health hospital profile database was used to identify hospitals with newborn nurseries before both surveys.13 An experienced obstetric nurse clinician contacted each hospital newborn nursery, spoke to the head nurse or charge nurse, and invited that nurse to participate in a short survey on infant sleep position.
After agreeing to participate, respondents were asked 10 questions about the policy and practice for infant sleep positioning in the nursery, what position they advise parents to use on discharge, and their opinion of the AAP recommendation. To maintain consistency between the 1992 and 1999 surveys, the 1999 survey was modified for date references and deletion of a question about any recent changes to their sleep position policy.
The opinion question was recorded as a narrative response during the interview. For data analysis, the responses were coded into 4 categories: agree, disagree, no opinion, or other Table 1. Three of the authors (J.E.D., R.L.P., P.G.S.) independently coded the opinion statements from the 1992 and 1999 surveys, and any discrepancies were resolved by consensus. The same coding criteria were used for both surveys. The data were analyzed using SAS software (SAS Institute, Cary, NC).
Our study was granted an exemption by the institutional review board at the University of Missouri–Columbia.
Results
In 1992 there were 79 hospitals with newborn nurseries in Missouri, and by 1999 this number had decreased to 75; however, the average number of nursery beds per hospital remained relatively constant with 16 in 1992 (median=12) and 15 in 1999 (median=12). For both surveys all hospitals with newborn nurseries in Missouri were contacted and agreed to participate.
In 1992, 92% of the head nurses were aware of the AAP recommendation for back or side sleeping position. By 1999, all were aware of the recommendation; however, the percentage of nurseries with an infant sleep position policy decreased from 98% to 95%.
Marked changes occurred in the infant sleep position used in these nurseries in the 7 years between surveys. In 1992, 32% of the survey respondents reported that their usual practice or policy was to use the prone position (exclusively) or stomach or side for sleep. In 1999, none of the respondents reported using the prone position as usual practice Table 2. Reported use of the supine position and of both supine and lateral position increased dramatically between 1992 and 1999 while exclusive use of side positioning decreased. Several of the nurses stated that they still place some babies on their stomach in the nursery and justified this by stating that they do not tell the parents or they watch the babies closely.
The positioning advice given to parents changed from 1992 to 1999, with more hospitals advising use of supine position exclusively or both the supine and the lateral position. In 1999, no nursery staff advised parents to place their babies prone. The position respondents stated that they used in the nursery and that they advised parents at discharge were highly correlated in 1999 but less so in 1992. The percentage agreement between the position used in the nursery and the advice given to parents in 1992 was 68% ({k}=0.52) and had increased to 75% ({k}=0.57) in 1999.
The proportion of respondents who agreed with the AAP recommendation increased between 1992 and 1999, while the proportion disagreeing decreased. Even so, in 1999 25% of the respondents stated opinions disagreeing with the recommendation.
The nurses who disagreed with the AAP recommendation in 1992 were more likely to state use of the prone position was their usual practice than respondents who agreed (68% vs 0%). In 1999 none of the respondents reported that prone position was their usual practice, and no respondent who disagreed with the recommendation used the supine position exclusively. Conversely, 36% of 1999 respondents who agreed with the recommendation placed babies exclusively on their backs (data not shown).
Discussion
This is the first published study that reports how newborn nurseries have implemented the AAP recommendations for infant sleep positioning. Our 2 surveys of head nurses in all newborn nurseries in Missouri show a significant change in nursery practice and nursing advice since the publication of the AAP statement on infant sleep positioning in 1992.
We were surprised to learn that 4 nurseries reported no infant sleep position policies or standard practice, but were encouraged by the overall decline in prone and exclusively side positioning. In 1992, 84% of Missouri nurseries were routinely placing babies prone or on their side. Both of these positions are of concern: Side positioning is the least stable, and both prone and side positioning are associated with an increased risk of SIDS.2,7,8 In 1999, 78% of head nurses reported back or back or side positioning as their standard practice, and only 23% still used side positioning as their standard practice. Head nurses report that prone positioning is no longer a standard practice in any Missouri nursery, but some nurses indicate a willingness to make exceptions for “spitty” babies and immediately after feeding.
Nursery advice to parents about positioning has also changed. In 1992, 80% of head nurses advised parents to place their babies either on their sides or prone. In 1999, 8% still advised side; 20% recommended supine; and 72% recommended either back or side. No head nurses reported that placing babies prone for sleep was standard advice.
The overall change in head nurse opinions of the AAP recommendations is also encouraging. Between 1992 and 1999, head nurses who disagreed with the AAP recommendations declined from 63% to 25%. This opinion conversion is critical, because nurses who disagreed with the AAP recommendations did not use supine positioning as their standard practice. In 1999 there was 75% agreement between what nurses did and what they advised.
Changing behavior is difficult, but public policies can lead to change.14 There has been considerable research devoted to the effect that clinical guidelines have on practitioner behavior.15 Several barriers to behavioral change have been identified16 including familiarity, awareness, and agreement with the guideline. Our study demonstrates the effectiveness of a clinical policy and a public campaign to change clinical behavior and identifies targets for future educational programs. We believe the Back to Sleep Program was successful due to a multifaceted approach that included the general press, professional society outreach, nursing and physician involvement, and education of parents. The diversity of influences applied in this campaign offers a model for eliciting specific behavioral change by clinicians and patients. Even so, our findings suggest that some clinicians may still cling to behaviors, and changing these behaviors may require specific targeted actions.
Newborn nursery nurses have an important role in influencing infant sleep positioning at home. There is increasing evidence that what advice and observation regarding infant sleep positioning while in the hospital is important for what they do at home. A study of inner city mothers found that the most important determinant of intended and actual home sleep positioning was the mothers’ observation of the sleep position used in the hospital. These mothers observed their babies in prone positions 14% to 17% of the time in the newborn nursery, despite hospital policies regarding side or supine positioning in all 3 participating hospitals.9 This finding is of concern, because in our study nurses who disagreed with AAP recommendations made exceptions and positioned some babies prone. So, even though prone positioning is no longer standard practice, it is still used in some nurseries and may be witnessed by parents.
Lesko and colleagues17 found that advice from a health care professional had the most important influence on a mother’s decision to use nonprone sleep positions at 1 month. Gibson and coworkers18 found that nearly half of parents in suburban and inner city clinics reported that health professional advice influenced how they positioned their infants. The Centers for Disease Control and Prevention (CDC)19 recently cited this evidence in recommending that outreach programs to influence infant sleep position should consider the role of advice from health professionals. This should reinforce the important role that family physicians have in recommending supine positioning over all other sleep positions. We believe that the change in advice given to parents demonstrated in our study has an important effect on home infant sleep positioning. The strong correlation found between position used and advice given may indicate that nurses who make exceptions for prone or side positioning may also bias their advice.
Limitations and Strengths
Our study had several limitations. We relied solely on the head nurses’ reporting of the conditions within their institutions. We did not conduct any observations of nursery practices, contact parents of new infants to corroborate their experience with the responses from head nurses, or otherwise validate survey responses. It is possible that actual practice differs from what was reported. We were also unable to match hospitals from the 1992 and 1999 data because of the way the 1992 hospital data were collected. The individual surveys completed in 1992 did not include the name of the hospital on the data form, which made it impossible to compare them with the individual hospitals in 1999.
Despite these limitations, this study has several strengths. We contacted and interviewed head nurses at every hospital newborn nursery in Missouri shortly after the AAP infant sleep position recommendation was released and again 7 years later. Also, the same obstetrical nurse clinician conducted the interviews using identical questions, which provided consistency between surveys.
Conclusions
Important change has occurred in nursery practice, opinion, and advice to parents since the announcement of the AAP recommendation on infant sleep position in 1992; however, some head nurses still disagree with this recommendation, and this may affect the nursery positioning practice and the advice given to mothers. Infant sleep position and advice from newborn nursery nurses should be consistent with current AAP recommendations and hospital policy. Our study further supports the CDC recommendation that outreach programs to influence infant sleep position should consider the role of advice from health professionals19 and emphasizes the importance of family physicians in parental choice for infant sleep position. Our study should remind all health care professionals of the impact of their advice to parents regarding infant sleep position. With the overwhelming evidence supporting the supine position, increased educational efforts focused on influencing nursery staff practice and advice may be necessary to increase infant supine sleep positioning. These educational efforts should include the family physician’s role in influencing nursery staff practices.
Acknowledgments
Support for our study was provided by the Center for Family Medicine Science in the Department of Family and Community Medicine at the University of Missouri–Columbia. The Center is funded in part by the American Academy of Family Physicians. We would like to thank Sharon Cornelison, RNC, for her diligent data collection, Darla Horman, MA, for management of the 1992 survey data, and Mirra Smith for data abstraction.
STUDY DESIGN: We conducted telephone interviews with the head nurses in all of the newborn nurseries in Missouri.
POPULATION: In 1992 there were 79 hospitals in Missouri with newborn nurseries; in 1999 that number had decreased to 75.
OUTCOMES MEASURED: During the interviews, we solicited nursery infant sleep position policy and practice, head nurses’ opinions about the supine sleep recommendation, and nurses’ advice to parents regarding sleep position.
RESULTS: In 1992, 32% of the nurseries used the prone position for sleep, and 58% of the head nurses interviewed disagreed with the recommendations of the American Academy of Pediatrics (AAP). By 1999, all newborn nurseries in Missouri placed infants on their backs or sides for sleep. The rate of disagreement with the AAP recommendation had decreased, with 25% of respondents indicating that they disagreed.
CONCLUSIONS: From 1992 to 1999 nurseries in Missouri have changed from predominantly using prone and lateral positioning to lateral and supine positioning for newborns. Some nurses continue to voice concern about placing infants on their backs and expressed a willingness to place babies prone. Since there is agreement between nurses’ usual infant positioning and the advice given to parents, and because both are important influences on infant positioning by parents, future campaigns to decrease SIDS should emphasize correcting nurses’ positioning behavior and advising parents to increase infant supine positioning.
Sudden infant death syndrome (SIDS) is the leading cause of postneonatal infant mortality in the United States, accounting for approximately one third of all such deaths.1 Between 1992 and 1996 the rate of SIDS deaths in the United States declined from 1.2 per 1000 live births to 0.74 per 1000 live births, and this decline accounted for 75% of the decline in the postneonatal infant death rate.2
By 1990 a strong association between the prone sleeping position and SIDS had been established,3 and in 1992 the American Academy of Pediatrics (AAP) Task Force on Infant Positioning and SIDS recommended that all full-term infants be placed in the lateral or supine position for sleep.4,5 The Back to Sleep campaign was launched in 1994 by the US Department of Health and Human Services and other partners to help disseminate the message that back sleeping can reduce the risk of SIDS and save lives.2,6 Studies in other countries indicated that SIDS rates declined concurrent with decreases in the prevalence of prone sleeping.3 Between 1992 and 1995 the SIDS rate in the United States declined 30%, while the prevalence of prone sleeping decreased from 78% to 43%.1 Since the initial AAP recommendations, additional evidence has accumulated supporting supine or side positioning7,8 and more recently that side positioning carries a higher risk for SIDS than supine.7
The recommendation of a health care professional and observation of sleep position in the hospital have both been shown to be important determinants in parents’ decision making about infant sleep position.2,9,10 Although there are several studies of sleep positioning by caregivers and of sleep position recommendations by health care providers, there is only one study of nursery nursing staff regarding infant sleep position.2,9,11,12 The National Institute of Child Health and Human Development has conducted a survey of health professionals regarding infant sleep position since 1993, but the results have not been published.9 Our study was initiated to assess the infant sleep position policies and practices of newborn nurseries in Missouri and evaluate nursery staff opinions of the AAP recommendation shortly after the recommendation was released in 1992 and again in 1999.
Methods
Study Design
A nurse interviewer conducted a telephone survey of all newborn nurseries in Missouri in 1992 and 1999. The Missouri Department of Health hospital profile database was used to identify hospitals with newborn nurseries before both surveys.13 An experienced obstetric nurse clinician contacted each hospital newborn nursery, spoke to the head nurse or charge nurse, and invited that nurse to participate in a short survey on infant sleep position.
After agreeing to participate, respondents were asked 10 questions about the policy and practice for infant sleep positioning in the nursery, what position they advise parents to use on discharge, and their opinion of the AAP recommendation. To maintain consistency between the 1992 and 1999 surveys, the 1999 survey was modified for date references and deletion of a question about any recent changes to their sleep position policy.
The opinion question was recorded as a narrative response during the interview. For data analysis, the responses were coded into 4 categories: agree, disagree, no opinion, or other Table 1. Three of the authors (J.E.D., R.L.P., P.G.S.) independently coded the opinion statements from the 1992 and 1999 surveys, and any discrepancies were resolved by consensus. The same coding criteria were used for both surveys. The data were analyzed using SAS software (SAS Institute, Cary, NC).
Our study was granted an exemption by the institutional review board at the University of Missouri–Columbia.
Results
In 1992 there were 79 hospitals with newborn nurseries in Missouri, and by 1999 this number had decreased to 75; however, the average number of nursery beds per hospital remained relatively constant with 16 in 1992 (median=12) and 15 in 1999 (median=12). For both surveys all hospitals with newborn nurseries in Missouri were contacted and agreed to participate.
In 1992, 92% of the head nurses were aware of the AAP recommendation for back or side sleeping position. By 1999, all were aware of the recommendation; however, the percentage of nurseries with an infant sleep position policy decreased from 98% to 95%.
Marked changes occurred in the infant sleep position used in these nurseries in the 7 years between surveys. In 1992, 32% of the survey respondents reported that their usual practice or policy was to use the prone position (exclusively) or stomach or side for sleep. In 1999, none of the respondents reported using the prone position as usual practice Table 2. Reported use of the supine position and of both supine and lateral position increased dramatically between 1992 and 1999 while exclusive use of side positioning decreased. Several of the nurses stated that they still place some babies on their stomach in the nursery and justified this by stating that they do not tell the parents or they watch the babies closely.
The positioning advice given to parents changed from 1992 to 1999, with more hospitals advising use of supine position exclusively or both the supine and the lateral position. In 1999, no nursery staff advised parents to place their babies prone. The position respondents stated that they used in the nursery and that they advised parents at discharge were highly correlated in 1999 but less so in 1992. The percentage agreement between the position used in the nursery and the advice given to parents in 1992 was 68% ({k}=0.52) and had increased to 75% ({k}=0.57) in 1999.
The proportion of respondents who agreed with the AAP recommendation increased between 1992 and 1999, while the proportion disagreeing decreased. Even so, in 1999 25% of the respondents stated opinions disagreeing with the recommendation.
The nurses who disagreed with the AAP recommendation in 1992 were more likely to state use of the prone position was their usual practice than respondents who agreed (68% vs 0%). In 1999 none of the respondents reported that prone position was their usual practice, and no respondent who disagreed with the recommendation used the supine position exclusively. Conversely, 36% of 1999 respondents who agreed with the recommendation placed babies exclusively on their backs (data not shown).
Discussion
This is the first published study that reports how newborn nurseries have implemented the AAP recommendations for infant sleep positioning. Our 2 surveys of head nurses in all newborn nurseries in Missouri show a significant change in nursery practice and nursing advice since the publication of the AAP statement on infant sleep positioning in 1992.
We were surprised to learn that 4 nurseries reported no infant sleep position policies or standard practice, but were encouraged by the overall decline in prone and exclusively side positioning. In 1992, 84% of Missouri nurseries were routinely placing babies prone or on their side. Both of these positions are of concern: Side positioning is the least stable, and both prone and side positioning are associated with an increased risk of SIDS.2,7,8 In 1999, 78% of head nurses reported back or back or side positioning as their standard practice, and only 23% still used side positioning as their standard practice. Head nurses report that prone positioning is no longer a standard practice in any Missouri nursery, but some nurses indicate a willingness to make exceptions for “spitty” babies and immediately after feeding.
Nursery advice to parents about positioning has also changed. In 1992, 80% of head nurses advised parents to place their babies either on their sides or prone. In 1999, 8% still advised side; 20% recommended supine; and 72% recommended either back or side. No head nurses reported that placing babies prone for sleep was standard advice.
The overall change in head nurse opinions of the AAP recommendations is also encouraging. Between 1992 and 1999, head nurses who disagreed with the AAP recommendations declined from 63% to 25%. This opinion conversion is critical, because nurses who disagreed with the AAP recommendations did not use supine positioning as their standard practice. In 1999 there was 75% agreement between what nurses did and what they advised.
Changing behavior is difficult, but public policies can lead to change.14 There has been considerable research devoted to the effect that clinical guidelines have on practitioner behavior.15 Several barriers to behavioral change have been identified16 including familiarity, awareness, and agreement with the guideline. Our study demonstrates the effectiveness of a clinical policy and a public campaign to change clinical behavior and identifies targets for future educational programs. We believe the Back to Sleep Program was successful due to a multifaceted approach that included the general press, professional society outreach, nursing and physician involvement, and education of parents. The diversity of influences applied in this campaign offers a model for eliciting specific behavioral change by clinicians and patients. Even so, our findings suggest that some clinicians may still cling to behaviors, and changing these behaviors may require specific targeted actions.
Newborn nursery nurses have an important role in influencing infant sleep positioning at home. There is increasing evidence that what advice and observation regarding infant sleep positioning while in the hospital is important for what they do at home. A study of inner city mothers found that the most important determinant of intended and actual home sleep positioning was the mothers’ observation of the sleep position used in the hospital. These mothers observed their babies in prone positions 14% to 17% of the time in the newborn nursery, despite hospital policies regarding side or supine positioning in all 3 participating hospitals.9 This finding is of concern, because in our study nurses who disagreed with AAP recommendations made exceptions and positioned some babies prone. So, even though prone positioning is no longer standard practice, it is still used in some nurseries and may be witnessed by parents.
Lesko and colleagues17 found that advice from a health care professional had the most important influence on a mother’s decision to use nonprone sleep positions at 1 month. Gibson and coworkers18 found that nearly half of parents in suburban and inner city clinics reported that health professional advice influenced how they positioned their infants. The Centers for Disease Control and Prevention (CDC)19 recently cited this evidence in recommending that outreach programs to influence infant sleep position should consider the role of advice from health professionals. This should reinforce the important role that family physicians have in recommending supine positioning over all other sleep positions. We believe that the change in advice given to parents demonstrated in our study has an important effect on home infant sleep positioning. The strong correlation found between position used and advice given may indicate that nurses who make exceptions for prone or side positioning may also bias their advice.
Limitations and Strengths
Our study had several limitations. We relied solely on the head nurses’ reporting of the conditions within their institutions. We did not conduct any observations of nursery practices, contact parents of new infants to corroborate their experience with the responses from head nurses, or otherwise validate survey responses. It is possible that actual practice differs from what was reported. We were also unable to match hospitals from the 1992 and 1999 data because of the way the 1992 hospital data were collected. The individual surveys completed in 1992 did not include the name of the hospital on the data form, which made it impossible to compare them with the individual hospitals in 1999.
Despite these limitations, this study has several strengths. We contacted and interviewed head nurses at every hospital newborn nursery in Missouri shortly after the AAP infant sleep position recommendation was released and again 7 years later. Also, the same obstetrical nurse clinician conducted the interviews using identical questions, which provided consistency between surveys.
Conclusions
Important change has occurred in nursery practice, opinion, and advice to parents since the announcement of the AAP recommendation on infant sleep position in 1992; however, some head nurses still disagree with this recommendation, and this may affect the nursery positioning practice and the advice given to mothers. Infant sleep position and advice from newborn nursery nurses should be consistent with current AAP recommendations and hospital policy. Our study further supports the CDC recommendation that outreach programs to influence infant sleep position should consider the role of advice from health professionals19 and emphasizes the importance of family physicians in parental choice for infant sleep position. Our study should remind all health care professionals of the impact of their advice to parents regarding infant sleep position. With the overwhelming evidence supporting the supine position, increased educational efforts focused on influencing nursery staff practice and advice may be necessary to increase infant supine sleep positioning. These educational efforts should include the family physician’s role in influencing nursery staff practices.
Acknowledgments
Support for our study was provided by the Center for Family Medicine Science in the Department of Family and Community Medicine at the University of Missouri–Columbia. The Center is funded in part by the American Academy of Family Physicians. We would like to thank Sharon Cornelison, RNC, for her diligent data collection, Darla Horman, MA, for management of the 1992 survey data, and Mirra Smith for data abstraction.
1. National Center for Health Statistics. Births and deaths: United States, 1995. Hyattsville, Md: US Department of Health and Human Services, Public Health Service, Center for Disease Control; 1996.
2. Willinger M, Hoffman HJ, Wu KT, et al. Factors associated with the transition to nonprone sleep positions of infants in the United States: the National Infant Sleep Position Study. JAMA 1998;280:329-35.
3. Willinger M. SIDS prevention. Pediatr Ann 1995;24:358-64.
4. American Academy of Pediatrics AAP Task Force on Infant Positioning and SIDS. Positioning and SIDS. Pediatrics 1992;89:1120-26.
5. Willinger M, Hoffman HJ, Hartford RB. Infant sleep position and risk for sudden infant death syndrome: report of meeting held January 13 and 14, 1994, National Institutes of Health, Bethesda, Md. Pediatrics 1994;93:814-19.
6. Clinton Administration announces expanded Back to Sleep campaign: Tipper Gore to lead new effort Rockville, Md, 1997. January 12, 2000. National Institute of Child Health and Human Development. February 9, 2000. Available at: www.nichd.nih.gov/sids/clinton.htm.
7. Fleming PJ, Blair PS, Bacon C, et al. Environment of infants during sleep and risk of the sudden infant death syndrome: results of 1993-5 case-control study for confidential inquiry into stillbirths and deaths in infancy. Confidential enquiry into stillbirths and deaths regional coordinators and researchers. BMJ 1996;313:191-95.
8. Mitchell EA, Tuohy PG, Brunt JM, et al. Risk factors for sudden infant death syndrome following the prevention campaign in New Zealand: a prospective study. Pediatrics 1997;100:835-40.
9. Brenner RA, Simons-Morton BG, Bhaskar B, et al. Prevalence and predictors of the prone sleep position among inner-city infants. JAMA 1998;280:341-46.
10. Willinger M, Ko C, Hoffman HJ, et al. Factors associated with caregivers’ choice of infant sleep position, 1994-1998. JAMA 2000;283:2135-42.
11. Peeke K, Herschberger CM, Kuehn D. Levett J Infant sleep position nursing practice and knowledge Am J Matern Child Nurs 1999;24:301-04.
12. Scheidt P, Willinger M, Hoffman H, et al. Recommended infant sleep positions for reduction of SIDS Risk. Am J Dis Children 1993;147:462.-
13. Missouri Department of Health Missouri hospital profiles 1997. Jefferson City, Mo: Missouri Department of Health, Center for Health Information Management and Epidemiology, State Center for Health Statistics; 1998.
14. Longo DR, Brownson RC, Johnson JC, et al. Hospital smoking bans and employee smoking behavior: results of a national survey. JAMA 1996;275:1252-57.
15. Grimshaw JM, Russell IT. Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993;342:1317-22.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Lesko SM, Corwin MJ, Vezina RM, et al. Changes in sleep position during infancy: a prospective longitudinal assessment. JAMA 1998;280:336-40.
18. Gibson E, Cullen JA, Spinner S, Rankin K, Spitzer AR. Infant sleep position following new AAP guidelines: American Academy of Pediatrics. Pediatrics 1995;96:t-72.
19. Assessment of infant sleeping position—selected states 1996. MMWR Morb Mortal Wkly Rep 1998;47:873-77.
1. National Center for Health Statistics. Births and deaths: United States, 1995. Hyattsville, Md: US Department of Health and Human Services, Public Health Service, Center for Disease Control; 1996.
2. Willinger M, Hoffman HJ, Wu KT, et al. Factors associated with the transition to nonprone sleep positions of infants in the United States: the National Infant Sleep Position Study. JAMA 1998;280:329-35.
3. Willinger M. SIDS prevention. Pediatr Ann 1995;24:358-64.
4. American Academy of Pediatrics AAP Task Force on Infant Positioning and SIDS. Positioning and SIDS. Pediatrics 1992;89:1120-26.
5. Willinger M, Hoffman HJ, Hartford RB. Infant sleep position and risk for sudden infant death syndrome: report of meeting held January 13 and 14, 1994, National Institutes of Health, Bethesda, Md. Pediatrics 1994;93:814-19.
6. Clinton Administration announces expanded Back to Sleep campaign: Tipper Gore to lead new effort Rockville, Md, 1997. January 12, 2000. National Institute of Child Health and Human Development. February 9, 2000. Available at: www.nichd.nih.gov/sids/clinton.htm.
7. Fleming PJ, Blair PS, Bacon C, et al. Environment of infants during sleep and risk of the sudden infant death syndrome: results of 1993-5 case-control study for confidential inquiry into stillbirths and deaths in infancy. Confidential enquiry into stillbirths and deaths regional coordinators and researchers. BMJ 1996;313:191-95.
8. Mitchell EA, Tuohy PG, Brunt JM, et al. Risk factors for sudden infant death syndrome following the prevention campaign in New Zealand: a prospective study. Pediatrics 1997;100:835-40.
9. Brenner RA, Simons-Morton BG, Bhaskar B, et al. Prevalence and predictors of the prone sleep position among inner-city infants. JAMA 1998;280:341-46.
10. Willinger M, Ko C, Hoffman HJ, et al. Factors associated with caregivers’ choice of infant sleep position, 1994-1998. JAMA 2000;283:2135-42.
11. Peeke K, Herschberger CM, Kuehn D. Levett J Infant sleep position nursing practice and knowledge Am J Matern Child Nurs 1999;24:301-04.
12. Scheidt P, Willinger M, Hoffman H, et al. Recommended infant sleep positions for reduction of SIDS Risk. Am J Dis Children 1993;147:462.-
13. Missouri Department of Health Missouri hospital profiles 1997. Jefferson City, Mo: Missouri Department of Health, Center for Health Information Management and Epidemiology, State Center for Health Statistics; 1998.
14. Longo DR, Brownson RC, Johnson JC, et al. Hospital smoking bans and employee smoking behavior: results of a national survey. JAMA 1996;275:1252-57.
15. Grimshaw JM, Russell IT. Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993;342:1317-22.
16. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458-65.
17. Lesko SM, Corwin MJ, Vezina RM, et al. Changes in sleep position during infancy: a prospective longitudinal assessment. JAMA 1998;280:336-40.
18. Gibson E, Cullen JA, Spinner S, Rankin K, Spitzer AR. Infant sleep position following new AAP guidelines: American Academy of Pediatrics. Pediatrics 1995;96:t-72.
19. Assessment of infant sleeping position—selected states 1996. MMWR Morb Mortal Wkly Rep 1998;47:873-77.
The Effects of Hypnosis on the Labor Processes and Birth Outcomes of Pregnant Adolescents
Hypnosis has been used to control pain during labor and delivery for more than a century, but the introduction of chemo-anesthesia and inhalation anesthesia during the late 19th century led to the decline of its use.1,2 Recently there has been a resurgence of this technique in obstetrics.3-7 Hypnotherapy has been found to be effective in providing pain relief,8,9 reducing the need for chemical anesthesia,8 and reducing anxiety, fear, and pain related to childbirth.1,2,7,10,11 Hypnosis has also been helpful in both managing various complications of pregnancy (such as premature labor5,12-14) and reducing the likelihood of premature labor and birth in high-risk patients.12 It has also has been effective in the treatment of hyperemesis gravidarum,15-16 acute hypertension associated with pregnancy17 and conversion of breech to vertex presentation.18
One promising application of hypnosis is in the area of labor and delivery.1,5,6,19 The use of hypnosis in preparing the patient for labor and delivery is based on the premise that such preparation reduces anxiety, improves pain tolerance (lowering the need for medication), reduces birth complications, and promotes a rapid recovery process.1,2,5 The key aspect of this treatment is involvement of the patient before labor begins, to promote her active participation and sense of control in the labor and delivery process. This is accomplished through educating the patient about this process and teaching her alternate ways to produce hypno-analgesia and anesthesia.1,2 Hypnotic preparation thus provides the expectant mother with a sense of control for managing her anxiety and physical discomfort.
Although there have been numerous reports suggesting the value of hypnosis in obstetrics, our study is one of the first to report a randomized controlled evaluation of childbirth preparation incorporating hypnotic techniques on labor processes and birth outcomes.
Methods
Our subjects were teenage patients (18 years or younger at the time of conception) who entered prenatal treatment with normal pregnancies at a Florida county public health department before the end of their 24th week. The clinic nursing director performed a chart review and identified 47 patients meeting the criteria. These patients were randomly assigned to either the treatment group or the control group. The treatment group received childbirth preparation in self-hypnosis that incorporated information on labor and delivery (the detailed protocol is described in a previous publication1). The control group received supportive counseling designed to control for interpersonal contact and social support and to provide an opportunity for discussion about pregnancy issues of concern to the patient. Patients in the treatment and control groups had the same number of visits.
We obtained institutional review board approval and informed consent from individual patients. The subjects were told that the study was an attempt to provide support for pregnant adolescents in addition to the routine prenatal care provided by the public health department and that they would be randomly assigned to 1 of the 2 groups, their intervention session would coincide with scheduled clinic appointments and would not interrupt their medical treatment in any way, and their participation was voluntary.
Both groups of patients received the standard prenatal treatment protocol from the medical staff, nurse practitioners, and hospital staff, all of whom were blind to group assignments. All patients were delivered at the local teaching hospital by obstetrics department staff who were blind to the study. The study interventions were begun with individual meetings with patients during regular clinic visits between 20 and 24 weeks’ gestation. Continuing clinic visits were scheduled for all patients on a biweekly basis, making the time span of the 4-session experimental conditions approximately 8 weeks. The study counselor (the primary author) provided hypnosis preparation training for the treatment group; a nurse midwife provided the supportive contact with the control group. Both interventions were completed before delivery; no prompting occurred during the labor and delivery process.
The 2 groups of patients were compared on medication use (Pitocin, anesthetic, and postpartum medication), complications and surgical intervention during delivery, and length of hospital stay for mothers and neonatal intensive care unit (NICU) admission for the infants. Complications fell into 36 categories of events (eg, multiple pregnancies, preeclampsia, vacuum-assisted delivery) that were entered in subjects’ records by obstetric staff who were unaware of the study. Statistical analysis was based on a simple count of the presence or absence of complications in the medical record by researchers (the researchers were not blinded to the patient’s study assignment).
Results
Of the 47 patients, 3 moved out of the geographic area before delivery, and 2 patients (1 in each group) did not complete the research protocol and were not included in the research. Results were thus obtained for 22 patients in the hypnosis group and 20 in the control group, resulting in a total of 42 subjects. A two-tailed Fisher exact analysis at the .05 level was used to test for significance.
Only one patient in the hypnosis group had a hospital stay of more than 2 days compared with 8 patients in the control group (P=.008). None of the 22 patients in the hypnosis group experienced surgical intervention compared with 12 of the 20 patients in the control group (P=.000). Twelve patients in the hypnosis group experienced complications compared with 17 in the control group (P=.047). Although consistently fewer patients in the hypnosis group used anesthesia (10 vs 14), Pitocin (2 vs 6), or postpartum medication (7 vs 11), and fewer had infants admitted to the NICU (1 vs 5), statistical analysis was nonsignificant Figure 1, Figure 2.
Discussion
We focused on the educational preparation of the patient while in hypnosis to create the expectation of a normal labor and delivery, develop a conditioned response of comfort and confidence, and facilitate an increased sense of control in achieving a healthy delivery.
The subjects in the treatment group received a 4-session sequence of standard hypnotic interventions incorporating childbirth preparation information (ie, the hypnoreflexogenous method1,2,20) in which they were instructed in the methods and benefits of focused relaxation and imagery to increase the likelihood of a safe and relatively pain-free delivery. The sessions provided an opportunity to experience and practice hypnotic induction and deep relaxation. The suggestions directed toward the expectant mothers during the hypnotic state focused on the conceptualization of pregnancy and childbirth as a healthy natural process. Suggestions were also given to help the patient respond to possible complications, in the event they might occur.1 These suggestions were designed to increase the patient’s sense of trust in her physician and her confidence in her own ability to manage anxiety and discomfort. Hypnotic inductions also included ego-strengthening techniques and suggestions for a relatively discomfort-free delivery and suggestions for the application of the hypnotic techniques to other stressful periods in their lives. In each session the patients were given the opportunity to ask any questions of concern regarding the method or the pregnancy.
The main limitations of our study are the relatively small number of subjects and the fact that these patients were adolescent women, which affects the generalizability of results.
Future Research
Future research should involve a larger subject pool including adults, have a control group receiving traditional prenatal care with no added intervention, and provide an analysis of cost-saving benefits.
Conclusions
Our study provides support for the use of hypnosis to aid in preparation of obstetric patients for labor and delivery. The reduction of complications, surgery, and hospital stay show direct medical benefit to mother and child and suggest the potential for a corresponding cost-saving benefit.
Acknowledgments
We would like to acknowledge the pioneering workon the use of hypnosis in obstetrics by the late William Werner, MD, and express appreciation for his assistance in designing the intervention protocol. We would also like to thank Maury Nation, PhD, for his assistance with statistical analysis and Poorti Karve Riley, MD, for her comments on a previous version of this manuscript.
1. Schauble PG, Werner WEF, Rai SH, Martin A. Childbirth preparation through hypnosis: the hypnoreflexogenous protocol. Am J Clin Hypnosis 1998;40:273-83.
2. Werner WEF, Schauble PG, Knudson MS. An argument for the revival of hypnosis in obstetrics. Am J Clin Hypnosis 1982;24:149-71.
3. Dillenburger K, Keenan M. Obstetric hypnosis: an experience. Contemp Hypnosis 1996;13:202-04.
4. Baram DA. Hypnosis in reproductive health care: a review and case report. Birth 1995;22:37-42.
5. Goldman L. The use of hypnosis in obstetrics. Psychiatric Med 1992;10:59-67.
6. Harmon TM, Hynan MT, Tyre TE. Improved obstetric outcomes using hypnotic analgesia and skill mastery combined with childbirth education. J Consult Clin Psychol 1990;58:525-30.
7. Kroger WS. Hypnoanesthesia in obstertrics. In: Davis CH, ed. Gynecology and obstetrics. Hagerstown, Md: Harper & Row; 1960.
8. Mairs DAE. Hypnosis and pain in childbirth. Contemp Hypnosis 1995;12:111-18.
9. Hilgard ER, Hilgard JR. Hypnosis in the relief of pain. Revised ed. New York, NY: Brunner/Mazel; 1994.
10. Martin J. Hypnosis gains legitimacy, respect, in diverse clinical specialties. J Am Med Assoc 1983;249:319-21.
11. Oster MI. Psychological preparation for labor and delivery using hypnosis. Am J Clin Hypnosis 1994;37:12-21.
12. Cheek DB. The early use of psychotherapy in prevention of pre-term labor: the application of hypnosis and ideomotor techniques with women carrying twin pregnancies. Pre Peri Natal Psychol J 1995;10:5-19.
13. Omer H. A hypnotic relaxation technique for the treatment of premature labor. Am J Clin Hypnosis 1987;29:206-14.
14. Peterson G. Prenatal bonding, prenatal communication, and the prevention of prematurity. Pre Peri Natal Psychol J 1987;2:87-92.
15. Iancu I, Kotler M, Spivak B, Radwan M, Weizaman A. Psychiatric aspects of hyperemesis gravidarum. Psychother Psychosom 1994;61:143-49.
16. Torem MS. Hypnotherapeutic techniques in the treatment of hyperemesis gravidarum. Am J Clin Hypnosis 1994;37:1-11.
17. Smith CH. Acute pregnancy-associated hypertension treated with hypnosis: a case report. Am J Clin Hypnosis 1989;31:209-11.
18. Mehl LE. Hypnosis and conversion of the breech to the vertex presentation. Arch Fam Med 1994;3:881-87.
19. Jenkins MW, Pritchard MH. Hypnosis: practical applications and theoretical considerations in normal labor. Br J Obstet Gynecol 1993;100:221-26.
20. Roig-Garcia S. The hypnoreflexogenous method: a new procedure in obstetrical psychoanalgesia. Am J Clin Hypnosis 1961;4:14-21.
Hypnosis has been used to control pain during labor and delivery for more than a century, but the introduction of chemo-anesthesia and inhalation anesthesia during the late 19th century led to the decline of its use.1,2 Recently there has been a resurgence of this technique in obstetrics.3-7 Hypnotherapy has been found to be effective in providing pain relief,8,9 reducing the need for chemical anesthesia,8 and reducing anxiety, fear, and pain related to childbirth.1,2,7,10,11 Hypnosis has also been helpful in both managing various complications of pregnancy (such as premature labor5,12-14) and reducing the likelihood of premature labor and birth in high-risk patients.12 It has also has been effective in the treatment of hyperemesis gravidarum,15-16 acute hypertension associated with pregnancy17 and conversion of breech to vertex presentation.18
One promising application of hypnosis is in the area of labor and delivery.1,5,6,19 The use of hypnosis in preparing the patient for labor and delivery is based on the premise that such preparation reduces anxiety, improves pain tolerance (lowering the need for medication), reduces birth complications, and promotes a rapid recovery process.1,2,5 The key aspect of this treatment is involvement of the patient before labor begins, to promote her active participation and sense of control in the labor and delivery process. This is accomplished through educating the patient about this process and teaching her alternate ways to produce hypno-analgesia and anesthesia.1,2 Hypnotic preparation thus provides the expectant mother with a sense of control for managing her anxiety and physical discomfort.
Although there have been numerous reports suggesting the value of hypnosis in obstetrics, our study is one of the first to report a randomized controlled evaluation of childbirth preparation incorporating hypnotic techniques on labor processes and birth outcomes.
Methods
Our subjects were teenage patients (18 years or younger at the time of conception) who entered prenatal treatment with normal pregnancies at a Florida county public health department before the end of their 24th week. The clinic nursing director performed a chart review and identified 47 patients meeting the criteria. These patients were randomly assigned to either the treatment group or the control group. The treatment group received childbirth preparation in self-hypnosis that incorporated information on labor and delivery (the detailed protocol is described in a previous publication1). The control group received supportive counseling designed to control for interpersonal contact and social support and to provide an opportunity for discussion about pregnancy issues of concern to the patient. Patients in the treatment and control groups had the same number of visits.
We obtained institutional review board approval and informed consent from individual patients. The subjects were told that the study was an attempt to provide support for pregnant adolescents in addition to the routine prenatal care provided by the public health department and that they would be randomly assigned to 1 of the 2 groups, their intervention session would coincide with scheduled clinic appointments and would not interrupt their medical treatment in any way, and their participation was voluntary.
Both groups of patients received the standard prenatal treatment protocol from the medical staff, nurse practitioners, and hospital staff, all of whom were blind to group assignments. All patients were delivered at the local teaching hospital by obstetrics department staff who were blind to the study. The study interventions were begun with individual meetings with patients during regular clinic visits between 20 and 24 weeks’ gestation. Continuing clinic visits were scheduled for all patients on a biweekly basis, making the time span of the 4-session experimental conditions approximately 8 weeks. The study counselor (the primary author) provided hypnosis preparation training for the treatment group; a nurse midwife provided the supportive contact with the control group. Both interventions were completed before delivery; no prompting occurred during the labor and delivery process.
The 2 groups of patients were compared on medication use (Pitocin, anesthetic, and postpartum medication), complications and surgical intervention during delivery, and length of hospital stay for mothers and neonatal intensive care unit (NICU) admission for the infants. Complications fell into 36 categories of events (eg, multiple pregnancies, preeclampsia, vacuum-assisted delivery) that were entered in subjects’ records by obstetric staff who were unaware of the study. Statistical analysis was based on a simple count of the presence or absence of complications in the medical record by researchers (the researchers were not blinded to the patient’s study assignment).
Results
Of the 47 patients, 3 moved out of the geographic area before delivery, and 2 patients (1 in each group) did not complete the research protocol and were not included in the research. Results were thus obtained for 22 patients in the hypnosis group and 20 in the control group, resulting in a total of 42 subjects. A two-tailed Fisher exact analysis at the .05 level was used to test for significance.
Only one patient in the hypnosis group had a hospital stay of more than 2 days compared with 8 patients in the control group (P=.008). None of the 22 patients in the hypnosis group experienced surgical intervention compared with 12 of the 20 patients in the control group (P=.000). Twelve patients in the hypnosis group experienced complications compared with 17 in the control group (P=.047). Although consistently fewer patients in the hypnosis group used anesthesia (10 vs 14), Pitocin (2 vs 6), or postpartum medication (7 vs 11), and fewer had infants admitted to the NICU (1 vs 5), statistical analysis was nonsignificant Figure 1, Figure 2.
Discussion
We focused on the educational preparation of the patient while in hypnosis to create the expectation of a normal labor and delivery, develop a conditioned response of comfort and confidence, and facilitate an increased sense of control in achieving a healthy delivery.
The subjects in the treatment group received a 4-session sequence of standard hypnotic interventions incorporating childbirth preparation information (ie, the hypnoreflexogenous method1,2,20) in which they were instructed in the methods and benefits of focused relaxation and imagery to increase the likelihood of a safe and relatively pain-free delivery. The sessions provided an opportunity to experience and practice hypnotic induction and deep relaxation. The suggestions directed toward the expectant mothers during the hypnotic state focused on the conceptualization of pregnancy and childbirth as a healthy natural process. Suggestions were also given to help the patient respond to possible complications, in the event they might occur.1 These suggestions were designed to increase the patient’s sense of trust in her physician and her confidence in her own ability to manage anxiety and discomfort. Hypnotic inductions also included ego-strengthening techniques and suggestions for a relatively discomfort-free delivery and suggestions for the application of the hypnotic techniques to other stressful periods in their lives. In each session the patients were given the opportunity to ask any questions of concern regarding the method or the pregnancy.
The main limitations of our study are the relatively small number of subjects and the fact that these patients were adolescent women, which affects the generalizability of results.
Future Research
Future research should involve a larger subject pool including adults, have a control group receiving traditional prenatal care with no added intervention, and provide an analysis of cost-saving benefits.
Conclusions
Our study provides support for the use of hypnosis to aid in preparation of obstetric patients for labor and delivery. The reduction of complications, surgery, and hospital stay show direct medical benefit to mother and child and suggest the potential for a corresponding cost-saving benefit.
Acknowledgments
We would like to acknowledge the pioneering workon the use of hypnosis in obstetrics by the late William Werner, MD, and express appreciation for his assistance in designing the intervention protocol. We would also like to thank Maury Nation, PhD, for his assistance with statistical analysis and Poorti Karve Riley, MD, for her comments on a previous version of this manuscript.
Hypnosis has been used to control pain during labor and delivery for more than a century, but the introduction of chemo-anesthesia and inhalation anesthesia during the late 19th century led to the decline of its use.1,2 Recently there has been a resurgence of this technique in obstetrics.3-7 Hypnotherapy has been found to be effective in providing pain relief,8,9 reducing the need for chemical anesthesia,8 and reducing anxiety, fear, and pain related to childbirth.1,2,7,10,11 Hypnosis has also been helpful in both managing various complications of pregnancy (such as premature labor5,12-14) and reducing the likelihood of premature labor and birth in high-risk patients.12 It has also has been effective in the treatment of hyperemesis gravidarum,15-16 acute hypertension associated with pregnancy17 and conversion of breech to vertex presentation.18
One promising application of hypnosis is in the area of labor and delivery.1,5,6,19 The use of hypnosis in preparing the patient for labor and delivery is based on the premise that such preparation reduces anxiety, improves pain tolerance (lowering the need for medication), reduces birth complications, and promotes a rapid recovery process.1,2,5 The key aspect of this treatment is involvement of the patient before labor begins, to promote her active participation and sense of control in the labor and delivery process. This is accomplished through educating the patient about this process and teaching her alternate ways to produce hypno-analgesia and anesthesia.1,2 Hypnotic preparation thus provides the expectant mother with a sense of control for managing her anxiety and physical discomfort.
Although there have been numerous reports suggesting the value of hypnosis in obstetrics, our study is one of the first to report a randomized controlled evaluation of childbirth preparation incorporating hypnotic techniques on labor processes and birth outcomes.
Methods
Our subjects were teenage patients (18 years or younger at the time of conception) who entered prenatal treatment with normal pregnancies at a Florida county public health department before the end of their 24th week. The clinic nursing director performed a chart review and identified 47 patients meeting the criteria. These patients were randomly assigned to either the treatment group or the control group. The treatment group received childbirth preparation in self-hypnosis that incorporated information on labor and delivery (the detailed protocol is described in a previous publication1). The control group received supportive counseling designed to control for interpersonal contact and social support and to provide an opportunity for discussion about pregnancy issues of concern to the patient. Patients in the treatment and control groups had the same number of visits.
We obtained institutional review board approval and informed consent from individual patients. The subjects were told that the study was an attempt to provide support for pregnant adolescents in addition to the routine prenatal care provided by the public health department and that they would be randomly assigned to 1 of the 2 groups, their intervention session would coincide with scheduled clinic appointments and would not interrupt their medical treatment in any way, and their participation was voluntary.
Both groups of patients received the standard prenatal treatment protocol from the medical staff, nurse practitioners, and hospital staff, all of whom were blind to group assignments. All patients were delivered at the local teaching hospital by obstetrics department staff who were blind to the study. The study interventions were begun with individual meetings with patients during regular clinic visits between 20 and 24 weeks’ gestation. Continuing clinic visits were scheduled for all patients on a biweekly basis, making the time span of the 4-session experimental conditions approximately 8 weeks. The study counselor (the primary author) provided hypnosis preparation training for the treatment group; a nurse midwife provided the supportive contact with the control group. Both interventions were completed before delivery; no prompting occurred during the labor and delivery process.
The 2 groups of patients were compared on medication use (Pitocin, anesthetic, and postpartum medication), complications and surgical intervention during delivery, and length of hospital stay for mothers and neonatal intensive care unit (NICU) admission for the infants. Complications fell into 36 categories of events (eg, multiple pregnancies, preeclampsia, vacuum-assisted delivery) that were entered in subjects’ records by obstetric staff who were unaware of the study. Statistical analysis was based on a simple count of the presence or absence of complications in the medical record by researchers (the researchers were not blinded to the patient’s study assignment).
Results
Of the 47 patients, 3 moved out of the geographic area before delivery, and 2 patients (1 in each group) did not complete the research protocol and were not included in the research. Results were thus obtained for 22 patients in the hypnosis group and 20 in the control group, resulting in a total of 42 subjects. A two-tailed Fisher exact analysis at the .05 level was used to test for significance.
Only one patient in the hypnosis group had a hospital stay of more than 2 days compared with 8 patients in the control group (P=.008). None of the 22 patients in the hypnosis group experienced surgical intervention compared with 12 of the 20 patients in the control group (P=.000). Twelve patients in the hypnosis group experienced complications compared with 17 in the control group (P=.047). Although consistently fewer patients in the hypnosis group used anesthesia (10 vs 14), Pitocin (2 vs 6), or postpartum medication (7 vs 11), and fewer had infants admitted to the NICU (1 vs 5), statistical analysis was nonsignificant Figure 1, Figure 2.
Discussion
We focused on the educational preparation of the patient while in hypnosis to create the expectation of a normal labor and delivery, develop a conditioned response of comfort and confidence, and facilitate an increased sense of control in achieving a healthy delivery.
The subjects in the treatment group received a 4-session sequence of standard hypnotic interventions incorporating childbirth preparation information (ie, the hypnoreflexogenous method1,2,20) in which they were instructed in the methods and benefits of focused relaxation and imagery to increase the likelihood of a safe and relatively pain-free delivery. The sessions provided an opportunity to experience and practice hypnotic induction and deep relaxation. The suggestions directed toward the expectant mothers during the hypnotic state focused on the conceptualization of pregnancy and childbirth as a healthy natural process. Suggestions were also given to help the patient respond to possible complications, in the event they might occur.1 These suggestions were designed to increase the patient’s sense of trust in her physician and her confidence in her own ability to manage anxiety and discomfort. Hypnotic inductions also included ego-strengthening techniques and suggestions for a relatively discomfort-free delivery and suggestions for the application of the hypnotic techniques to other stressful periods in their lives. In each session the patients were given the opportunity to ask any questions of concern regarding the method or the pregnancy.
The main limitations of our study are the relatively small number of subjects and the fact that these patients were adolescent women, which affects the generalizability of results.
Future Research
Future research should involve a larger subject pool including adults, have a control group receiving traditional prenatal care with no added intervention, and provide an analysis of cost-saving benefits.
Conclusions
Our study provides support for the use of hypnosis to aid in preparation of obstetric patients for labor and delivery. The reduction of complications, surgery, and hospital stay show direct medical benefit to mother and child and suggest the potential for a corresponding cost-saving benefit.
Acknowledgments
We would like to acknowledge the pioneering workon the use of hypnosis in obstetrics by the late William Werner, MD, and express appreciation for his assistance in designing the intervention protocol. We would also like to thank Maury Nation, PhD, for his assistance with statistical analysis and Poorti Karve Riley, MD, for her comments on a previous version of this manuscript.
1. Schauble PG, Werner WEF, Rai SH, Martin A. Childbirth preparation through hypnosis: the hypnoreflexogenous protocol. Am J Clin Hypnosis 1998;40:273-83.
2. Werner WEF, Schauble PG, Knudson MS. An argument for the revival of hypnosis in obstetrics. Am J Clin Hypnosis 1982;24:149-71.
3. Dillenburger K, Keenan M. Obstetric hypnosis: an experience. Contemp Hypnosis 1996;13:202-04.
4. Baram DA. Hypnosis in reproductive health care: a review and case report. Birth 1995;22:37-42.
5. Goldman L. The use of hypnosis in obstetrics. Psychiatric Med 1992;10:59-67.
6. Harmon TM, Hynan MT, Tyre TE. Improved obstetric outcomes using hypnotic analgesia and skill mastery combined with childbirth education. J Consult Clin Psychol 1990;58:525-30.
7. Kroger WS. Hypnoanesthesia in obstertrics. In: Davis CH, ed. Gynecology and obstetrics. Hagerstown, Md: Harper & Row; 1960.
8. Mairs DAE. Hypnosis and pain in childbirth. Contemp Hypnosis 1995;12:111-18.
9. Hilgard ER, Hilgard JR. Hypnosis in the relief of pain. Revised ed. New York, NY: Brunner/Mazel; 1994.
10. Martin J. Hypnosis gains legitimacy, respect, in diverse clinical specialties. J Am Med Assoc 1983;249:319-21.
11. Oster MI. Psychological preparation for labor and delivery using hypnosis. Am J Clin Hypnosis 1994;37:12-21.
12. Cheek DB. The early use of psychotherapy in prevention of pre-term labor: the application of hypnosis and ideomotor techniques with women carrying twin pregnancies. Pre Peri Natal Psychol J 1995;10:5-19.
13. Omer H. A hypnotic relaxation technique for the treatment of premature labor. Am J Clin Hypnosis 1987;29:206-14.
14. Peterson G. Prenatal bonding, prenatal communication, and the prevention of prematurity. Pre Peri Natal Psychol J 1987;2:87-92.
15. Iancu I, Kotler M, Spivak B, Radwan M, Weizaman A. Psychiatric aspects of hyperemesis gravidarum. Psychother Psychosom 1994;61:143-49.
16. Torem MS. Hypnotherapeutic techniques in the treatment of hyperemesis gravidarum. Am J Clin Hypnosis 1994;37:1-11.
17. Smith CH. Acute pregnancy-associated hypertension treated with hypnosis: a case report. Am J Clin Hypnosis 1989;31:209-11.
18. Mehl LE. Hypnosis and conversion of the breech to the vertex presentation. Arch Fam Med 1994;3:881-87.
19. Jenkins MW, Pritchard MH. Hypnosis: practical applications and theoretical considerations in normal labor. Br J Obstet Gynecol 1993;100:221-26.
20. Roig-Garcia S. The hypnoreflexogenous method: a new procedure in obstetrical psychoanalgesia. Am J Clin Hypnosis 1961;4:14-21.
1. Schauble PG, Werner WEF, Rai SH, Martin A. Childbirth preparation through hypnosis: the hypnoreflexogenous protocol. Am J Clin Hypnosis 1998;40:273-83.
2. Werner WEF, Schauble PG, Knudson MS. An argument for the revival of hypnosis in obstetrics. Am J Clin Hypnosis 1982;24:149-71.
3. Dillenburger K, Keenan M. Obstetric hypnosis: an experience. Contemp Hypnosis 1996;13:202-04.
4. Baram DA. Hypnosis in reproductive health care: a review and case report. Birth 1995;22:37-42.
5. Goldman L. The use of hypnosis in obstetrics. Psychiatric Med 1992;10:59-67.
6. Harmon TM, Hynan MT, Tyre TE. Improved obstetric outcomes using hypnotic analgesia and skill mastery combined with childbirth education. J Consult Clin Psychol 1990;58:525-30.
7. Kroger WS. Hypnoanesthesia in obstertrics. In: Davis CH, ed. Gynecology and obstetrics. Hagerstown, Md: Harper & Row; 1960.
8. Mairs DAE. Hypnosis and pain in childbirth. Contemp Hypnosis 1995;12:111-18.
9. Hilgard ER, Hilgard JR. Hypnosis in the relief of pain. Revised ed. New York, NY: Brunner/Mazel; 1994.
10. Martin J. Hypnosis gains legitimacy, respect, in diverse clinical specialties. J Am Med Assoc 1983;249:319-21.
11. Oster MI. Psychological preparation for labor and delivery using hypnosis. Am J Clin Hypnosis 1994;37:12-21.
12. Cheek DB. The early use of psychotherapy in prevention of pre-term labor: the application of hypnosis and ideomotor techniques with women carrying twin pregnancies. Pre Peri Natal Psychol J 1995;10:5-19.
13. Omer H. A hypnotic relaxation technique for the treatment of premature labor. Am J Clin Hypnosis 1987;29:206-14.
14. Peterson G. Prenatal bonding, prenatal communication, and the prevention of prematurity. Pre Peri Natal Psychol J 1987;2:87-92.
15. Iancu I, Kotler M, Spivak B, Radwan M, Weizaman A. Psychiatric aspects of hyperemesis gravidarum. Psychother Psychosom 1994;61:143-49.
16. Torem MS. Hypnotherapeutic techniques in the treatment of hyperemesis gravidarum. Am J Clin Hypnosis 1994;37:1-11.
17. Smith CH. Acute pregnancy-associated hypertension treated with hypnosis: a case report. Am J Clin Hypnosis 1989;31:209-11.
18. Mehl LE. Hypnosis and conversion of the breech to the vertex presentation. Arch Fam Med 1994;3:881-87.
19. Jenkins MW, Pritchard MH. Hypnosis: practical applications and theoretical considerations in normal labor. Br J Obstet Gynecol 1993;100:221-26.
20. Roig-Garcia S. The hypnoreflexogenous method: a new procedure in obstetrical psychoanalgesia. Am J Clin Hypnosis 1961;4:14-21.
Conducting The Direct Observation of Primary Care Study Insights from the Process of Conducting Multimethod Transdisciplinary Research in Community Practice
CONSENSUS PROCESS: The study participants (academic investigators, clinicians, and research nurses) met in groups. By reflecting on the study process, these groups identified insights that may be useful to other investigators planning or conducting primary care research.
LESSONS: The story of the DOPC study is one of collaboration leading to innovation and the development of ongoing relationships and a persistent research trajectory. Six factors were identified as important to the success of the primary care research process: (1) A generalist perspective; (2) involvement of community practices and practicing clinicians as research partners; (3) commitment to a transdisciplinary team process; (4) a multimethod approach; (5) openness to emerging insights; and (6) thinking big, but starting small.
CONCLUSIONS: A multimethod research process that involves collaboration between practicing clinicians, methodologists, and content experts can simultaneously test a priori hypotheses and discover important new insights about primary care practice.
The Direct Observation of Primary Care (DOPC) Study has contributed to the understanding of family practice and has fostered the development of new primary care research methods1-6 and theoretical perspectives.7-10 The study’s findings have important implications for improving patient care11-23 and developing policies9,24-31 that maximize the impact of a generalist patient-centered approach toward the health of individuals, families, and communities.9 This study has spawned a large portfolio of related inquiry, including an in-depth qualitative study of family practices, a multimethod community practice intervention trial, and a new family practice research center.
The DOPC Study story represents a unique confluence of ideas, people, and opportunities. However, we hope that readers might glean insights relevant to their lines of inquiry and that this article will stimulate the continued development of a unique primary care and family practice research agenda.*
The Dopc Story
Concept Development
In 1988, family practice researchers Kurt Stange and Stephen Zyzanski collaborated on a paper about the benefits of integrating quantitative and qualitative research methods.32 While writing a second manuscript on the topic, they invited Benjamin Crabtree and William Miller (emerging experts on the application of qualitative research methods to primary care) to collaborate.33
At the same time, a group of family physicians and researchers affiliated with Case Western Reserve University in Cleveland, Ohio, was attempting to develop an innovative approach to improving clinical preventive service delivery in practice. With a grant from the Ohio Academy of Family Physicians Foundation, they conducted a survey on family physician agreement with United States Preventive Services Task Force recommendations.34-36 The survey findings, conversations with respondents, and a review of the literature led to the conclusion that current approaches to improving clinical preventive service delivery were limited by a lack of understanding of the true nature of family practice37 and that efforts to improve practice should be preceded by efforts to understand practice.38,39 Before designing an intervention study, further insight into the “black box” of real world family practice was needed.2,40
The research group, which now included family practice academicians, clinicians, and methodologists, began exploring the development of a research network backward from the traditionally successful models used by the Ambulatory Sentinel Practice Network (ASPN) and other networks.41-44 Rather than developing the infrastructure around a network of clinicians that performs research by gathering data, this new network was developed around a large descriptive study of the content and context of family practice. Funding for a specific study would be easier to obtain than start-up costs for a practice-based research infrastructure. Also, given the demands of clinical practice,45 a well-funded study in a regional network could collect more extensive data than a study conducted by individual practices, providing opportunities for spinoff studies and other clinician-initiated inquiries. The research team decided to explore research opportunities with national funding agencies, with close communications and extensive input from local practicing family physicians.
During this time, a series of primary care research conferences sponsored by the Agency for Health Care Policy and Research (AHCPR) provided fertile ground for exploring research ideas and methods from a multidisciplinary perspective. At one conference, research team member Carlos Jaén (an epidemiologist and family physician), team leader Kurt Stange, and Paul Nutting (director of primary care research at AHCPR at the time) discussed the research network and project. The recognition emerged that many worthwhile primary care activities, including preventive service delivery, are not carried out during patient visits because of the competing demands imposed by other activities. This competing demands mode17 of preventive service delivery and primary care provided an important initial theoretical framework for what would become the DOPC Study.
Research Design
The research team began refining study questions and developing methods. A critical event occurred during a discussion of methods for measuring the content of outpatient family practice visits. Jason Chao, a family practice academician, enumerated these methods:“…chart review, patient questionnaire, billing data. One could do direct observation, but you can’t do that.” As everyone nodded agreement, his colleague Robert Kelly interrupted, “Why not? Why can’t you do direct observation?” The group listed many good reasons: intrusiveness, unacceptability to patients and clinicians, expense, and the potential to bias behavior. However, the question “Why not?” remained and created a shared sense that direct observation of real world family practices represented an opportunity to make a unique contribution. The group decided to include direct observation as a major measurement technique and to add a methodologic goal of establishing the validity and reliability of nonobservational techniques for assessing the content of outpatient medical practice. An additional advance occurred with the publication of the Davis Observation Code (DOC)46 that classified patient visits into 20 different behavioral codes measured in 15-second intervals. Lead author Edward Callahan agreed to become a collaborator.
Limited existing research on the content of community primary care practice meant that the group would have difficulty in anticipating all content areas worth measuring and questions worth asking before immersing themselves in community family practice settings. Therefore, Drs Crabtree and Miller were asked to join the team to design a multimethod approach that integrated quantitative and qualitative methods.47 Project design was pursued further in research team meetings, telephone conversations, and interactions with out-of-town collaborators during national professional meetings. These face-to-face meetings were essential to developing the trust, communication, and shared vision necessary for a transdisciplinary multimethod study.
Conversations with local family physicians soon revealed that preventive service delivery, although an important aspect of family practice, was not a sufficiently compelling research question to engage a new practice-based research network. A broader focus, such as the content of family practice, would engage the largest number of clinicians and be less likely to bias clinician behavior during direct observation. At the suggestion of practicing family physician Michael Rabovsky, in whose office the protocol was being pilot tested, the study was expanded to address the Medicare Resource-Based Relative Value System (RBRVS)-based billing system. Health economist Daniel Dunn, who helped develop the RBRVS,48 was invited to participate.
Based on discussions with practicing family physicians, a strategy was developed for recruiting practice-based research network members. Members of the Ohio Academy of Family Physicians (OAFP) in Northeast Ohio were targeted to facilitate easy meeting of practices and travel of study teams to practice sites.45 A letter describing the study and proposed network was sent to all 531 OAFP active members in the area. A total of 138 physicians responded and formed the fledgling Research Association of Practicing Physicians (RAPP) network. A working relationship was established with the NorthEast Ohio Network (NEON), a practice-based research network of 6 community residency training sites affiliated with the NorthEast Ohio Universities Colleges of Medicine, directed by William Gillanders (and later Valerie Gilchrist). NEON physicians were trained in National Ambulatory Medical Care Survey (NAMCS)49 data collection techniques and provided the opportunity to evaluate the validity of the NAMCS methods compared with direct observation. These development activities were supported by considerable in-kind contributions of investigator time from the participants’ institutions.
Pursuit of Funding
A research concept paper was sent to the AHCPR for feedback. The response indicated that intervention studies were more compatible with funding priorities than the proposed observational study. The critique also pointed out “fatal flaws” engendered by direct observation methods and expressed skepticism that community physicians would allow such observation of their practices. These concerns were addressed with pilot data and a strengthened argument about the need for efforts to understand practice before attempting to change it. An investigator-initiated (R01) grant application was submitted to the AHCPR. A secondary assignment to the National Cancer Institute (NCI) was requested because of the clinical preventive service delivery focus and the important potential of understanding family practice and competing demands for the subsequent design of interventions to enhance cancer prevention and control.
The initial application was favorably reviewed and received a priority score near the funding line. In response to advice from NCI and AHCPR program officers, the research team allowed the application to be considered for funding during 3 upcoming NCI council meetings. Regular letters to research network members kept them informed of the funding status. After 1 year of narrowly missing the funding line, the grant application was revised and resubmitted in response to the scientific review committee’s critique, with increased emphasis on the implications of the study for cancer prevention and control. It was funded by the NCI, with an additional grant from the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program, to develop communication and clinician-initiated research in RAPP and additional methodologic and descriptive aims.
Planning and Conduct of Fieldwork
In 1994, more than 3 years after the idea was conceived, the board of directors of the 138-member research network (RAPP) was formally activated. The board’s 14 network volunteer members, several of whom helped develop the study and the network, were active in planning the practical implementation and refinement of the study protocol. When a board member suggested that they review the details of the direct observation measures before the study, the board concluded that as study participants they should not be involved in planning study measures, to avoid biasing their behavior during direct observation.
With funding, 2 logistic aspects of the study gained importance. First, it had been 2 years since the research network was formed. Whether physicians had retained their commitment to participate was unknown. That concern was laid to rest, however, when the vast majority of physicians expressed continued interest in the study. In retrospect, the 2-year struggle to obtain funding helped bond the network and create a sense of ownership and allegiance to the project.
The second major logistic issue was the need to recruit 8 research nurses. Job requirements included excellent interpersonal skills, sensitivity to the demands of real world community family practice, attention to detail in collecting reliable and valid quantitative data from multiple measures, an open-minded observational ability to simultaneously collect qualitative data, willingness to drive to multiple and sometimes distant sites, and interest in a 1-year job at a university salary. Hiring 8 nurses who could meet these requirements and start the 2-month training process on the same date seemed unrealistic at best. Yet, because of word of mouth advertising, the excitement generated by the study, the recent termination of another research project at the university, and the excellent reputation of the department, 8 highly qualified research nurses were found.
During their 8-week training, the research nurses were enlisted as true partners. They helped refine the research protocol and instruments, and items were added to the measures to reflect their interests. Using videotaped encounters, Dr Callahan instructed the nurses in applying the DOC, and they took the lead in adapting it for the study. As the immensity of the quantitative data collection requirements grew, Drs Miller and Crabtree scaled back the qualitative data collection protocol. They trained the research nurses in observational techniques and in how to dictate ethnographic field notes to record unanticipated findings, provide rich descriptions of quantitatively measured variables, and critique the study methods’ accuracy in capturing the phenomena under study.50
Details of the data collection procedure have been reported elsewhere.1,2 Briefly, teams of 2 research nurses spent 1 day observing patient care by the 138 participating RAPP members. One nurse obtained verbal informed consent from patients in the waiting room and distributed patient exit questionnaires. The other nurse accompanied the physician, directly observing consecutive visits by consenting patients and recording observations using the DOC and a direct observation checklist. Typically, the nurses exchanged roles after lunch. They returned on a subsequent day to perform medical record reviews for each other’s observed visits and to collect billing data. On the basis of observation and brief interviews with key informants, the research nurses completed a practice environment checklist. They dictated ethnographic field notes immediately after leaving the practice.10
During the course of the fieldwork, research team meetings were held every other week to coordinate logistics and assess and recalibrate inter-rater reliability using videotaped visits and medical records that were not part of the larger study. The high degree of inter-rater reliability achieved with this approach has been reported previously.1
After data were collected from each physician, the board of directors met to review study progress and reassess the study protocol. The academic research team, including all consultants, also met to refine the protocol and plan the second round of physician visits. Initial plans called for ongoing analysis of the ethnographic field notes, but this proved to be infeasible because of their large volume and the study demands. However, at the study midpoint, Drs Crabtree and Miller independently analyzed the field notes using an immersion crystallization technique.51 Based on the richness of the information, they developed a template52 for gathering field notes during the second round of physician visits.
Data collection procedures were repeated, and each physician was visited a second time. The 4 months (on average) between visits helped assure that seasonal variations in health problems did not unduly affect the characterization of patient care. After the second data collection visit, physicians completed a detailed questionnaire.
Data Analysis and Production of Scholarly Output
The data were entered by optical scanner and manually verified. Quantitative data analyses were performed by Cleveland research team members in response to the initial research aims and additional questions raised by the research team and research network board. Qualitative data analyses were subcontracted to the University of Nebraska, with additional grant support from the American Academy of Family Physicians for more in-depth analyses.
Multiple papers were begun with diverse lead authorship. Preference in determining paper topics was given to methodologic manuscripts, topics with timely policy implications, and papers for which individual team members had a particular passion. In response to a call from the editorial office of JFP, a proposal for a theme issue on the DOPC Study was made and accepted. The opportunity to publish early scholarly output in one place greatly increased the potential for papers on diverse topics that would help cohesively describe several aspects of the value of family practice. The deadline for the theme issue also made the paper writing a high priority. Of 14 manuscripts accepted after going through peer review, 10 were included in the May 1998 issue of JFP,2,8-11,14,15,17,18,25,40 with one paper published in each of 4 subsequent issues.13,16,26,27 Other analyses and papers have focused on the original research themes, new topics, more complex analyses, and expansion into the non–family practice literature.
Opportunities to propose paper topics have been extended to all study participants, including the academic research team, consultants, and RAPP members. Proposed topics are reviewed for feasibility and potential conflicts with other papers. The data set has spawned 2 masters theses16,19 and one doctoral dissertation3,4,12,24 and has led to new collaborations with complementary content experts.
Related Research Initiatives
Concurrent with the DOPC study, Dr Crabtree and his colleagues in Nebraska conducted a series of related inquiries.3-5,10,12,24,39,53-56 These studies have provided complementary information and advanced multimethod approaches for studying primary care practice. Close collaboration and open information sharing among the research teams and collaborators have greatly facilitated the discovery of new methods and insights into family practice and have furthered the research trajectory of the collaborating groups. These collaborations spawned the Center for Research in Family Practice and Primary Care, a multisite consortium funded by the American Academy of Family Physicians.
DOPC Study collaborations have led to other research initiatives as well. For example, a desire for more in-depth qualitative data led to a comparative case study of a smaller number of purposively selected practices in Nebraska, funded by the AHCPR with Dr Crabtree as principal investigator. In addition, after reviewing the initial findings of the first round of DOPC data, the RAPP board of directors developed a study of competing demands outside the examination room, which has led to related inquires.
Based on emerging insights from the DOPC Study on the competing demands of family practice, a competing continuation application was funded by the NCI for a trial to improve clinical preventive service delivery. The Study to Enhance Prevention by Understanding Practice (STEP-UP) was developed with input from the research team and the RAPP board of directors, with collaboration from family practice researchers at Dartmouth, led by Allen Dietrich.57,58 Building on complexity theory–based insights from the DOPC Study,8 STEP-UP uses a multimethod practice assessment to understand the unique attributes of family practices and tailor intervention strategies. This approach increased preventive service delivery rates59 and led to a more comprehensive assessment and improvement strategy that is being evaluated in the delayed intervention group. The participants include DOPC Study practices and new RAPP members.
Continuing efforts to develop the RAPP network have included free continuing medical education conferences for participants in practice-based research and quality improvement projects. An ongoing research network newsletter periodically publishes a 1-page Research Prospectus Worksheet* to encourage research ideas from RAPP members. The Cleveland research team provides rapid turnaround methodological consultation for study proposals, and those involving multiple practices are reviewed by the RAPP board of directors. In addition, RAPP members are encouraged to serve as authors on DOPC papers, and approximately half have provided internal peer review before submission of papers.
Several RAPP members have received external funding for their own research projects. These include studies of causes of bilateral leg edema in family practice,60,61 an evaluation of a family-centered approach to diagnosis and treatment of respiratory infection, a clinical trial of therapeutic touch for carpal tunnel syndrome, and development of practical new methods for community-oriented primary care.62 A recent RAPP study, in collaboration with the NEON network, used a card study methodology to describe the “oh, by the way” phenomenon in which patients raise issues after the clinician thinks the outpatient visit is finished. In addition, the discovery of high rates of care of a secondary patient11 in the DOPC study led to an ASPN card study to elucidate the content of care provided to family members other than the identified patient for an outpatient visit.63 An additional ASPN collaboration, using the Components of Primary Care Instrument3,4 that was developed as part of the DOPC Study, examined the effect of different aspects of managed care on the delivery of 10 elements of quality primary care.64
Lessons learned from the dopc process
Some of the lessons learned from the process of conducting the DOPC study are summarized in the Table 1. These lessons can be grouped into 6 categories, as follows.
A generalist perspective. A generalist perspective that places research questions in the context of the competing opportunities and complexity of family practice is needed for true family practice and primary care research.65 Although this perspective is essential if we are to diminish the current chasm between discovery and practice, it has not been supported by those who fund research. One strategy for addressing this funding issue is to identify topics and multimethod approaches that allow simultaneous pursuit of both categorical and generalist perspectives.
Involvement. The involvement of community practices and practicing clinicians as partners is essential for research about primary care practice.66, 67 New knowledge from discoveries in the settings in which most people get most of their medical care will help end the dichotomy between research and dissemination. Practice-based research networks can help bridge this gap by asking and answering questions from the perspective and setting in which the findings will be applied.68,69 (It is worth noting, however, that most successful research networks are built around a group of clinicians who are committed to conducting research in their practices. Developing a network around a particular study, as with the RAPP network, requires attention to fostering clinician ideas and nurturing relationships that extend beyond the initial study.) Greater involvement of nonclinician health care professionals, patients, and communities can also increase the relevance of research to meet the population’s health care needs.67,70
Transdisciplinary team process. A transdisciplinary team process in which diverse specialized expertise is integrated toward a common goal can be a tremendous resource for innovation and productivity. Development of a transdisciplinary team is a long-term process that requires trust, shared vision, open leadership, idea sharing, and group meetings. In addition, team members with particular expertise must be willing to commit to creating new knowledge that transcends their disciplinary perspectives.71 Such collaboration creates the mentality of a bigger pie in which the size of each participants’ piece is increased, rather than a mentality of finite resources in which a bigger piece for one member creates a shortage for another.72
Multimethodology. A multimethod approach in which quantitative and qualitative methods are integrated creates the opportunity to generate new methods, assure rigor, and maximize the efficiency of new discovery.6,32,33,47 Multimethod approaches allow testing of a priori hypotheses while creating new understanding.
Openness. Openness to emerging insights is fostered by the generalist perspective, by participatory multimethod research approaches, and by building the project from pilot data and knowledge of previous work. In the DOPC study, openness to new methods led to the “Eureka!” moment of deciding to do direct observation. The involvement of clinician and nurse perspectives in study design and conduct and the inductive use of qualitative data to discover the relevance of complexity science to understanding and enhancing primary care practice also reflected the study’s openness to new approaches.
Thinking big, but starting small. This creates a larger vision that can guide and inspire individual decisions and creates an overall research trajectory built on incremental steps. The DOPC Study began with a large idea of improving practice. Grounding in real world practice led to development of innovative new methods to try to understand primary care practice and ongoing efforts to improve practice. These major undertakings, however, were built on a foundation of small pilot studies and multiple interactions among researchers and practicing family physicians.
Applying these insights to other studies may help to advance the generation of new knowledge about family practice and primary care.73
Acknowledgments
This research was supported by grants from the National Cancer Institute (1R01 CA60962, 2R01 CA60962 and K24 CA81931), the Agency for Health Care Policy and Research (1R01 HS08776), the Ohio Academy of Family Physicians, the American Academy of Family Physicians, Generalist Physician Faculty Scholar Awards to Drs Stange and Jaén from the Robert Wood Johnson Foundation, and a Family Practice Research Center Grant from the American Academy of Family Physicians. The authors are grateful to the RAPP physicians, other clinicians, office staffs, and patients, without whose participation our study would not have been possible. We are also indebted to the many people who have participated and continue to participate in the genesis of related ideas and scholarly output that continues to emerge from the original study. Members of the DOPC Writing group also include: Authors from the Academic Research Team: Stephen J. Zyzanski, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Benjamin F. Crabtree, PhD, Department of Family Medicine, UMDNJ-RWJ Medical School, New Brunswick, NJ; William L. Miller, MD, MA, Department of Family Practice, Lehigh Valley Hospital, Allentown, Pa; Carlos Roberto Jaén, MD, PhD, Center for Urban Research in Primary Care, SUNY, Buffalo, NY; Susan A. Flocke, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Robert B. Kelly, MD, MS, Department of Family Practice, MetroHealth Medical Center, Cleveland, Ohio; William R. Gillanders, MD, Family Practice Residency Program, Sutter Health, Sacramento, Calif; Valerie Gilchrist, MD, Department of Family Practice, NorthEast Ohio Universities College of Medicine, Rootstown, Ohio; Jason Chao, MD, MS, Department of Family Medicine, Case Western Reserve University; J. Christopher Shank, MD, Methodist/Indiana University Family Practice Residency, Indianapolis, Ind; Daniel L. Dunn, PhD, Integrated Health Care Information Service, Cambridge, Mass; Jack H. Medalie, MD, MPH, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Doreen Langa, BA, American University School of Law, Washington, DC; Virginia Aita, PhD, Department of Family Practice, University of Nebraska Medical Center, Omaha; Meredith A. Goodwin, MS, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; and Robin S. Gotler, MA, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio. Research Nurse Team Authors: Lisa B. Ballou, RN, FNP; Catherine M. Corrigan, RN; Luzmaria Jaén, RN; Sherry Patzke, RN; Frances F. Powers, RN; Kathleen L. Schneeberger, RN; Kelly Warner, RN; and Susan Zronek, RN. Authors from the RAPP Board of Directors: Robert Blankfield, MD; Henry Bloom, MD; Valerie Gilchrist, MD; Gwen Haas, MD; Patricia Kellner, MD; Sa Koo Lee, MD; Conrad Lindes, MD; Dennis McCluskey, MD; Thomas Mettee, MD; Albert Miller, MD; Michael Rabovsky, MD; and Archie Wilkinson, MD.
1. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patient visits. Med Care 1998;36:851-67.
2. Stange KC, Zyzanski SJ, Flocke SA, et al. Illuminating the black box: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
3. Flocke SA. Measuring attributes of primary care: development of a new instrument. J Fam Pract 1997;45:64-74.
4. Flocke SA. Primary care instrument. J Fam Pract 1998;46:12.-
5. McIlvain H, Crabtree BF, Medder J, et al. Using ‘practice genograms’ to understand and describe practice configurations. Fam Med 1998;30:490-96.
6. Crabtree BF, Miller WL. Doing qualitative research. 2nd ed. Newbury Park, Calif: Sage Publications; 1999.
7. Jaén CR, Stange KC, Nutting PA. The competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
8. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
9. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
10. Crabtree BF, Miller WL, Aita V, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
11. Flocke SA, Goodwin MA, Stange KC. The effect of a secondary patient on the family practice visit. J Fam Pract 1998;46:429-34.
12. Flocke SA, Stange KC, Zyzanski SJ. The association of attributes of primary care with preventive service delivery. Med Care 1998;36:AS21-30.
13. Flocke SA, Stange KC, Goodwin MA. Patient and visit characteristics associated with opportunistic preventive services delivery. J Fam Pract 1998;47:202-08.
14. Medalie JH, Zyzanski SJ, Langa DM, Stange KC. The family in family practice: is it a reality? Results of a multi-faceted study. J Fam Pract 1998;46:390-96.
15. Callahan EJ, Jaén CR, Goodwin MA, Crabtree BF, Stange KC. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
16. Gross DA, Stange KC, Zyzanski SJ, Cebul R, Borawski E. Patient satisfaction with time spent with them by their family physician. J Fam Pract 1998;46:133-37.
17. Jaén CR, Crabtree BF, Zyzanski SJ, Stange KC. Making time for tobacco cessation counseling. J Fam Pract 1998;46:425-28.
18. Stange KC, Flocke SA, Goodwin MA. Opportunistic preventive service delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
19. Goodwin MA, Flocke SA, Borawski EA, Zyzanski SJ, Stange KC. Direct observation of preventive service delivery to adolescents seen in community family practice. Arch Pediatr Adolesc Med 1999;153:367-73.
20. Podl T, Goodwin MA, Kikano GE, Stange KC. Direct observation of exercise counseling in community family practice. Am J Prev Med 1999;17:207-10.
21. Medalie JH, Zyzanski SJ, Goodwin MA, Stange KC. Two physician styles of focusing on the family: their relationship to patient outcomes and process of care. J Fam Pract 2000;49:209-15.
22. Stange KC, Flocke SA, Goodwin MA, Kelly R, Zyzanski SJ. Direct observation of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
23. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray T, Stange KC. Facilitating participatory decision-making: what happens in real-world community family practice? Med Care 2000;38:1200-09.
24. Flocke SA, Stange KC, Zyzanski SJ. The impact of insurance type and forced discontinuity on the delivery of primary care. J Fam Pract 1997;45:129-35.
25. Zyzanski SJ, Langa DM, Flocke SA, Stange KC. Trade-offs in high volume primary care practice. J Fam Pract 1998;46:397-402.
26. Chao J, Gillanders WR, Goodwin MA, Stange KC. Billing for physician services: a comparison of actual billing with CPT codes assigned by direct observation. J Fam Pract 1998;47:28-32.
27. Kikano GE, Goodwin MA, Stange KC. Physician employment status and patterns of care. J Fam Pract 1998;46:499-505.
28. Frank SH, Stange KC, Langa DM, Workings M. Direct observation of community-based ambulatory encounters involving medical students. JAMA 1997;278:712-16.
29. Kikano GE, Goodwin MA, Stange KC. Evaluation and management services: a comparison of medical record documentation with actual billing in community family practice. Arch Fam Med 2000;9:68-71.
30. Aita VA, Crabtree BF. Historical reflections on current preventive practice. Prev Med 2000;30:5-16.
31. Acheson LS, Goodwin MA, Wiesner G, Stange KC. Familial screening for cancer risk by community family physicians. Genetic Med 2000;2:180-85.
32. Stange KC, Zyzanski SJ. The integrated use of quantitative and qualitative research methods. Fam Med 1989;21:448-51.
33. Stange KC, Miller WL, Crabtree BF, O’Connor P, Zyzanski SJ. Multimethod research: approaches for integrating qualitative and quantitative methods. J Gen Intern Med 1994;9:278-82.
34. Stange KC, Kelly RB, Chao JC, et al. Physician agreement with US Preventive Services Task Force recommendations. J Fam Pract 1991;34:409-16.
35. Zyzanski SJ, Stange KC, Kelly RB, et al. Family physicians’ disagreements with the US Preventive Services Task Force recommendations. J Fam Pract 1994;39:140-47.
36. Flocke SA, Stange KC, Fedirko T. Dissemination of the US preventive service task force guidelines. Arch Fam Med 1994;3:1006-08.
37. Stange K. Engaging providers and patients in prevention program design, implementation and operation. In: St. Peter R, Heiser N, eds. Delivering women’s preventive services under managed care. Washington, DC: Mathematica Policy Research, Inc; 1996;19-25.
38. Stange KC. “One size doesn’t fit all”: multimethod research yields new insights into interventions to improve preventive service delivery in family practice. J Fam Pract 1996;43:358-60.
39. McVea K, Crabtree BF, Medder JD, Susman JL, Lukas L, McIlvain HE, Davis CM, Gilbert CS, Hawver M. An ounce of prevention? Evaluation of the put prevention into practice program. J Fam Pract 1996;43:361-69.
40. Nutting PA. New knowledge, new tools: a look inside the ‘black box’ of family practice. J Fam Pract 1998;46:361.-
41. Green LA, Hames CG, Nutting PA. Potential of practice based research networks: Experiences from ASPN. J Fam Pract 1994;38:400-06.
42. Nutting PA, Green LA. And the evidence continues to establish the feasibility of practice-based research. Fam Med 1993;25:434-36.
43. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
44. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
45. Stange KC. Practice-based research networks: their current level of validity, generalizability, and potential for wider application. Arch Fam Med 1993;2:921-23.
46. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
47. Crabtree BF, Miller WL. Doing qualitative research. Newbury Park, Calif: Sage Publications; 1992.
48. Hsiao WC, Braun P, Dunn DL, et al. An overview of the development and refinement of the resource-based relative value scale. Med Care 1992;30:NS1-12.
49. Gilchrist V, Miller RS, Gillanders WL, et al. Does family practice at residency teaching sites reflect community practice? J Fam Pract 1993;37:555-63.
50. Miller WL, Crabtree BF. Clinical research: a multi-method typology and qualitative roadmap. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;3-30.
51. Borkan J. Immersion/crystallization. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;179-94.
52. Crabtree BF, Miller WL. Using codes and code manuals: a template organizing style of interpretation. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;163-77.
53. Crabtree BF, Miller WL. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
54. McIlvain HE, Crabtree BF, Gilbert C, Havranek R, Backer E. Current trends in tobacco prevention and cessation in Nebraska’s physician’s offices. J Fam Pract 1997;44:193-202.
55. McIlvain HE, Susman JL, Davis C, Gilbert C. Physician counseling for smoking cessation: Is the glass half empty? J Fam Pract 1995;40:148-52.
56. Medder JD, Susman JL, Gilbert C, et al. Dissemination and implementation of put prevention into practice: success or failure? Am J Prev Med 1997;13:345-51.
57. Dietrich AJ, O’Connor GT, Keller A, Carney PA, Levy D. Cancer: improving early detection and prevention. A community practice randomised trial. BMJ 1992;304:687-91.
58. Carney PA, Dietrich AJ, Keller A, Landgraf J, O’Conner GT. Tools, teamwork, and tenacity: an office system for cancer prevention. J Fam Pract 1992;35:388-94.
59. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study to Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med. In press.
60. Blankfield RP, Finkelhor RS, Alexander J, et al. Etiology and diagnosis of bilateral leg edema in primary care. Am J Med 1998;105:192-97.
61. Blankfield RP, Hudgel DW, Tapolyai AA, Zyzanski SJ. Bilateral leg edema, pulmonary hypertension, and obstructive sleep apnea. Arch Intern Med 2000;160:2357-62.
62. Mettee TM, Martin KB, Williams RL. Tools for community-oriented primary care: a process for linking practice and community data. J Am Board Fam Pract 1998;11:28-33.
63. Orzano AJ, Gregory PM, Nutting PA, Werner JJ, Flocke SA, Stange KC. Care of the secondary patient in family practice: a report from ASPN. J Fam Pract 2001;50:113-16.
64. Flocke SA, Orzano AJ, Selinger A, et al. Does managed care restrictiveness affect the perceived quality of care? A report from ASPN. J Fam Pract 1999;48:762-68.
65. Stange KC. Primary care research: barriers and opportunities. J Fam Pract 1996;42:192-98.
66. Thesen J, Kuzel A. Participatory inquiry. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;269-90.
67. Macaulay AC, Gibson N, Commanda L, McCabe M, Robbins C, Twohig P. Responsible research with communities: participatory research in primary care. Available at: views.vcu.edu/views/fap/napcrg.html. North American Primary Care Research Group; 1998.
68. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano AJ. Practice patterns of family physicians in practice-based research networks: a report from ASPN. Ambulatory Sentinel Practice Network. J Am Board Fam Pract 1999;12:278-84.
69. Nutting PA, Stange KC. Practice-based research: the opportunity to create a learning discipline. In: Rakel RE, ed. The textbook of family practice. St Louis, Mo: WB Saunders; 2001.
70. Macaulay AC, Commanda L, Freeman W, et al. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
71. Crabtree BF, Miller WL, Adison RB, Gilchrist VJ, Kuzel A. Exploring collaborative research in primary care. Thousand Oaks, Calif: Sage Publications; 1994.
72. Covey S. The seven habits of highly effective people. New York, NY: Simon & Schuster, Inc; 1989.
73. Stange KC, Miller WL, McWhinney IR. Developing the knowledge base of family practice. Fam Med. In press.
CONSENSUS PROCESS: The study participants (academic investigators, clinicians, and research nurses) met in groups. By reflecting on the study process, these groups identified insights that may be useful to other investigators planning or conducting primary care research.
LESSONS: The story of the DOPC study is one of collaboration leading to innovation and the development of ongoing relationships and a persistent research trajectory. Six factors were identified as important to the success of the primary care research process: (1) A generalist perspective; (2) involvement of community practices and practicing clinicians as research partners; (3) commitment to a transdisciplinary team process; (4) a multimethod approach; (5) openness to emerging insights; and (6) thinking big, but starting small.
CONCLUSIONS: A multimethod research process that involves collaboration between practicing clinicians, methodologists, and content experts can simultaneously test a priori hypotheses and discover important new insights about primary care practice.
The Direct Observation of Primary Care (DOPC) Study has contributed to the understanding of family practice and has fostered the development of new primary care research methods1-6 and theoretical perspectives.7-10 The study’s findings have important implications for improving patient care11-23 and developing policies9,24-31 that maximize the impact of a generalist patient-centered approach toward the health of individuals, families, and communities.9 This study has spawned a large portfolio of related inquiry, including an in-depth qualitative study of family practices, a multimethod community practice intervention trial, and a new family practice research center.
The DOPC Study story represents a unique confluence of ideas, people, and opportunities. However, we hope that readers might glean insights relevant to their lines of inquiry and that this article will stimulate the continued development of a unique primary care and family practice research agenda.*
The Dopc Story
Concept Development
In 1988, family practice researchers Kurt Stange and Stephen Zyzanski collaborated on a paper about the benefits of integrating quantitative and qualitative research methods.32 While writing a second manuscript on the topic, they invited Benjamin Crabtree and William Miller (emerging experts on the application of qualitative research methods to primary care) to collaborate.33
At the same time, a group of family physicians and researchers affiliated with Case Western Reserve University in Cleveland, Ohio, was attempting to develop an innovative approach to improving clinical preventive service delivery in practice. With a grant from the Ohio Academy of Family Physicians Foundation, they conducted a survey on family physician agreement with United States Preventive Services Task Force recommendations.34-36 The survey findings, conversations with respondents, and a review of the literature led to the conclusion that current approaches to improving clinical preventive service delivery were limited by a lack of understanding of the true nature of family practice37 and that efforts to improve practice should be preceded by efforts to understand practice.38,39 Before designing an intervention study, further insight into the “black box” of real world family practice was needed.2,40
The research group, which now included family practice academicians, clinicians, and methodologists, began exploring the development of a research network backward from the traditionally successful models used by the Ambulatory Sentinel Practice Network (ASPN) and other networks.41-44 Rather than developing the infrastructure around a network of clinicians that performs research by gathering data, this new network was developed around a large descriptive study of the content and context of family practice. Funding for a specific study would be easier to obtain than start-up costs for a practice-based research infrastructure. Also, given the demands of clinical practice,45 a well-funded study in a regional network could collect more extensive data than a study conducted by individual practices, providing opportunities for spinoff studies and other clinician-initiated inquiries. The research team decided to explore research opportunities with national funding agencies, with close communications and extensive input from local practicing family physicians.
During this time, a series of primary care research conferences sponsored by the Agency for Health Care Policy and Research (AHCPR) provided fertile ground for exploring research ideas and methods from a multidisciplinary perspective. At one conference, research team member Carlos Jaén (an epidemiologist and family physician), team leader Kurt Stange, and Paul Nutting (director of primary care research at AHCPR at the time) discussed the research network and project. The recognition emerged that many worthwhile primary care activities, including preventive service delivery, are not carried out during patient visits because of the competing demands imposed by other activities. This competing demands mode17 of preventive service delivery and primary care provided an important initial theoretical framework for what would become the DOPC Study.
Research Design
The research team began refining study questions and developing methods. A critical event occurred during a discussion of methods for measuring the content of outpatient family practice visits. Jason Chao, a family practice academician, enumerated these methods:“…chart review, patient questionnaire, billing data. One could do direct observation, but you can’t do that.” As everyone nodded agreement, his colleague Robert Kelly interrupted, “Why not? Why can’t you do direct observation?” The group listed many good reasons: intrusiveness, unacceptability to patients and clinicians, expense, and the potential to bias behavior. However, the question “Why not?” remained and created a shared sense that direct observation of real world family practices represented an opportunity to make a unique contribution. The group decided to include direct observation as a major measurement technique and to add a methodologic goal of establishing the validity and reliability of nonobservational techniques for assessing the content of outpatient medical practice. An additional advance occurred with the publication of the Davis Observation Code (DOC)46 that classified patient visits into 20 different behavioral codes measured in 15-second intervals. Lead author Edward Callahan agreed to become a collaborator.
Limited existing research on the content of community primary care practice meant that the group would have difficulty in anticipating all content areas worth measuring and questions worth asking before immersing themselves in community family practice settings. Therefore, Drs Crabtree and Miller were asked to join the team to design a multimethod approach that integrated quantitative and qualitative methods.47 Project design was pursued further in research team meetings, telephone conversations, and interactions with out-of-town collaborators during national professional meetings. These face-to-face meetings were essential to developing the trust, communication, and shared vision necessary for a transdisciplinary multimethod study.
Conversations with local family physicians soon revealed that preventive service delivery, although an important aspect of family practice, was not a sufficiently compelling research question to engage a new practice-based research network. A broader focus, such as the content of family practice, would engage the largest number of clinicians and be less likely to bias clinician behavior during direct observation. At the suggestion of practicing family physician Michael Rabovsky, in whose office the protocol was being pilot tested, the study was expanded to address the Medicare Resource-Based Relative Value System (RBRVS)-based billing system. Health economist Daniel Dunn, who helped develop the RBRVS,48 was invited to participate.
Based on discussions with practicing family physicians, a strategy was developed for recruiting practice-based research network members. Members of the Ohio Academy of Family Physicians (OAFP) in Northeast Ohio were targeted to facilitate easy meeting of practices and travel of study teams to practice sites.45 A letter describing the study and proposed network was sent to all 531 OAFP active members in the area. A total of 138 physicians responded and formed the fledgling Research Association of Practicing Physicians (RAPP) network. A working relationship was established with the NorthEast Ohio Network (NEON), a practice-based research network of 6 community residency training sites affiliated with the NorthEast Ohio Universities Colleges of Medicine, directed by William Gillanders (and later Valerie Gilchrist). NEON physicians were trained in National Ambulatory Medical Care Survey (NAMCS)49 data collection techniques and provided the opportunity to evaluate the validity of the NAMCS methods compared with direct observation. These development activities were supported by considerable in-kind contributions of investigator time from the participants’ institutions.
Pursuit of Funding
A research concept paper was sent to the AHCPR for feedback. The response indicated that intervention studies were more compatible with funding priorities than the proposed observational study. The critique also pointed out “fatal flaws” engendered by direct observation methods and expressed skepticism that community physicians would allow such observation of their practices. These concerns were addressed with pilot data and a strengthened argument about the need for efforts to understand practice before attempting to change it. An investigator-initiated (R01) grant application was submitted to the AHCPR. A secondary assignment to the National Cancer Institute (NCI) was requested because of the clinical preventive service delivery focus and the important potential of understanding family practice and competing demands for the subsequent design of interventions to enhance cancer prevention and control.
The initial application was favorably reviewed and received a priority score near the funding line. In response to advice from NCI and AHCPR program officers, the research team allowed the application to be considered for funding during 3 upcoming NCI council meetings. Regular letters to research network members kept them informed of the funding status. After 1 year of narrowly missing the funding line, the grant application was revised and resubmitted in response to the scientific review committee’s critique, with increased emphasis on the implications of the study for cancer prevention and control. It was funded by the NCI, with an additional grant from the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program, to develop communication and clinician-initiated research in RAPP and additional methodologic and descriptive aims.
Planning and Conduct of Fieldwork
In 1994, more than 3 years after the idea was conceived, the board of directors of the 138-member research network (RAPP) was formally activated. The board’s 14 network volunteer members, several of whom helped develop the study and the network, were active in planning the practical implementation and refinement of the study protocol. When a board member suggested that they review the details of the direct observation measures before the study, the board concluded that as study participants they should not be involved in planning study measures, to avoid biasing their behavior during direct observation.
With funding, 2 logistic aspects of the study gained importance. First, it had been 2 years since the research network was formed. Whether physicians had retained their commitment to participate was unknown. That concern was laid to rest, however, when the vast majority of physicians expressed continued interest in the study. In retrospect, the 2-year struggle to obtain funding helped bond the network and create a sense of ownership and allegiance to the project.
The second major logistic issue was the need to recruit 8 research nurses. Job requirements included excellent interpersonal skills, sensitivity to the demands of real world community family practice, attention to detail in collecting reliable and valid quantitative data from multiple measures, an open-minded observational ability to simultaneously collect qualitative data, willingness to drive to multiple and sometimes distant sites, and interest in a 1-year job at a university salary. Hiring 8 nurses who could meet these requirements and start the 2-month training process on the same date seemed unrealistic at best. Yet, because of word of mouth advertising, the excitement generated by the study, the recent termination of another research project at the university, and the excellent reputation of the department, 8 highly qualified research nurses were found.
During their 8-week training, the research nurses were enlisted as true partners. They helped refine the research protocol and instruments, and items were added to the measures to reflect their interests. Using videotaped encounters, Dr Callahan instructed the nurses in applying the DOC, and they took the lead in adapting it for the study. As the immensity of the quantitative data collection requirements grew, Drs Miller and Crabtree scaled back the qualitative data collection protocol. They trained the research nurses in observational techniques and in how to dictate ethnographic field notes to record unanticipated findings, provide rich descriptions of quantitatively measured variables, and critique the study methods’ accuracy in capturing the phenomena under study.50
Details of the data collection procedure have been reported elsewhere.1,2 Briefly, teams of 2 research nurses spent 1 day observing patient care by the 138 participating RAPP members. One nurse obtained verbal informed consent from patients in the waiting room and distributed patient exit questionnaires. The other nurse accompanied the physician, directly observing consecutive visits by consenting patients and recording observations using the DOC and a direct observation checklist. Typically, the nurses exchanged roles after lunch. They returned on a subsequent day to perform medical record reviews for each other’s observed visits and to collect billing data. On the basis of observation and brief interviews with key informants, the research nurses completed a practice environment checklist. They dictated ethnographic field notes immediately after leaving the practice.10
During the course of the fieldwork, research team meetings were held every other week to coordinate logistics and assess and recalibrate inter-rater reliability using videotaped visits and medical records that were not part of the larger study. The high degree of inter-rater reliability achieved with this approach has been reported previously.1
After data were collected from each physician, the board of directors met to review study progress and reassess the study protocol. The academic research team, including all consultants, also met to refine the protocol and plan the second round of physician visits. Initial plans called for ongoing analysis of the ethnographic field notes, but this proved to be infeasible because of their large volume and the study demands. However, at the study midpoint, Drs Crabtree and Miller independently analyzed the field notes using an immersion crystallization technique.51 Based on the richness of the information, they developed a template52 for gathering field notes during the second round of physician visits.
Data collection procedures were repeated, and each physician was visited a second time. The 4 months (on average) between visits helped assure that seasonal variations in health problems did not unduly affect the characterization of patient care. After the second data collection visit, physicians completed a detailed questionnaire.
Data Analysis and Production of Scholarly Output
The data were entered by optical scanner and manually verified. Quantitative data analyses were performed by Cleveland research team members in response to the initial research aims and additional questions raised by the research team and research network board. Qualitative data analyses were subcontracted to the University of Nebraska, with additional grant support from the American Academy of Family Physicians for more in-depth analyses.
Multiple papers were begun with diverse lead authorship. Preference in determining paper topics was given to methodologic manuscripts, topics with timely policy implications, and papers for which individual team members had a particular passion. In response to a call from the editorial office of JFP, a proposal for a theme issue on the DOPC Study was made and accepted. The opportunity to publish early scholarly output in one place greatly increased the potential for papers on diverse topics that would help cohesively describe several aspects of the value of family practice. The deadline for the theme issue also made the paper writing a high priority. Of 14 manuscripts accepted after going through peer review, 10 were included in the May 1998 issue of JFP,2,8-11,14,15,17,18,25,40 with one paper published in each of 4 subsequent issues.13,16,26,27 Other analyses and papers have focused on the original research themes, new topics, more complex analyses, and expansion into the non–family practice literature.
Opportunities to propose paper topics have been extended to all study participants, including the academic research team, consultants, and RAPP members. Proposed topics are reviewed for feasibility and potential conflicts with other papers. The data set has spawned 2 masters theses16,19 and one doctoral dissertation3,4,12,24 and has led to new collaborations with complementary content experts.
Related Research Initiatives
Concurrent with the DOPC study, Dr Crabtree and his colleagues in Nebraska conducted a series of related inquiries.3-5,10,12,24,39,53-56 These studies have provided complementary information and advanced multimethod approaches for studying primary care practice. Close collaboration and open information sharing among the research teams and collaborators have greatly facilitated the discovery of new methods and insights into family practice and have furthered the research trajectory of the collaborating groups. These collaborations spawned the Center for Research in Family Practice and Primary Care, a multisite consortium funded by the American Academy of Family Physicians.
DOPC Study collaborations have led to other research initiatives as well. For example, a desire for more in-depth qualitative data led to a comparative case study of a smaller number of purposively selected practices in Nebraska, funded by the AHCPR with Dr Crabtree as principal investigator. In addition, after reviewing the initial findings of the first round of DOPC data, the RAPP board of directors developed a study of competing demands outside the examination room, which has led to related inquires.
Based on emerging insights from the DOPC Study on the competing demands of family practice, a competing continuation application was funded by the NCI for a trial to improve clinical preventive service delivery. The Study to Enhance Prevention by Understanding Practice (STEP-UP) was developed with input from the research team and the RAPP board of directors, with collaboration from family practice researchers at Dartmouth, led by Allen Dietrich.57,58 Building on complexity theory–based insights from the DOPC Study,8 STEP-UP uses a multimethod practice assessment to understand the unique attributes of family practices and tailor intervention strategies. This approach increased preventive service delivery rates59 and led to a more comprehensive assessment and improvement strategy that is being evaluated in the delayed intervention group. The participants include DOPC Study practices and new RAPP members.
Continuing efforts to develop the RAPP network have included free continuing medical education conferences for participants in practice-based research and quality improvement projects. An ongoing research network newsletter periodically publishes a 1-page Research Prospectus Worksheet* to encourage research ideas from RAPP members. The Cleveland research team provides rapid turnaround methodological consultation for study proposals, and those involving multiple practices are reviewed by the RAPP board of directors. In addition, RAPP members are encouraged to serve as authors on DOPC papers, and approximately half have provided internal peer review before submission of papers.
Several RAPP members have received external funding for their own research projects. These include studies of causes of bilateral leg edema in family practice,60,61 an evaluation of a family-centered approach to diagnosis and treatment of respiratory infection, a clinical trial of therapeutic touch for carpal tunnel syndrome, and development of practical new methods for community-oriented primary care.62 A recent RAPP study, in collaboration with the NEON network, used a card study methodology to describe the “oh, by the way” phenomenon in which patients raise issues after the clinician thinks the outpatient visit is finished. In addition, the discovery of high rates of care of a secondary patient11 in the DOPC study led to an ASPN card study to elucidate the content of care provided to family members other than the identified patient for an outpatient visit.63 An additional ASPN collaboration, using the Components of Primary Care Instrument3,4 that was developed as part of the DOPC Study, examined the effect of different aspects of managed care on the delivery of 10 elements of quality primary care.64
Lessons learned from the dopc process
Some of the lessons learned from the process of conducting the DOPC study are summarized in the Table 1. These lessons can be grouped into 6 categories, as follows.
A generalist perspective. A generalist perspective that places research questions in the context of the competing opportunities and complexity of family practice is needed for true family practice and primary care research.65 Although this perspective is essential if we are to diminish the current chasm between discovery and practice, it has not been supported by those who fund research. One strategy for addressing this funding issue is to identify topics and multimethod approaches that allow simultaneous pursuit of both categorical and generalist perspectives.
Involvement. The involvement of community practices and practicing clinicians as partners is essential for research about primary care practice.66, 67 New knowledge from discoveries in the settings in which most people get most of their medical care will help end the dichotomy between research and dissemination. Practice-based research networks can help bridge this gap by asking and answering questions from the perspective and setting in which the findings will be applied.68,69 (It is worth noting, however, that most successful research networks are built around a group of clinicians who are committed to conducting research in their practices. Developing a network around a particular study, as with the RAPP network, requires attention to fostering clinician ideas and nurturing relationships that extend beyond the initial study.) Greater involvement of nonclinician health care professionals, patients, and communities can also increase the relevance of research to meet the population’s health care needs.67,70
Transdisciplinary team process. A transdisciplinary team process in which diverse specialized expertise is integrated toward a common goal can be a tremendous resource for innovation and productivity. Development of a transdisciplinary team is a long-term process that requires trust, shared vision, open leadership, idea sharing, and group meetings. In addition, team members with particular expertise must be willing to commit to creating new knowledge that transcends their disciplinary perspectives.71 Such collaboration creates the mentality of a bigger pie in which the size of each participants’ piece is increased, rather than a mentality of finite resources in which a bigger piece for one member creates a shortage for another.72
Multimethodology. A multimethod approach in which quantitative and qualitative methods are integrated creates the opportunity to generate new methods, assure rigor, and maximize the efficiency of new discovery.6,32,33,47 Multimethod approaches allow testing of a priori hypotheses while creating new understanding.
Openness. Openness to emerging insights is fostered by the generalist perspective, by participatory multimethod research approaches, and by building the project from pilot data and knowledge of previous work. In the DOPC study, openness to new methods led to the “Eureka!” moment of deciding to do direct observation. The involvement of clinician and nurse perspectives in study design and conduct and the inductive use of qualitative data to discover the relevance of complexity science to understanding and enhancing primary care practice also reflected the study’s openness to new approaches.
Thinking big, but starting small. This creates a larger vision that can guide and inspire individual decisions and creates an overall research trajectory built on incremental steps. The DOPC Study began with a large idea of improving practice. Grounding in real world practice led to development of innovative new methods to try to understand primary care practice and ongoing efforts to improve practice. These major undertakings, however, were built on a foundation of small pilot studies and multiple interactions among researchers and practicing family physicians.
Applying these insights to other studies may help to advance the generation of new knowledge about family practice and primary care.73
Acknowledgments
This research was supported by grants from the National Cancer Institute (1R01 CA60962, 2R01 CA60962 and K24 CA81931), the Agency for Health Care Policy and Research (1R01 HS08776), the Ohio Academy of Family Physicians, the American Academy of Family Physicians, Generalist Physician Faculty Scholar Awards to Drs Stange and Jaén from the Robert Wood Johnson Foundation, and a Family Practice Research Center Grant from the American Academy of Family Physicians. The authors are grateful to the RAPP physicians, other clinicians, office staffs, and patients, without whose participation our study would not have been possible. We are also indebted to the many people who have participated and continue to participate in the genesis of related ideas and scholarly output that continues to emerge from the original study. Members of the DOPC Writing group also include: Authors from the Academic Research Team: Stephen J. Zyzanski, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Benjamin F. Crabtree, PhD, Department of Family Medicine, UMDNJ-RWJ Medical School, New Brunswick, NJ; William L. Miller, MD, MA, Department of Family Practice, Lehigh Valley Hospital, Allentown, Pa; Carlos Roberto Jaén, MD, PhD, Center for Urban Research in Primary Care, SUNY, Buffalo, NY; Susan A. Flocke, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Robert B. Kelly, MD, MS, Department of Family Practice, MetroHealth Medical Center, Cleveland, Ohio; William R. Gillanders, MD, Family Practice Residency Program, Sutter Health, Sacramento, Calif; Valerie Gilchrist, MD, Department of Family Practice, NorthEast Ohio Universities College of Medicine, Rootstown, Ohio; Jason Chao, MD, MS, Department of Family Medicine, Case Western Reserve University; J. Christopher Shank, MD, Methodist/Indiana University Family Practice Residency, Indianapolis, Ind; Daniel L. Dunn, PhD, Integrated Health Care Information Service, Cambridge, Mass; Jack H. Medalie, MD, MPH, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Doreen Langa, BA, American University School of Law, Washington, DC; Virginia Aita, PhD, Department of Family Practice, University of Nebraska Medical Center, Omaha; Meredith A. Goodwin, MS, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; and Robin S. Gotler, MA, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio. Research Nurse Team Authors: Lisa B. Ballou, RN, FNP; Catherine M. Corrigan, RN; Luzmaria Jaén, RN; Sherry Patzke, RN; Frances F. Powers, RN; Kathleen L. Schneeberger, RN; Kelly Warner, RN; and Susan Zronek, RN. Authors from the RAPP Board of Directors: Robert Blankfield, MD; Henry Bloom, MD; Valerie Gilchrist, MD; Gwen Haas, MD; Patricia Kellner, MD; Sa Koo Lee, MD; Conrad Lindes, MD; Dennis McCluskey, MD; Thomas Mettee, MD; Albert Miller, MD; Michael Rabovsky, MD; and Archie Wilkinson, MD.
CONSENSUS PROCESS: The study participants (academic investigators, clinicians, and research nurses) met in groups. By reflecting on the study process, these groups identified insights that may be useful to other investigators planning or conducting primary care research.
LESSONS: The story of the DOPC study is one of collaboration leading to innovation and the development of ongoing relationships and a persistent research trajectory. Six factors were identified as important to the success of the primary care research process: (1) A generalist perspective; (2) involvement of community practices and practicing clinicians as research partners; (3) commitment to a transdisciplinary team process; (4) a multimethod approach; (5) openness to emerging insights; and (6) thinking big, but starting small.
CONCLUSIONS: A multimethod research process that involves collaboration between practicing clinicians, methodologists, and content experts can simultaneously test a priori hypotheses and discover important new insights about primary care practice.
The Direct Observation of Primary Care (DOPC) Study has contributed to the understanding of family practice and has fostered the development of new primary care research methods1-6 and theoretical perspectives.7-10 The study’s findings have important implications for improving patient care11-23 and developing policies9,24-31 that maximize the impact of a generalist patient-centered approach toward the health of individuals, families, and communities.9 This study has spawned a large portfolio of related inquiry, including an in-depth qualitative study of family practices, a multimethod community practice intervention trial, and a new family practice research center.
The DOPC Study story represents a unique confluence of ideas, people, and opportunities. However, we hope that readers might glean insights relevant to their lines of inquiry and that this article will stimulate the continued development of a unique primary care and family practice research agenda.*
The Dopc Story
Concept Development
In 1988, family practice researchers Kurt Stange and Stephen Zyzanski collaborated on a paper about the benefits of integrating quantitative and qualitative research methods.32 While writing a second manuscript on the topic, they invited Benjamin Crabtree and William Miller (emerging experts on the application of qualitative research methods to primary care) to collaborate.33
At the same time, a group of family physicians and researchers affiliated with Case Western Reserve University in Cleveland, Ohio, was attempting to develop an innovative approach to improving clinical preventive service delivery in practice. With a grant from the Ohio Academy of Family Physicians Foundation, they conducted a survey on family physician agreement with United States Preventive Services Task Force recommendations.34-36 The survey findings, conversations with respondents, and a review of the literature led to the conclusion that current approaches to improving clinical preventive service delivery were limited by a lack of understanding of the true nature of family practice37 and that efforts to improve practice should be preceded by efforts to understand practice.38,39 Before designing an intervention study, further insight into the “black box” of real world family practice was needed.2,40
The research group, which now included family practice academicians, clinicians, and methodologists, began exploring the development of a research network backward from the traditionally successful models used by the Ambulatory Sentinel Practice Network (ASPN) and other networks.41-44 Rather than developing the infrastructure around a network of clinicians that performs research by gathering data, this new network was developed around a large descriptive study of the content and context of family practice. Funding for a specific study would be easier to obtain than start-up costs for a practice-based research infrastructure. Also, given the demands of clinical practice,45 a well-funded study in a regional network could collect more extensive data than a study conducted by individual practices, providing opportunities for spinoff studies and other clinician-initiated inquiries. The research team decided to explore research opportunities with national funding agencies, with close communications and extensive input from local practicing family physicians.
During this time, a series of primary care research conferences sponsored by the Agency for Health Care Policy and Research (AHCPR) provided fertile ground for exploring research ideas and methods from a multidisciplinary perspective. At one conference, research team member Carlos Jaén (an epidemiologist and family physician), team leader Kurt Stange, and Paul Nutting (director of primary care research at AHCPR at the time) discussed the research network and project. The recognition emerged that many worthwhile primary care activities, including preventive service delivery, are not carried out during patient visits because of the competing demands imposed by other activities. This competing demands mode17 of preventive service delivery and primary care provided an important initial theoretical framework for what would become the DOPC Study.
Research Design
The research team began refining study questions and developing methods. A critical event occurred during a discussion of methods for measuring the content of outpatient family practice visits. Jason Chao, a family practice academician, enumerated these methods:“…chart review, patient questionnaire, billing data. One could do direct observation, but you can’t do that.” As everyone nodded agreement, his colleague Robert Kelly interrupted, “Why not? Why can’t you do direct observation?” The group listed many good reasons: intrusiveness, unacceptability to patients and clinicians, expense, and the potential to bias behavior. However, the question “Why not?” remained and created a shared sense that direct observation of real world family practices represented an opportunity to make a unique contribution. The group decided to include direct observation as a major measurement technique and to add a methodologic goal of establishing the validity and reliability of nonobservational techniques for assessing the content of outpatient medical practice. An additional advance occurred with the publication of the Davis Observation Code (DOC)46 that classified patient visits into 20 different behavioral codes measured in 15-second intervals. Lead author Edward Callahan agreed to become a collaborator.
Limited existing research on the content of community primary care practice meant that the group would have difficulty in anticipating all content areas worth measuring and questions worth asking before immersing themselves in community family practice settings. Therefore, Drs Crabtree and Miller were asked to join the team to design a multimethod approach that integrated quantitative and qualitative methods.47 Project design was pursued further in research team meetings, telephone conversations, and interactions with out-of-town collaborators during national professional meetings. These face-to-face meetings were essential to developing the trust, communication, and shared vision necessary for a transdisciplinary multimethod study.
Conversations with local family physicians soon revealed that preventive service delivery, although an important aspect of family practice, was not a sufficiently compelling research question to engage a new practice-based research network. A broader focus, such as the content of family practice, would engage the largest number of clinicians and be less likely to bias clinician behavior during direct observation. At the suggestion of practicing family physician Michael Rabovsky, in whose office the protocol was being pilot tested, the study was expanded to address the Medicare Resource-Based Relative Value System (RBRVS)-based billing system. Health economist Daniel Dunn, who helped develop the RBRVS,48 was invited to participate.
Based on discussions with practicing family physicians, a strategy was developed for recruiting practice-based research network members. Members of the Ohio Academy of Family Physicians (OAFP) in Northeast Ohio were targeted to facilitate easy meeting of practices and travel of study teams to practice sites.45 A letter describing the study and proposed network was sent to all 531 OAFP active members in the area. A total of 138 physicians responded and formed the fledgling Research Association of Practicing Physicians (RAPP) network. A working relationship was established with the NorthEast Ohio Network (NEON), a practice-based research network of 6 community residency training sites affiliated with the NorthEast Ohio Universities Colleges of Medicine, directed by William Gillanders (and later Valerie Gilchrist). NEON physicians were trained in National Ambulatory Medical Care Survey (NAMCS)49 data collection techniques and provided the opportunity to evaluate the validity of the NAMCS methods compared with direct observation. These development activities were supported by considerable in-kind contributions of investigator time from the participants’ institutions.
Pursuit of Funding
A research concept paper was sent to the AHCPR for feedback. The response indicated that intervention studies were more compatible with funding priorities than the proposed observational study. The critique also pointed out “fatal flaws” engendered by direct observation methods and expressed skepticism that community physicians would allow such observation of their practices. These concerns were addressed with pilot data and a strengthened argument about the need for efforts to understand practice before attempting to change it. An investigator-initiated (R01) grant application was submitted to the AHCPR. A secondary assignment to the National Cancer Institute (NCI) was requested because of the clinical preventive service delivery focus and the important potential of understanding family practice and competing demands for the subsequent design of interventions to enhance cancer prevention and control.
The initial application was favorably reviewed and received a priority score near the funding line. In response to advice from NCI and AHCPR program officers, the research team allowed the application to be considered for funding during 3 upcoming NCI council meetings. Regular letters to research network members kept them informed of the funding status. After 1 year of narrowly missing the funding line, the grant application was revised and resubmitted in response to the scientific review committee’s critique, with increased emphasis on the implications of the study for cancer prevention and control. It was funded by the NCI, with an additional grant from the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program, to develop communication and clinician-initiated research in RAPP and additional methodologic and descriptive aims.
Planning and Conduct of Fieldwork
In 1994, more than 3 years after the idea was conceived, the board of directors of the 138-member research network (RAPP) was formally activated. The board’s 14 network volunteer members, several of whom helped develop the study and the network, were active in planning the practical implementation and refinement of the study protocol. When a board member suggested that they review the details of the direct observation measures before the study, the board concluded that as study participants they should not be involved in planning study measures, to avoid biasing their behavior during direct observation.
With funding, 2 logistic aspects of the study gained importance. First, it had been 2 years since the research network was formed. Whether physicians had retained their commitment to participate was unknown. That concern was laid to rest, however, when the vast majority of physicians expressed continued interest in the study. In retrospect, the 2-year struggle to obtain funding helped bond the network and create a sense of ownership and allegiance to the project.
The second major logistic issue was the need to recruit 8 research nurses. Job requirements included excellent interpersonal skills, sensitivity to the demands of real world community family practice, attention to detail in collecting reliable and valid quantitative data from multiple measures, an open-minded observational ability to simultaneously collect qualitative data, willingness to drive to multiple and sometimes distant sites, and interest in a 1-year job at a university salary. Hiring 8 nurses who could meet these requirements and start the 2-month training process on the same date seemed unrealistic at best. Yet, because of word of mouth advertising, the excitement generated by the study, the recent termination of another research project at the university, and the excellent reputation of the department, 8 highly qualified research nurses were found.
During their 8-week training, the research nurses were enlisted as true partners. They helped refine the research protocol and instruments, and items were added to the measures to reflect their interests. Using videotaped encounters, Dr Callahan instructed the nurses in applying the DOC, and they took the lead in adapting it for the study. As the immensity of the quantitative data collection requirements grew, Drs Miller and Crabtree scaled back the qualitative data collection protocol. They trained the research nurses in observational techniques and in how to dictate ethnographic field notes to record unanticipated findings, provide rich descriptions of quantitatively measured variables, and critique the study methods’ accuracy in capturing the phenomena under study.50
Details of the data collection procedure have been reported elsewhere.1,2 Briefly, teams of 2 research nurses spent 1 day observing patient care by the 138 participating RAPP members. One nurse obtained verbal informed consent from patients in the waiting room and distributed patient exit questionnaires. The other nurse accompanied the physician, directly observing consecutive visits by consenting patients and recording observations using the DOC and a direct observation checklist. Typically, the nurses exchanged roles after lunch. They returned on a subsequent day to perform medical record reviews for each other’s observed visits and to collect billing data. On the basis of observation and brief interviews with key informants, the research nurses completed a practice environment checklist. They dictated ethnographic field notes immediately after leaving the practice.10
During the course of the fieldwork, research team meetings were held every other week to coordinate logistics and assess and recalibrate inter-rater reliability using videotaped visits and medical records that were not part of the larger study. The high degree of inter-rater reliability achieved with this approach has been reported previously.1
After data were collected from each physician, the board of directors met to review study progress and reassess the study protocol. The academic research team, including all consultants, also met to refine the protocol and plan the second round of physician visits. Initial plans called for ongoing analysis of the ethnographic field notes, but this proved to be infeasible because of their large volume and the study demands. However, at the study midpoint, Drs Crabtree and Miller independently analyzed the field notes using an immersion crystallization technique.51 Based on the richness of the information, they developed a template52 for gathering field notes during the second round of physician visits.
Data collection procedures were repeated, and each physician was visited a second time. The 4 months (on average) between visits helped assure that seasonal variations in health problems did not unduly affect the characterization of patient care. After the second data collection visit, physicians completed a detailed questionnaire.
Data Analysis and Production of Scholarly Output
The data were entered by optical scanner and manually verified. Quantitative data analyses were performed by Cleveland research team members in response to the initial research aims and additional questions raised by the research team and research network board. Qualitative data analyses were subcontracted to the University of Nebraska, with additional grant support from the American Academy of Family Physicians for more in-depth analyses.
Multiple papers were begun with diverse lead authorship. Preference in determining paper topics was given to methodologic manuscripts, topics with timely policy implications, and papers for which individual team members had a particular passion. In response to a call from the editorial office of JFP, a proposal for a theme issue on the DOPC Study was made and accepted. The opportunity to publish early scholarly output in one place greatly increased the potential for papers on diverse topics that would help cohesively describe several aspects of the value of family practice. The deadline for the theme issue also made the paper writing a high priority. Of 14 manuscripts accepted after going through peer review, 10 were included in the May 1998 issue of JFP,2,8-11,14,15,17,18,25,40 with one paper published in each of 4 subsequent issues.13,16,26,27 Other analyses and papers have focused on the original research themes, new topics, more complex analyses, and expansion into the non–family practice literature.
Opportunities to propose paper topics have been extended to all study participants, including the academic research team, consultants, and RAPP members. Proposed topics are reviewed for feasibility and potential conflicts with other papers. The data set has spawned 2 masters theses16,19 and one doctoral dissertation3,4,12,24 and has led to new collaborations with complementary content experts.
Related Research Initiatives
Concurrent with the DOPC study, Dr Crabtree and his colleagues in Nebraska conducted a series of related inquiries.3-5,10,12,24,39,53-56 These studies have provided complementary information and advanced multimethod approaches for studying primary care practice. Close collaboration and open information sharing among the research teams and collaborators have greatly facilitated the discovery of new methods and insights into family practice and have furthered the research trajectory of the collaborating groups. These collaborations spawned the Center for Research in Family Practice and Primary Care, a multisite consortium funded by the American Academy of Family Physicians.
DOPC Study collaborations have led to other research initiatives as well. For example, a desire for more in-depth qualitative data led to a comparative case study of a smaller number of purposively selected practices in Nebraska, funded by the AHCPR with Dr Crabtree as principal investigator. In addition, after reviewing the initial findings of the first round of DOPC data, the RAPP board of directors developed a study of competing demands outside the examination room, which has led to related inquires.
Based on emerging insights from the DOPC Study on the competing demands of family practice, a competing continuation application was funded by the NCI for a trial to improve clinical preventive service delivery. The Study to Enhance Prevention by Understanding Practice (STEP-UP) was developed with input from the research team and the RAPP board of directors, with collaboration from family practice researchers at Dartmouth, led by Allen Dietrich.57,58 Building on complexity theory–based insights from the DOPC Study,8 STEP-UP uses a multimethod practice assessment to understand the unique attributes of family practices and tailor intervention strategies. This approach increased preventive service delivery rates59 and led to a more comprehensive assessment and improvement strategy that is being evaluated in the delayed intervention group. The participants include DOPC Study practices and new RAPP members.
Continuing efforts to develop the RAPP network have included free continuing medical education conferences for participants in practice-based research and quality improvement projects. An ongoing research network newsletter periodically publishes a 1-page Research Prospectus Worksheet* to encourage research ideas from RAPP members. The Cleveland research team provides rapid turnaround methodological consultation for study proposals, and those involving multiple practices are reviewed by the RAPP board of directors. In addition, RAPP members are encouraged to serve as authors on DOPC papers, and approximately half have provided internal peer review before submission of papers.
Several RAPP members have received external funding for their own research projects. These include studies of causes of bilateral leg edema in family practice,60,61 an evaluation of a family-centered approach to diagnosis and treatment of respiratory infection, a clinical trial of therapeutic touch for carpal tunnel syndrome, and development of practical new methods for community-oriented primary care.62 A recent RAPP study, in collaboration with the NEON network, used a card study methodology to describe the “oh, by the way” phenomenon in which patients raise issues after the clinician thinks the outpatient visit is finished. In addition, the discovery of high rates of care of a secondary patient11 in the DOPC study led to an ASPN card study to elucidate the content of care provided to family members other than the identified patient for an outpatient visit.63 An additional ASPN collaboration, using the Components of Primary Care Instrument3,4 that was developed as part of the DOPC Study, examined the effect of different aspects of managed care on the delivery of 10 elements of quality primary care.64
Lessons learned from the dopc process
Some of the lessons learned from the process of conducting the DOPC study are summarized in the Table 1. These lessons can be grouped into 6 categories, as follows.
A generalist perspective. A generalist perspective that places research questions in the context of the competing opportunities and complexity of family practice is needed for true family practice and primary care research.65 Although this perspective is essential if we are to diminish the current chasm between discovery and practice, it has not been supported by those who fund research. One strategy for addressing this funding issue is to identify topics and multimethod approaches that allow simultaneous pursuit of both categorical and generalist perspectives.
Involvement. The involvement of community practices and practicing clinicians as partners is essential for research about primary care practice.66, 67 New knowledge from discoveries in the settings in which most people get most of their medical care will help end the dichotomy between research and dissemination. Practice-based research networks can help bridge this gap by asking and answering questions from the perspective and setting in which the findings will be applied.68,69 (It is worth noting, however, that most successful research networks are built around a group of clinicians who are committed to conducting research in their practices. Developing a network around a particular study, as with the RAPP network, requires attention to fostering clinician ideas and nurturing relationships that extend beyond the initial study.) Greater involvement of nonclinician health care professionals, patients, and communities can also increase the relevance of research to meet the population’s health care needs.67,70
Transdisciplinary team process. A transdisciplinary team process in which diverse specialized expertise is integrated toward a common goal can be a tremendous resource for innovation and productivity. Development of a transdisciplinary team is a long-term process that requires trust, shared vision, open leadership, idea sharing, and group meetings. In addition, team members with particular expertise must be willing to commit to creating new knowledge that transcends their disciplinary perspectives.71 Such collaboration creates the mentality of a bigger pie in which the size of each participants’ piece is increased, rather than a mentality of finite resources in which a bigger piece for one member creates a shortage for another.72
Multimethodology. A multimethod approach in which quantitative and qualitative methods are integrated creates the opportunity to generate new methods, assure rigor, and maximize the efficiency of new discovery.6,32,33,47 Multimethod approaches allow testing of a priori hypotheses while creating new understanding.
Openness. Openness to emerging insights is fostered by the generalist perspective, by participatory multimethod research approaches, and by building the project from pilot data and knowledge of previous work. In the DOPC study, openness to new methods led to the “Eureka!” moment of deciding to do direct observation. The involvement of clinician and nurse perspectives in study design and conduct and the inductive use of qualitative data to discover the relevance of complexity science to understanding and enhancing primary care practice also reflected the study’s openness to new approaches.
Thinking big, but starting small. This creates a larger vision that can guide and inspire individual decisions and creates an overall research trajectory built on incremental steps. The DOPC Study began with a large idea of improving practice. Grounding in real world practice led to development of innovative new methods to try to understand primary care practice and ongoing efforts to improve practice. These major undertakings, however, were built on a foundation of small pilot studies and multiple interactions among researchers and practicing family physicians.
Applying these insights to other studies may help to advance the generation of new knowledge about family practice and primary care.73
Acknowledgments
This research was supported by grants from the National Cancer Institute (1R01 CA60962, 2R01 CA60962 and K24 CA81931), the Agency for Health Care Policy and Research (1R01 HS08776), the Ohio Academy of Family Physicians, the American Academy of Family Physicians, Generalist Physician Faculty Scholar Awards to Drs Stange and Jaén from the Robert Wood Johnson Foundation, and a Family Practice Research Center Grant from the American Academy of Family Physicians. The authors are grateful to the RAPP physicians, other clinicians, office staffs, and patients, without whose participation our study would not have been possible. We are also indebted to the many people who have participated and continue to participate in the genesis of related ideas and scholarly output that continues to emerge from the original study. Members of the DOPC Writing group also include: Authors from the Academic Research Team: Stephen J. Zyzanski, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Benjamin F. Crabtree, PhD, Department of Family Medicine, UMDNJ-RWJ Medical School, New Brunswick, NJ; William L. Miller, MD, MA, Department of Family Practice, Lehigh Valley Hospital, Allentown, Pa; Carlos Roberto Jaén, MD, PhD, Center for Urban Research in Primary Care, SUNY, Buffalo, NY; Susan A. Flocke, PhD, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Robert B. Kelly, MD, MS, Department of Family Practice, MetroHealth Medical Center, Cleveland, Ohio; William R. Gillanders, MD, Family Practice Residency Program, Sutter Health, Sacramento, Calif; Valerie Gilchrist, MD, Department of Family Practice, NorthEast Ohio Universities College of Medicine, Rootstown, Ohio; Jason Chao, MD, MS, Department of Family Medicine, Case Western Reserve University; J. Christopher Shank, MD, Methodist/Indiana University Family Practice Residency, Indianapolis, Ind; Daniel L. Dunn, PhD, Integrated Health Care Information Service, Cambridge, Mass; Jack H. Medalie, MD, MPH, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; Doreen Langa, BA, American University School of Law, Washington, DC; Virginia Aita, PhD, Department of Family Practice, University of Nebraska Medical Center, Omaha; Meredith A. Goodwin, MS, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio; and Robin S. Gotler, MA, Department of Family Medicine, Case Western Reserve University, Cleveland, Ohio. Research Nurse Team Authors: Lisa B. Ballou, RN, FNP; Catherine M. Corrigan, RN; Luzmaria Jaén, RN; Sherry Patzke, RN; Frances F. Powers, RN; Kathleen L. Schneeberger, RN; Kelly Warner, RN; and Susan Zronek, RN. Authors from the RAPP Board of Directors: Robert Blankfield, MD; Henry Bloom, MD; Valerie Gilchrist, MD; Gwen Haas, MD; Patricia Kellner, MD; Sa Koo Lee, MD; Conrad Lindes, MD; Dennis McCluskey, MD; Thomas Mettee, MD; Albert Miller, MD; Michael Rabovsky, MD; and Archie Wilkinson, MD.
1. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patient visits. Med Care 1998;36:851-67.
2. Stange KC, Zyzanski SJ, Flocke SA, et al. Illuminating the black box: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
3. Flocke SA. Measuring attributes of primary care: development of a new instrument. J Fam Pract 1997;45:64-74.
4. Flocke SA. Primary care instrument. J Fam Pract 1998;46:12.-
5. McIlvain H, Crabtree BF, Medder J, et al. Using ‘practice genograms’ to understand and describe practice configurations. Fam Med 1998;30:490-96.
6. Crabtree BF, Miller WL. Doing qualitative research. 2nd ed. Newbury Park, Calif: Sage Publications; 1999.
7. Jaén CR, Stange KC, Nutting PA. The competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
8. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
9. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
10. Crabtree BF, Miller WL, Aita V, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
11. Flocke SA, Goodwin MA, Stange KC. The effect of a secondary patient on the family practice visit. J Fam Pract 1998;46:429-34.
12. Flocke SA, Stange KC, Zyzanski SJ. The association of attributes of primary care with preventive service delivery. Med Care 1998;36:AS21-30.
13. Flocke SA, Stange KC, Goodwin MA. Patient and visit characteristics associated with opportunistic preventive services delivery. J Fam Pract 1998;47:202-08.
14. Medalie JH, Zyzanski SJ, Langa DM, Stange KC. The family in family practice: is it a reality? Results of a multi-faceted study. J Fam Pract 1998;46:390-96.
15. Callahan EJ, Jaén CR, Goodwin MA, Crabtree BF, Stange KC. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
16. Gross DA, Stange KC, Zyzanski SJ, Cebul R, Borawski E. Patient satisfaction with time spent with them by their family physician. J Fam Pract 1998;46:133-37.
17. Jaén CR, Crabtree BF, Zyzanski SJ, Stange KC. Making time for tobacco cessation counseling. J Fam Pract 1998;46:425-28.
18. Stange KC, Flocke SA, Goodwin MA. Opportunistic preventive service delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
19. Goodwin MA, Flocke SA, Borawski EA, Zyzanski SJ, Stange KC. Direct observation of preventive service delivery to adolescents seen in community family practice. Arch Pediatr Adolesc Med 1999;153:367-73.
20. Podl T, Goodwin MA, Kikano GE, Stange KC. Direct observation of exercise counseling in community family practice. Am J Prev Med 1999;17:207-10.
21. Medalie JH, Zyzanski SJ, Goodwin MA, Stange KC. Two physician styles of focusing on the family: their relationship to patient outcomes and process of care. J Fam Pract 2000;49:209-15.
22. Stange KC, Flocke SA, Goodwin MA, Kelly R, Zyzanski SJ. Direct observation of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
23. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray T, Stange KC. Facilitating participatory decision-making: what happens in real-world community family practice? Med Care 2000;38:1200-09.
24. Flocke SA, Stange KC, Zyzanski SJ. The impact of insurance type and forced discontinuity on the delivery of primary care. J Fam Pract 1997;45:129-35.
25. Zyzanski SJ, Langa DM, Flocke SA, Stange KC. Trade-offs in high volume primary care practice. J Fam Pract 1998;46:397-402.
26. Chao J, Gillanders WR, Goodwin MA, Stange KC. Billing for physician services: a comparison of actual billing with CPT codes assigned by direct observation. J Fam Pract 1998;47:28-32.
27. Kikano GE, Goodwin MA, Stange KC. Physician employment status and patterns of care. J Fam Pract 1998;46:499-505.
28. Frank SH, Stange KC, Langa DM, Workings M. Direct observation of community-based ambulatory encounters involving medical students. JAMA 1997;278:712-16.
29. Kikano GE, Goodwin MA, Stange KC. Evaluation and management services: a comparison of medical record documentation with actual billing in community family practice. Arch Fam Med 2000;9:68-71.
30. Aita VA, Crabtree BF. Historical reflections on current preventive practice. Prev Med 2000;30:5-16.
31. Acheson LS, Goodwin MA, Wiesner G, Stange KC. Familial screening for cancer risk by community family physicians. Genetic Med 2000;2:180-85.
32. Stange KC, Zyzanski SJ. The integrated use of quantitative and qualitative research methods. Fam Med 1989;21:448-51.
33. Stange KC, Miller WL, Crabtree BF, O’Connor P, Zyzanski SJ. Multimethod research: approaches for integrating qualitative and quantitative methods. J Gen Intern Med 1994;9:278-82.
34. Stange KC, Kelly RB, Chao JC, et al. Physician agreement with US Preventive Services Task Force recommendations. J Fam Pract 1991;34:409-16.
35. Zyzanski SJ, Stange KC, Kelly RB, et al. Family physicians’ disagreements with the US Preventive Services Task Force recommendations. J Fam Pract 1994;39:140-47.
36. Flocke SA, Stange KC, Fedirko T. Dissemination of the US preventive service task force guidelines. Arch Fam Med 1994;3:1006-08.
37. Stange K. Engaging providers and patients in prevention program design, implementation and operation. In: St. Peter R, Heiser N, eds. Delivering women’s preventive services under managed care. Washington, DC: Mathematica Policy Research, Inc; 1996;19-25.
38. Stange KC. “One size doesn’t fit all”: multimethod research yields new insights into interventions to improve preventive service delivery in family practice. J Fam Pract 1996;43:358-60.
39. McVea K, Crabtree BF, Medder JD, Susman JL, Lukas L, McIlvain HE, Davis CM, Gilbert CS, Hawver M. An ounce of prevention? Evaluation of the put prevention into practice program. J Fam Pract 1996;43:361-69.
40. Nutting PA. New knowledge, new tools: a look inside the ‘black box’ of family practice. J Fam Pract 1998;46:361.-
41. Green LA, Hames CG, Nutting PA. Potential of practice based research networks: Experiences from ASPN. J Fam Pract 1994;38:400-06.
42. Nutting PA, Green LA. And the evidence continues to establish the feasibility of practice-based research. Fam Med 1993;25:434-36.
43. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
44. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
45. Stange KC. Practice-based research networks: their current level of validity, generalizability, and potential for wider application. Arch Fam Med 1993;2:921-23.
46. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
47. Crabtree BF, Miller WL. Doing qualitative research. Newbury Park, Calif: Sage Publications; 1992.
48. Hsiao WC, Braun P, Dunn DL, et al. An overview of the development and refinement of the resource-based relative value scale. Med Care 1992;30:NS1-12.
49. Gilchrist V, Miller RS, Gillanders WL, et al. Does family practice at residency teaching sites reflect community practice? J Fam Pract 1993;37:555-63.
50. Miller WL, Crabtree BF. Clinical research: a multi-method typology and qualitative roadmap. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;3-30.
51. Borkan J. Immersion/crystallization. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;179-94.
52. Crabtree BF, Miller WL. Using codes and code manuals: a template organizing style of interpretation. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;163-77.
53. Crabtree BF, Miller WL. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
54. McIlvain HE, Crabtree BF, Gilbert C, Havranek R, Backer E. Current trends in tobacco prevention and cessation in Nebraska’s physician’s offices. J Fam Pract 1997;44:193-202.
55. McIlvain HE, Susman JL, Davis C, Gilbert C. Physician counseling for smoking cessation: Is the glass half empty? J Fam Pract 1995;40:148-52.
56. Medder JD, Susman JL, Gilbert C, et al. Dissemination and implementation of put prevention into practice: success or failure? Am J Prev Med 1997;13:345-51.
57. Dietrich AJ, O’Connor GT, Keller A, Carney PA, Levy D. Cancer: improving early detection and prevention. A community practice randomised trial. BMJ 1992;304:687-91.
58. Carney PA, Dietrich AJ, Keller A, Landgraf J, O’Conner GT. Tools, teamwork, and tenacity: an office system for cancer prevention. J Fam Pract 1992;35:388-94.
59. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study to Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med. In press.
60. Blankfield RP, Finkelhor RS, Alexander J, et al. Etiology and diagnosis of bilateral leg edema in primary care. Am J Med 1998;105:192-97.
61. Blankfield RP, Hudgel DW, Tapolyai AA, Zyzanski SJ. Bilateral leg edema, pulmonary hypertension, and obstructive sleep apnea. Arch Intern Med 2000;160:2357-62.
62. Mettee TM, Martin KB, Williams RL. Tools for community-oriented primary care: a process for linking practice and community data. J Am Board Fam Pract 1998;11:28-33.
63. Orzano AJ, Gregory PM, Nutting PA, Werner JJ, Flocke SA, Stange KC. Care of the secondary patient in family practice: a report from ASPN. J Fam Pract 2001;50:113-16.
64. Flocke SA, Orzano AJ, Selinger A, et al. Does managed care restrictiveness affect the perceived quality of care? A report from ASPN. J Fam Pract 1999;48:762-68.
65. Stange KC. Primary care research: barriers and opportunities. J Fam Pract 1996;42:192-98.
66. Thesen J, Kuzel A. Participatory inquiry. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;269-90.
67. Macaulay AC, Gibson N, Commanda L, McCabe M, Robbins C, Twohig P. Responsible research with communities: participatory research in primary care. Available at: views.vcu.edu/views/fap/napcrg.html. North American Primary Care Research Group; 1998.
68. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano AJ. Practice patterns of family physicians in practice-based research networks: a report from ASPN. Ambulatory Sentinel Practice Network. J Am Board Fam Pract 1999;12:278-84.
69. Nutting PA, Stange KC. Practice-based research: the opportunity to create a learning discipline. In: Rakel RE, ed. The textbook of family practice. St Louis, Mo: WB Saunders; 2001.
70. Macaulay AC, Commanda L, Freeman W, et al. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
71. Crabtree BF, Miller WL, Adison RB, Gilchrist VJ, Kuzel A. Exploring collaborative research in primary care. Thousand Oaks, Calif: Sage Publications; 1994.
72. Covey S. The seven habits of highly effective people. New York, NY: Simon & Schuster, Inc; 1989.
73. Stange KC, Miller WL, McWhinney IR. Developing the knowledge base of family practice. Fam Med. In press.
1. Stange KC, Zyzanski SJ, Smith TF, et al. How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patient visits. Med Care 1998;36:851-67.
2. Stange KC, Zyzanski SJ, Flocke SA, et al. Illuminating the black box: a description of 4454 patient visits to 138 family physicians. J Fam Pract 1998;46:377-89.
3. Flocke SA. Measuring attributes of primary care: development of a new instrument. J Fam Pract 1997;45:64-74.
4. Flocke SA. Primary care instrument. J Fam Pract 1998;46:12.-
5. McIlvain H, Crabtree BF, Medder J, et al. Using ‘practice genograms’ to understand and describe practice configurations. Fam Med 1998;30:490-96.
6. Crabtree BF, Miller WL. Doing qualitative research. 2nd ed. Newbury Park, Calif: Sage Publications; 1999.
7. Jaén CR, Stange KC, Nutting PA. The competing demands of primary care: a model for the delivery of clinical preventive services. J Fam Pract 1994;38:166-71.
8. Miller WL, Crabtree BF, McDaniel R, Stange KC. Understanding change in primary care practice using complexity theory. J Fam Pract 1998;46:369-76.
9. Stange KC, Jaén CR, Flocke SA, Miller WL, Crabtree BF, Zyzanski SJ. The value of a family physician. J Fam Pract 1998;46:363-68.
10. Crabtree BF, Miller WL, Aita V, Flocke SA, Stange KC. Primary care practice organization and preventive services delivery: a qualitative analysis. J Fam Pract 1998;46:403-09.
11. Flocke SA, Goodwin MA, Stange KC. The effect of a secondary patient on the family practice visit. J Fam Pract 1998;46:429-34.
12. Flocke SA, Stange KC, Zyzanski SJ. The association of attributes of primary care with preventive service delivery. Med Care 1998;36:AS21-30.
13. Flocke SA, Stange KC, Goodwin MA. Patient and visit characteristics associated with opportunistic preventive services delivery. J Fam Pract 1998;47:202-08.
14. Medalie JH, Zyzanski SJ, Langa DM, Stange KC. The family in family practice: is it a reality? Results of a multi-faceted study. J Fam Pract 1998;46:390-96.
15. Callahan EJ, Jaén CR, Goodwin MA, Crabtree BF, Stange KC. The impact of recent emotional distress and diagnosis of depression or anxiety on the physician-patient encounter in family practice. J Fam Pract 1998;46:410-18.
16. Gross DA, Stange KC, Zyzanski SJ, Cebul R, Borawski E. Patient satisfaction with time spent with them by their family physician. J Fam Pract 1998;46:133-37.
17. Jaén CR, Crabtree BF, Zyzanski SJ, Stange KC. Making time for tobacco cessation counseling. J Fam Pract 1998;46:425-28.
18. Stange KC, Flocke SA, Goodwin MA. Opportunistic preventive service delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
19. Goodwin MA, Flocke SA, Borawski EA, Zyzanski SJ, Stange KC. Direct observation of preventive service delivery to adolescents seen in community family practice. Arch Pediatr Adolesc Med 1999;153:367-73.
20. Podl T, Goodwin MA, Kikano GE, Stange KC. Direct observation of exercise counseling in community family practice. Am J Prev Med 1999;17:207-10.
21. Medalie JH, Zyzanski SJ, Goodwin MA, Stange KC. Two physician styles of focusing on the family: their relationship to patient outcomes and process of care. J Fam Pract 2000;49:209-15.
22. Stange KC, Flocke SA, Goodwin MA, Kelly R, Zyzanski SJ. Direct observation of preventive service delivery in community family practice. Prev Med 2000;31:167-76.
23. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray T, Stange KC. Facilitating participatory decision-making: what happens in real-world community family practice? Med Care 2000;38:1200-09.
24. Flocke SA, Stange KC, Zyzanski SJ. The impact of insurance type and forced discontinuity on the delivery of primary care. J Fam Pract 1997;45:129-35.
25. Zyzanski SJ, Langa DM, Flocke SA, Stange KC. Trade-offs in high volume primary care practice. J Fam Pract 1998;46:397-402.
26. Chao J, Gillanders WR, Goodwin MA, Stange KC. Billing for physician services: a comparison of actual billing with CPT codes assigned by direct observation. J Fam Pract 1998;47:28-32.
27. Kikano GE, Goodwin MA, Stange KC. Physician employment status and patterns of care. J Fam Pract 1998;46:499-505.
28. Frank SH, Stange KC, Langa DM, Workings M. Direct observation of community-based ambulatory encounters involving medical students. JAMA 1997;278:712-16.
29. Kikano GE, Goodwin MA, Stange KC. Evaluation and management services: a comparison of medical record documentation with actual billing in community family practice. Arch Fam Med 2000;9:68-71.
30. Aita VA, Crabtree BF. Historical reflections on current preventive practice. Prev Med 2000;30:5-16.
31. Acheson LS, Goodwin MA, Wiesner G, Stange KC. Familial screening for cancer risk by community family physicians. Genetic Med 2000;2:180-85.
32. Stange KC, Zyzanski SJ. The integrated use of quantitative and qualitative research methods. Fam Med 1989;21:448-51.
33. Stange KC, Miller WL, Crabtree BF, O’Connor P, Zyzanski SJ. Multimethod research: approaches for integrating qualitative and quantitative methods. J Gen Intern Med 1994;9:278-82.
34. Stange KC, Kelly RB, Chao JC, et al. Physician agreement with US Preventive Services Task Force recommendations. J Fam Pract 1991;34:409-16.
35. Zyzanski SJ, Stange KC, Kelly RB, et al. Family physicians’ disagreements with the US Preventive Services Task Force recommendations. J Fam Pract 1994;39:140-47.
36. Flocke SA, Stange KC, Fedirko T. Dissemination of the US preventive service task force guidelines. Arch Fam Med 1994;3:1006-08.
37. Stange K. Engaging providers and patients in prevention program design, implementation and operation. In: St. Peter R, Heiser N, eds. Delivering women’s preventive services under managed care. Washington, DC: Mathematica Policy Research, Inc; 1996;19-25.
38. Stange KC. “One size doesn’t fit all”: multimethod research yields new insights into interventions to improve preventive service delivery in family practice. J Fam Pract 1996;43:358-60.
39. McVea K, Crabtree BF, Medder JD, Susman JL, Lukas L, McIlvain HE, Davis CM, Gilbert CS, Hawver M. An ounce of prevention? Evaluation of the put prevention into practice program. J Fam Pract 1996;43:361-69.
40. Nutting PA. New knowledge, new tools: a look inside the ‘black box’ of family practice. J Fam Pract 1998;46:361.-
41. Green LA, Hames CG, Nutting PA. Potential of practice based research networks: Experiences from ASPN. J Fam Pract 1994;38:400-06.
42. Nutting PA, Green LA. And the evidence continues to establish the feasibility of practice-based research. Fam Med 1993;25:434-36.
43. Nutting PA, Green LA. Practice-based research networks: reuniting practice and research around the problems most of the people have most of the time. J Fam Pract 1994;38:335-36.
44. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-88.
45. Stange KC. Practice-based research networks: their current level of validity, generalizability, and potential for wider application. Arch Fam Med 1993;2:921-23.
46. Callahan EJ, Bertakis KD. Development and validation of the Davis Observation Code. Fam Med 1991;23:19-24.
47. Crabtree BF, Miller WL. Doing qualitative research. Newbury Park, Calif: Sage Publications; 1992.
48. Hsiao WC, Braun P, Dunn DL, et al. An overview of the development and refinement of the resource-based relative value scale. Med Care 1992;30:NS1-12.
49. Gilchrist V, Miller RS, Gillanders WL, et al. Does family practice at residency teaching sites reflect community practice? J Fam Pract 1993;37:555-63.
50. Miller WL, Crabtree BF. Clinical research: a multi-method typology and qualitative roadmap. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;3-30.
51. Borkan J. Immersion/crystallization. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;179-94.
52. Crabtree BF, Miller WL. Using codes and code manuals: a template organizing style of interpretation. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;163-77.
53. Crabtree BF, Miller WL. Researching practice settings: a case study approach. In: Crabtree BF, Miller WL, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;293-312.
54. McIlvain HE, Crabtree BF, Gilbert C, Havranek R, Backer E. Current trends in tobacco prevention and cessation in Nebraska’s physician’s offices. J Fam Pract 1997;44:193-202.
55. McIlvain HE, Susman JL, Davis C, Gilbert C. Physician counseling for smoking cessation: Is the glass half empty? J Fam Pract 1995;40:148-52.
56. Medder JD, Susman JL, Gilbert C, et al. Dissemination and implementation of put prevention into practice: success or failure? Am J Prev Med 1997;13:345-51.
57. Dietrich AJ, O’Connor GT, Keller A, Carney PA, Levy D. Cancer: improving early detection and prevention. A community practice randomised trial. BMJ 1992;304:687-91.
58. Carney PA, Dietrich AJ, Keller A, Landgraf J, O’Conner GT. Tools, teamwork, and tenacity: an office system for cancer prevention. J Fam Pract 1992;35:388-94.
59. Goodwin MA, Zyzanski SJ, Zronek S, et al. A clinical trial of tailored office systems for preventive service delivery: the Study to Enhance Prevention by Understanding Practice (STEP-UP). Am J Prev Med. In press.
60. Blankfield RP, Finkelhor RS, Alexander J, et al. Etiology and diagnosis of bilateral leg edema in primary care. Am J Med 1998;105:192-97.
61. Blankfield RP, Hudgel DW, Tapolyai AA, Zyzanski SJ. Bilateral leg edema, pulmonary hypertension, and obstructive sleep apnea. Arch Intern Med 2000;160:2357-62.
62. Mettee TM, Martin KB, Williams RL. Tools for community-oriented primary care: a process for linking practice and community data. J Am Board Fam Pract 1998;11:28-33.
63. Orzano AJ, Gregory PM, Nutting PA, Werner JJ, Flocke SA, Stange KC. Care of the secondary patient in family practice: a report from ASPN. J Fam Pract 2001;50:113-16.
64. Flocke SA, Orzano AJ, Selinger A, et al. Does managed care restrictiveness affect the perceived quality of care? A report from ASPN. J Fam Pract 1999;48:762-68.
65. Stange KC. Primary care research: barriers and opportunities. J Fam Pract 1996;42:192-98.
66. Thesen J, Kuzel A. Participatory inquiry. In: Crabtree B F, Miller W L, eds. Doing qualitative research. 2nd ed. Thousand Oaks, Calif: Sage Publications; 1999;269-90.
67. Macaulay AC, Gibson N, Commanda L, McCabe M, Robbins C, Twohig P. Responsible research with communities: participatory research in primary care. Available at: views.vcu.edu/views/fap/napcrg.html. North American Primary Care Research Group; 1998.
68. Nutting PA, Baier M, Werner JJ, Cutter G, Reed FM, Orzano AJ. Practice patterns of family physicians in practice-based research networks: a report from ASPN. Ambulatory Sentinel Practice Network. J Am Board Fam Pract 1999;12:278-84.
69. Nutting PA, Stange KC. Practice-based research: the opportunity to create a learning discipline. In: Rakel RE, ed. The textbook of family practice. St Louis, Mo: WB Saunders; 2001.
70. Macaulay AC, Commanda L, Freeman W, et al. Participatory research maximises community and lay involvement. BMJ 1999;319:774-78.
71. Crabtree BF, Miller WL, Adison RB, Gilchrist VJ, Kuzel A. Exploring collaborative research in primary care. Thousand Oaks, Calif: Sage Publications; 1994.
72. Covey S. The seven habits of highly effective people. New York, NY: Simon & Schuster, Inc; 1989.
73. Stange KC, Miller WL, McWhinney IR. Developing the knowledge base of family practice. Fam Med. In press.
The Factors Associated with Disclosure of Intimate Partner Abuse to Clinicians
STUDY DESIGN: We conducted telephone interviews with a random sample of ethnically diverse abused women.
POPULATION: We included a total of 375 African American, Latina, and non-Latina white women aged 18 to 46 years with histories of intimate partner abuse who attended 1 of 3 public primary care clinics in San Francisco, California, in 1997.
OUTCOMES MEASURED: We measured the relevance and determinants of past communication with clinicians about abuse and barriers to communication.
RESULTS: Forty-two percent (159) of the patients reported having communicated with a clinician about abuse. Significant independent predictors of communication were direct clinician questioning about abuse (odds ratio [OR] =4.6; 95% confidence interval [CI], 3.2-6.6), and African American ethnicity (OR=1.8; 95% CI, 1.1-2.9). Factors associated with lack of communication about abuse included immigrant status (OR=0.6; 95% CI, 0.3-1.0) and patient concerns about confidentiality (OR=0.7; 95% CI, 0.5-0.9). Barriers significantly associated with lack of communication were patients’ perceptions that clinicians did not ask directly about abuse, beliefs that clinicians lack time and interest in discussing abuse, fears about involving police and courts, and concerns about confidentiality.
CONCLUSIONS: Clinician inquiry appears to be one of the strongest determinants of communication with patients about partner abuse. Other factors that need to be addressed include patient perceptions regarding clinicians’ time and interest in discussing abuse, fear of police or court involvement, and patient concerns about confidentiality.
It is estimated that intimate partner abuse (IPA) occurs in 4 to 6 million relationships each year in the United States1,2 and that many health care interactions involve abused patients in primary care settings.3,4 Clinicians are therefore well placed to identify IPA and to provide appropriate care and referrals. However, in spite of its high prevalence and the existence of published guidelines and recommendations for routine clinician screening,5 the majority of abused women patients are not identified in the medical system and do not receive needed assistance.6,7 Estimates of the prevalence of clinician-patient communication about IPA range from 10% to slightly less than one third of all abused women.2,8
Previous studies have shown that the low rates of clinician-patient communication about IPA result in part from a lack of direct questioning by many clinicians and because women rarely volunteer information about abuse without being asked. Less than 15% of women patients in primary care settings report being asked about abuse by health care professionals.2,4,6,7,9 A recent statewide study of primary care clinicians in California found that only 10% reported routine screening for abuse among new patients, and 9% reported such screening at periodic checkups.10 Yet the majority of women patients report that they favor direct questioning by clinicians about IPA and would reveal abuse histories if asked directly.6,7
Despite these studies there is much that remains unknown about abuse-related communication patterns and patient attitudes about communication in the medical setting. We examined the prevalence and determinants of clinician-patient communication about intimate partner abuse by interviewing an ethnically diverse group of abused women primary care patients to determine whether differences in disclosure of abuse were related to any of the following: age, ethnicity, education, language, and immigrant status of the patient, as well as clinician sex and ethnicity and the presence of an established clinician-patient relationship. We also looked into patients’ perceived barriers to communication about IPA, including lack of direct clinician questioning about abuse, perceptions about clinicians’ lack of time or interest in discussing abuse, fears about involving the police and courts, embarrassment, concerns about confidentiality, fear of shaming the family, and fear that the patient’s partner might hurt or kill her.
Methods
Study Population
Our sample consisted of women seen at 3 primary care outpatient clinics at San Francisco General Hospital in California.11 Each year these family medicine, general internal medicine, and obstetrics/gynecology clinics serve nearly 100,000 ethnically and socioeconomically diverse women aged 18 to 45 years. During the 3-year period preceding our study, many staff members at the 3 clinics received training to encourage identification and management of IPA in the medical setting. The training incorporated lectures and continuing medical education.
We selected the sample from a computerized patient utilization database for the 3 clinics during 1997. Selection criteria included: (1) female sex; (2) race/ethnicity African American, non-Latina white, or Latina; (3) age 18 to 45 years; and (4) receipt of care in 1 of the 3 primary care clinics in the previous 6 months. Women were selected for participation in this study because they are much more likely to have been abused by an intimate partner than are men. Only women who reported histories of abuse were included in this analysis.
Patients were considered eligible if they met all the selection criteria, spoke English or Spanish, had verifiable phone numbers, and were mentally and physically capable of completing the survey.
Survey Instrument
We developed the survey instrument through a review of the literature, including the results from some of the authors’ previous qualitative research, consultation with domestic violence researchers and advocates, and discussions with a focus group of 6 abused women. Final survey modifications were made following expert review and pilot testing with 75 women, 25 from each target ethnic group. The instrument included questions about patients’ social, health, and demographic characteristics; clinic and medical clinician utilization; and IPA experiences. Women who indicated histories of IPA were questioned about their experiences in obtaining abuse-related help in the medical system, the barriers to IPA communication with medical clinicians, and clinician demographics. The questionnaire was prepared in English and translated to Spanish using standard translation methods.12
Questions about abuse were adapted from the 4-question Abuse Assessment Screen, which has been validated in multiethnic populations.13 These questions asked whether the participants had ever experienced physical, sexual, or psychological abuse. For each positive response, women indicated whether the abuse had occurred in the past 12 months (recent abuse) or in the more distant past.
Prevalence of communication with clinicians about abuse was assessed by asking participants if they had ever mentioned or discussed abuse with a physician: (1) in response to direct clinician questioning or (2) in the absence of direct clinician questioning.
Data Collection
The survey was administered to the sample by computer-assisted telephone interview between October 1997 and March 1998. An introductory letter was mailed to the homes of all potential participants (to ensure safety the topic of abuse was not mentioned in this letter). Following this, trained women interviewers contacted potential participants by telephone. After confirming eligibility, privacy, and safety and obtaining verbal consent, interviews lasting approximately 25 minutes were conducted in English or Spanish. The study protocol was approved by the Committee for Human Research at the University of California, San Francisco.
Data Analyses
We analyzed the data using SPSS statistical software.14 IPA was defined as having ever been exposed as an adult to physical abuse, sexual abuse, or threats/fear of abuse. The principal outcome variable was previous communication with a medical clinician about IPA experiences. Predictor variables included age, ethnicity, birthplace, language, employment and medical insurance status, and education, as well as clinician sex and ethnicity, direct clinician questioning about abuse, and presence of an established relationship with a clinician (regular clinician). Additional predictor variables included patients’ perceived barriers to communication.
We used multiple logistic regression analysis to estimate crude and adjusted odds ratios (ORs) and 95% confidence intervals (95% CIs) for the factors associated with clinician-patient communication about abuse. Our final model includes variables of primary interest to our study (patient age, ethnicity, education, and presence of a regular clinician), as well as those variables that significantly influenced abuse-related communication (birthplace, direct questioning by a clinician, perceptions that clinicians lack time and interest in discussing abuse, and concerns about confidentiality). For cross tabulations, statistical significance was determined using the Pearson chi-square test. Statistical significance was defined as P less than .05.
Results
Sample Description
Of the 1390 patients selected from the database, 992 (71%) met the eligibility criteria. Of the 398 ineligible women, 315 (23%) did not have verifiable phone numbers, and 83 (6%) either did not speak English or Spanish, were incapable of completing the survey, or did not meet the original selection criteria. The overall collaboration rate was 74% (734/992) of the available eligible participants. Of the women interviewed, 51% (375) reported having ever been abused by an intimate partner as an adult. Further descriptive analyses are reported elsewhere.11 Among the 375 participants who reported a history of abuse: 88% (328) reported having experienced physical abuse; 33% (122) reported having experienced sexual abuse; and 66% (246) reported having experienced threats or fear of IPA. There was substantial overlap between abuse categories for most participants, and almost all women reporting a history of sexual abuse also reported a history of physical abuse. However, 7% (28) of the participants reported previous threats or fear of IPA in the absence of physical or sexual abuse.
Sample characteristics of all study participants with histories of abuse are summarized in Table 1. The mean age was 34.3 years (standard deviation [SD]=7.3 years). The study participants were primarily of lower socioeconomic status. Years of education ranged from none to postgraduate, with a mean of 11.9 years (SD=3.5 years).
Prevalence of Clinician-Patient Communication About Abuse
Summary prevalence data relating to clinician-patient communication are provided in Table 2. Among the 375 abused participants, 42% (159) reported communicating with a clinician about IPA. Among the 347 participants with a history of physical or sexual abuse, 45% (155) reported communicating with a clinician about IPA. Communication rates were significantly lower, however, among the 7% (28) of the participants who reported threats or fear of IPA in the absence of physical or sexual abuse (P <.05). Only 14% of the participants in this group reported having ever communicated with a clinician about abuse.
Overall, 28% of the participants reported direct questioning by a medical clinician about abuse; however, 85% of those who were questioned reported that they had disclosed the abuse when directly asked by their physicians. In the absence of direct questioning, only 25% of participants reported disclosing abuse to a physician. Rates of clinician inquiry about IPA did not vary significantly across ethnic groups.
There were no significant differences in frequency of communication between women reporting abuse in the past 12 months and women reporting abuse in the more distant past. Other variables not significantly associated with communication included employment, language, medical insurance status, primary care clinic, and clinician’s sex or ethnicity. In addition, having been asked directly about abuse by a clinician was not associated with age, ethnicity, birthplace, education, insurance status, or primary clinic site. However, on bivariate analysis, having been asked was significantly associated with having a regular physician (33% vs 21%, P=.02) and having been married (36% vs 23%, P=.01).
Barriers to Communication
Barriers that hindered patients’ desire to communicate included beliefs that clinicians do not ask directly about IPA and that clinicians lack time for and interest in discussing abuse. Participants were also asked whether their communication with clinicians was hindered by any of the following factors: concerns about confidentiality, fear of involving the police and courts, embarrassment, fear of shaming family, and fear that their partners would hurt or kill them.
Table 3 lists each of the perceived barriers by frequency of agreement according to participants’ abuse communication status (never communicated vs ever communicated). All of the factors (with the exception of one) were reported with greater frequency among women who had never disclosed abuse to a medical clinician than among those who had.
To determine if there were significant differences in the frequency of reported barriers according to communication status, we conducted cross-tabulations and determined statistical significance using the Pearson chi-square test. Statistical significance was defined as P less than .05. We obtained significant differences for each of the following barriers: beliefs that clinicians do not ask directly (P <.001), concerns about confidentiality (P <.001), beliefs that clinicians lack time for (P=.002) and interest in (P=.001) discussing abuse, and fear of involving the police and courts (P=.042).
Among the 108 abused Latina patients, 34% identified language barriers, and 21% reported concerns about the immigration authorities.
Predictors of Communication
To better understand the variables associated with clinician-patient communication about abuse, we used multivariate logistic regression Table 4. We found that the most significant predictor of communication was the presence of direct clinician questioning about abuse. Women who had been directly asked about abuse were much more likely to discuss it than were those who were not asked directly (OR=4.53; 95% CI, 3.20-6.40). Ethnicity also had an important effect on communication, with African American women more likely to communicate about abuse than white women (OR=1.77; 95% CI, 1.08-2.92). Immigrant status was also an important predictor. Patients born outside the United States were less likely than US-born women to have communicated about abuse (OR=0.57; 95% CI, 0.33-0.99). Also, women with concerns about confidentiality were less likely to discuss abuse with medical clinicians (OR=0.68; 95% CI, 0.48-0.94). Although age, formal education, regular clinician status, and perceptions about clinicians’ time and interest in discussing abuse had some impact on communication outcomes, none of these variables reached statistical significance.
Although each of the attitudinal barriers had an influence on the likelihood of communicating about abuse, only concerns about confidentiality reached statistical significance in the multivariate model.
Discussion
Our study is one of the first to quantitatively examine the patterns of IPA communication between an ethnically diverse group of abused women and their medical clinicians. Overall, the prevalence of IPA communication in our study (42%) was substantially higher than we had anticipated. In spite of this, most of the women (58%) had never disclosed abuse to a medical clinician. This suggests that improved efforts to identify and reduce barriers to IPA communication in the medical setting are still needed.
We found important differences in communication patterns between participants who had experienced threats or fear of IPA only (in the absence of physical or sexual abuse) and participants who had experienced physical or sexual abuse (14% vs 45%, respectively). Given the significant effects on physical health associated with psychological abuse, these findings suggest a need for greater clinician inquiry about psychological forms of IPA in addition to physical and sexual IPA. Our findings also underscore the importance of direct clinician questioning about IPA.6,7 In our study, less than one third (28%) of all participants reported having ever been directly questioned by a clinician about abuse. Among those who had been directly questioned, 85% had disclosed their abuse to a clinician, compared with only 25% of those who had never been directly questioned by a clinician. These findings support current recommendations for direct clinician inquiry about intimate partner abuse.5
We also found that birthplace is an important determinant of clinician-patient communication about abuse. In our study, women born outside the United States were much less likely to have disclosed abuse to a medical clinician than women born in the United States. Overall, 32% of immigrant participants reported previous communication with clinicians about abuse, compared with 46% of US-born participants. Low levels of communication among immigrant women (most of whom were Latina) may be found because foreign-born women and Latinas face numerous barriers to seeking medical help and communication with clinicians. These barriers include low levels of acculturation,15 discrimination, and language.16 It is clear that there is a need for special efforts to encourage communication about abuse among immigrant and Latina patients.17 Increased use of interpreters might be one means of addressing these barriers,18 in addition to greater sensitivity and attention to sociocultural and sociopolitical differences between patients and clinicians.19 These findings underscore the importance of cultural and linguistic competency when caring for the Latina population.
We identified a number of important barriers to clinician-patient communication. One is the belief that clinicians lack the time to discuss abuse. Fifty-three percent of the participants in our study felt that clinicians do not have time to discuss abuse (compared with 40% of women who had previously discussed abuse). This is consistent with previous research in which physicians noted time constraints as one of the deterrents to IPA communication with patients.20 One means of eliminating this barrier might involve delegating responsibility for abuse screenings to other medical professionals, such as nurses and physician assistants. Another barrier identified was patients’ fear of involving the police and courts. This finding is also consistent with previous research19 and reiterates questions about the utility of mandatory IPA reporting requirements.21,22
We also found that patients’ perceptions that clinicians lack interest in discussing abuse and concerns about confidentiality pose significant barriers to communication. Specifically among women who had never communicated with a medical clinician about abuse, 38% believed that clinicians lack interest in discussing it (compared with 25% of women who had previously communicated), and 37% had concerns about confidentiality (compared with 21% of women who had previously communicated with a clinician). This suggests the need for mechanisms to reduce these barriers during the abuse screening process. Even though clinician education about intimate partner abuse has been found to improve IPA screening practices,10,23-25 the most effective training modalities and follow-up mechanisms have not been identified.
We note that our findings indicate a lack of clinician’s sex/ethnicity effect, suggesting that these demographic differences may be less important than other factors in facilitating abuse-related communication.
Limitations
Our findings are subject to limitations. The sample consisted primarily of low-income women in an urban setting, and therefore our results may not apply to all ethnically diverse abused women attending primary care clinics. Also, our study did not include any women from Asian ethnic groups. We relied on self-reporting of an extremely sensitive issue that may have led to underidentification of IPA and inaccurate reporting of communication patterns because of recall bias and desirability effects. We were also unable to compare the degree of communication or reported barriers with other measures, such as clinician report or documentation of the medical record. Although our study had a very good response rate, we were unable to sample patients who did not have telephones, and resultant unrecognized selection bias may have occurred.
One final limitation pertains to the high rates of clinician-patient communication obtained in this study. Our findings may be disproportionately high because of greater-than-average levels of awareness about IPA among clinicians at the 3 clinics involved in this study. Many of these clinicians received training related to the detection of IPA before the study began. As a result, our findings may not accurately reflect the frequency of communication among demographically similar populations of abused women patients in other medical settings.
Suggestions for future research
Although our findings support the need for direct clinician inquiry about IPA among all women patients in the medical setting, there is a need for more information about how to most effectively screen patients, particularly among demographically diverse populations. There is also a need for clarification around the meaning of “routine screening” and for information about the extent to which differences in screening practices might affect communication outcomes. These differences include factors such as the type of clinician doing the screening and the frequency of screenings (ie, screenings at every visit vs annually vs only if the patient is in a new relationship).
Relatively little is known about clinician-patient communication patterns among different immigrant groups in the United States. Although our study examined the general influence of birthplace on communication outcomes, most of the immigrant women in our study were from Spanish-speaking countries, and immigration was not a focus of our study. Future research might look specifically at determinants of communication among various immigrant groups in the United States, in particular, Asian women, about whom relatively little is known regarding abuse-related communication.
Finally, we were unable to specifically examine the determinants of decreased IPA communication among immigrant women. It is possible that decreased communication within this population may have resulted from less contact with the medical system or from differential treatment by medical clinicians. Future research might look more closely at this issue.
Acknowledgements
Our research was supported by the Commonwealth Foundation and by a grant under the Resource Centers for Minority Aging Research Program by the National Institute on Aging, the National Institute of Nursing, and the Office of Research on Minority Health, National Institute of Health, grant # 1 P30 AG15272. Dr Rodriguez was a Picker/Commonwealth Scholar when this work was completed. We wish to thank Drs Kevin Grumbach and Elizabeth McLoughlin for assistance with study design, Dr Liza Pressor for data collection, and Gregory Nah for data management. In addition, we thank the many San Francisco advocates against domestic violence for their input into the survey content and design, and we thank the women who participated in our study.
Related resources
- National Domestic Violence Hotline http://www.ndvh.org, 1-800-799-SAFE (7233), 1-800-787-3224 (TDD)
- The Family Violence Prevention Fund http://www.fvpf.org/
- Intimate Partner Violence and Sexual Assault: A Guide to Training Materials and Programs for Healthcare Providers http://www.cdc.gov/ncipc/pub-res/pdf/newguide.pdf
- American Medical Association Violence Prevention Website http://www.ama-assn.org/ama/pub/category/3242.html
1. Straus M, Gelles R. Societal change and change in family violence from 1975 to 1985 as revealed by two national surveys. J Marriage Fam 1986;48:465-79.
2. Plichta SB, Duncan MM, Plichta L. Spouse abuse, patient-physician communication, and patient satisfaction. Am J Prev Med 1996;12:297-303.
3. McCauley J, Kern DE, Kolodner K, et al. The ‘battering syndrome’: prevalence and clinical characteristics of domestic violence in primary care internal medicine practices. Ann Intern Med 1995;123:737-46.
4. Hamberger LK, Saunders DG, Hovey M. Prevalence of domestic violence in community practice and rate of physician inquiry. Fam Med 1992;24:283-87.
5. Council on Scientific Affairs. American Medical Association. Violence against women: relevance for medical practitioners. JAMA 1992;267:3184-89.
6. Friedman LS, Sarnet JH, Roberts MS, et al. Inquiry about victimization experiences: a survey of patient p and physician practices. Arch Intern Med 1992;152:1186-90.
7. Caralis PV, Musialowski R. Women’s experiences with domestic violence and their attitudes and expectations regarding medical care of abuse victims. South Med J 1997;90:1075-80.
8. Gin NE, Rucker L, Frayne S, et al. Prevalence of domestic violence among patients in three ambulatory care internal medicine clinics. J Gen Intern Med 1991;6:317-22.
9. Straus MA, Smith C. Family patterns and primary prevention of family violence. Trends in health care, law & ethics 1993;8:17-26.
10. Rodríguez MA, Bauer HM, McLoughlin E, Grumbach K. Screening and intervention for intimate partner abuse: practices and attitudes of primary care physicians. JAMA 1999;282:468-74.
11. Bauer HM, Rodríguez MA, Pérez-Stable EJ. Prevalence and determinants of intimate partner abuse among public hospital primary care patients JGIM In press.
12. Brislin RW. Back-translation for cross-cultural research. J Cross-Cultural Psych 1970;1:185-216.
13. Soeken K, Parker B, McFarlane J, et al. The abuse assessment screen: a clinical instrument to measure frequency, severity, and perpetrator of abuse against women. In: Campbell JC, ed. Empowering survivors of abuse: health care for battered women and their children. Thousand Oaks, Calif: Sage Publications; 1998.
14. SPSS. Version 8.0 for Windows. Chicago, Ill: SPSS, Inc; 1998.
15. West CM, Kantor GK, Jasinski JL. Sociodemographic predictors and cultural barriers to help-seeking behavior by Latina and Anglo American battered women. Violence Victims 1998;13:361-75.
16. Bauer HM, Rodríguez MA, Quiroga SS, Flores-Ortiz YG. Barriers to health care for abused Latina and Asian immigrant women. J Health Care Poor Underserved 1999;11:33-44.
17. Morales LS, Cunningham WE, Brown JA, et al. Are Latinos less satisfied with communication by health care providers? J Gen Intern Med 1999;14:409-17.
18. Baker DW, Parker RM, Williams MV, et al. Use and effectiveness of interpreters in an emergency department. JAMA 1996;275:783-88.
19. Rodríguez MA, Quiroga SS, Bauer HM. Breaking the silence: battered women’s perspectives on medical care. Arch Fam Med 1996;5:153-58.
20. Sugg NK, Inui T. Primary care physicians’ response to domestic violence: opening Pandora’s box. JAMA 1992;267:3157-60.
21. Rodríguez MA, McLoughlin E, Bauer HM, et al. Mandatory reporting of intimate partner violence to police: views of physicians in California. Am J Public Health 1999;89:575-78.
22. Gerbert B, Caspers N, Bronstone A, et al. A qualitative analysis of how physicians with expertise in domestic violence approach the identification of victims. Ann Intern Med 1999;131:578-84.
23. Parsons LH, Zaccaro D, Wells B, Stovall TG. Methods of and attitudes toward screening obstetrics and gynecology patients for domestic violence. Am J Obstet Gynecol 1995;173:381-87.
24. Tilden VP, Schmidt TA, Limandri BJ, et al. Factors that influence clinicians’ assessment and management of family violence. Am J Public Health 1994;84:628-33.
25. Harwell TS, Casten RJ, Armstrong KA, et al. Results of a domestic violence training program offered to the staff of urban community health centers. Am J Prev Med 1998;15:235-41.
STUDY DESIGN: We conducted telephone interviews with a random sample of ethnically diverse abused women.
POPULATION: We included a total of 375 African American, Latina, and non-Latina white women aged 18 to 46 years with histories of intimate partner abuse who attended 1 of 3 public primary care clinics in San Francisco, California, in 1997.
OUTCOMES MEASURED: We measured the relevance and determinants of past communication with clinicians about abuse and barriers to communication.
RESULTS: Forty-two percent (159) of the patients reported having communicated with a clinician about abuse. Significant independent predictors of communication were direct clinician questioning about abuse (odds ratio [OR] =4.6; 95% confidence interval [CI], 3.2-6.6), and African American ethnicity (OR=1.8; 95% CI, 1.1-2.9). Factors associated with lack of communication about abuse included immigrant status (OR=0.6; 95% CI, 0.3-1.0) and patient concerns about confidentiality (OR=0.7; 95% CI, 0.5-0.9). Barriers significantly associated with lack of communication were patients’ perceptions that clinicians did not ask directly about abuse, beliefs that clinicians lack time and interest in discussing abuse, fears about involving police and courts, and concerns about confidentiality.
CONCLUSIONS: Clinician inquiry appears to be one of the strongest determinants of communication with patients about partner abuse. Other factors that need to be addressed include patient perceptions regarding clinicians’ time and interest in discussing abuse, fear of police or court involvement, and patient concerns about confidentiality.
It is estimated that intimate partner abuse (IPA) occurs in 4 to 6 million relationships each year in the United States1,2 and that many health care interactions involve abused patients in primary care settings.3,4 Clinicians are therefore well placed to identify IPA and to provide appropriate care and referrals. However, in spite of its high prevalence and the existence of published guidelines and recommendations for routine clinician screening,5 the majority of abused women patients are not identified in the medical system and do not receive needed assistance.6,7 Estimates of the prevalence of clinician-patient communication about IPA range from 10% to slightly less than one third of all abused women.2,8
Previous studies have shown that the low rates of clinician-patient communication about IPA result in part from a lack of direct questioning by many clinicians and because women rarely volunteer information about abuse without being asked. Less than 15% of women patients in primary care settings report being asked about abuse by health care professionals.2,4,6,7,9 A recent statewide study of primary care clinicians in California found that only 10% reported routine screening for abuse among new patients, and 9% reported such screening at periodic checkups.10 Yet the majority of women patients report that they favor direct questioning by clinicians about IPA and would reveal abuse histories if asked directly.6,7
Despite these studies there is much that remains unknown about abuse-related communication patterns and patient attitudes about communication in the medical setting. We examined the prevalence and determinants of clinician-patient communication about intimate partner abuse by interviewing an ethnically diverse group of abused women primary care patients to determine whether differences in disclosure of abuse were related to any of the following: age, ethnicity, education, language, and immigrant status of the patient, as well as clinician sex and ethnicity and the presence of an established clinician-patient relationship. We also looked into patients’ perceived barriers to communication about IPA, including lack of direct clinician questioning about abuse, perceptions about clinicians’ lack of time or interest in discussing abuse, fears about involving the police and courts, embarrassment, concerns about confidentiality, fear of shaming the family, and fear that the patient’s partner might hurt or kill her.
Methods
Study Population
Our sample consisted of women seen at 3 primary care outpatient clinics at San Francisco General Hospital in California.11 Each year these family medicine, general internal medicine, and obstetrics/gynecology clinics serve nearly 100,000 ethnically and socioeconomically diverse women aged 18 to 45 years. During the 3-year period preceding our study, many staff members at the 3 clinics received training to encourage identification and management of IPA in the medical setting. The training incorporated lectures and continuing medical education.
We selected the sample from a computerized patient utilization database for the 3 clinics during 1997. Selection criteria included: (1) female sex; (2) race/ethnicity African American, non-Latina white, or Latina; (3) age 18 to 45 years; and (4) receipt of care in 1 of the 3 primary care clinics in the previous 6 months. Women were selected for participation in this study because they are much more likely to have been abused by an intimate partner than are men. Only women who reported histories of abuse were included in this analysis.
Patients were considered eligible if they met all the selection criteria, spoke English or Spanish, had verifiable phone numbers, and were mentally and physically capable of completing the survey.
Survey Instrument
We developed the survey instrument through a review of the literature, including the results from some of the authors’ previous qualitative research, consultation with domestic violence researchers and advocates, and discussions with a focus group of 6 abused women. Final survey modifications were made following expert review and pilot testing with 75 women, 25 from each target ethnic group. The instrument included questions about patients’ social, health, and demographic characteristics; clinic and medical clinician utilization; and IPA experiences. Women who indicated histories of IPA were questioned about their experiences in obtaining abuse-related help in the medical system, the barriers to IPA communication with medical clinicians, and clinician demographics. The questionnaire was prepared in English and translated to Spanish using standard translation methods.12
Questions about abuse were adapted from the 4-question Abuse Assessment Screen, which has been validated in multiethnic populations.13 These questions asked whether the participants had ever experienced physical, sexual, or psychological abuse. For each positive response, women indicated whether the abuse had occurred in the past 12 months (recent abuse) or in the more distant past.
Prevalence of communication with clinicians about abuse was assessed by asking participants if they had ever mentioned or discussed abuse with a physician: (1) in response to direct clinician questioning or (2) in the absence of direct clinician questioning.
Data Collection
The survey was administered to the sample by computer-assisted telephone interview between October 1997 and March 1998. An introductory letter was mailed to the homes of all potential participants (to ensure safety the topic of abuse was not mentioned in this letter). Following this, trained women interviewers contacted potential participants by telephone. After confirming eligibility, privacy, and safety and obtaining verbal consent, interviews lasting approximately 25 minutes were conducted in English or Spanish. The study protocol was approved by the Committee for Human Research at the University of California, San Francisco.
Data Analyses
We analyzed the data using SPSS statistical software.14 IPA was defined as having ever been exposed as an adult to physical abuse, sexual abuse, or threats/fear of abuse. The principal outcome variable was previous communication with a medical clinician about IPA experiences. Predictor variables included age, ethnicity, birthplace, language, employment and medical insurance status, and education, as well as clinician sex and ethnicity, direct clinician questioning about abuse, and presence of an established relationship with a clinician (regular clinician). Additional predictor variables included patients’ perceived barriers to communication.
We used multiple logistic regression analysis to estimate crude and adjusted odds ratios (ORs) and 95% confidence intervals (95% CIs) for the factors associated with clinician-patient communication about abuse. Our final model includes variables of primary interest to our study (patient age, ethnicity, education, and presence of a regular clinician), as well as those variables that significantly influenced abuse-related communication (birthplace, direct questioning by a clinician, perceptions that clinicians lack time and interest in discussing abuse, and concerns about confidentiality). For cross tabulations, statistical significance was determined using the Pearson chi-square test. Statistical significance was defined as P less than .05.
Results
Sample Description
Of the 1390 patients selected from the database, 992 (71%) met the eligibility criteria. Of the 398 ineligible women, 315 (23%) did not have verifiable phone numbers, and 83 (6%) either did not speak English or Spanish, were incapable of completing the survey, or did not meet the original selection criteria. The overall collaboration rate was 74% (734/992) of the available eligible participants. Of the women interviewed, 51% (375) reported having ever been abused by an intimate partner as an adult. Further descriptive analyses are reported elsewhere.11 Among the 375 participants who reported a history of abuse: 88% (328) reported having experienced physical abuse; 33% (122) reported having experienced sexual abuse; and 66% (246) reported having experienced threats or fear of IPA. There was substantial overlap between abuse categories for most participants, and almost all women reporting a history of sexual abuse also reported a history of physical abuse. However, 7% (28) of the participants reported previous threats or fear of IPA in the absence of physical or sexual abuse.
Sample characteristics of all study participants with histories of abuse are summarized in Table 1. The mean age was 34.3 years (standard deviation [SD]=7.3 years). The study participants were primarily of lower socioeconomic status. Years of education ranged from none to postgraduate, with a mean of 11.9 years (SD=3.5 years).
Prevalence of Clinician-Patient Communication About Abuse
Summary prevalence data relating to clinician-patient communication are provided in Table 2. Among the 375 abused participants, 42% (159) reported communicating with a clinician about IPA. Among the 347 participants with a history of physical or sexual abuse, 45% (155) reported communicating with a clinician about IPA. Communication rates were significantly lower, however, among the 7% (28) of the participants who reported threats or fear of IPA in the absence of physical or sexual abuse (P <.05). Only 14% of the participants in this group reported having ever communicated with a clinician about abuse.
Overall, 28% of the participants reported direct questioning by a medical clinician about abuse; however, 85% of those who were questioned reported that they had disclosed the abuse when directly asked by their physicians. In the absence of direct questioning, only 25% of participants reported disclosing abuse to a physician. Rates of clinician inquiry about IPA did not vary significantly across ethnic groups.
There were no significant differences in frequency of communication between women reporting abuse in the past 12 months and women reporting abuse in the more distant past. Other variables not significantly associated with communication included employment, language, medical insurance status, primary care clinic, and clinician’s sex or ethnicity. In addition, having been asked directly about abuse by a clinician was not associated with age, ethnicity, birthplace, education, insurance status, or primary clinic site. However, on bivariate analysis, having been asked was significantly associated with having a regular physician (33% vs 21%, P=.02) and having been married (36% vs 23%, P=.01).
Barriers to Communication
Barriers that hindered patients’ desire to communicate included beliefs that clinicians do not ask directly about IPA and that clinicians lack time for and interest in discussing abuse. Participants were also asked whether their communication with clinicians was hindered by any of the following factors: concerns about confidentiality, fear of involving the police and courts, embarrassment, fear of shaming family, and fear that their partners would hurt or kill them.
Table 3 lists each of the perceived barriers by frequency of agreement according to participants’ abuse communication status (never communicated vs ever communicated). All of the factors (with the exception of one) were reported with greater frequency among women who had never disclosed abuse to a medical clinician than among those who had.
To determine if there were significant differences in the frequency of reported barriers according to communication status, we conducted cross-tabulations and determined statistical significance using the Pearson chi-square test. Statistical significance was defined as P less than .05. We obtained significant differences for each of the following barriers: beliefs that clinicians do not ask directly (P <.001), concerns about confidentiality (P <.001), beliefs that clinicians lack time for (P=.002) and interest in (P=.001) discussing abuse, and fear of involving the police and courts (P=.042).
Among the 108 abused Latina patients, 34% identified language barriers, and 21% reported concerns about the immigration authorities.
Predictors of Communication
To better understand the variables associated with clinician-patient communication about abuse, we used multivariate logistic regression Table 4. We found that the most significant predictor of communication was the presence of direct clinician questioning about abuse. Women who had been directly asked about abuse were much more likely to discuss it than were those who were not asked directly (OR=4.53; 95% CI, 3.20-6.40). Ethnicity also had an important effect on communication, with African American women more likely to communicate about abuse than white women (OR=1.77; 95% CI, 1.08-2.92). Immigrant status was also an important predictor. Patients born outside the United States were less likely than US-born women to have communicated about abuse (OR=0.57; 95% CI, 0.33-0.99). Also, women with concerns about confidentiality were less likely to discuss abuse with medical clinicians (OR=0.68; 95% CI, 0.48-0.94). Although age, formal education, regular clinician status, and perceptions about clinicians’ time and interest in discussing abuse had some impact on communication outcomes, none of these variables reached statistical significance.
Although each of the attitudinal barriers had an influence on the likelihood of communicating about abuse, only concerns about confidentiality reached statistical significance in the multivariate model.
Discussion
Our study is one of the first to quantitatively examine the patterns of IPA communication between an ethnically diverse group of abused women and their medical clinicians. Overall, the prevalence of IPA communication in our study (42%) was substantially higher than we had anticipated. In spite of this, most of the women (58%) had never disclosed abuse to a medical clinician. This suggests that improved efforts to identify and reduce barriers to IPA communication in the medical setting are still needed.
We found important differences in communication patterns between participants who had experienced threats or fear of IPA only (in the absence of physical or sexual abuse) and participants who had experienced physical or sexual abuse (14% vs 45%, respectively). Given the significant effects on physical health associated with psychological abuse, these findings suggest a need for greater clinician inquiry about psychological forms of IPA in addition to physical and sexual IPA. Our findings also underscore the importance of direct clinician questioning about IPA.6,7 In our study, less than one third (28%) of all participants reported having ever been directly questioned by a clinician about abuse. Among those who had been directly questioned, 85% had disclosed their abuse to a clinician, compared with only 25% of those who had never been directly questioned by a clinician. These findings support current recommendations for direct clinician inquiry about intimate partner abuse.5
We also found that birthplace is an important determinant of clinician-patient communication about abuse. In our study, women born outside the United States were much less likely to have disclosed abuse to a medical clinician than women born in the United States. Overall, 32% of immigrant participants reported previous communication with clinicians about abuse, compared with 46% of US-born participants. Low levels of communication among immigrant women (most of whom were Latina) may be found because foreign-born women and Latinas face numerous barriers to seeking medical help and communication with clinicians. These barriers include low levels of acculturation,15 discrimination, and language.16 It is clear that there is a need for special efforts to encourage communication about abuse among immigrant and Latina patients.17 Increased use of interpreters might be one means of addressing these barriers,18 in addition to greater sensitivity and attention to sociocultural and sociopolitical differences between patients and clinicians.19 These findings underscore the importance of cultural and linguistic competency when caring for the Latina population.
We identified a number of important barriers to clinician-patient communication. One is the belief that clinicians lack the time to discuss abuse. Fifty-three percent of the participants in our study felt that clinicians do not have time to discuss abuse (compared with 40% of women who had previously discussed abuse). This is consistent with previous research in which physicians noted time constraints as one of the deterrents to IPA communication with patients.20 One means of eliminating this barrier might involve delegating responsibility for abuse screenings to other medical professionals, such as nurses and physician assistants. Another barrier identified was patients’ fear of involving the police and courts. This finding is also consistent with previous research19 and reiterates questions about the utility of mandatory IPA reporting requirements.21,22
We also found that patients’ perceptions that clinicians lack interest in discussing abuse and concerns about confidentiality pose significant barriers to communication. Specifically among women who had never communicated with a medical clinician about abuse, 38% believed that clinicians lack interest in discussing it (compared with 25% of women who had previously communicated), and 37% had concerns about confidentiality (compared with 21% of women who had previously communicated with a clinician). This suggests the need for mechanisms to reduce these barriers during the abuse screening process. Even though clinician education about intimate partner abuse has been found to improve IPA screening practices,10,23-25 the most effective training modalities and follow-up mechanisms have not been identified.
We note that our findings indicate a lack of clinician’s sex/ethnicity effect, suggesting that these demographic differences may be less important than other factors in facilitating abuse-related communication.
Limitations
Our findings are subject to limitations. The sample consisted primarily of low-income women in an urban setting, and therefore our results may not apply to all ethnically diverse abused women attending primary care clinics. Also, our study did not include any women from Asian ethnic groups. We relied on self-reporting of an extremely sensitive issue that may have led to underidentification of IPA and inaccurate reporting of communication patterns because of recall bias and desirability effects. We were also unable to compare the degree of communication or reported barriers with other measures, such as clinician report or documentation of the medical record. Although our study had a very good response rate, we were unable to sample patients who did not have telephones, and resultant unrecognized selection bias may have occurred.
One final limitation pertains to the high rates of clinician-patient communication obtained in this study. Our findings may be disproportionately high because of greater-than-average levels of awareness about IPA among clinicians at the 3 clinics involved in this study. Many of these clinicians received training related to the detection of IPA before the study began. As a result, our findings may not accurately reflect the frequency of communication among demographically similar populations of abused women patients in other medical settings.
Suggestions for future research
Although our findings support the need for direct clinician inquiry about IPA among all women patients in the medical setting, there is a need for more information about how to most effectively screen patients, particularly among demographically diverse populations. There is also a need for clarification around the meaning of “routine screening” and for information about the extent to which differences in screening practices might affect communication outcomes. These differences include factors such as the type of clinician doing the screening and the frequency of screenings (ie, screenings at every visit vs annually vs only if the patient is in a new relationship).
Relatively little is known about clinician-patient communication patterns among different immigrant groups in the United States. Although our study examined the general influence of birthplace on communication outcomes, most of the immigrant women in our study were from Spanish-speaking countries, and immigration was not a focus of our study. Future research might look specifically at determinants of communication among various immigrant groups in the United States, in particular, Asian women, about whom relatively little is known regarding abuse-related communication.
Finally, we were unable to specifically examine the determinants of decreased IPA communication among immigrant women. It is possible that decreased communication within this population may have resulted from less contact with the medical system or from differential treatment by medical clinicians. Future research might look more closely at this issue.
Acknowledgements
Our research was supported by the Commonwealth Foundation and by a grant under the Resource Centers for Minority Aging Research Program by the National Institute on Aging, the National Institute of Nursing, and the Office of Research on Minority Health, National Institute of Health, grant # 1 P30 AG15272. Dr Rodriguez was a Picker/Commonwealth Scholar when this work was completed. We wish to thank Drs Kevin Grumbach and Elizabeth McLoughlin for assistance with study design, Dr Liza Pressor for data collection, and Gregory Nah for data management. In addition, we thank the many San Francisco advocates against domestic violence for their input into the survey content and design, and we thank the women who participated in our study.
Related resources
- National Domestic Violence Hotline http://www.ndvh.org, 1-800-799-SAFE (7233), 1-800-787-3224 (TDD)
- The Family Violence Prevention Fund http://www.fvpf.org/
- Intimate Partner Violence and Sexual Assault: A Guide to Training Materials and Programs for Healthcare Providers http://www.cdc.gov/ncipc/pub-res/pdf/newguide.pdf
- American Medical Association Violence Prevention Website http://www.ama-assn.org/ama/pub/category/3242.html
STUDY DESIGN: We conducted telephone interviews with a random sample of ethnically diverse abused women.
POPULATION: We included a total of 375 African American, Latina, and non-Latina white women aged 18 to 46 years with histories of intimate partner abuse who attended 1 of 3 public primary care clinics in San Francisco, California, in 1997.
OUTCOMES MEASURED: We measured the relevance and determinants of past communication with clinicians about abuse and barriers to communication.
RESULTS: Forty-two percent (159) of the patients reported having communicated with a clinician about abuse. Significant independent predictors of communication were direct clinician questioning about abuse (odds ratio [OR] =4.6; 95% confidence interval [CI], 3.2-6.6), and African American ethnicity (OR=1.8; 95% CI, 1.1-2.9). Factors associated with lack of communication about abuse included immigrant status (OR=0.6; 95% CI, 0.3-1.0) and patient concerns about confidentiality (OR=0.7; 95% CI, 0.5-0.9). Barriers significantly associated with lack of communication were patients’ perceptions that clinicians did not ask directly about abuse, beliefs that clinicians lack time and interest in discussing abuse, fears about involving police and courts, and concerns about confidentiality.
CONCLUSIONS: Clinician inquiry appears to be one of the strongest determinants of communication with patients about partner abuse. Other factors that need to be addressed include patient perceptions regarding clinicians’ time and interest in discussing abuse, fear of police or court involvement, and patient concerns about confidentiality.
It is estimated that intimate partner abuse (IPA) occurs in 4 to 6 million relationships each year in the United States1,2 and that many health care interactions involve abused patients in primary care settings.3,4 Clinicians are therefore well placed to identify IPA and to provide appropriate care and referrals. However, in spite of its high prevalence and the existence of published guidelines and recommendations for routine clinician screening,5 the majority of abused women patients are not identified in the medical system and do not receive needed assistance.6,7 Estimates of the prevalence of clinician-patient communication about IPA range from 10% to slightly less than one third of all abused women.2,8
Previous studies have shown that the low rates of clinician-patient communication about IPA result in part from a lack of direct questioning by many clinicians and because women rarely volunteer information about abuse without being asked. Less than 15% of women patients in primary care settings report being asked about abuse by health care professionals.2,4,6,7,9 A recent statewide study of primary care clinicians in California found that only 10% reported routine screening for abuse among new patients, and 9% reported such screening at periodic checkups.10 Yet the majority of women patients report that they favor direct questioning by clinicians about IPA and would reveal abuse histories if asked directly.6,7
Despite these studies there is much that remains unknown about abuse-related communication patterns and patient attitudes about communication in the medical setting. We examined the prevalence and determinants of clinician-patient communication about intimate partner abuse by interviewing an ethnically diverse group of abused women primary care patients to determine whether differences in disclosure of abuse were related to any of the following: age, ethnicity, education, language, and immigrant status of the patient, as well as clinician sex and ethnicity and the presence of an established clinician-patient relationship. We also looked into patients’ perceived barriers to communication about IPA, including lack of direct clinician questioning about abuse, perceptions about clinicians’ lack of time or interest in discussing abuse, fears about involving the police and courts, embarrassment, concerns about confidentiality, fear of shaming the family, and fear that the patient’s partner might hurt or kill her.
Methods
Study Population
Our sample consisted of women seen at 3 primary care outpatient clinics at San Francisco General Hospital in California.11 Each year these family medicine, general internal medicine, and obstetrics/gynecology clinics serve nearly 100,000 ethnically and socioeconomically diverse women aged 18 to 45 years. During the 3-year period preceding our study, many staff members at the 3 clinics received training to encourage identification and management of IPA in the medical setting. The training incorporated lectures and continuing medical education.
We selected the sample from a computerized patient utilization database for the 3 clinics during 1997. Selection criteria included: (1) female sex; (2) race/ethnicity African American, non-Latina white, or Latina; (3) age 18 to 45 years; and (4) receipt of care in 1 of the 3 primary care clinics in the previous 6 months. Women were selected for participation in this study because they are much more likely to have been abused by an intimate partner than are men. Only women who reported histories of abuse were included in this analysis.
Patients were considered eligible if they met all the selection criteria, spoke English or Spanish, had verifiable phone numbers, and were mentally and physically capable of completing the survey.
Survey Instrument
We developed the survey instrument through a review of the literature, including the results from some of the authors’ previous qualitative research, consultation with domestic violence researchers and advocates, and discussions with a focus group of 6 abused women. Final survey modifications were made following expert review and pilot testing with 75 women, 25 from each target ethnic group. The instrument included questions about patients’ social, health, and demographic characteristics; clinic and medical clinician utilization; and IPA experiences. Women who indicated histories of IPA were questioned about their experiences in obtaining abuse-related help in the medical system, the barriers to IPA communication with medical clinicians, and clinician demographics. The questionnaire was prepared in English and translated to Spanish using standard translation methods.12
Questions about abuse were adapted from the 4-question Abuse Assessment Screen, which has been validated in multiethnic populations.13 These questions asked whether the participants had ever experienced physical, sexual, or psychological abuse. For each positive response, women indicated whether the abuse had occurred in the past 12 months (recent abuse) or in the more distant past.
Prevalence of communication with clinicians about abuse was assessed by asking participants if they had ever mentioned or discussed abuse with a physician: (1) in response to direct clinician questioning or (2) in the absence of direct clinician questioning.
Data Collection
The survey was administered to the sample by computer-assisted telephone interview between October 1997 and March 1998. An introductory letter was mailed to the homes of all potential participants (to ensure safety the topic of abuse was not mentioned in this letter). Following this, trained women interviewers contacted potential participants by telephone. After confirming eligibility, privacy, and safety and obtaining verbal consent, interviews lasting approximately 25 minutes were conducted in English or Spanish. The study protocol was approved by the Committee for Human Research at the University of California, San Francisco.
Data Analyses
We analyzed the data using SPSS statistical software.14 IPA was defined as having ever been exposed as an adult to physical abuse, sexual abuse, or threats/fear of abuse. The principal outcome variable was previous communication with a medical clinician about IPA experiences. Predictor variables included age, ethnicity, birthplace, language, employment and medical insurance status, and education, as well as clinician sex and ethnicity, direct clinician questioning about abuse, and presence of an established relationship with a clinician (regular clinician). Additional predictor variables included patients’ perceived barriers to communication.
We used multiple logistic regression analysis to estimate crude and adjusted odds ratios (ORs) and 95% confidence intervals (95% CIs) for the factors associated with clinician-patient communication about abuse. Our final model includes variables of primary interest to our study (patient age, ethnicity, education, and presence of a regular clinician), as well as those variables that significantly influenced abuse-related communication (birthplace, direct questioning by a clinician, perceptions that clinicians lack time and interest in discussing abuse, and concerns about confidentiality). For cross tabulations, statistical significance was determined using the Pearson chi-square test. Statistical significance was defined as P less than .05.
Results
Sample Description
Of the 1390 patients selected from the database, 992 (71%) met the eligibility criteria. Of the 398 ineligible women, 315 (23%) did not have verifiable phone numbers, and 83 (6%) either did not speak English or Spanish, were incapable of completing the survey, or did not meet the original selection criteria. The overall collaboration rate was 74% (734/992) of the available eligible participants. Of the women interviewed, 51% (375) reported having ever been abused by an intimate partner as an adult. Further descriptive analyses are reported elsewhere.11 Among the 375 participants who reported a history of abuse: 88% (328) reported having experienced physical abuse; 33% (122) reported having experienced sexual abuse; and 66% (246) reported having experienced threats or fear of IPA. There was substantial overlap between abuse categories for most participants, and almost all women reporting a history of sexual abuse also reported a history of physical abuse. However, 7% (28) of the participants reported previous threats or fear of IPA in the absence of physical or sexual abuse.
Sample characteristics of all study participants with histories of abuse are summarized in Table 1. The mean age was 34.3 years (standard deviation [SD]=7.3 years). The study participants were primarily of lower socioeconomic status. Years of education ranged from none to postgraduate, with a mean of 11.9 years (SD=3.5 years).
Prevalence of Clinician-Patient Communication About Abuse
Summary prevalence data relating to clinician-patient communication are provided in Table 2. Among the 375 abused participants, 42% (159) reported communicating with a clinician about IPA. Among the 347 participants with a history of physical or sexual abuse, 45% (155) reported communicating with a clinician about IPA. Communication rates were significantly lower, however, among the 7% (28) of the participants who reported threats or fear of IPA in the absence of physical or sexual abuse (P <.05). Only 14% of the participants in this group reported having ever communicated with a clinician about abuse.
Overall, 28% of the participants reported direct questioning by a medical clinician about abuse; however, 85% of those who were questioned reported that they had disclosed the abuse when directly asked by their physicians. In the absence of direct questioning, only 25% of participants reported disclosing abuse to a physician. Rates of clinician inquiry about IPA did not vary significantly across ethnic groups.
There were no significant differences in frequency of communication between women reporting abuse in the past 12 months and women reporting abuse in the more distant past. Other variables not significantly associated with communication included employment, language, medical insurance status, primary care clinic, and clinician’s sex or ethnicity. In addition, having been asked directly about abuse by a clinician was not associated with age, ethnicity, birthplace, education, insurance status, or primary clinic site. However, on bivariate analysis, having been asked was significantly associated with having a regular physician (33% vs 21%, P=.02) and having been married (36% vs 23%, P=.01).
Barriers to Communication
Barriers that hindered patients’ desire to communicate included beliefs that clinicians do not ask directly about IPA and that clinicians lack time for and interest in discussing abuse. Participants were also asked whether their communication with clinicians was hindered by any of the following factors: concerns about confidentiality, fear of involving the police and courts, embarrassment, fear of shaming family, and fear that their partners would hurt or kill them.
Table 3 lists each of the perceived barriers by frequency of agreement according to participants’ abuse communication status (never communicated vs ever communicated). All of the factors (with the exception of one) were reported with greater frequency among women who had never disclosed abuse to a medical clinician than among those who had.
To determine if there were significant differences in the frequency of reported barriers according to communication status, we conducted cross-tabulations and determined statistical significance using the Pearson chi-square test. Statistical significance was defined as P less than .05. We obtained significant differences for each of the following barriers: beliefs that clinicians do not ask directly (P <.001), concerns about confidentiality (P <.001), beliefs that clinicians lack time for (P=.002) and interest in (P=.001) discussing abuse, and fear of involving the police and courts (P=.042).
Among the 108 abused Latina patients, 34% identified language barriers, and 21% reported concerns about the immigration authorities.
Predictors of Communication
To better understand the variables associated with clinician-patient communication about abuse, we used multivariate logistic regression Table 4. We found that the most significant predictor of communication was the presence of direct clinician questioning about abuse. Women who had been directly asked about abuse were much more likely to discuss it than were those who were not asked directly (OR=4.53; 95% CI, 3.20-6.40). Ethnicity also had an important effect on communication, with African American women more likely to communicate about abuse than white women (OR=1.77; 95% CI, 1.08-2.92). Immigrant status was also an important predictor. Patients born outside the United States were less likely than US-born women to have communicated about abuse (OR=0.57; 95% CI, 0.33-0.99). Also, women with concerns about confidentiality were less likely to discuss abuse with medical clinicians (OR=0.68; 95% CI, 0.48-0.94). Although age, formal education, regular clinician status, and perceptions about clinicians’ time and interest in discussing abuse had some impact on communication outcomes, none of these variables reached statistical significance.
Although each of the attitudinal barriers had an influence on the likelihood of communicating about abuse, only concerns about confidentiality reached statistical significance in the multivariate model.
Discussion
Our study is one of the first to quantitatively examine the patterns of IPA communication between an ethnically diverse group of abused women and their medical clinicians. Overall, the prevalence of IPA communication in our study (42%) was substantially higher than we had anticipated. In spite of this, most of the women (58%) had never disclosed abuse to a medical clinician. This suggests that improved efforts to identify and reduce barriers to IPA communication in the medical setting are still needed.
We found important differences in communication patterns between participants who had experienced threats or fear of IPA only (in the absence of physical or sexual abuse) and participants who had experienced physical or sexual abuse (14% vs 45%, respectively). Given the significant effects on physical health associated with psychological abuse, these findings suggest a need for greater clinician inquiry about psychological forms of IPA in addition to physical and sexual IPA. Our findings also underscore the importance of direct clinician questioning about IPA.6,7 In our study, less than one third (28%) of all participants reported having ever been directly questioned by a clinician about abuse. Among those who had been directly questioned, 85% had disclosed their abuse to a clinician, compared with only 25% of those who had never been directly questioned by a clinician. These findings support current recommendations for direct clinician inquiry about intimate partner abuse.5
We also found that birthplace is an important determinant of clinician-patient communication about abuse. In our study, women born outside the United States were much less likely to have disclosed abuse to a medical clinician than women born in the United States. Overall, 32% of immigrant participants reported previous communication with clinicians about abuse, compared with 46% of US-born participants. Low levels of communication among immigrant women (most of whom were Latina) may be found because foreign-born women and Latinas face numerous barriers to seeking medical help and communication with clinicians. These barriers include low levels of acculturation,15 discrimination, and language.16 It is clear that there is a need for special efforts to encourage communication about abuse among immigrant and Latina patients.17 Increased use of interpreters might be one means of addressing these barriers,18 in addition to greater sensitivity and attention to sociocultural and sociopolitical differences between patients and clinicians.19 These findings underscore the importance of cultural and linguistic competency when caring for the Latina population.
We identified a number of important barriers to clinician-patient communication. One is the belief that clinicians lack the time to discuss abuse. Fifty-three percent of the participants in our study felt that clinicians do not have time to discuss abuse (compared with 40% of women who had previously discussed abuse). This is consistent with previous research in which physicians noted time constraints as one of the deterrents to IPA communication with patients.20 One means of eliminating this barrier might involve delegating responsibility for abuse screenings to other medical professionals, such as nurses and physician assistants. Another barrier identified was patients’ fear of involving the police and courts. This finding is also consistent with previous research19 and reiterates questions about the utility of mandatory IPA reporting requirements.21,22
We also found that patients’ perceptions that clinicians lack interest in discussing abuse and concerns about confidentiality pose significant barriers to communication. Specifically among women who had never communicated with a medical clinician about abuse, 38% believed that clinicians lack interest in discussing it (compared with 25% of women who had previously communicated), and 37% had concerns about confidentiality (compared with 21% of women who had previously communicated with a clinician). This suggests the need for mechanisms to reduce these barriers during the abuse screening process. Even though clinician education about intimate partner abuse has been found to improve IPA screening practices,10,23-25 the most effective training modalities and follow-up mechanisms have not been identified.
We note that our findings indicate a lack of clinician’s sex/ethnicity effect, suggesting that these demographic differences may be less important than other factors in facilitating abuse-related communication.
Limitations
Our findings are subject to limitations. The sample consisted primarily of low-income women in an urban setting, and therefore our results may not apply to all ethnically diverse abused women attending primary care clinics. Also, our study did not include any women from Asian ethnic groups. We relied on self-reporting of an extremely sensitive issue that may have led to underidentification of IPA and inaccurate reporting of communication patterns because of recall bias and desirability effects. We were also unable to compare the degree of communication or reported barriers with other measures, such as clinician report or documentation of the medical record. Although our study had a very good response rate, we were unable to sample patients who did not have telephones, and resultant unrecognized selection bias may have occurred.
One final limitation pertains to the high rates of clinician-patient communication obtained in this study. Our findings may be disproportionately high because of greater-than-average levels of awareness about IPA among clinicians at the 3 clinics involved in this study. Many of these clinicians received training related to the detection of IPA before the study began. As a result, our findings may not accurately reflect the frequency of communication among demographically similar populations of abused women patients in other medical settings.
Suggestions for future research
Although our findings support the need for direct clinician inquiry about IPA among all women patients in the medical setting, there is a need for more information about how to most effectively screen patients, particularly among demographically diverse populations. There is also a need for clarification around the meaning of “routine screening” and for information about the extent to which differences in screening practices might affect communication outcomes. These differences include factors such as the type of clinician doing the screening and the frequency of screenings (ie, screenings at every visit vs annually vs only if the patient is in a new relationship).
Relatively little is known about clinician-patient communication patterns among different immigrant groups in the United States. Although our study examined the general influence of birthplace on communication outcomes, most of the immigrant women in our study were from Spanish-speaking countries, and immigration was not a focus of our study. Future research might look specifically at determinants of communication among various immigrant groups in the United States, in particular, Asian women, about whom relatively little is known regarding abuse-related communication.
Finally, we were unable to specifically examine the determinants of decreased IPA communication among immigrant women. It is possible that decreased communication within this population may have resulted from less contact with the medical system or from differential treatment by medical clinicians. Future research might look more closely at this issue.
Acknowledgements
Our research was supported by the Commonwealth Foundation and by a grant under the Resource Centers for Minority Aging Research Program by the National Institute on Aging, the National Institute of Nursing, and the Office of Research on Minority Health, National Institute of Health, grant # 1 P30 AG15272. Dr Rodriguez was a Picker/Commonwealth Scholar when this work was completed. We wish to thank Drs Kevin Grumbach and Elizabeth McLoughlin for assistance with study design, Dr Liza Pressor for data collection, and Gregory Nah for data management. In addition, we thank the many San Francisco advocates against domestic violence for their input into the survey content and design, and we thank the women who participated in our study.
Related resources
- National Domestic Violence Hotline http://www.ndvh.org, 1-800-799-SAFE (7233), 1-800-787-3224 (TDD)
- The Family Violence Prevention Fund http://www.fvpf.org/
- Intimate Partner Violence and Sexual Assault: A Guide to Training Materials and Programs for Healthcare Providers http://www.cdc.gov/ncipc/pub-res/pdf/newguide.pdf
- American Medical Association Violence Prevention Website http://www.ama-assn.org/ama/pub/category/3242.html
1. Straus M, Gelles R. Societal change and change in family violence from 1975 to 1985 as revealed by two national surveys. J Marriage Fam 1986;48:465-79.
2. Plichta SB, Duncan MM, Plichta L. Spouse abuse, patient-physician communication, and patient satisfaction. Am J Prev Med 1996;12:297-303.
3. McCauley J, Kern DE, Kolodner K, et al. The ‘battering syndrome’: prevalence and clinical characteristics of domestic violence in primary care internal medicine practices. Ann Intern Med 1995;123:737-46.
4. Hamberger LK, Saunders DG, Hovey M. Prevalence of domestic violence in community practice and rate of physician inquiry. Fam Med 1992;24:283-87.
5. Council on Scientific Affairs. American Medical Association. Violence against women: relevance for medical practitioners. JAMA 1992;267:3184-89.
6. Friedman LS, Sarnet JH, Roberts MS, et al. Inquiry about victimization experiences: a survey of patient p and physician practices. Arch Intern Med 1992;152:1186-90.
7. Caralis PV, Musialowski R. Women’s experiences with domestic violence and their attitudes and expectations regarding medical care of abuse victims. South Med J 1997;90:1075-80.
8. Gin NE, Rucker L, Frayne S, et al. Prevalence of domestic violence among patients in three ambulatory care internal medicine clinics. J Gen Intern Med 1991;6:317-22.
9. Straus MA, Smith C. Family patterns and primary prevention of family violence. Trends in health care, law & ethics 1993;8:17-26.
10. Rodríguez MA, Bauer HM, McLoughlin E, Grumbach K. Screening and intervention for intimate partner abuse: practices and attitudes of primary care physicians. JAMA 1999;282:468-74.
11. Bauer HM, Rodríguez MA, Pérez-Stable EJ. Prevalence and determinants of intimate partner abuse among public hospital primary care patients JGIM In press.
12. Brislin RW. Back-translation for cross-cultural research. J Cross-Cultural Psych 1970;1:185-216.
13. Soeken K, Parker B, McFarlane J, et al. The abuse assessment screen: a clinical instrument to measure frequency, severity, and perpetrator of abuse against women. In: Campbell JC, ed. Empowering survivors of abuse: health care for battered women and their children. Thousand Oaks, Calif: Sage Publications; 1998.
14. SPSS. Version 8.0 for Windows. Chicago, Ill: SPSS, Inc; 1998.
15. West CM, Kantor GK, Jasinski JL. Sociodemographic predictors and cultural barriers to help-seeking behavior by Latina and Anglo American battered women. Violence Victims 1998;13:361-75.
16. Bauer HM, Rodríguez MA, Quiroga SS, Flores-Ortiz YG. Barriers to health care for abused Latina and Asian immigrant women. J Health Care Poor Underserved 1999;11:33-44.
17. Morales LS, Cunningham WE, Brown JA, et al. Are Latinos less satisfied with communication by health care providers? J Gen Intern Med 1999;14:409-17.
18. Baker DW, Parker RM, Williams MV, et al. Use and effectiveness of interpreters in an emergency department. JAMA 1996;275:783-88.
19. Rodríguez MA, Quiroga SS, Bauer HM. Breaking the silence: battered women’s perspectives on medical care. Arch Fam Med 1996;5:153-58.
20. Sugg NK, Inui T. Primary care physicians’ response to domestic violence: opening Pandora’s box. JAMA 1992;267:3157-60.
21. Rodríguez MA, McLoughlin E, Bauer HM, et al. Mandatory reporting of intimate partner violence to police: views of physicians in California. Am J Public Health 1999;89:575-78.
22. Gerbert B, Caspers N, Bronstone A, et al. A qualitative analysis of how physicians with expertise in domestic violence approach the identification of victims. Ann Intern Med 1999;131:578-84.
23. Parsons LH, Zaccaro D, Wells B, Stovall TG. Methods of and attitudes toward screening obstetrics and gynecology patients for domestic violence. Am J Obstet Gynecol 1995;173:381-87.
24. Tilden VP, Schmidt TA, Limandri BJ, et al. Factors that influence clinicians’ assessment and management of family violence. Am J Public Health 1994;84:628-33.
25. Harwell TS, Casten RJ, Armstrong KA, et al. Results of a domestic violence training program offered to the staff of urban community health centers. Am J Prev Med 1998;15:235-41.
1. Straus M, Gelles R. Societal change and change in family violence from 1975 to 1985 as revealed by two national surveys. J Marriage Fam 1986;48:465-79.
2. Plichta SB, Duncan MM, Plichta L. Spouse abuse, patient-physician communication, and patient satisfaction. Am J Prev Med 1996;12:297-303.
3. McCauley J, Kern DE, Kolodner K, et al. The ‘battering syndrome’: prevalence and clinical characteristics of domestic violence in primary care internal medicine practices. Ann Intern Med 1995;123:737-46.
4. Hamberger LK, Saunders DG, Hovey M. Prevalence of domestic violence in community practice and rate of physician inquiry. Fam Med 1992;24:283-87.
5. Council on Scientific Affairs. American Medical Association. Violence against women: relevance for medical practitioners. JAMA 1992;267:3184-89.
6. Friedman LS, Sarnet JH, Roberts MS, et al. Inquiry about victimization experiences: a survey of patient p and physician practices. Arch Intern Med 1992;152:1186-90.
7. Caralis PV, Musialowski R. Women’s experiences with domestic violence and their attitudes and expectations regarding medical care of abuse victims. South Med J 1997;90:1075-80.
8. Gin NE, Rucker L, Frayne S, et al. Prevalence of domestic violence among patients in three ambulatory care internal medicine clinics. J Gen Intern Med 1991;6:317-22.
9. Straus MA, Smith C. Family patterns and primary prevention of family violence. Trends in health care, law & ethics 1993;8:17-26.
10. Rodríguez MA, Bauer HM, McLoughlin E, Grumbach K. Screening and intervention for intimate partner abuse: practices and attitudes of primary care physicians. JAMA 1999;282:468-74.
11. Bauer HM, Rodríguez MA, Pérez-Stable EJ. Prevalence and determinants of intimate partner abuse among public hospital primary care patients JGIM In press.
12. Brislin RW. Back-translation for cross-cultural research. J Cross-Cultural Psych 1970;1:185-216.
13. Soeken K, Parker B, McFarlane J, et al. The abuse assessment screen: a clinical instrument to measure frequency, severity, and perpetrator of abuse against women. In: Campbell JC, ed. Empowering survivors of abuse: health care for battered women and their children. Thousand Oaks, Calif: Sage Publications; 1998.
14. SPSS. Version 8.0 for Windows. Chicago, Ill: SPSS, Inc; 1998.
15. West CM, Kantor GK, Jasinski JL. Sociodemographic predictors and cultural barriers to help-seeking behavior by Latina and Anglo American battered women. Violence Victims 1998;13:361-75.
16. Bauer HM, Rodríguez MA, Quiroga SS, Flores-Ortiz YG. Barriers to health care for abused Latina and Asian immigrant women. J Health Care Poor Underserved 1999;11:33-44.
17. Morales LS, Cunningham WE, Brown JA, et al. Are Latinos less satisfied with communication by health care providers? J Gen Intern Med 1999;14:409-17.
18. Baker DW, Parker RM, Williams MV, et al. Use and effectiveness of interpreters in an emergency department. JAMA 1996;275:783-88.
19. Rodríguez MA, Quiroga SS, Bauer HM. Breaking the silence: battered women’s perspectives on medical care. Arch Fam Med 1996;5:153-58.
20. Sugg NK, Inui T. Primary care physicians’ response to domestic violence: opening Pandora’s box. JAMA 1992;267:3157-60.
21. Rodríguez MA, McLoughlin E, Bauer HM, et al. Mandatory reporting of intimate partner violence to police: views of physicians in California. Am J Public Health 1999;89:575-78.
22. Gerbert B, Caspers N, Bronstone A, et al. A qualitative analysis of how physicians with expertise in domestic violence approach the identification of victims. Ann Intern Med 1999;131:578-84.
23. Parsons LH, Zaccaro D, Wells B, Stovall TG. Methods of and attitudes toward screening obstetrics and gynecology patients for domestic violence. Am J Obstet Gynecol 1995;173:381-87.
24. Tilden VP, Schmidt TA, Limandri BJ, et al. Factors that influence clinicians’ assessment and management of family violence. Am J Public Health 1994;84:628-33.
25. Harwell TS, Casten RJ, Armstrong KA, et al. Results of a domestic violence training program offered to the staff of urban community health centers. Am J Prev Med 1998;15:235-41.
The Association Between Perineal Trauma and Spontaneous Perineal Tears
DESIGN: Retrospective cohort study.
POPULATION: We included data from 1895 women who had their first and second deliveries at Saint-Sacrement Hospital, Quebec City, Canada, between 1985 and 1994. Our study was restricted to women who gave birth vaginally to a single living neonate at their first 2 deliveries and who did not have an episiotomy at the second delivery. We extracted the data from the Department of Obstetrics computerized database.
OUTCOMES MEASURED: Spontaneous perineal tears (of second degree or higher) at the second delivery.
RESULTS: Having a perineal trauma at the first delivery more than tripled the risk (relative risk=3.3; 95% confidence interval, 2.6-4.2) of spontaneous perineal tears at the second delivery. The risk of spontaneous perineal tears at the second delivery increased with the severity of previous perineal trauma at birth.
CONCLUSIONS: Our results show that the risk of spontaneous perineal tears at subsequent deliveries increases with the presence and the severity of perineal trauma at the first delivery.
Women frequently incur perineal trauma at delivery. Such trauma is associated with perineal pain that may still be present 3 months postpartum,1-4 dyspareunia,1,2 perineal infection,4,5 and following severe lacerations, fistula and incontinence of flatus and feces.6,7 Episiotomy accounts for a large proportion of perineal trauma. Wide variations exist in the use of episiotomy according to country,5 hospital,8 or birth attendant.1,9-12 There is no evidence that it is effective in preventing severe lacerations13,14 and pelvic floor relaxation1,7,13 or that recovery is more rapid and morbidity less than that following spontaneous tears.5,7 Also, median episiotomy increases the risk of severe perineal lacerations,14 particularly in primiparous women.10
Even in the absence of episiotomy, from 35%15 to 75%16,17 of women suffer a perineal trauma while giving birth. Risk factors include nulliparity,17,18 use of stirrups for delivery,19 second stage of labor of at least 1 hour,6,18,20-22 shoulder dystocia,21 forceps delivery,18,19,22-24 and excessive birth weight.20,21,23
Very few studies have assessed whether perineal trauma experienced during a first delivery is a risk factor for spontaneous tears at the next delivery. Observations from the West Berkshire randomized perineal management trial25 suggested that this could be the case, even though the results were not statistically significant. Of the 1000 women enrolled in that trial, 67% completed a questionnaire 3 years later, and 40% of the respondents had a second delivery in the interval. The women assigned to the “liberal” group (instruction to try to prevent a tear) tended to carry a higher risk of perineal tears at the next delivery than those assigned to the “restricted” group (to restrict episiotomy to fetal indications) [46% and 40%, respectively; P=0.3]. Two recent studies26,27 have reported an increased risk of severe perineal lacerations (third- and fourth-degree tears) in women who sustained such lacerations at their previous delivery, but these studies did not provide data on the whole range of perineal trauma.
Our objective for this retrospective cohort study was to assess whether the presence and severity of perineal trauma (defined as a spontaneous tear or an episiotomy with and without extension) at the first delivery are related to the risk of spontaneous perineal tears of the second degree or more in women who subsequently deliver vaginally without an episiotomy.
Methods
We included women who gave birth to their first and second baby at Saint-Sacrement Hospital of the Centre hospitalier affilié universitaire de Québec, Canada, between January 1, 1985, and December 31, 1994. Those who had a cesarean delivery, a multiple birth, or a stillbirth at either the first or second delivery were excluded, as were those who had an episiotomy at the second delivery.
We abstracted the data from the computerized database maintained by the Department of Obstetrics since 1985. The delivery physician routinely recorded data about labor and delivery on a standard form after the delivery. The attending physician indicated on the standard form whether the woman had no tear, a first-degree tear (limited to the fourchet, the perineal skin, and the vaginal mucosa), a second-degree tear (extending to perineal muscles but saving the anal sphincter), a third-degree tear (involving muscles of the central nucleus and the anal sphincter with anal mucosa remaining intact), a fourth-degree tear (complete rupture of the anal sphincter through the mucosa),28 or an episiotomy (with or without a third- or fourth-degree extension). Except for some first-degree tears, all other trauma required a surgical repair. The decision to cut an episiotomy was left to the discretion of the physician. The standard form also included information on factors potentially related to perineal tears, such as maternal age, epidural use, use of forceps or vacuum, shoulder dystocia, fetal presentation, gestational age (based on last menstrual period or on ultrasound dating, if the 2 estimations differ by more than 10 days), birth weight, head circumference of the newborn, and the training (obstetrics and gynecology or general or family medicine) and identity of the birth attendant. Data from these forms were computerized periodically by one obstetrician (J.J.P.). Incomplete or inconsistent data were checked using the medical records.
Women who gave birth with an intact perineum or a first-degree tear at the first delivery made up the “unexposed” group. Those who experienced a perineal trauma (an episiotomy or a spontaneous perineal tear of the second degree or higher) at the first delivery were considered “exposed.” Exposure was also categorized according to the severity of the trauma: second-degree spontaneous tear, episiotomy without extension, third- or fourth-degree spontaneous tear, and third- or fourth-degree extension of an episiotomy. All episiotomies were median. The dependent variable was defined as the presence of a second-, third- or fourth-degree spontaneous tear at the second delivery (indicated yes or no).
The association between perineal tears at the second delivery and a history of perineal trauma was primarily measured by the relative risk (RR). The precision of the estimate was given by the 95% confidence interval (95% CI). We also performed unconditional logistic regression to evaluate the influence of potential confounders on the association.29 Unadjusted and adjusted odds ratios (ORs) were obtained. Because the outcome was relatively frequent, the OR overestimated the RR. Trends in proportions were tested with the c2 test for trend.30 Comparisons of proportions were based on Pearson c2 tests.
Results
Of the 3769 women who had their first and second deliveries in the same hospital during the study period, we excluded 579 (15.4%) who had a cesarean delivery, 66 who had a multiple pregnancy, and 31 who had a stillbirth. We also excluded 1198 women who underwent an episiotomy at their second delivery. Among the 1895 secondiparous women left in the analysis, 462 (24.4%) had an intact perineum or a first-degree tear at the first delivery, 333 (17.6%) had a spontaneous second-degree (302) or third- or fourth-degree tear (31); and 1100 (58.0%) had an episiotomy without (911) or with (189) a third- or fourth-degree extension. At the second delivery 1196 (63.1%) delivered with an intact perineum or had a first-degree tear, while 699 (36.9%) had a tear of the second (686) or third or fourth degree (13).
Risk factors for spontaneous perineal tears at the second delivery are presented in Table 1. The unadjusted risk increased with maternal age, gestational age at delivery, birth weight, and fetal head circumference (all tests for trend, P≤.003). The risk was also higher in nonvertex than vertex presentations and in vacuum- or forceps-assisted deliveries than in spontaneous vaginal deliveries. Epidural analgesia, shoulder dystocia, and the training, experience (years since graduation), and identity (data not shown) of the birth attendant were not related to the risk of spontaneous tears in secondiparous women.
Among women who previously gave birth with an intact perineum or a first-degree laceration, 13.4% had a spontaneous tear of the second degree or more at the next delivery Table 2. The risk did not differ whether women had a history of intact perineum (13.0%) or of first-degree tear (14.1% [c2=0.1, P=.7]). In contrast, among women previously exposed to perineal trauma (episiotomy; second-, third-, or fourth-degree tear) 44.5% underwent a spontaneous tear of the second degree or higher at the subsequent delivery. Thus, the overall risk of spontaneous tears (second, third, or fourth degree) was 3.3 times higher in women with a history of perineal trauma at the first delivery than in those without (RR=3.3; 95% CI, 2.6-4.2). The risk of perineal tears at the second delivery increased with the severity of the trauma at the first delivery (test for trend, P <.001). The risk was higher in women with a previous episiotomy without extension (44.8%) than in women with a spontaneous second-degree laceration (36.1%). Severe lacerations at the first delivery yielded the highest risk (54.5%), but the risk was similar whether previous severe lacerations were spontaneous (54.8%) or secondary to an extension of episiotomy (54.5%). Among the 220 women who suffered a third- or fourth-degree perineal tear at their first birth, 2 (0.9%) were delivered with such a tear at their second birth, while among the 1675 who did not suffer a third- or fourth-degree perineal tear at their first delivery, 11 (0.7 %) were delivered with such trauma (RR=1.4; 95% CI, 0.3-6.3).
To verify that the risk factors shown in Table 1 did not confound the associations, we carried out an unconditional logistic regression analysis with simultaneous adjustment for maternal age, birth weight, length of gestation, head circumference, fetal presentation, and mode of delivery. This analysis yielded an adjusted OR of 5.3 (95% CI, 3.9-7.1) for the relation of a history of perineal trauma at the first delivery and the risk of perineal tears (second-, third-, or fourth-degree tears) at the second delivery. As the adjusted OR is similar to the unadjusted OR (5.2; 95% CI, 3.9-6.9), this indicates that the risk factors entered into the model did not confound the association. When the severity of perineal trauma at the first delivery was categorized as in Table 2, the regression analysis again suggested there was no confounding (data not shown).
To estimate the influence of the exclusion of the 1198 women who had an episiotomy at their second delivery, we reanalyzed our data to include these women. At their first delivery: 65 of them had no tear or a first-degree tear; 70 had a second-degree spontaneous tear; 13 had a spontaneous third- or fourth-degree tear; 789 had an episiotomy without extension; and 261 had a third- or fourth-degree extension of an episiotomy. The risk of trauma at the second delivery in the 3093 women was 24.1% for those without a history of perineal trauma and 69% for those who had (RR=2.9; 95% CI, 2.5-3.3). Also, spontaneous tears at the second delivery were 2.1 times (95% CI, 1.7-2.7) more frequent in women who had a previous perineal trauma.
Discussion
Our results indicate that the risk of spontaneous perineal lacerations (second-, third- or fourth-degree tears) at the second delivery increases with the presence and severity of perineal trauma at the previous delivery. To our knowledge this study is the first to demonstrate that association.
An increased risk of severe perineal lacerations (third- and fourth-degree tears) has been reported in women who sustained such lacerations at their previous delivery in studies by Payne and colleagues (unadjusted OR=3.4; 95% CI, 1.8-6.4)26 and by Peleg and coworkers (OR=2.5; 95% CI, 1.8-3.4).27 In these studies, many women gave birth with a median episiotomy, a known risk factor for severe perineal tears. However, the association persisted after adjustment for episiotomy26 or in the subset of spontaneous births (RR=6.5; 95% CI, 2.0-21.2).27 In our study, a similar trend was observed but was not statistically significant.
Our results support the view that the prevention of perineal trauma in first deliveries could benefit women in subsequent deliveries. Prenatal perineal massage constitutes a simple and valuable approach for doing so.31 Recent randomized controlled trials indicate that perineal massage during pregnancy increases the likelihood for primiparous women of delivering with an intact perineum.32,33 Avoiding episiotomy, in addition to increasing the rate of intact perineum reduces the severity of perineal trauma. In a previous study10 we reported a 3-fold increase of third- and fourth-degree perineal tears associated with median episiotomy in primiparous women. In that study, while the episiotomy rate declined from 77.7% in 1985-1987 to 56.2% in 1991-1993, the rate of severe perineal lacerations fell from 17.2% to 12.6% during the same period. Finally, restricting forceps birth also enhances perineal integrity.31
Limitations
We studied a large cohort of women who delivered twice at the same hospital. The exclusion of women who had their second delivery in a different hospital raises the possibility of a selection bias. It was not possible to estimate the number of these exclusions. This phenomenon, however, is not frequent, and we see no reason that women who had a history of perineal trauma and changed hospitals would be more or less likely to have a perineal tear at their second delivery than women included in the analysis.
Our study was restricted to women who did not have an episiotomy at the second delivery. We did this because we were interested in estimating the likelihood of a spontaneous tear, which cannot be determined in women undergoing an episiotomy. The decision to undertake an episiotomy was at the discretion of the physician, and policies regarding the use of episiotomy varied between physicians as well as over the study period. Some physicians may have been more likely to undertake an episiotomy if they noticed the presence of a perineal scar. If this were the case, the exclusion of women who underwent an episiotomy at the second delivery could have resulted in an underestimation of the strength of the association between perineal trauma at the first delivery and spontaneous tears at the second delivery. However, if the reasons women had an episiotomy at their second delivery were independent of the state of the perineum at the first delivery, the influence on the association is unpredictable and would depend on underlying unknown risk of spontaneous tear in women with and without episiotomy. Reanalyzing our data to include the women who had an episiotomy at their second delivery showed that the strength of the association decreases, but history of perineal trauma remains a clinically and statistically significant risk factor for such trauma at the second delivery.
Our analysis took into account most of the variables known to be related to the risk of perineal trauma. However, we did not have information on the duration of the second stage, the use of oxytocin in the second stage of labor, and the delivery position. These variables would confound our results if they were related to a history of perineal trauma and independent risk factors for perineal tears in subsequent deliveries. Confounding by these variables cannot be eliminated but appears unlikely, since stronger risk factors such as excessive birth weight and shoulder dystocia did not introduce any confounding in our data. Another possible explanation for the observed association is that some perinea might be inherently more prone to tearing than others, possibly because of genetic factors.
Conclusions
Our study shows that the risk of spontaneous perineal tears at the second delivery increases with the presence and the severity of perineal trauma at the first delivery. These results support arguments for the prevention of perineal trauma at the first delivery and the selective use of episiotomy.
Acknowledgements
Dr Marcoux holds a National Health Research Scholarship from Health Canada.
Related resources
FOR PATIENTS:
- ParentsPlace.com—Perineal Massage: Your How-to Guidehttp://www.parentsplace.com/pregnancy/labor/qa/0,3105,13778,00.html
- Childbirth.org—Perineal Massagehttp://www.childbirth.org/articles/massage.html
FOR FAMILY PHYSICIANS:
- Obstetric Myths Versus Research Realitieshttp://www.efn.org/~djz/birth/obmyth/epis.html
- Wooley RJ. Benefits and risks of episiotomy: A review of the English-language literature since 1980 http://www.gentlebirth.org/format/woolley.html(the best one on episotomy)
- Carroli G, Belizan J. Episiotomy for vaginal birth (Cochrane Review)http://www.update-software.com/abstracts/ab000081.htm (Only the abstract is available free.)
1. Klein MC, Gauthier RJ, Robbins JM, et al. Relationship of episiotomy to perineal trauma and morbidity, sexual dysfunction, and pelvic floor relaxation. Am J Obstet Gynecol 1994;171:591-98.
2. Weijmar Schultz WCM, van de Wiel HBM, Heidemann R, Aarnoudse JG, Huisjes HJ. Perineal pain and dyspareunia after uncomplicated primiparous delivery. J Psychosom Obstet Gynecol 1990;11:119-27.
3. Harrison RF, Brennan M, North PM, Reed JV, Wickham EA. Is routine episiotomy necessary? Br J Med 1984;288:1971-75.
4. Larsson PG, Platz-Christensen JJ, Bergman B, Wallstersson G. Advantage or disadvantage of episiotomy compared with spontaneous perineal laceration. Gynecol Obstet Invest 1991;31:213-16.
5. Thacker SB, Banta HD. Benefits and risks of episiotomy: an interpretive review of the English language literature, 1860-1980. Obstet Gynecol Surv 1983;38:322-38.
6. Hordnes K, Bergsjo P. Severe lacerations after childbirth. Acta Obstet Gynecol Scand 1993;72:413-22.
7. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part II. Obstet Gynecol Surv 1995;50:821-35.
8. Rockner G, Olund A. The use of episiotomy in primiparas in Sweden: a descriptive study with particular focus on two hospitals. Acta Obstet Gynecol Scand 1991;70:325-30.
9. Henriksen TB, Moller Bek K, Hedegaard M, Secher NJ. Methods and consequences of changes in use of episiotomy. BMJ 1994;309:1255-58.
10. Labrecque M, Baillargeon L, Dallaire M, Tremblay A, Pinault JJ, Gingras S. Association between median episiotomy and severe perineal lacerations in primiparous women. Can Med Assoc J 1997;156:797-802.
11. Hueston WJ, Rudy M. Differences in labor and delivery experience in family physician. and obstetrician-supervised teaching services. Fam Med 1995;27:182-87.
12. Ruderman J, Carroll JC, Reid AJ, Murray MA. Episiotomy: differences in practice between family physicians and obstetricians. Can Fam Phys 1992;38:2583-89.
13. Lede RL, Belizan JM, Carroli G. Is routine use of episiotomy justified? Am J Obstet Gynecol 1996;174:1399-402.
14. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part I. Obstet Gynecol Surv 1995;50:806-20.
15. Gass MS, Dunn C, Stys SJ. Effect of episiotomy on the frequency of vaginal outlet lacerations. J Reprod Med 1986;31:240-44.
16. Klein MC, Gauthier RJ, Jorgensen SH, et al. Does episiotomy prevent perineal trauma and pelvic floor relaxation? Online J Curr Clin Trials 1992;2(doc no.10).-
17. Walker MPR, Farine D, Rolbin SH, Ritchie JWK. Epidural anesthesia, episiotomy, and obstetric laceration. Obstet Gynecol 1991;77:668-71.
18. Wilcox LS, Strobino DM, Baruffi G, Dellinger W. Episiotomy and its role in the incidence of perineal lacerations in a maternity center and a tertiary hospital obstetric service. Am J Obstet Gynecol 1989;160:1047-52.
19. Borgatta L, Piening SL, Cohen WR. Association of episiotomy and delivery position with deep perineal laceration during spontaneous delivery in nulliparous women. Am J Obstet Gynecol 1989;160:294-97.
20. Green JR, Soohoo SL. Factors associated with rectal injury in spontaneous deliveries. Obstet Gynecol 1989;73:732-38.
21. Moller Bek K, Laurberg S. Intervention during labor: risk factors associated with complete tear of the anal sphincter. Acta Obstet Gynecol Scand 1992;71:520-24.
22. Donnelly V, Fynes M, Campbell D, Johnson H, O’Connell PR, O’Herlihy C. Obstetric events leading to anal sphincter damage. Obstet Gynecol 1998;92:955-61.
23. Shiono P, Klebanoff MA, Carey JC. Midline episiotomies: more harm than good? Obstet Gynecol 1990;75:765-70.
24. Combs CA, Robertson PA, Laros RK. Risk factors for third-degree and fourth-degree lacerations in forceps and vacuum deliveries. Am J Obstet Gynecol 1990;163:100-04.
25. Sleep J, Grant A. West Berkshire perineal management trial: three year follow up. BMJ 1987;295:749-51.
26. Payne TN, Carey JC, Rayburn WF. Prior third- or fourth-degree perineal tears and recurrence risks. Int J Gynaecol Obstet 1999;64:55-57.
27. Peleg D, Kennedy CM, Merrill D, Zlatnik FJ. Risk of repetition of a severe perineal laceration. Obstet Gynecol 1999;3:1021-24.
28. Pritchard JA, Macdonald PC. Williams Obstetrics. 15th ed. New York, NY: Appleton-Century-Crofts 1989;345-50.
29. Hosmer DW, Lemeshow S. Applied logistic regression. Toronto, Canada: John Wiley and Sons; 1989.
30. Mantel N. Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J Am Stat Assoc 1963;58:690-700.
31. Eason E, Labrecque M, Wells G, Feldman P. Preventing perineal trauma during childbirth: a systematic review. Obstet Gynecol 2000;95:464-71.
32. Shipman M, Boniface D, Tefft M, McCloghty F. Antenatal perineal massage and subsequent perineal outcomes: a randomised controlled trial. Br J Obstet Gynecol 1997;104:787-91.
33. Labrecque M, Eason E, Marcoux S, et al. Randomized controlled trial of prevention of perineal trauma by perineal massage during pregnancy. Am J Obstet Gynecol 1999;180:593-600.
DESIGN: Retrospective cohort study.
POPULATION: We included data from 1895 women who had their first and second deliveries at Saint-Sacrement Hospital, Quebec City, Canada, between 1985 and 1994. Our study was restricted to women who gave birth vaginally to a single living neonate at their first 2 deliveries and who did not have an episiotomy at the second delivery. We extracted the data from the Department of Obstetrics computerized database.
OUTCOMES MEASURED: Spontaneous perineal tears (of second degree or higher) at the second delivery.
RESULTS: Having a perineal trauma at the first delivery more than tripled the risk (relative risk=3.3; 95% confidence interval, 2.6-4.2) of spontaneous perineal tears at the second delivery. The risk of spontaneous perineal tears at the second delivery increased with the severity of previous perineal trauma at birth.
CONCLUSIONS: Our results show that the risk of spontaneous perineal tears at subsequent deliveries increases with the presence and the severity of perineal trauma at the first delivery.
Women frequently incur perineal trauma at delivery. Such trauma is associated with perineal pain that may still be present 3 months postpartum,1-4 dyspareunia,1,2 perineal infection,4,5 and following severe lacerations, fistula and incontinence of flatus and feces.6,7 Episiotomy accounts for a large proportion of perineal trauma. Wide variations exist in the use of episiotomy according to country,5 hospital,8 or birth attendant.1,9-12 There is no evidence that it is effective in preventing severe lacerations13,14 and pelvic floor relaxation1,7,13 or that recovery is more rapid and morbidity less than that following spontaneous tears.5,7 Also, median episiotomy increases the risk of severe perineal lacerations,14 particularly in primiparous women.10
Even in the absence of episiotomy, from 35%15 to 75%16,17 of women suffer a perineal trauma while giving birth. Risk factors include nulliparity,17,18 use of stirrups for delivery,19 second stage of labor of at least 1 hour,6,18,20-22 shoulder dystocia,21 forceps delivery,18,19,22-24 and excessive birth weight.20,21,23
Very few studies have assessed whether perineal trauma experienced during a first delivery is a risk factor for spontaneous tears at the next delivery. Observations from the West Berkshire randomized perineal management trial25 suggested that this could be the case, even though the results were not statistically significant. Of the 1000 women enrolled in that trial, 67% completed a questionnaire 3 years later, and 40% of the respondents had a second delivery in the interval. The women assigned to the “liberal” group (instruction to try to prevent a tear) tended to carry a higher risk of perineal tears at the next delivery than those assigned to the “restricted” group (to restrict episiotomy to fetal indications) [46% and 40%, respectively; P=0.3]. Two recent studies26,27 have reported an increased risk of severe perineal lacerations (third- and fourth-degree tears) in women who sustained such lacerations at their previous delivery, but these studies did not provide data on the whole range of perineal trauma.
Our objective for this retrospective cohort study was to assess whether the presence and severity of perineal trauma (defined as a spontaneous tear or an episiotomy with and without extension) at the first delivery are related to the risk of spontaneous perineal tears of the second degree or more in women who subsequently deliver vaginally without an episiotomy.
Methods
We included women who gave birth to their first and second baby at Saint-Sacrement Hospital of the Centre hospitalier affilié universitaire de Québec, Canada, between January 1, 1985, and December 31, 1994. Those who had a cesarean delivery, a multiple birth, or a stillbirth at either the first or second delivery were excluded, as were those who had an episiotomy at the second delivery.
We abstracted the data from the computerized database maintained by the Department of Obstetrics since 1985. The delivery physician routinely recorded data about labor and delivery on a standard form after the delivery. The attending physician indicated on the standard form whether the woman had no tear, a first-degree tear (limited to the fourchet, the perineal skin, and the vaginal mucosa), a second-degree tear (extending to perineal muscles but saving the anal sphincter), a third-degree tear (involving muscles of the central nucleus and the anal sphincter with anal mucosa remaining intact), a fourth-degree tear (complete rupture of the anal sphincter through the mucosa),28 or an episiotomy (with or without a third- or fourth-degree extension). Except for some first-degree tears, all other trauma required a surgical repair. The decision to cut an episiotomy was left to the discretion of the physician. The standard form also included information on factors potentially related to perineal tears, such as maternal age, epidural use, use of forceps or vacuum, shoulder dystocia, fetal presentation, gestational age (based on last menstrual period or on ultrasound dating, if the 2 estimations differ by more than 10 days), birth weight, head circumference of the newborn, and the training (obstetrics and gynecology or general or family medicine) and identity of the birth attendant. Data from these forms were computerized periodically by one obstetrician (J.J.P.). Incomplete or inconsistent data were checked using the medical records.
Women who gave birth with an intact perineum or a first-degree tear at the first delivery made up the “unexposed” group. Those who experienced a perineal trauma (an episiotomy or a spontaneous perineal tear of the second degree or higher) at the first delivery were considered “exposed.” Exposure was also categorized according to the severity of the trauma: second-degree spontaneous tear, episiotomy without extension, third- or fourth-degree spontaneous tear, and third- or fourth-degree extension of an episiotomy. All episiotomies were median. The dependent variable was defined as the presence of a second-, third- or fourth-degree spontaneous tear at the second delivery (indicated yes or no).
The association between perineal tears at the second delivery and a history of perineal trauma was primarily measured by the relative risk (RR). The precision of the estimate was given by the 95% confidence interval (95% CI). We also performed unconditional logistic regression to evaluate the influence of potential confounders on the association.29 Unadjusted and adjusted odds ratios (ORs) were obtained. Because the outcome was relatively frequent, the OR overestimated the RR. Trends in proportions were tested with the c2 test for trend.30 Comparisons of proportions were based on Pearson c2 tests.
Results
Of the 3769 women who had their first and second deliveries in the same hospital during the study period, we excluded 579 (15.4%) who had a cesarean delivery, 66 who had a multiple pregnancy, and 31 who had a stillbirth. We also excluded 1198 women who underwent an episiotomy at their second delivery. Among the 1895 secondiparous women left in the analysis, 462 (24.4%) had an intact perineum or a first-degree tear at the first delivery, 333 (17.6%) had a spontaneous second-degree (302) or third- or fourth-degree tear (31); and 1100 (58.0%) had an episiotomy without (911) or with (189) a third- or fourth-degree extension. At the second delivery 1196 (63.1%) delivered with an intact perineum or had a first-degree tear, while 699 (36.9%) had a tear of the second (686) or third or fourth degree (13).
Risk factors for spontaneous perineal tears at the second delivery are presented in Table 1. The unadjusted risk increased with maternal age, gestational age at delivery, birth weight, and fetal head circumference (all tests for trend, P≤.003). The risk was also higher in nonvertex than vertex presentations and in vacuum- or forceps-assisted deliveries than in spontaneous vaginal deliveries. Epidural analgesia, shoulder dystocia, and the training, experience (years since graduation), and identity (data not shown) of the birth attendant were not related to the risk of spontaneous tears in secondiparous women.
Among women who previously gave birth with an intact perineum or a first-degree laceration, 13.4% had a spontaneous tear of the second degree or more at the next delivery Table 2. The risk did not differ whether women had a history of intact perineum (13.0%) or of first-degree tear (14.1% [c2=0.1, P=.7]). In contrast, among women previously exposed to perineal trauma (episiotomy; second-, third-, or fourth-degree tear) 44.5% underwent a spontaneous tear of the second degree or higher at the subsequent delivery. Thus, the overall risk of spontaneous tears (second, third, or fourth degree) was 3.3 times higher in women with a history of perineal trauma at the first delivery than in those without (RR=3.3; 95% CI, 2.6-4.2). The risk of perineal tears at the second delivery increased with the severity of the trauma at the first delivery (test for trend, P <.001). The risk was higher in women with a previous episiotomy without extension (44.8%) than in women with a spontaneous second-degree laceration (36.1%). Severe lacerations at the first delivery yielded the highest risk (54.5%), but the risk was similar whether previous severe lacerations were spontaneous (54.8%) or secondary to an extension of episiotomy (54.5%). Among the 220 women who suffered a third- or fourth-degree perineal tear at their first birth, 2 (0.9%) were delivered with such a tear at their second birth, while among the 1675 who did not suffer a third- or fourth-degree perineal tear at their first delivery, 11 (0.7 %) were delivered with such trauma (RR=1.4; 95% CI, 0.3-6.3).
To verify that the risk factors shown in Table 1 did not confound the associations, we carried out an unconditional logistic regression analysis with simultaneous adjustment for maternal age, birth weight, length of gestation, head circumference, fetal presentation, and mode of delivery. This analysis yielded an adjusted OR of 5.3 (95% CI, 3.9-7.1) for the relation of a history of perineal trauma at the first delivery and the risk of perineal tears (second-, third-, or fourth-degree tears) at the second delivery. As the adjusted OR is similar to the unadjusted OR (5.2; 95% CI, 3.9-6.9), this indicates that the risk factors entered into the model did not confound the association. When the severity of perineal trauma at the first delivery was categorized as in Table 2, the regression analysis again suggested there was no confounding (data not shown).
To estimate the influence of the exclusion of the 1198 women who had an episiotomy at their second delivery, we reanalyzed our data to include these women. At their first delivery: 65 of them had no tear or a first-degree tear; 70 had a second-degree spontaneous tear; 13 had a spontaneous third- or fourth-degree tear; 789 had an episiotomy without extension; and 261 had a third- or fourth-degree extension of an episiotomy. The risk of trauma at the second delivery in the 3093 women was 24.1% for those without a history of perineal trauma and 69% for those who had (RR=2.9; 95% CI, 2.5-3.3). Also, spontaneous tears at the second delivery were 2.1 times (95% CI, 1.7-2.7) more frequent in women who had a previous perineal trauma.
Discussion
Our results indicate that the risk of spontaneous perineal lacerations (second-, third- or fourth-degree tears) at the second delivery increases with the presence and severity of perineal trauma at the previous delivery. To our knowledge this study is the first to demonstrate that association.
An increased risk of severe perineal lacerations (third- and fourth-degree tears) has been reported in women who sustained such lacerations at their previous delivery in studies by Payne and colleagues (unadjusted OR=3.4; 95% CI, 1.8-6.4)26 and by Peleg and coworkers (OR=2.5; 95% CI, 1.8-3.4).27 In these studies, many women gave birth with a median episiotomy, a known risk factor for severe perineal tears. However, the association persisted after adjustment for episiotomy26 or in the subset of spontaneous births (RR=6.5; 95% CI, 2.0-21.2).27 In our study, a similar trend was observed but was not statistically significant.
Our results support the view that the prevention of perineal trauma in first deliveries could benefit women in subsequent deliveries. Prenatal perineal massage constitutes a simple and valuable approach for doing so.31 Recent randomized controlled trials indicate that perineal massage during pregnancy increases the likelihood for primiparous women of delivering with an intact perineum.32,33 Avoiding episiotomy, in addition to increasing the rate of intact perineum reduces the severity of perineal trauma. In a previous study10 we reported a 3-fold increase of third- and fourth-degree perineal tears associated with median episiotomy in primiparous women. In that study, while the episiotomy rate declined from 77.7% in 1985-1987 to 56.2% in 1991-1993, the rate of severe perineal lacerations fell from 17.2% to 12.6% during the same period. Finally, restricting forceps birth also enhances perineal integrity.31
Limitations
We studied a large cohort of women who delivered twice at the same hospital. The exclusion of women who had their second delivery in a different hospital raises the possibility of a selection bias. It was not possible to estimate the number of these exclusions. This phenomenon, however, is not frequent, and we see no reason that women who had a history of perineal trauma and changed hospitals would be more or less likely to have a perineal tear at their second delivery than women included in the analysis.
Our study was restricted to women who did not have an episiotomy at the second delivery. We did this because we were interested in estimating the likelihood of a spontaneous tear, which cannot be determined in women undergoing an episiotomy. The decision to undertake an episiotomy was at the discretion of the physician, and policies regarding the use of episiotomy varied between physicians as well as over the study period. Some physicians may have been more likely to undertake an episiotomy if they noticed the presence of a perineal scar. If this were the case, the exclusion of women who underwent an episiotomy at the second delivery could have resulted in an underestimation of the strength of the association between perineal trauma at the first delivery and spontaneous tears at the second delivery. However, if the reasons women had an episiotomy at their second delivery were independent of the state of the perineum at the first delivery, the influence on the association is unpredictable and would depend on underlying unknown risk of spontaneous tear in women with and without episiotomy. Reanalyzing our data to include the women who had an episiotomy at their second delivery showed that the strength of the association decreases, but history of perineal trauma remains a clinically and statistically significant risk factor for such trauma at the second delivery.
Our analysis took into account most of the variables known to be related to the risk of perineal trauma. However, we did not have information on the duration of the second stage, the use of oxytocin in the second stage of labor, and the delivery position. These variables would confound our results if they were related to a history of perineal trauma and independent risk factors for perineal tears in subsequent deliveries. Confounding by these variables cannot be eliminated but appears unlikely, since stronger risk factors such as excessive birth weight and shoulder dystocia did not introduce any confounding in our data. Another possible explanation for the observed association is that some perinea might be inherently more prone to tearing than others, possibly because of genetic factors.
Conclusions
Our study shows that the risk of spontaneous perineal tears at the second delivery increases with the presence and the severity of perineal trauma at the first delivery. These results support arguments for the prevention of perineal trauma at the first delivery and the selective use of episiotomy.
Acknowledgements
Dr Marcoux holds a National Health Research Scholarship from Health Canada.
Related resources
FOR PATIENTS:
- ParentsPlace.com—Perineal Massage: Your How-to Guidehttp://www.parentsplace.com/pregnancy/labor/qa/0,3105,13778,00.html
- Childbirth.org—Perineal Massagehttp://www.childbirth.org/articles/massage.html
FOR FAMILY PHYSICIANS:
- Obstetric Myths Versus Research Realitieshttp://www.efn.org/~djz/birth/obmyth/epis.html
- Wooley RJ. Benefits and risks of episiotomy: A review of the English-language literature since 1980 http://www.gentlebirth.org/format/woolley.html(the best one on episotomy)
- Carroli G, Belizan J. Episiotomy for vaginal birth (Cochrane Review)http://www.update-software.com/abstracts/ab000081.htm (Only the abstract is available free.)
DESIGN: Retrospective cohort study.
POPULATION: We included data from 1895 women who had their first and second deliveries at Saint-Sacrement Hospital, Quebec City, Canada, between 1985 and 1994. Our study was restricted to women who gave birth vaginally to a single living neonate at their first 2 deliveries and who did not have an episiotomy at the second delivery. We extracted the data from the Department of Obstetrics computerized database.
OUTCOMES MEASURED: Spontaneous perineal tears (of second degree or higher) at the second delivery.
RESULTS: Having a perineal trauma at the first delivery more than tripled the risk (relative risk=3.3; 95% confidence interval, 2.6-4.2) of spontaneous perineal tears at the second delivery. The risk of spontaneous perineal tears at the second delivery increased with the severity of previous perineal trauma at birth.
CONCLUSIONS: Our results show that the risk of spontaneous perineal tears at subsequent deliveries increases with the presence and the severity of perineal trauma at the first delivery.
Women frequently incur perineal trauma at delivery. Such trauma is associated with perineal pain that may still be present 3 months postpartum,1-4 dyspareunia,1,2 perineal infection,4,5 and following severe lacerations, fistula and incontinence of flatus and feces.6,7 Episiotomy accounts for a large proportion of perineal trauma. Wide variations exist in the use of episiotomy according to country,5 hospital,8 or birth attendant.1,9-12 There is no evidence that it is effective in preventing severe lacerations13,14 and pelvic floor relaxation1,7,13 or that recovery is more rapid and morbidity less than that following spontaneous tears.5,7 Also, median episiotomy increases the risk of severe perineal lacerations,14 particularly in primiparous women.10
Even in the absence of episiotomy, from 35%15 to 75%16,17 of women suffer a perineal trauma while giving birth. Risk factors include nulliparity,17,18 use of stirrups for delivery,19 second stage of labor of at least 1 hour,6,18,20-22 shoulder dystocia,21 forceps delivery,18,19,22-24 and excessive birth weight.20,21,23
Very few studies have assessed whether perineal trauma experienced during a first delivery is a risk factor for spontaneous tears at the next delivery. Observations from the West Berkshire randomized perineal management trial25 suggested that this could be the case, even though the results were not statistically significant. Of the 1000 women enrolled in that trial, 67% completed a questionnaire 3 years later, and 40% of the respondents had a second delivery in the interval. The women assigned to the “liberal” group (instruction to try to prevent a tear) tended to carry a higher risk of perineal tears at the next delivery than those assigned to the “restricted” group (to restrict episiotomy to fetal indications) [46% and 40%, respectively; P=0.3]. Two recent studies26,27 have reported an increased risk of severe perineal lacerations (third- and fourth-degree tears) in women who sustained such lacerations at their previous delivery, but these studies did not provide data on the whole range of perineal trauma.
Our objective for this retrospective cohort study was to assess whether the presence and severity of perineal trauma (defined as a spontaneous tear or an episiotomy with and without extension) at the first delivery are related to the risk of spontaneous perineal tears of the second degree or more in women who subsequently deliver vaginally without an episiotomy.
Methods
We included women who gave birth to their first and second baby at Saint-Sacrement Hospital of the Centre hospitalier affilié universitaire de Québec, Canada, between January 1, 1985, and December 31, 1994. Those who had a cesarean delivery, a multiple birth, or a stillbirth at either the first or second delivery were excluded, as were those who had an episiotomy at the second delivery.
We abstracted the data from the computerized database maintained by the Department of Obstetrics since 1985. The delivery physician routinely recorded data about labor and delivery on a standard form after the delivery. The attending physician indicated on the standard form whether the woman had no tear, a first-degree tear (limited to the fourchet, the perineal skin, and the vaginal mucosa), a second-degree tear (extending to perineal muscles but saving the anal sphincter), a third-degree tear (involving muscles of the central nucleus and the anal sphincter with anal mucosa remaining intact), a fourth-degree tear (complete rupture of the anal sphincter through the mucosa),28 or an episiotomy (with or without a third- or fourth-degree extension). Except for some first-degree tears, all other trauma required a surgical repair. The decision to cut an episiotomy was left to the discretion of the physician. The standard form also included information on factors potentially related to perineal tears, such as maternal age, epidural use, use of forceps or vacuum, shoulder dystocia, fetal presentation, gestational age (based on last menstrual period or on ultrasound dating, if the 2 estimations differ by more than 10 days), birth weight, head circumference of the newborn, and the training (obstetrics and gynecology or general or family medicine) and identity of the birth attendant. Data from these forms were computerized periodically by one obstetrician (J.J.P.). Incomplete or inconsistent data were checked using the medical records.
Women who gave birth with an intact perineum or a first-degree tear at the first delivery made up the “unexposed” group. Those who experienced a perineal trauma (an episiotomy or a spontaneous perineal tear of the second degree or higher) at the first delivery were considered “exposed.” Exposure was also categorized according to the severity of the trauma: second-degree spontaneous tear, episiotomy without extension, third- or fourth-degree spontaneous tear, and third- or fourth-degree extension of an episiotomy. All episiotomies were median. The dependent variable was defined as the presence of a second-, third- or fourth-degree spontaneous tear at the second delivery (indicated yes or no).
The association between perineal tears at the second delivery and a history of perineal trauma was primarily measured by the relative risk (RR). The precision of the estimate was given by the 95% confidence interval (95% CI). We also performed unconditional logistic regression to evaluate the influence of potential confounders on the association.29 Unadjusted and adjusted odds ratios (ORs) were obtained. Because the outcome was relatively frequent, the OR overestimated the RR. Trends in proportions were tested with the c2 test for trend.30 Comparisons of proportions were based on Pearson c2 tests.
Results
Of the 3769 women who had their first and second deliveries in the same hospital during the study period, we excluded 579 (15.4%) who had a cesarean delivery, 66 who had a multiple pregnancy, and 31 who had a stillbirth. We also excluded 1198 women who underwent an episiotomy at their second delivery. Among the 1895 secondiparous women left in the analysis, 462 (24.4%) had an intact perineum or a first-degree tear at the first delivery, 333 (17.6%) had a spontaneous second-degree (302) or third- or fourth-degree tear (31); and 1100 (58.0%) had an episiotomy without (911) or with (189) a third- or fourth-degree extension. At the second delivery 1196 (63.1%) delivered with an intact perineum or had a first-degree tear, while 699 (36.9%) had a tear of the second (686) or third or fourth degree (13).
Risk factors for spontaneous perineal tears at the second delivery are presented in Table 1. The unadjusted risk increased with maternal age, gestational age at delivery, birth weight, and fetal head circumference (all tests for trend, P≤.003). The risk was also higher in nonvertex than vertex presentations and in vacuum- or forceps-assisted deliveries than in spontaneous vaginal deliveries. Epidural analgesia, shoulder dystocia, and the training, experience (years since graduation), and identity (data not shown) of the birth attendant were not related to the risk of spontaneous tears in secondiparous women.
Among women who previously gave birth with an intact perineum or a first-degree laceration, 13.4% had a spontaneous tear of the second degree or more at the next delivery Table 2. The risk did not differ whether women had a history of intact perineum (13.0%) or of first-degree tear (14.1% [c2=0.1, P=.7]). In contrast, among women previously exposed to perineal trauma (episiotomy; second-, third-, or fourth-degree tear) 44.5% underwent a spontaneous tear of the second degree or higher at the subsequent delivery. Thus, the overall risk of spontaneous tears (second, third, or fourth degree) was 3.3 times higher in women with a history of perineal trauma at the first delivery than in those without (RR=3.3; 95% CI, 2.6-4.2). The risk of perineal tears at the second delivery increased with the severity of the trauma at the first delivery (test for trend, P <.001). The risk was higher in women with a previous episiotomy without extension (44.8%) than in women with a spontaneous second-degree laceration (36.1%). Severe lacerations at the first delivery yielded the highest risk (54.5%), but the risk was similar whether previous severe lacerations were spontaneous (54.8%) or secondary to an extension of episiotomy (54.5%). Among the 220 women who suffered a third- or fourth-degree perineal tear at their first birth, 2 (0.9%) were delivered with such a tear at their second birth, while among the 1675 who did not suffer a third- or fourth-degree perineal tear at their first delivery, 11 (0.7 %) were delivered with such trauma (RR=1.4; 95% CI, 0.3-6.3).
To verify that the risk factors shown in Table 1 did not confound the associations, we carried out an unconditional logistic regression analysis with simultaneous adjustment for maternal age, birth weight, length of gestation, head circumference, fetal presentation, and mode of delivery. This analysis yielded an adjusted OR of 5.3 (95% CI, 3.9-7.1) for the relation of a history of perineal trauma at the first delivery and the risk of perineal tears (second-, third-, or fourth-degree tears) at the second delivery. As the adjusted OR is similar to the unadjusted OR (5.2; 95% CI, 3.9-6.9), this indicates that the risk factors entered into the model did not confound the association. When the severity of perineal trauma at the first delivery was categorized as in Table 2, the regression analysis again suggested there was no confounding (data not shown).
To estimate the influence of the exclusion of the 1198 women who had an episiotomy at their second delivery, we reanalyzed our data to include these women. At their first delivery: 65 of them had no tear or a first-degree tear; 70 had a second-degree spontaneous tear; 13 had a spontaneous third- or fourth-degree tear; 789 had an episiotomy without extension; and 261 had a third- or fourth-degree extension of an episiotomy. The risk of trauma at the second delivery in the 3093 women was 24.1% for those without a history of perineal trauma and 69% for those who had (RR=2.9; 95% CI, 2.5-3.3). Also, spontaneous tears at the second delivery were 2.1 times (95% CI, 1.7-2.7) more frequent in women who had a previous perineal trauma.
Discussion
Our results indicate that the risk of spontaneous perineal lacerations (second-, third- or fourth-degree tears) at the second delivery increases with the presence and severity of perineal trauma at the previous delivery. To our knowledge this study is the first to demonstrate that association.
An increased risk of severe perineal lacerations (third- and fourth-degree tears) has been reported in women who sustained such lacerations at their previous delivery in studies by Payne and colleagues (unadjusted OR=3.4; 95% CI, 1.8-6.4)26 and by Peleg and coworkers (OR=2.5; 95% CI, 1.8-3.4).27 In these studies, many women gave birth with a median episiotomy, a known risk factor for severe perineal tears. However, the association persisted after adjustment for episiotomy26 or in the subset of spontaneous births (RR=6.5; 95% CI, 2.0-21.2).27 In our study, a similar trend was observed but was not statistically significant.
Our results support the view that the prevention of perineal trauma in first deliveries could benefit women in subsequent deliveries. Prenatal perineal massage constitutes a simple and valuable approach for doing so.31 Recent randomized controlled trials indicate that perineal massage during pregnancy increases the likelihood for primiparous women of delivering with an intact perineum.32,33 Avoiding episiotomy, in addition to increasing the rate of intact perineum reduces the severity of perineal trauma. In a previous study10 we reported a 3-fold increase of third- and fourth-degree perineal tears associated with median episiotomy in primiparous women. In that study, while the episiotomy rate declined from 77.7% in 1985-1987 to 56.2% in 1991-1993, the rate of severe perineal lacerations fell from 17.2% to 12.6% during the same period. Finally, restricting forceps birth also enhances perineal integrity.31
Limitations
We studied a large cohort of women who delivered twice at the same hospital. The exclusion of women who had their second delivery in a different hospital raises the possibility of a selection bias. It was not possible to estimate the number of these exclusions. This phenomenon, however, is not frequent, and we see no reason that women who had a history of perineal trauma and changed hospitals would be more or less likely to have a perineal tear at their second delivery than women included in the analysis.
Our study was restricted to women who did not have an episiotomy at the second delivery. We did this because we were interested in estimating the likelihood of a spontaneous tear, which cannot be determined in women undergoing an episiotomy. The decision to undertake an episiotomy was at the discretion of the physician, and policies regarding the use of episiotomy varied between physicians as well as over the study period. Some physicians may have been more likely to undertake an episiotomy if they noticed the presence of a perineal scar. If this were the case, the exclusion of women who underwent an episiotomy at the second delivery could have resulted in an underestimation of the strength of the association between perineal trauma at the first delivery and spontaneous tears at the second delivery. However, if the reasons women had an episiotomy at their second delivery were independent of the state of the perineum at the first delivery, the influence on the association is unpredictable and would depend on underlying unknown risk of spontaneous tear in women with and without episiotomy. Reanalyzing our data to include the women who had an episiotomy at their second delivery showed that the strength of the association decreases, but history of perineal trauma remains a clinically and statistically significant risk factor for such trauma at the second delivery.
Our analysis took into account most of the variables known to be related to the risk of perineal trauma. However, we did not have information on the duration of the second stage, the use of oxytocin in the second stage of labor, and the delivery position. These variables would confound our results if they were related to a history of perineal trauma and independent risk factors for perineal tears in subsequent deliveries. Confounding by these variables cannot be eliminated but appears unlikely, since stronger risk factors such as excessive birth weight and shoulder dystocia did not introduce any confounding in our data. Another possible explanation for the observed association is that some perinea might be inherently more prone to tearing than others, possibly because of genetic factors.
Conclusions
Our study shows that the risk of spontaneous perineal tears at the second delivery increases with the presence and the severity of perineal trauma at the first delivery. These results support arguments for the prevention of perineal trauma at the first delivery and the selective use of episiotomy.
Acknowledgements
Dr Marcoux holds a National Health Research Scholarship from Health Canada.
Related resources
FOR PATIENTS:
- ParentsPlace.com—Perineal Massage: Your How-to Guidehttp://www.parentsplace.com/pregnancy/labor/qa/0,3105,13778,00.html
- Childbirth.org—Perineal Massagehttp://www.childbirth.org/articles/massage.html
FOR FAMILY PHYSICIANS:
- Obstetric Myths Versus Research Realitieshttp://www.efn.org/~djz/birth/obmyth/epis.html
- Wooley RJ. Benefits and risks of episiotomy: A review of the English-language literature since 1980 http://www.gentlebirth.org/format/woolley.html(the best one on episotomy)
- Carroli G, Belizan J. Episiotomy for vaginal birth (Cochrane Review)http://www.update-software.com/abstracts/ab000081.htm (Only the abstract is available free.)
1. Klein MC, Gauthier RJ, Robbins JM, et al. Relationship of episiotomy to perineal trauma and morbidity, sexual dysfunction, and pelvic floor relaxation. Am J Obstet Gynecol 1994;171:591-98.
2. Weijmar Schultz WCM, van de Wiel HBM, Heidemann R, Aarnoudse JG, Huisjes HJ. Perineal pain and dyspareunia after uncomplicated primiparous delivery. J Psychosom Obstet Gynecol 1990;11:119-27.
3. Harrison RF, Brennan M, North PM, Reed JV, Wickham EA. Is routine episiotomy necessary? Br J Med 1984;288:1971-75.
4. Larsson PG, Platz-Christensen JJ, Bergman B, Wallstersson G. Advantage or disadvantage of episiotomy compared with spontaneous perineal laceration. Gynecol Obstet Invest 1991;31:213-16.
5. Thacker SB, Banta HD. Benefits and risks of episiotomy: an interpretive review of the English language literature, 1860-1980. Obstet Gynecol Surv 1983;38:322-38.
6. Hordnes K, Bergsjo P. Severe lacerations after childbirth. Acta Obstet Gynecol Scand 1993;72:413-22.
7. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part II. Obstet Gynecol Surv 1995;50:821-35.
8. Rockner G, Olund A. The use of episiotomy in primiparas in Sweden: a descriptive study with particular focus on two hospitals. Acta Obstet Gynecol Scand 1991;70:325-30.
9. Henriksen TB, Moller Bek K, Hedegaard M, Secher NJ. Methods and consequences of changes in use of episiotomy. BMJ 1994;309:1255-58.
10. Labrecque M, Baillargeon L, Dallaire M, Tremblay A, Pinault JJ, Gingras S. Association between median episiotomy and severe perineal lacerations in primiparous women. Can Med Assoc J 1997;156:797-802.
11. Hueston WJ, Rudy M. Differences in labor and delivery experience in family physician. and obstetrician-supervised teaching services. Fam Med 1995;27:182-87.
12. Ruderman J, Carroll JC, Reid AJ, Murray MA. Episiotomy: differences in practice between family physicians and obstetricians. Can Fam Phys 1992;38:2583-89.
13. Lede RL, Belizan JM, Carroli G. Is routine use of episiotomy justified? Am J Obstet Gynecol 1996;174:1399-402.
14. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part I. Obstet Gynecol Surv 1995;50:806-20.
15. Gass MS, Dunn C, Stys SJ. Effect of episiotomy on the frequency of vaginal outlet lacerations. J Reprod Med 1986;31:240-44.
16. Klein MC, Gauthier RJ, Jorgensen SH, et al. Does episiotomy prevent perineal trauma and pelvic floor relaxation? Online J Curr Clin Trials 1992;2(doc no.10).-
17. Walker MPR, Farine D, Rolbin SH, Ritchie JWK. Epidural anesthesia, episiotomy, and obstetric laceration. Obstet Gynecol 1991;77:668-71.
18. Wilcox LS, Strobino DM, Baruffi G, Dellinger W. Episiotomy and its role in the incidence of perineal lacerations in a maternity center and a tertiary hospital obstetric service. Am J Obstet Gynecol 1989;160:1047-52.
19. Borgatta L, Piening SL, Cohen WR. Association of episiotomy and delivery position with deep perineal laceration during spontaneous delivery in nulliparous women. Am J Obstet Gynecol 1989;160:294-97.
20. Green JR, Soohoo SL. Factors associated with rectal injury in spontaneous deliveries. Obstet Gynecol 1989;73:732-38.
21. Moller Bek K, Laurberg S. Intervention during labor: risk factors associated with complete tear of the anal sphincter. Acta Obstet Gynecol Scand 1992;71:520-24.
22. Donnelly V, Fynes M, Campbell D, Johnson H, O’Connell PR, O’Herlihy C. Obstetric events leading to anal sphincter damage. Obstet Gynecol 1998;92:955-61.
23. Shiono P, Klebanoff MA, Carey JC. Midline episiotomies: more harm than good? Obstet Gynecol 1990;75:765-70.
24. Combs CA, Robertson PA, Laros RK. Risk factors for third-degree and fourth-degree lacerations in forceps and vacuum deliveries. Am J Obstet Gynecol 1990;163:100-04.
25. Sleep J, Grant A. West Berkshire perineal management trial: three year follow up. BMJ 1987;295:749-51.
26. Payne TN, Carey JC, Rayburn WF. Prior third- or fourth-degree perineal tears and recurrence risks. Int J Gynaecol Obstet 1999;64:55-57.
27. Peleg D, Kennedy CM, Merrill D, Zlatnik FJ. Risk of repetition of a severe perineal laceration. Obstet Gynecol 1999;3:1021-24.
28. Pritchard JA, Macdonald PC. Williams Obstetrics. 15th ed. New York, NY: Appleton-Century-Crofts 1989;345-50.
29. Hosmer DW, Lemeshow S. Applied logistic regression. Toronto, Canada: John Wiley and Sons; 1989.
30. Mantel N. Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J Am Stat Assoc 1963;58:690-700.
31. Eason E, Labrecque M, Wells G, Feldman P. Preventing perineal trauma during childbirth: a systematic review. Obstet Gynecol 2000;95:464-71.
32. Shipman M, Boniface D, Tefft M, McCloghty F. Antenatal perineal massage and subsequent perineal outcomes: a randomised controlled trial. Br J Obstet Gynecol 1997;104:787-91.
33. Labrecque M, Eason E, Marcoux S, et al. Randomized controlled trial of prevention of perineal trauma by perineal massage during pregnancy. Am J Obstet Gynecol 1999;180:593-600.
1. Klein MC, Gauthier RJ, Robbins JM, et al. Relationship of episiotomy to perineal trauma and morbidity, sexual dysfunction, and pelvic floor relaxation. Am J Obstet Gynecol 1994;171:591-98.
2. Weijmar Schultz WCM, van de Wiel HBM, Heidemann R, Aarnoudse JG, Huisjes HJ. Perineal pain and dyspareunia after uncomplicated primiparous delivery. J Psychosom Obstet Gynecol 1990;11:119-27.
3. Harrison RF, Brennan M, North PM, Reed JV, Wickham EA. Is routine episiotomy necessary? Br J Med 1984;288:1971-75.
4. Larsson PG, Platz-Christensen JJ, Bergman B, Wallstersson G. Advantage or disadvantage of episiotomy compared with spontaneous perineal laceration. Gynecol Obstet Invest 1991;31:213-16.
5. Thacker SB, Banta HD. Benefits and risks of episiotomy: an interpretive review of the English language literature, 1860-1980. Obstet Gynecol Surv 1983;38:322-38.
6. Hordnes K, Bergsjo P. Severe lacerations after childbirth. Acta Obstet Gynecol Scand 1993;72:413-22.
7. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part II. Obstet Gynecol Surv 1995;50:821-35.
8. Rockner G, Olund A. The use of episiotomy in primiparas in Sweden: a descriptive study with particular focus on two hospitals. Acta Obstet Gynecol Scand 1991;70:325-30.
9. Henriksen TB, Moller Bek K, Hedegaard M, Secher NJ. Methods and consequences of changes in use of episiotomy. BMJ 1994;309:1255-58.
10. Labrecque M, Baillargeon L, Dallaire M, Tremblay A, Pinault JJ, Gingras S. Association between median episiotomy and severe perineal lacerations in primiparous women. Can Med Assoc J 1997;156:797-802.
11. Hueston WJ, Rudy M. Differences in labor and delivery experience in family physician. and obstetrician-supervised teaching services. Fam Med 1995;27:182-87.
12. Ruderman J, Carroll JC, Reid AJ, Murray MA. Episiotomy: differences in practice between family physicians and obstetricians. Can Fam Phys 1992;38:2583-89.
13. Lede RL, Belizan JM, Carroli G. Is routine use of episiotomy justified? Am J Obstet Gynecol 1996;174:1399-402.
14. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part I. Obstet Gynecol Surv 1995;50:806-20.
15. Gass MS, Dunn C, Stys SJ. Effect of episiotomy on the frequency of vaginal outlet lacerations. J Reprod Med 1986;31:240-44.
16. Klein MC, Gauthier RJ, Jorgensen SH, et al. Does episiotomy prevent perineal trauma and pelvic floor relaxation? Online J Curr Clin Trials 1992;2(doc no.10).-
17. Walker MPR, Farine D, Rolbin SH, Ritchie JWK. Epidural anesthesia, episiotomy, and obstetric laceration. Obstet Gynecol 1991;77:668-71.
18. Wilcox LS, Strobino DM, Baruffi G, Dellinger W. Episiotomy and its role in the incidence of perineal lacerations in a maternity center and a tertiary hospital obstetric service. Am J Obstet Gynecol 1989;160:1047-52.
19. Borgatta L, Piening SL, Cohen WR. Association of episiotomy and delivery position with deep perineal laceration during spontaneous delivery in nulliparous women. Am J Obstet Gynecol 1989;160:294-97.
20. Green JR, Soohoo SL. Factors associated with rectal injury in spontaneous deliveries. Obstet Gynecol 1989;73:732-38.
21. Moller Bek K, Laurberg S. Intervention during labor: risk factors associated with complete tear of the anal sphincter. Acta Obstet Gynecol Scand 1992;71:520-24.
22. Donnelly V, Fynes M, Campbell D, Johnson H, O’Connell PR, O’Herlihy C. Obstetric events leading to anal sphincter damage. Obstet Gynecol 1998;92:955-61.
23. Shiono P, Klebanoff MA, Carey JC. Midline episiotomies: more harm than good? Obstet Gynecol 1990;75:765-70.
24. Combs CA, Robertson PA, Laros RK. Risk factors for third-degree and fourth-degree lacerations in forceps and vacuum deliveries. Am J Obstet Gynecol 1990;163:100-04.
25. Sleep J, Grant A. West Berkshire perineal management trial: three year follow up. BMJ 1987;295:749-51.
26. Payne TN, Carey JC, Rayburn WF. Prior third- or fourth-degree perineal tears and recurrence risks. Int J Gynaecol Obstet 1999;64:55-57.
27. Peleg D, Kennedy CM, Merrill D, Zlatnik FJ. Risk of repetition of a severe perineal laceration. Obstet Gynecol 1999;3:1021-24.
28. Pritchard JA, Macdonald PC. Williams Obstetrics. 15th ed. New York, NY: Appleton-Century-Crofts 1989;345-50.
29. Hosmer DW, Lemeshow S. Applied logistic regression. Toronto, Canada: John Wiley and Sons; 1989.
30. Mantel N. Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J Am Stat Assoc 1963;58:690-700.
31. Eason E, Labrecque M, Wells G, Feldman P. Preventing perineal trauma during childbirth: a systematic review. Obstet Gynecol 2000;95:464-71.
32. Shipman M, Boniface D, Tefft M, McCloghty F. Antenatal perineal massage and subsequent perineal outcomes: a randomised controlled trial. Br J Obstet Gynecol 1997;104:787-91.
33. Labrecque M, Eason E, Marcoux S, et al. Randomized controlled trial of prevention of perineal trauma by perineal massage during pregnancy. Am J Obstet Gynecol 1999;180:593-600.
Physician Behaviors that Predict Patient Trust
STUDY DESIGN AND POPULATION: Patients (N=414) enrolled from 20 community-based family practices rated 18 physician behaviors and completed the Trust in Physician Scale immediately after their visits. Trust was also measured at 1 and 6 months after the visit. The association between physician behaviors and trust was examined in regard to patient sex, age, and length of relationship with the physician.
RESULTS: All behaviors were significantly associated with trust (P <.0001), with Pearson correlation coefficients (r) ranging from 0.46 to 0.64. Being comforting and caring, demonstrating competency, encouraging and answering questions, and explaining were associated with trust among all groups. However, referring to a specialist if needed was strongly associated with trust only among women (r=0.61), more established patients (r=0.62), and younger patients (r=0.63). The behaviors least important for trust were gentleness during the examination, discussing options/asking opinions, looking in the eye, and treating as an equal.
CONCLUSIONS: Caring and comfort, technical competency, and communication are the physician behaviors most strongly associated with patient trust. Further research is needed to test the hypothesis that changes in identified physician behaviors can lead to changes in the level of patient trust.
The physician-patient relationship is recognized as having an essential role in the process of medical care, providing the context in which caring and healing can occur.1-3 Patient trust in the physician has been proposed as a key feature of this relationship.1,4-6 There are several potential benefits to patient trust, including increased satisfaction, adherence to treatment, and continuity of care.6,7 Trust may also be associated with lower transaction costs,8 such as those incurred by a need to reassure patients (eg, ordering additional tests and referrals) or by inefficiencies due to incomplete disclosure of information by the patient.
Despite the apparent importance of patient trust, relatively little is known about what physician behaviors are most strongly associated with it. A previous study,1 using patient focus groups, identified 7 categories of physician behaviors that increased patients’ trust: thoroughly evaluating problems, indicating an understanding of the patient’s experience, expressing care for the patient, providing appropriate and effective treatment, communicating clearly and completely, building partnership, and demonstrating honesty and respect. The qualitative nature of the focus group data does not allow for the assessment of the relative importance of specific types of physician behaviors in predicting subsequent patient trust. Ascertaining the association between physician behaviors and patient trust is important both on a theoretical level, for what it may reveal about the nature of patient trust, and on a practical level, for guiding interventions to improve trust through physician education and training.
The goal for our study was to assess the relative importance of physician behaviors on patient trust immediately following the visit, after 1 month, and after 6 months. The behaviors chosen for measurement had been previously identified as promoting trust in patients in focus groups.1 The measurement of trust 3 times made it possible to ascertain if the physician behaviors most associated with trust immediately following a visit are those most associated with future trust. Also, the relative importance of physician behaviors for trust was explored in 3 patient subgroups: men and women patients, younger and older patients, and newer and more established patients.
Methods
Study Design and Subject Recruitment
This was a 6-month prospective study. Consecutive eligible patients were enrolled from the practices of 20 family physicians recruited by mail from a single geographic area based on their interest in practice-based research and physician-patient communication.6,9 The patients were recruited by a research assistant who approached them in the waiting room after they had checked in and before they were brought to an examination room. Patients younger than 18 years, those unable to complete the questionnaire, and those in acute distress were excluded. In addition, patients with no previous visits to the study physician or who did not identify the study physician as their primary care physician were excluded. All patients signed an informed consent form at the time of enrollment.
Measures
Each physician provided demographic and practice characteristic data. Measures obtained from patients in the waiting room or examination room at the time of their enrollment (the previsit questionnaire) included: demographics, length of relationship with physician, number and type of chronic medical conditions, and health status (measured by the Medical Outcomes Study Short Form-36).10 Following the office visit, patients completed a postvisit questionnaire concerning the physician’s interpersonal behavior during the visit, their trust in the physician, and their satisfaction with the visit (measured by a subset of 13 questions from the Consumer Satisfaction Survey).6,11 Approximately 80% of the patients completed this form in the waiting room after the visit; the remaining 20% completed it within 24 hours and returned it by mail. The 18 items pertaining to physician interpersonal behavior Table W1* were chosen to assess the physician behaviors identified in the previous patient focus groups as affecting patient trust.1 Fourteen items were taken from the 23-item version of the Humanistic Behaviors Questionnaire developed by the American Board of Internal Medicine.12 This questionnaire was chosen because it had items pertaining to most of the behaviors identified in the focus groups as affecting patient trust, including receptive and expressive communication (listening and explaining), treating patients with warmth and respect, gentleness, honesty, partnership, and willingness to refer to a specialist. Four items were added to assess additional behaviors identified from the focus groups: finding out all the reasons for the visit, respecting opinions and feelings, caring and concern, and demonstrating competency to diagnose and treat. Patients rated physician performance of each behavior item on a 5-point Likert-type scale, from poor to excellent. The questionnaire was piloted for clarity and acceptability.
Patient trust was measured using a slightly modified version of the 11-item Trust in the Physician Scale developed from Anderson and Dedrick13 as previously described.6 At the time of the study, the Trust in Physician Scale was the only published measure of patient trust. One item, “My doctor is a real expert in taking care of medical problems like mine” was modified to read “My doctor is well qualified to manage (diagnose and treat or make an appropriate referral) medical problems like mine,” to be appropriate for the primary care setting. On the basis of a pilot study of the scale, response labels were changed from (1=strongly disagree; 2=disagree; 3=uncertain; 4=agree; 5=strongly agree) to (1=totally disagree; 2=disagree; 3=neutral; 4=agree; 5=totally agree).6 The scale was scored by transforming the mean response score (calculated after reverse coding the negative items) to a 0 to 100 scale. Patient trust was assessed again at 1 and 6 months after the enrollment visit by mail survey.
Statistical Analysis
The association between specific physician behaviors and level of trust was assessed using Pearson correlation coefficients. Behaviors were ranked on the basis of the relative strength of the correlations.
Results
Of 803 consecutive patients, 561 were eligible for enrollment. Of those, 414 (74%) enrolled and completed the previsit and postvisit questionnaires at the time of the index visit, 52 (9%) refused, 15 (3%) saw the physician and left before being approached by the research assistant, and 74 (13%) enrolled but failed to complete both questionnaires.
Among the 414 enrolled patients, 334 (81%) completed the 1-month and 343 (83%) completed the 6-month follow-up questionnaires. Patients who did not complete the 1-month and 6-month questionnaires were compared with those who did with respect to age, length of the relationship with the physician, sex, race, education, and self-reported health status. Those who did complete the questionnaires at 6 months were virtually the same as those who did not at 1 month with respect to these characteristics. Those who did not complete the questionnaires at 6 months were slightly younger (45 vs 48 years), had been seeing their physician for a slightly shorter time at study enrollment (mean = 37 months vs 43 months), were more likely to be men (45% vs 36%), and were more likely to be nonwhite (41% vs 31%) than those who did complete the questionnaire, but they were virtually identical with respect to education and health status. None of these differences reached statistical significance.
The average age of the physicians was 47 years (range=34-73 years), with an average of 16 years in practice (range=8-44 years). Physicians were predominately men (85%) and white (70%), and most were in group practice (70%). Patients also had a mean age of 47 years and were predominately women (62%). Approximately two thirds (68%) of the patients were white, and 81% had graduated from high school. More than half (55%) reported at least one chronic medical condition, and almost half (45%) reported their health as being less than very good.
The correlation between physician behaviors during the visit, as rated by the patient, and trust immediately following the visit ranged from 0.46 to 0.64 Table 1. The 5 behaviors that were most strongly associated with trust immediately after the visit were: (1) being comforting and caring, (2) demonstrating competency, (3) encouraging and answering questions, (4) explaining what they were doing, and (5) referring to a specialist if needed. Behaviors least important for trust were: (1) gentleness during examination, (2) discussing options/asking opinions, (3) making eye contact, and (4) treating as an equal. Correlations between specific behaviors and trust decreased over time, with a range of 0.38 to 0.58 at 1 month and 0.27 to 0.46 at 6 months after the initial visit. The same pattern of the strength of associations between trust and specific behaviors remain essentially stable, with the exceptions of “being available when needed” and “working to adjust treatment.” Being available when needed was one of the behaviors least associated with trust at the index visit (ranked 16th out of 19) but moved up to be ranked 12th at 1 month and sixth at 6 months. Working to adjust treatment was also less important at the initial visit (ranked 14th out of 18), compared with 1 month (ranked 7th) and 6 months (ranked 8th).
The associations between specific behaviors and trust at the time of the enrollment visit were examined for the following subgroups of patients: men versus women, aged younger than 45 years versus 45 years and older, and length of relationship 2 years or less versus more than 2 years Table W2*. These subgroups were selected a priori for exploration and generation of hypotheses but without any particular hypothesis regarding the pattern of associations between behaviors and trust within each subgroup. As shown in Table 2, being comforting and caring, demonstrating competency in diagnosis and treatment, and expressive communication (encouraging and answering questions, explaining, and checking understanding) were among the behaviors most strongly associated with trust for all groups. Letting the whole story be told or finding out all the reasons for the visits (2 receptive communication behaviors) were strongly associated with trust in most of the groups. Referring to a specialist if needed was one of the behaviors most strongly associated with trust among women, younger patients, and established patients. Respecting feelings and opinions was among the most strongly associated behaviors only for younger patients, and checking progress was among the most strongly associated behaviors only for women.
Identical analyses were performed examining the association between specific physician behaviors and patient satisfaction following the enrollment visit. All physician behaviors were more highly correlated with patient satisfaction than with patient trust, ranging from 0.59 to 0.75 for satisfaction. In general, the pattern of correlation between behaviors and satisfaction was very similar to the pattern for trust. For the total sample, the 4 behaviors most strongly associated with trust were among the 5 behaviors most strongly associated with satisfaction. As with trust, there was relatively little variation in the associations with behaviors by patient subgroups.
Discussion
The strength of the association between key physician behaviors during the office visit on subsequent patient trust in the physician was assessed. The behaviors assessed were previously identified as affecting patient trust in a study using patient focus groups.1 This study found that the behaviors assessed were predictive of patient trust up to 6 months after the initial visit, though the strength of the association decreased over time. There were relatively modest differences in the strength of the associations between behaviors and trust among the patient subgroups examined. Being comforting and caring, demonstrating competency, and explaining and listening were most strongly associated with trust in all, or virtually all, the subgroups. For women, referral to a specialist if needed and checking progress were also strongly associated with trust. The relative importance of referrals among women may reflect a concern for seeing a specialist for reproductive-related conditions. Referral was also more strongly associated with trust among more established patients, perhaps because these patients were more likely to have experienced a need for referral from their current physician at some time. For younger patients, willingness to refer and respect for feelings, opinions, and self-knowledge were among the most important behaviors, possibly reflecting differences in expectations for physician behaviors among younger versus older patients.
Interestingly, treating the patient as being on the same level and asking the patient’s opinion, while significantly associated with trust, were among the physician behaviors least associated with trust. This finding does not mean that equality and partnership are unimportant. The degree to which patients want to be involved in making decisions about their care varies,14 and patients may choose to stay with physicians whose practice style fits their preferences for involvement in their care. Bedside manners, such as gentleness during the examination, greeting warmly, and making eye contact, while significantly associated with trust, were also among the least strongly associated. These behaviors, while desirable, may be less important to establishing trust.
It was also found that to a large extent the same physician behaviors most associated with trust are also most associated with satisfaction, though the associations are stronger with satisfaction. A previous paper has reported data indicating that patient trust is somewhat separate from satisfaction, predicting continuity and self-reported adherence to treatment independently of satisfaction.6 One possible interpretation of this finding is that physician behaviors that lead to satisfaction in a single visit also help build trust, but trust is more dependent on factors in addition to physician behaviors during the office visit. No previous studies could be located that reported on the association of physician behaviors with patient trust. However studies of physician behaviors and patient satisfaction, have found that interpersonal competence (similar to comforting and caring in this study), communication, and technical competency were all significantly associated with satisfaction, a result confirmed in our study.15-17
Limitations
Patients’ ratings of physician behaviors may reflect their overall positive feelings toward the physician. Thus it is not possible to conclude that differences in the specific physician behaviors cause differences in trust. However, identifying the behaviors most strongly associated with trust may help to focus future intervention studies on these behaviors.
Conclusions
The results suggest that caring and comfort are as important as technical competency in predicting patient trust. Also, expressive and receptive communication skills, which have been shown as strongly related to patient satisfaction, are also important predictors of trust. Although the relative importance of a few other behaviors differed between subgroups, these differences were relatively modest, suggesting that the listed behaviors are of general importance to patient trust. Further work is needed to test the hypothesis that changes in identified physician behaviors can modify levels of patient trust.
Acknowledgements
This study was supported in part by grants from the Picker/Commonwealth Fund (#94-130) and the Bayer Institute for Health Care Communication (#94-181). The author thanks Barbara Elspas, MPH, for her fine work as study coordinator. The participating Stanford Trust Study Physicians were: William G. Broad, MD, (Palo Alto, Calif); Lawrence J. Bruguera, MD, (Half Moon Bay, Calif); David R. Ehrenberger, MD, (Mountain View, Calif); Larry A. Freeman, MD, (Palo Alto, Calif); Robert J. Fuss, MD, (Milpitas, Calif); H. Wallace Greig, MD, (San Jose, Calif); Mary P. Hufty, MD, (Palo Alto, Calif); Carlos F. Inocencio, MD, (Los Altos, Calif); Steven R. Lane, MD, MPH, (Palo Alto, Calif), Jas P. Lockhart, MD, (Menlo Park, Calif); Jeffrey S. McClanahan, MD, (Cupertino, Calif); Catherine A. Owen, MD, (Half Moon Bay, Calif); William E. Page, MD, (Palo Alto, Calif); Kuljeet S. Rai, MD, (San Jose, Calif); Daljeet S. Rai, MD, (San Jose, Calif); Paulita R. Ramos, MD, (San Jose, Calif); William E. Straw, MD, (Los Altos, Calif); William S. Warshal, MD, (Campbell, Calif); Roger W. Washington, MD, (Mountain View, Calif); and Andrew W. White, MD, (Menlo Park, Calif).
Related resources
- Picker Institute—non-profit organization that offers products and services aimed at improving health care “through the eyes of the patient.”http://www.picker.org
- American Academy on Physician and Patient—professional society dedicated to research, education, and professional standards in doctor-patient communication.www.physicianpatient.org
- The Program in Communication and Medicine at Northwestern Universitywww.pcm.northwestern.edu
1. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
2. Peabody FW The care of the patient. J Am Med Assoc 1927;88:877-82.
3. Brody H. Relationship-centered care: beyond finishing school. J Am Board Fam Pract 1995;8:416-18.
4. Leopold N, Cooper M, Clancy C. Sustained partnership in primary care. J Fam Pract 1996;42:129-37.
5. Mechanic D, Schlesinger M. The impact of managed care on patents’ trust in medical care and their physicians. J Am Med Assoc 1996;275:1693-97.
6. Thom DH, Ribisl KM, Stewart AL, Luke DA. Validation of a measure of patients’ trust in their physician: the Trust in Physician Scale. Med Care 1999;37:510-17.
7. Safran DG, Taira DA, Rogers WH, Kosinski M, Ware JE, Tarlov AR. Linking primary care performance to outcomes of care. J Fam Pract 1998;47:213-20.
8. Creed WED, Miles R. Trust in organizations. In: Kramer RM, Tyler TR, eds. Trust in organizations: frontiers of theory and research. Thousand Oaks, Calif: Sage Publications; 1996;26-27.
9. Thom DH. Training physicians to increase patient trust. J Eval Clin Pract 2000;6:249-55.
10. McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994;32:40-66.
11. Davis AR, Ware JE. GHAA’s consumer satisfaction survey. Washington, DC: Group Health Association of America; 1991.
12. American Board of Internal Medicine. Guide to awareness and evaluation of humanistic qualities in the internist. Philadelphia, Pa: American Board of Internal Medicine; 1992.
13. Anderson LA, Dedrick RF. Development of the trust in physician scale: a measure to assess interpersonal trust in patient-physician relationships. Psych Rep 1990;67:1091-100.
14. Brody DS, Miller SM, Lerman CE, Smith DG, Caputo GC. Patient perception of involvement in medial care: relationship to illness attitudes and outcomes. J Gen Intern Med 1989;506-11.
15. Hall JA, Roter DL, Katz NR. Meta-analysis of correlates of provider behavior in medical encounters. Med Care 1988;26:657-75.
16. Lochman JE. Factors related to patients’ satisfaction with their medical care. J Comm Health 1983;9:91-109.
17. DiMatteo MR, Hays R. The significance of patients’ perceptions of physician conduct. J Comm Health 1980;6:18-34.
STUDY DESIGN AND POPULATION: Patients (N=414) enrolled from 20 community-based family practices rated 18 physician behaviors and completed the Trust in Physician Scale immediately after their visits. Trust was also measured at 1 and 6 months after the visit. The association between physician behaviors and trust was examined in regard to patient sex, age, and length of relationship with the physician.
RESULTS: All behaviors were significantly associated with trust (P <.0001), with Pearson correlation coefficients (r) ranging from 0.46 to 0.64. Being comforting and caring, demonstrating competency, encouraging and answering questions, and explaining were associated with trust among all groups. However, referring to a specialist if needed was strongly associated with trust only among women (r=0.61), more established patients (r=0.62), and younger patients (r=0.63). The behaviors least important for trust were gentleness during the examination, discussing options/asking opinions, looking in the eye, and treating as an equal.
CONCLUSIONS: Caring and comfort, technical competency, and communication are the physician behaviors most strongly associated with patient trust. Further research is needed to test the hypothesis that changes in identified physician behaviors can lead to changes in the level of patient trust.
The physician-patient relationship is recognized as having an essential role in the process of medical care, providing the context in which caring and healing can occur.1-3 Patient trust in the physician has been proposed as a key feature of this relationship.1,4-6 There are several potential benefits to patient trust, including increased satisfaction, adherence to treatment, and continuity of care.6,7 Trust may also be associated with lower transaction costs,8 such as those incurred by a need to reassure patients (eg, ordering additional tests and referrals) or by inefficiencies due to incomplete disclosure of information by the patient.
Despite the apparent importance of patient trust, relatively little is known about what physician behaviors are most strongly associated with it. A previous study,1 using patient focus groups, identified 7 categories of physician behaviors that increased patients’ trust: thoroughly evaluating problems, indicating an understanding of the patient’s experience, expressing care for the patient, providing appropriate and effective treatment, communicating clearly and completely, building partnership, and demonstrating honesty and respect. The qualitative nature of the focus group data does not allow for the assessment of the relative importance of specific types of physician behaviors in predicting subsequent patient trust. Ascertaining the association between physician behaviors and patient trust is important both on a theoretical level, for what it may reveal about the nature of patient trust, and on a practical level, for guiding interventions to improve trust through physician education and training.
The goal for our study was to assess the relative importance of physician behaviors on patient trust immediately following the visit, after 1 month, and after 6 months. The behaviors chosen for measurement had been previously identified as promoting trust in patients in focus groups.1 The measurement of trust 3 times made it possible to ascertain if the physician behaviors most associated with trust immediately following a visit are those most associated with future trust. Also, the relative importance of physician behaviors for trust was explored in 3 patient subgroups: men and women patients, younger and older patients, and newer and more established patients.
Methods
Study Design and Subject Recruitment
This was a 6-month prospective study. Consecutive eligible patients were enrolled from the practices of 20 family physicians recruited by mail from a single geographic area based on their interest in practice-based research and physician-patient communication.6,9 The patients were recruited by a research assistant who approached them in the waiting room after they had checked in and before they were brought to an examination room. Patients younger than 18 years, those unable to complete the questionnaire, and those in acute distress were excluded. In addition, patients with no previous visits to the study physician or who did not identify the study physician as their primary care physician were excluded. All patients signed an informed consent form at the time of enrollment.
Measures
Each physician provided demographic and practice characteristic data. Measures obtained from patients in the waiting room or examination room at the time of their enrollment (the previsit questionnaire) included: demographics, length of relationship with physician, number and type of chronic medical conditions, and health status (measured by the Medical Outcomes Study Short Form-36).10 Following the office visit, patients completed a postvisit questionnaire concerning the physician’s interpersonal behavior during the visit, their trust in the physician, and their satisfaction with the visit (measured by a subset of 13 questions from the Consumer Satisfaction Survey).6,11 Approximately 80% of the patients completed this form in the waiting room after the visit; the remaining 20% completed it within 24 hours and returned it by mail. The 18 items pertaining to physician interpersonal behavior Table W1* were chosen to assess the physician behaviors identified in the previous patient focus groups as affecting patient trust.1 Fourteen items were taken from the 23-item version of the Humanistic Behaviors Questionnaire developed by the American Board of Internal Medicine.12 This questionnaire was chosen because it had items pertaining to most of the behaviors identified in the focus groups as affecting patient trust, including receptive and expressive communication (listening and explaining), treating patients with warmth and respect, gentleness, honesty, partnership, and willingness to refer to a specialist. Four items were added to assess additional behaviors identified from the focus groups: finding out all the reasons for the visit, respecting opinions and feelings, caring and concern, and demonstrating competency to diagnose and treat. Patients rated physician performance of each behavior item on a 5-point Likert-type scale, from poor to excellent. The questionnaire was piloted for clarity and acceptability.
Patient trust was measured using a slightly modified version of the 11-item Trust in the Physician Scale developed from Anderson and Dedrick13 as previously described.6 At the time of the study, the Trust in Physician Scale was the only published measure of patient trust. One item, “My doctor is a real expert in taking care of medical problems like mine” was modified to read “My doctor is well qualified to manage (diagnose and treat or make an appropriate referral) medical problems like mine,” to be appropriate for the primary care setting. On the basis of a pilot study of the scale, response labels were changed from (1=strongly disagree; 2=disagree; 3=uncertain; 4=agree; 5=strongly agree) to (1=totally disagree; 2=disagree; 3=neutral; 4=agree; 5=totally agree).6 The scale was scored by transforming the mean response score (calculated after reverse coding the negative items) to a 0 to 100 scale. Patient trust was assessed again at 1 and 6 months after the enrollment visit by mail survey.
Statistical Analysis
The association between specific physician behaviors and level of trust was assessed using Pearson correlation coefficients. Behaviors were ranked on the basis of the relative strength of the correlations.
Results
Of 803 consecutive patients, 561 were eligible for enrollment. Of those, 414 (74%) enrolled and completed the previsit and postvisit questionnaires at the time of the index visit, 52 (9%) refused, 15 (3%) saw the physician and left before being approached by the research assistant, and 74 (13%) enrolled but failed to complete both questionnaires.
Among the 414 enrolled patients, 334 (81%) completed the 1-month and 343 (83%) completed the 6-month follow-up questionnaires. Patients who did not complete the 1-month and 6-month questionnaires were compared with those who did with respect to age, length of the relationship with the physician, sex, race, education, and self-reported health status. Those who did complete the questionnaires at 6 months were virtually the same as those who did not at 1 month with respect to these characteristics. Those who did not complete the questionnaires at 6 months were slightly younger (45 vs 48 years), had been seeing their physician for a slightly shorter time at study enrollment (mean = 37 months vs 43 months), were more likely to be men (45% vs 36%), and were more likely to be nonwhite (41% vs 31%) than those who did complete the questionnaire, but they were virtually identical with respect to education and health status. None of these differences reached statistical significance.
The average age of the physicians was 47 years (range=34-73 years), with an average of 16 years in practice (range=8-44 years). Physicians were predominately men (85%) and white (70%), and most were in group practice (70%). Patients also had a mean age of 47 years and were predominately women (62%). Approximately two thirds (68%) of the patients were white, and 81% had graduated from high school. More than half (55%) reported at least one chronic medical condition, and almost half (45%) reported their health as being less than very good.
The correlation between physician behaviors during the visit, as rated by the patient, and trust immediately following the visit ranged from 0.46 to 0.64 Table 1. The 5 behaviors that were most strongly associated with trust immediately after the visit were: (1) being comforting and caring, (2) demonstrating competency, (3) encouraging and answering questions, (4) explaining what they were doing, and (5) referring to a specialist if needed. Behaviors least important for trust were: (1) gentleness during examination, (2) discussing options/asking opinions, (3) making eye contact, and (4) treating as an equal. Correlations between specific behaviors and trust decreased over time, with a range of 0.38 to 0.58 at 1 month and 0.27 to 0.46 at 6 months after the initial visit. The same pattern of the strength of associations between trust and specific behaviors remain essentially stable, with the exceptions of “being available when needed” and “working to adjust treatment.” Being available when needed was one of the behaviors least associated with trust at the index visit (ranked 16th out of 19) but moved up to be ranked 12th at 1 month and sixth at 6 months. Working to adjust treatment was also less important at the initial visit (ranked 14th out of 18), compared with 1 month (ranked 7th) and 6 months (ranked 8th).
The associations between specific behaviors and trust at the time of the enrollment visit were examined for the following subgroups of patients: men versus women, aged younger than 45 years versus 45 years and older, and length of relationship 2 years or less versus more than 2 years Table W2*. These subgroups were selected a priori for exploration and generation of hypotheses but without any particular hypothesis regarding the pattern of associations between behaviors and trust within each subgroup. As shown in Table 2, being comforting and caring, demonstrating competency in diagnosis and treatment, and expressive communication (encouraging and answering questions, explaining, and checking understanding) were among the behaviors most strongly associated with trust for all groups. Letting the whole story be told or finding out all the reasons for the visits (2 receptive communication behaviors) were strongly associated with trust in most of the groups. Referring to a specialist if needed was one of the behaviors most strongly associated with trust among women, younger patients, and established patients. Respecting feelings and opinions was among the most strongly associated behaviors only for younger patients, and checking progress was among the most strongly associated behaviors only for women.
Identical analyses were performed examining the association between specific physician behaviors and patient satisfaction following the enrollment visit. All physician behaviors were more highly correlated with patient satisfaction than with patient trust, ranging from 0.59 to 0.75 for satisfaction. In general, the pattern of correlation between behaviors and satisfaction was very similar to the pattern for trust. For the total sample, the 4 behaviors most strongly associated with trust were among the 5 behaviors most strongly associated with satisfaction. As with trust, there was relatively little variation in the associations with behaviors by patient subgroups.
Discussion
The strength of the association between key physician behaviors during the office visit on subsequent patient trust in the physician was assessed. The behaviors assessed were previously identified as affecting patient trust in a study using patient focus groups.1 This study found that the behaviors assessed were predictive of patient trust up to 6 months after the initial visit, though the strength of the association decreased over time. There were relatively modest differences in the strength of the associations between behaviors and trust among the patient subgroups examined. Being comforting and caring, demonstrating competency, and explaining and listening were most strongly associated with trust in all, or virtually all, the subgroups. For women, referral to a specialist if needed and checking progress were also strongly associated with trust. The relative importance of referrals among women may reflect a concern for seeing a specialist for reproductive-related conditions. Referral was also more strongly associated with trust among more established patients, perhaps because these patients were more likely to have experienced a need for referral from their current physician at some time. For younger patients, willingness to refer and respect for feelings, opinions, and self-knowledge were among the most important behaviors, possibly reflecting differences in expectations for physician behaviors among younger versus older patients.
Interestingly, treating the patient as being on the same level and asking the patient’s opinion, while significantly associated with trust, were among the physician behaviors least associated with trust. This finding does not mean that equality and partnership are unimportant. The degree to which patients want to be involved in making decisions about their care varies,14 and patients may choose to stay with physicians whose practice style fits their preferences for involvement in their care. Bedside manners, such as gentleness during the examination, greeting warmly, and making eye contact, while significantly associated with trust, were also among the least strongly associated. These behaviors, while desirable, may be less important to establishing trust.
It was also found that to a large extent the same physician behaviors most associated with trust are also most associated with satisfaction, though the associations are stronger with satisfaction. A previous paper has reported data indicating that patient trust is somewhat separate from satisfaction, predicting continuity and self-reported adherence to treatment independently of satisfaction.6 One possible interpretation of this finding is that physician behaviors that lead to satisfaction in a single visit also help build trust, but trust is more dependent on factors in addition to physician behaviors during the office visit. No previous studies could be located that reported on the association of physician behaviors with patient trust. However studies of physician behaviors and patient satisfaction, have found that interpersonal competence (similar to comforting and caring in this study), communication, and technical competency were all significantly associated with satisfaction, a result confirmed in our study.15-17
Limitations
Patients’ ratings of physician behaviors may reflect their overall positive feelings toward the physician. Thus it is not possible to conclude that differences in the specific physician behaviors cause differences in trust. However, identifying the behaviors most strongly associated with trust may help to focus future intervention studies on these behaviors.
Conclusions
The results suggest that caring and comfort are as important as technical competency in predicting patient trust. Also, expressive and receptive communication skills, which have been shown as strongly related to patient satisfaction, are also important predictors of trust. Although the relative importance of a few other behaviors differed between subgroups, these differences were relatively modest, suggesting that the listed behaviors are of general importance to patient trust. Further work is needed to test the hypothesis that changes in identified physician behaviors can modify levels of patient trust.
Acknowledgements
This study was supported in part by grants from the Picker/Commonwealth Fund (#94-130) and the Bayer Institute for Health Care Communication (#94-181). The author thanks Barbara Elspas, MPH, for her fine work as study coordinator. The participating Stanford Trust Study Physicians were: William G. Broad, MD, (Palo Alto, Calif); Lawrence J. Bruguera, MD, (Half Moon Bay, Calif); David R. Ehrenberger, MD, (Mountain View, Calif); Larry A. Freeman, MD, (Palo Alto, Calif); Robert J. Fuss, MD, (Milpitas, Calif); H. Wallace Greig, MD, (San Jose, Calif); Mary P. Hufty, MD, (Palo Alto, Calif); Carlos F. Inocencio, MD, (Los Altos, Calif); Steven R. Lane, MD, MPH, (Palo Alto, Calif), Jas P. Lockhart, MD, (Menlo Park, Calif); Jeffrey S. McClanahan, MD, (Cupertino, Calif); Catherine A. Owen, MD, (Half Moon Bay, Calif); William E. Page, MD, (Palo Alto, Calif); Kuljeet S. Rai, MD, (San Jose, Calif); Daljeet S. Rai, MD, (San Jose, Calif); Paulita R. Ramos, MD, (San Jose, Calif); William E. Straw, MD, (Los Altos, Calif); William S. Warshal, MD, (Campbell, Calif); Roger W. Washington, MD, (Mountain View, Calif); and Andrew W. White, MD, (Menlo Park, Calif).
Related resources
- Picker Institute—non-profit organization that offers products and services aimed at improving health care “through the eyes of the patient.”http://www.picker.org
- American Academy on Physician and Patient—professional society dedicated to research, education, and professional standards in doctor-patient communication.www.physicianpatient.org
- The Program in Communication and Medicine at Northwestern Universitywww.pcm.northwestern.edu
STUDY DESIGN AND POPULATION: Patients (N=414) enrolled from 20 community-based family practices rated 18 physician behaviors and completed the Trust in Physician Scale immediately after their visits. Trust was also measured at 1 and 6 months after the visit. The association between physician behaviors and trust was examined in regard to patient sex, age, and length of relationship with the physician.
RESULTS: All behaviors were significantly associated with trust (P <.0001), with Pearson correlation coefficients (r) ranging from 0.46 to 0.64. Being comforting and caring, demonstrating competency, encouraging and answering questions, and explaining were associated with trust among all groups. However, referring to a specialist if needed was strongly associated with trust only among women (r=0.61), more established patients (r=0.62), and younger patients (r=0.63). The behaviors least important for trust were gentleness during the examination, discussing options/asking opinions, looking in the eye, and treating as an equal.
CONCLUSIONS: Caring and comfort, technical competency, and communication are the physician behaviors most strongly associated with patient trust. Further research is needed to test the hypothesis that changes in identified physician behaviors can lead to changes in the level of patient trust.
The physician-patient relationship is recognized as having an essential role in the process of medical care, providing the context in which caring and healing can occur.1-3 Patient trust in the physician has been proposed as a key feature of this relationship.1,4-6 There are several potential benefits to patient trust, including increased satisfaction, adherence to treatment, and continuity of care.6,7 Trust may also be associated with lower transaction costs,8 such as those incurred by a need to reassure patients (eg, ordering additional tests and referrals) or by inefficiencies due to incomplete disclosure of information by the patient.
Despite the apparent importance of patient trust, relatively little is known about what physician behaviors are most strongly associated with it. A previous study,1 using patient focus groups, identified 7 categories of physician behaviors that increased patients’ trust: thoroughly evaluating problems, indicating an understanding of the patient’s experience, expressing care for the patient, providing appropriate and effective treatment, communicating clearly and completely, building partnership, and demonstrating honesty and respect. The qualitative nature of the focus group data does not allow for the assessment of the relative importance of specific types of physician behaviors in predicting subsequent patient trust. Ascertaining the association between physician behaviors and patient trust is important both on a theoretical level, for what it may reveal about the nature of patient trust, and on a practical level, for guiding interventions to improve trust through physician education and training.
The goal for our study was to assess the relative importance of physician behaviors on patient trust immediately following the visit, after 1 month, and after 6 months. The behaviors chosen for measurement had been previously identified as promoting trust in patients in focus groups.1 The measurement of trust 3 times made it possible to ascertain if the physician behaviors most associated with trust immediately following a visit are those most associated with future trust. Also, the relative importance of physician behaviors for trust was explored in 3 patient subgroups: men and women patients, younger and older patients, and newer and more established patients.
Methods
Study Design and Subject Recruitment
This was a 6-month prospective study. Consecutive eligible patients were enrolled from the practices of 20 family physicians recruited by mail from a single geographic area based on their interest in practice-based research and physician-patient communication.6,9 The patients were recruited by a research assistant who approached them in the waiting room after they had checked in and before they were brought to an examination room. Patients younger than 18 years, those unable to complete the questionnaire, and those in acute distress were excluded. In addition, patients with no previous visits to the study physician or who did not identify the study physician as their primary care physician were excluded. All patients signed an informed consent form at the time of enrollment.
Measures
Each physician provided demographic and practice characteristic data. Measures obtained from patients in the waiting room or examination room at the time of their enrollment (the previsit questionnaire) included: demographics, length of relationship with physician, number and type of chronic medical conditions, and health status (measured by the Medical Outcomes Study Short Form-36).10 Following the office visit, patients completed a postvisit questionnaire concerning the physician’s interpersonal behavior during the visit, their trust in the physician, and their satisfaction with the visit (measured by a subset of 13 questions from the Consumer Satisfaction Survey).6,11 Approximately 80% of the patients completed this form in the waiting room after the visit; the remaining 20% completed it within 24 hours and returned it by mail. The 18 items pertaining to physician interpersonal behavior Table W1* were chosen to assess the physician behaviors identified in the previous patient focus groups as affecting patient trust.1 Fourteen items were taken from the 23-item version of the Humanistic Behaviors Questionnaire developed by the American Board of Internal Medicine.12 This questionnaire was chosen because it had items pertaining to most of the behaviors identified in the focus groups as affecting patient trust, including receptive and expressive communication (listening and explaining), treating patients with warmth and respect, gentleness, honesty, partnership, and willingness to refer to a specialist. Four items were added to assess additional behaviors identified from the focus groups: finding out all the reasons for the visit, respecting opinions and feelings, caring and concern, and demonstrating competency to diagnose and treat. Patients rated physician performance of each behavior item on a 5-point Likert-type scale, from poor to excellent. The questionnaire was piloted for clarity and acceptability.
Patient trust was measured using a slightly modified version of the 11-item Trust in the Physician Scale developed from Anderson and Dedrick13 as previously described.6 At the time of the study, the Trust in Physician Scale was the only published measure of patient trust. One item, “My doctor is a real expert in taking care of medical problems like mine” was modified to read “My doctor is well qualified to manage (diagnose and treat or make an appropriate referral) medical problems like mine,” to be appropriate for the primary care setting. On the basis of a pilot study of the scale, response labels were changed from (1=strongly disagree; 2=disagree; 3=uncertain; 4=agree; 5=strongly agree) to (1=totally disagree; 2=disagree; 3=neutral; 4=agree; 5=totally agree).6 The scale was scored by transforming the mean response score (calculated after reverse coding the negative items) to a 0 to 100 scale. Patient trust was assessed again at 1 and 6 months after the enrollment visit by mail survey.
Statistical Analysis
The association between specific physician behaviors and level of trust was assessed using Pearson correlation coefficients. Behaviors were ranked on the basis of the relative strength of the correlations.
Results
Of 803 consecutive patients, 561 were eligible for enrollment. Of those, 414 (74%) enrolled and completed the previsit and postvisit questionnaires at the time of the index visit, 52 (9%) refused, 15 (3%) saw the physician and left before being approached by the research assistant, and 74 (13%) enrolled but failed to complete both questionnaires.
Among the 414 enrolled patients, 334 (81%) completed the 1-month and 343 (83%) completed the 6-month follow-up questionnaires. Patients who did not complete the 1-month and 6-month questionnaires were compared with those who did with respect to age, length of the relationship with the physician, sex, race, education, and self-reported health status. Those who did complete the questionnaires at 6 months were virtually the same as those who did not at 1 month with respect to these characteristics. Those who did not complete the questionnaires at 6 months were slightly younger (45 vs 48 years), had been seeing their physician for a slightly shorter time at study enrollment (mean = 37 months vs 43 months), were more likely to be men (45% vs 36%), and were more likely to be nonwhite (41% vs 31%) than those who did complete the questionnaire, but they were virtually identical with respect to education and health status. None of these differences reached statistical significance.
The average age of the physicians was 47 years (range=34-73 years), with an average of 16 years in practice (range=8-44 years). Physicians were predominately men (85%) and white (70%), and most were in group practice (70%). Patients also had a mean age of 47 years and were predominately women (62%). Approximately two thirds (68%) of the patients were white, and 81% had graduated from high school. More than half (55%) reported at least one chronic medical condition, and almost half (45%) reported their health as being less than very good.
The correlation between physician behaviors during the visit, as rated by the patient, and trust immediately following the visit ranged from 0.46 to 0.64 Table 1. The 5 behaviors that were most strongly associated with trust immediately after the visit were: (1) being comforting and caring, (2) demonstrating competency, (3) encouraging and answering questions, (4) explaining what they were doing, and (5) referring to a specialist if needed. Behaviors least important for trust were: (1) gentleness during examination, (2) discussing options/asking opinions, (3) making eye contact, and (4) treating as an equal. Correlations between specific behaviors and trust decreased over time, with a range of 0.38 to 0.58 at 1 month and 0.27 to 0.46 at 6 months after the initial visit. The same pattern of the strength of associations between trust and specific behaviors remain essentially stable, with the exceptions of “being available when needed” and “working to adjust treatment.” Being available when needed was one of the behaviors least associated with trust at the index visit (ranked 16th out of 19) but moved up to be ranked 12th at 1 month and sixth at 6 months. Working to adjust treatment was also less important at the initial visit (ranked 14th out of 18), compared with 1 month (ranked 7th) and 6 months (ranked 8th).
The associations between specific behaviors and trust at the time of the enrollment visit were examined for the following subgroups of patients: men versus women, aged younger than 45 years versus 45 years and older, and length of relationship 2 years or less versus more than 2 years Table W2*. These subgroups were selected a priori for exploration and generation of hypotheses but without any particular hypothesis regarding the pattern of associations between behaviors and trust within each subgroup. As shown in Table 2, being comforting and caring, demonstrating competency in diagnosis and treatment, and expressive communication (encouraging and answering questions, explaining, and checking understanding) were among the behaviors most strongly associated with trust for all groups. Letting the whole story be told or finding out all the reasons for the visits (2 receptive communication behaviors) were strongly associated with trust in most of the groups. Referring to a specialist if needed was one of the behaviors most strongly associated with trust among women, younger patients, and established patients. Respecting feelings and opinions was among the most strongly associated behaviors only for younger patients, and checking progress was among the most strongly associated behaviors only for women.
Identical analyses were performed examining the association between specific physician behaviors and patient satisfaction following the enrollment visit. All physician behaviors were more highly correlated with patient satisfaction than with patient trust, ranging from 0.59 to 0.75 for satisfaction. In general, the pattern of correlation between behaviors and satisfaction was very similar to the pattern for trust. For the total sample, the 4 behaviors most strongly associated with trust were among the 5 behaviors most strongly associated with satisfaction. As with trust, there was relatively little variation in the associations with behaviors by patient subgroups.
Discussion
The strength of the association between key physician behaviors during the office visit on subsequent patient trust in the physician was assessed. The behaviors assessed were previously identified as affecting patient trust in a study using patient focus groups.1 This study found that the behaviors assessed were predictive of patient trust up to 6 months after the initial visit, though the strength of the association decreased over time. There were relatively modest differences in the strength of the associations between behaviors and trust among the patient subgroups examined. Being comforting and caring, demonstrating competency, and explaining and listening were most strongly associated with trust in all, or virtually all, the subgroups. For women, referral to a specialist if needed and checking progress were also strongly associated with trust. The relative importance of referrals among women may reflect a concern for seeing a specialist for reproductive-related conditions. Referral was also more strongly associated with trust among more established patients, perhaps because these patients were more likely to have experienced a need for referral from their current physician at some time. For younger patients, willingness to refer and respect for feelings, opinions, and self-knowledge were among the most important behaviors, possibly reflecting differences in expectations for physician behaviors among younger versus older patients.
Interestingly, treating the patient as being on the same level and asking the patient’s opinion, while significantly associated with trust, were among the physician behaviors least associated with trust. This finding does not mean that equality and partnership are unimportant. The degree to which patients want to be involved in making decisions about their care varies,14 and patients may choose to stay with physicians whose practice style fits their preferences for involvement in their care. Bedside manners, such as gentleness during the examination, greeting warmly, and making eye contact, while significantly associated with trust, were also among the least strongly associated. These behaviors, while desirable, may be less important to establishing trust.
It was also found that to a large extent the same physician behaviors most associated with trust are also most associated with satisfaction, though the associations are stronger with satisfaction. A previous paper has reported data indicating that patient trust is somewhat separate from satisfaction, predicting continuity and self-reported adherence to treatment independently of satisfaction.6 One possible interpretation of this finding is that physician behaviors that lead to satisfaction in a single visit also help build trust, but trust is more dependent on factors in addition to physician behaviors during the office visit. No previous studies could be located that reported on the association of physician behaviors with patient trust. However studies of physician behaviors and patient satisfaction, have found that interpersonal competence (similar to comforting and caring in this study), communication, and technical competency were all significantly associated with satisfaction, a result confirmed in our study.15-17
Limitations
Patients’ ratings of physician behaviors may reflect their overall positive feelings toward the physician. Thus it is not possible to conclude that differences in the specific physician behaviors cause differences in trust. However, identifying the behaviors most strongly associated with trust may help to focus future intervention studies on these behaviors.
Conclusions
The results suggest that caring and comfort are as important as technical competency in predicting patient trust. Also, expressive and receptive communication skills, which have been shown as strongly related to patient satisfaction, are also important predictors of trust. Although the relative importance of a few other behaviors differed between subgroups, these differences were relatively modest, suggesting that the listed behaviors are of general importance to patient trust. Further work is needed to test the hypothesis that changes in identified physician behaviors can modify levels of patient trust.
Acknowledgements
This study was supported in part by grants from the Picker/Commonwealth Fund (#94-130) and the Bayer Institute for Health Care Communication (#94-181). The author thanks Barbara Elspas, MPH, for her fine work as study coordinator. The participating Stanford Trust Study Physicians were: William G. Broad, MD, (Palo Alto, Calif); Lawrence J. Bruguera, MD, (Half Moon Bay, Calif); David R. Ehrenberger, MD, (Mountain View, Calif); Larry A. Freeman, MD, (Palo Alto, Calif); Robert J. Fuss, MD, (Milpitas, Calif); H. Wallace Greig, MD, (San Jose, Calif); Mary P. Hufty, MD, (Palo Alto, Calif); Carlos F. Inocencio, MD, (Los Altos, Calif); Steven R. Lane, MD, MPH, (Palo Alto, Calif), Jas P. Lockhart, MD, (Menlo Park, Calif); Jeffrey S. McClanahan, MD, (Cupertino, Calif); Catherine A. Owen, MD, (Half Moon Bay, Calif); William E. Page, MD, (Palo Alto, Calif); Kuljeet S. Rai, MD, (San Jose, Calif); Daljeet S. Rai, MD, (San Jose, Calif); Paulita R. Ramos, MD, (San Jose, Calif); William E. Straw, MD, (Los Altos, Calif); William S. Warshal, MD, (Campbell, Calif); Roger W. Washington, MD, (Mountain View, Calif); and Andrew W. White, MD, (Menlo Park, Calif).
Related resources
- Picker Institute—non-profit organization that offers products and services aimed at improving health care “through the eyes of the patient.”http://www.picker.org
- American Academy on Physician and Patient—professional society dedicated to research, education, and professional standards in doctor-patient communication.www.physicianpatient.org
- The Program in Communication and Medicine at Northwestern Universitywww.pcm.northwestern.edu
1. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
2. Peabody FW The care of the patient. J Am Med Assoc 1927;88:877-82.
3. Brody H. Relationship-centered care: beyond finishing school. J Am Board Fam Pract 1995;8:416-18.
4. Leopold N, Cooper M, Clancy C. Sustained partnership in primary care. J Fam Pract 1996;42:129-37.
5. Mechanic D, Schlesinger M. The impact of managed care on patents’ trust in medical care and their physicians. J Am Med Assoc 1996;275:1693-97.
6. Thom DH, Ribisl KM, Stewart AL, Luke DA. Validation of a measure of patients’ trust in their physician: the Trust in Physician Scale. Med Care 1999;37:510-17.
7. Safran DG, Taira DA, Rogers WH, Kosinski M, Ware JE, Tarlov AR. Linking primary care performance to outcomes of care. J Fam Pract 1998;47:213-20.
8. Creed WED, Miles R. Trust in organizations. In: Kramer RM, Tyler TR, eds. Trust in organizations: frontiers of theory and research. Thousand Oaks, Calif: Sage Publications; 1996;26-27.
9. Thom DH. Training physicians to increase patient trust. J Eval Clin Pract 2000;6:249-55.
10. McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994;32:40-66.
11. Davis AR, Ware JE. GHAA’s consumer satisfaction survey. Washington, DC: Group Health Association of America; 1991.
12. American Board of Internal Medicine. Guide to awareness and evaluation of humanistic qualities in the internist. Philadelphia, Pa: American Board of Internal Medicine; 1992.
13. Anderson LA, Dedrick RF. Development of the trust in physician scale: a measure to assess interpersonal trust in patient-physician relationships. Psych Rep 1990;67:1091-100.
14. Brody DS, Miller SM, Lerman CE, Smith DG, Caputo GC. Patient perception of involvement in medial care: relationship to illness attitudes and outcomes. J Gen Intern Med 1989;506-11.
15. Hall JA, Roter DL, Katz NR. Meta-analysis of correlates of provider behavior in medical encounters. Med Care 1988;26:657-75.
16. Lochman JE. Factors related to patients’ satisfaction with their medical care. J Comm Health 1983;9:91-109.
17. DiMatteo MR, Hays R. The significance of patients’ perceptions of physician conduct. J Comm Health 1980;6:18-34.
1. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
2. Peabody FW The care of the patient. J Am Med Assoc 1927;88:877-82.
3. Brody H. Relationship-centered care: beyond finishing school. J Am Board Fam Pract 1995;8:416-18.
4. Leopold N, Cooper M, Clancy C. Sustained partnership in primary care. J Fam Pract 1996;42:129-37.
5. Mechanic D, Schlesinger M. The impact of managed care on patents’ trust in medical care and their physicians. J Am Med Assoc 1996;275:1693-97.
6. Thom DH, Ribisl KM, Stewart AL, Luke DA. Validation of a measure of patients’ trust in their physician: the Trust in Physician Scale. Med Care 1999;37:510-17.
7. Safran DG, Taira DA, Rogers WH, Kosinski M, Ware JE, Tarlov AR. Linking primary care performance to outcomes of care. J Fam Pract 1998;47:213-20.
8. Creed WED, Miles R. Trust in organizations. In: Kramer RM, Tyler TR, eds. Trust in organizations: frontiers of theory and research. Thousand Oaks, Calif: Sage Publications; 1996;26-27.
9. Thom DH. Training physicians to increase patient trust. J Eval Clin Pract 2000;6:249-55.
10. McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994;32:40-66.
11. Davis AR, Ware JE. GHAA’s consumer satisfaction survey. Washington, DC: Group Health Association of America; 1991.
12. American Board of Internal Medicine. Guide to awareness and evaluation of humanistic qualities in the internist. Philadelphia, Pa: American Board of Internal Medicine; 1992.
13. Anderson LA, Dedrick RF. Development of the trust in physician scale: a measure to assess interpersonal trust in patient-physician relationships. Psych Rep 1990;67:1091-100.
14. Brody DS, Miller SM, Lerman CE, Smith DG, Caputo GC. Patient perception of involvement in medial care: relationship to illness attitudes and outcomes. J Gen Intern Med 1989;506-11.
15. Hall JA, Roter DL, Katz NR. Meta-analysis of correlates of provider behavior in medical encounters. Med Care 1988;26:657-75.
16. Lochman JE. Factors related to patients’ satisfaction with their medical care. J Comm Health 1983;9:91-109.
17. DiMatteo MR, Hays R. The significance of patients’ perceptions of physician conduct. J Comm Health 1980;6:18-34.
Three Questions Can Detect Hazardous Drinkers
STUDY DESIGN: Cross-sectional survey.
POPULATION: Patients waiting for care at 12 primary care sites in western Pennsylvania from October 1995 to December 1997.
OUTCOMES MEASURED: Sensitivity, specificity, likelihood ratios, and predictive values for the AUDIT, AUDIT-C, and AUDIT-3.
RESULTS: A total of 13,438 patients were surveyed. Compared with a quantity-frequency definition of hazardous drinking (Ž16 drinks/week for men and Ž12 drinks/week for women), the AUDIT, AUDIT-C, and AUDIT-3 had areas under the receiver-operating characteristic curves (AUROC) of 0.940, 0.949, and 0.871, respectively. The AUROCs of the AUDIT and AUDIT-C were significantly different (P=.004). The AUROCs of the AUDIT-C (P <.001) and AUDIT (P <.001) were significantly larger than the AUDIT-3. When compared with a positive AUDIT score of 8 or higher, the AUDIT-C (score Ž3) and the AUDIT-3 (score Ž1) were 94.9% and 99.6% sensitive and 68.8% and 51.1% specific in detecting individuals as hazardous drinkers.
CONCLUSIONS: In a large primary care sample, a 3-question version of the AUDIT identified hazardous drinkers as well as the full AUDIT when such drinkers were defined by quantity-frequency criterion. This version of the AUDIT may be useful as an initial screen for assessing hazardous drinking behavior.
Hazardous drinkers consume enough alcohol to be at risk for adverse consequences but do not meet criteria for alcohol abuse or dependence. They are, however, are at risk for more harmful alcohol abuse.1-5 Such drinking behavior has been defined by quantity and frequency criteria.6 It is estimated that up to 20% of primary care patients are at least hazardous drinkers.7-9 Effective interventions to reduce alcohol consumption exist in primary care settings, so it is important for care providers to reliably and efficiently identify patients who are hazardous drinkers.1,10,11 Traditionally,12-14 care providers are poor at identifying such drinkers, and as many as 72% escape their detection.15-17 This ineffectiveness may be because of a lack of brief and simple questions that aid in patient identification.18-20
Formal screening instruments have been promoted to aid in identification of patients with alcohol problems. The Alcohol Use Disorders Identification Test (AUDIT) was developed by the World Health Organization (WHO) and consists of 2 distinct instruments: a 10-item AUDIT core questionnaire and a clinical screening procedure.1,9,21 The AUDIT core questions can detect hazardous drinkers and have been used alone as a screening instrument.22 The AUDIT questions address intake, dependence, and adverse consequences of drinking,23 emphasize drinking in the past year,5,24 and are indifferent to sex or ethnicity.4,25 It is most useful at detecting drinkers who do not meet criteria for alcohol abuse or dependence.26 Because of its ability to detect less severe alcohol drinkers, the AUDIT seems to have practical value in primary care settings.4,15,21,26
Because of trends toward shorter patient visits, the 10-question AUDIT may be too lengthy to be clinically useful in primary care settings.5,27,28 The shorter CAGE questionnaire, therefore, is often recommended for use in limited time situations.18,20 However, although the CAGE is a valuable tool for identifying alcohol abuse and dependence, it is not as useful for identifying less serious behaviors, such as hazardous drinking.5,28-32
A shorter version of the AUDIT may prove beneficial for use by the busy physician for identifying hazardous drinking behavior. The AUDIT-C (consisting of the first 3 questions of the AUDIT) was shown to be as effective as the full AUDIT in detecting hazardous drinking in a population of veterans.33 Also, the AUDIT-3 (the third question of the AUDIT) may be effective for identifying hazardous drinkers.5,34
We investigated the performance of the AUDIT, AUDIT-C, and AUDIT-3 in detecting such drinkers in a large primary care sample. We also compared the AUDIT-C and the AUDIT-3 to the full AUDIT. We hypothesized that the abbreviated instruments would be comparable with the AUDIT for detecting hazardous drinkers as defined by a quantity-frequency standard.
Methods
Design
We based our study on screening data obtained as part of a large randomized clinical trial of brief interventions for hazardous drinkers (the Early Lifestyle Modification [ELM] Study). Screening forms were administered at 12 primary care sites in the western Pennsylvania area from October 1995 to December 1997. The institutional review board at the University of Pittsburgh and equivalent review groups from each primary care setting approved the ELM study and screening protocol.
Setting
The 12 primary care sites included a Veterans Affairs Medical Center internal medicine clinic, a university-based internal medicine clinic, 2 university-affiliated community care clinics, 3 health maintenance organization clinics, 3 university-affiliated family medicine clinics, and 2 private practice family medicine clinics. All clinics were staffed by physicians. The Veterans Administration and university-based clinics had internal medicine residents participating in patient care. Also, physician assistants and nurse practitioners were involved with primary care at some sites.
Patients
Patients were eligible for the study if they were in the waiting rooms of one of the clinic sites during the screening period, were approached by a research assistant in the waiting room, and agreed to answer the screening questionnaire. Screening of eligible patients occurred from October 1995 to December 1997.
Screening Instruments
In the primary care clinics, patients self-administered an 8-page survey consisting of questions about lifestyle habits. Research assistants approached as many patients in the waiting rooms of the primary care sites as possible. The survey included questions about stress management (8 questions), smoking habits (8), the AUDIT (10), and quantity-frequency questions (3). In initial surveys, just before the alcohol-related questions patients were instructed not to answer alcohol questions if they responded “never” to the following question: “How often do you have a drink containing alcohol (for example: beer, wine, wine coolers, sherry, gin, vodka, or other hard liquor)?” This question was removed in later surveys, because it limited the number of people responding to the alcohol instruments under comparison. We also asked for categorical responses to questions about age, sex, education background, race, marital status, and occupation. We did not compensate patients for completing this questionnaire, and their responses were anonymous.
Alcohol Screening Instruments
The AUDIT consists of 10 questions Figure 1. The AUDIT-C includes the following first 3 questions of the AUDIT: How often do you have a drink containing alcohol? How many drinks containing alcohol do you have on a typical day when you are drinking? How often do you have 6 or more drinks on one occasion? The AUDIT-3 is the third question alone. Each individual question is scored from 0 to 4 points on a Likert scale, with higher numbers indicating more severe drinking behavior. For AUDIT questions with only 3 possible answers (questions 9 and 10), the scores were 0, 2, and 4. Thus, the range of possible scores on the AUDIT are from 0 to 40, the AUDIT-C from 0 to 12, and the AUDIT-3 from 0 to 4.
Our quantity-frequency questionnaire consisted of the questions: “If you drink, how many days per week do you have a drink?” (answers: 1 through 7); “If you drink less than once a week, how many days per month do you drink?” (answers: 1 or less, 2, 3, or 4 or more); and “How many drinks containing alcohol do you have on a typical day when you are drinking?” (answers: 1 through 10 or more). The number of average drinks per week was computed for analysis, and if the answer was “or more” we used the maximum number indicated in the question (4 or 10).
Outcome Measures and Analysis
The main outcome measures concerned the accuracy of the full AUDIT, AUDITC, and AUDIT-3. For comparative purposes, patients with missing responses were not used in subsequent statistical analyses.
The AUDIT, AUDIT-C, and AUDIT-3 were compared with a hazardous drinking criterion defined by the quantity-frequency questions. This criterion was 16 or more drinks per week for men and 12 or more drinks per week for women.6 Clinically, quantity-frequency assessment is often used to determine hazardous drinking behavior.16 The area under the receiver-operating characteristic curve (AUROC) was used to compare each instrument’s diagnostic ability.35 AUROC is an indication of the ability of a test to discriminate between false positives and false negatives. A score approaching 1 will be more sensitive and specific over a range of cutoff points than a score of 0.5, which is a nondiscriminating test. To calculate and compare the AUROCs we used published standards developed by Hanley and McNeil36,37 for curves derived by same cases. We performed chi-square analysis on comparisons of categorical data.
We then compared the AUDIT-C and AUDIT-3 with a score of the full AUDIT as a criterion for hazardous drinking. An AUDIT result of 8 or higher was accepted as a criterion for hazardous drinking. We measured the sensitivity and specificity of the AUDIT-C, using a cutoff score of 3 or higher and AUDIT-3 cutoff score of 1 or higher compared with the criterion of hazardous drinking defined by the full AUDIT.38
Results
At least 1 question on the screening survey was answered by 13,438 patients. Overall, 13,198 (98%) either answered that they currently drink or never drank alcohol. Not all questions were answered by each individual. Of patients who indicated that they currently drink alcohol, the full AUDIT was completed by 7035 (52%). The AUDIT-C was completed by 7190 (54%) patients, and the AUDIT-3 was answered by 7303 (54%) patients. Both the entire AUDIT and quantity-frequency questions were answered by 6954 (52%) patients. Of the 13,438 patients, 36% indicated no alcohol consumption. The majority of the surveyed sample were men, white, married, employed, high school graduates, and younger than 60 years Table 1.
AUDITs Compared with Quantity-Frequency Criterion
Table 2 compares the likelihood of hazardous alcohol use, defined by a quantity-frequency criterion, associated with each score of the AUDIT, AUDIT-C, and AUDIT-3. It is important to note that at each score the true positives of screening are only persons identified at that score. For example, at an AUDIT score of 7, 53 of 754 hazardous drinkers were identified with the resulting likelihood ratio (3.0) and predictive value (26.9%).
For comparisons of the AUDIT, AUDIT-C, and AUDIT-3 at identifying hazardous drinkers who scored at or greater than a minimal cutoff, the sensitivity and specificity compared with a quantity-frequency criterion is shown in Table 3. For cut-point values of an AUDIT score of 8 or higher, the sensitivity of the AUDIT was 76%. Similarly, the sensitivities of the AUDIT-C (with score Ž3) and AUDIT-3 (with score Ž1) were 99.6% and 89.1%, as sensitive as the quantity-frequency questions in detecting these patients. Specificity of the AUDIT, AUDIT-C, and AUDIT-3 at these cutoff values was 92%, 48%, and 65%, respectively.
AUROCs were constructed from all cut-point values Figure 2. Computation of the AUROC indicates the effectiveness of the instrument to discriminate hazardous drinkers over a range of AUDIT scores. The AUROCs for the AUDIT, AUDIT-C, and AUDIT-3 were significantly more discriminating than the line of identity (AUROC=0.5). The AUROC of the AUDIT was significantly different from the AUDIT-C (z=2.69; P=.004). The AUDIT-3 AUROC was significantly different than the AUDIT (z=10.03; P <.001) and AUDIT-C (z=12.69; P <.001).
Abbreviated AUDITs Compared with Full AUDIT Criterion
The full AUDIT is often used as a standard to assess hazardous drinking. We compared the abbreviated instruments to the full AUDIT, with a positive score of 8 or higher as a criterion for such drinking. The AUDIT-C (score (3) and AUDIT-3 (score Ž1) were 94.9% and 99.9% as sensitive and 68.8% and 51.1% as specific as the full AUDIT in obtaining a positive score. We also determined the performance of the AUDIT-3 when compared with a reference standard of a positive AUDIT-C. The AUDIT-3 (score Ž1) was 69% sensitive and 95% specific as the AUDIT-C (score Ž3) at identifying hazardous drinkers (data not shown).
Discussion
We evaluated the performance of the AUDIT and abbreviated AUDIT instruments to detect hazardous drinking in a large multisite primary care sample. The abbreviated forms of the AUDIT were as effective as the AUDIT at identifying hazardous drinkers. Compared with quantity-frequency questions, the AUDIT and AUDIT-C were superior at identifying hazardous drinkers than the AUDIT-3. The abbreviated forms of the AUDIT were as sensitive as the full AUDIT at detecting hazardous drinkers when using standard cutoff values for hazardous drinking.
As with the 4-item CAGE questionnaire for alcohol dependence, a 1- or 3-item AUDIT instrument may increase care providers’ recognition of hazardous drinkers. Providers do not routinely ask standard alcohol questions, are particularly poor at identifying hazardous drinkers, and do not enter patients into alcohol treatment.16 Therefore, providing clinicians with a few easily remembered questions to determine hazardous drinking behavior would be beneficial.39 A short questionnaire would be simple to administer and applicable in a wide variety of practice settings. A positive response would increase suspicion regarding hazardous or abusive drinking behavior and prompt additional questions about patients’ alcohol use.18,40 For example, care providers could use the AUDIT-3 or AUDIT-C (to detect at least hazardous drinking), then administer the questionnaire (to detect abuse and dependence), if the patient’s response was positive.
It is important to realize that the AUDIT and its abbreviated forms are only sensitive to detect hazardous drinking, not to specifically assign patients’ drinking habits as hazardous only. The AUDIT was originally designed to distinguish a person with hazardous drinking from one with nonhazardous drinking.26 As such, this instrument may not be specific enough to distinguish hazardous drinkers from others with severe alcohol behaviors; people who score positive may qualify for alcohol abuse and dependence. For less risky alcohol behaviors, it is more important for a health care provider to identify all hazardous drinkers (true positives), at the risk of falsely identifying a person who may not have this behavior (false positives).18 Therefore, when screening to establish a threshold level of treatment intervention, screening instruments should maximize sensitivity, even at the expense of low specificity.
In our sample, the AUDIT (score Ž8), AUDIT-C (score Ž3), and AUDIT-3 (score Ž1) were as sensitive as quantity-frequency questions in detecting hazardous drinking. This increase in sensitivity of the AUDIT-C and AUDIT-3 is likely related to the consumption focus of these questions. The AUDIT-C consists of quantity and frequency type questions, and the third question is specific for quantity of drinking at one session. However, the performance of AUDIT instruments in our study is comparable and confirms results found in a study by Bush and colleagues,34 even though the studies used different criteria for assessment of abbreviated instruments.
If the AUDIT is used as a standard to detect hazardous drinkers, would the AUDIT-C or AUDIT-3 identify the same patients as the full AUDIT? Using cutoff points for the AUDIT-C of 3 or higher and an AUDIT-3 score of 1 or higher, these instruments were 99.7% and 98.3% as sensitive as the full AUDIT. As expected, specificity is much less for both abbreviated instruments. However, the high sensitivity suggests a clinical utility for these abbreviated instruments. It is unlikely that by asking the 3 AUDIT-C questions or the single AUDIT-3 question, a primary care provider will miss identification of a person who is at least a hazardous drinker.
Limitations
There is no gold standard to identify hazardous drinkers.41 The definition of what level of drinking constitutes that label is controversial, and providers do not routinely ask standard questions about drinking behavior.16 Our criterion, the quantity-frequency questions, may be considered a poor standard to compare survey instruments.7,42 In research, quantity-frequency consumption questions are helpful in specific identification of hazardous drinkers when a cutoff value is defined.5 However, patients may prefer not to answer questions about quantity or frequency of alcohol use and may not respond consistently to heterogeneous provider questions. Therefore, quantity-frequency questions may be useful as a standard to compare similar instruments such as the AUDIT and its abbreviations, although they may not be particularly effective in clinical practice.
Not all surveyed individuals completed the full AUDIT instrument. This was primarily because patients who answered “never” to our initial question regarding any alcohol use did not proceed to the AUDIT. Before completion of the study, we eliminated this question from the survey. However, it is not known whether patients who answered “never” to the initial question, were, in fact, drinkers. Consistency response bias may also have occurred as patients may have wished to answer similar items similarly. In addition, the similarity of the AUDIT-C and AUDIT-3 to quantity-frequency questions suggests that our sensitivity analysis is perhaps only the upper bounds of the briefer instruments.
We derived the abbreviated tests directly from the AUDIT, thus no assessment of the instruments out of the context of the full AUDIT was performed. Independent testing of abbreviated AUDIT instruments is needed. Recruitment was conducted by research assistants who solicited and provided forms to patients in the waiting rooms of primary care clinics. This convenience sample may have led to a selection bias in obtaining survey data. Patient recall bias may also have affected survey answers as patients may have had difficulty answering quantity and frequency questions accurately (an advantage of the AUDIT over quantity-frequency type questions). Also, our study investigates identification of individuals who are at least hazardous drinkers, but may also be abusive or dependent. We did not study the instruments’ ability to distinguish between hazardous drinking and abuse or dependence.
Conclusions
Our results confirm that the AUDIT-C and AUDIT-3 are useful screening tests for hazardous drinking. Because treatment of such drinkers can be effective, identifying people with less severe alcohol problems is crucial and an important public health initiative.21 Abbreviated instruments identify hazardous drinkers quickly, efficiently, and effectively, and may encourage early treatment to prevent the occurrence of alcohol-related consequences, abuse, or dependence. We recommend using the AUDIT-C and CAGE as brief screening instruments for hazardous drinking and alcohol abuse and dependence. This approach warrants further investigation.
Acknowledgments
Our work was supported by a grant to Dr Maisto from the National Institute of Alcohol Abuse and Alcoholism (AA10291). Dr Gordon is supported by a faculty development grant in general internal medicine from the VA Pittsburgh Healthcare System and the VISN 4 Mental Illness Research, Education, and Clinical Center. Dr Kraemer is supported by a Mentored Clinical Scientist Development Award from the National Institute of Alcohol Abuse and Alcoholism (AA00235). Dr J. Conigliaro is supported by a Career Development Award from the HSR & D Service, Department of Veterans Affairs (CD-97324-A) and is a generalist physician faculty scholar of the Robert Wood Johnson Foundation (#031500). We thank the ELM research study staff, Monica O’Connor (the ELM project coordinator), and all the patients who participated in the ELM study.
1. A cross-national trial of brief interventions with heavy drinkers: WHO Brief Intervention Study Group. Am J Public Health 1996;86:948-55.
2. Babor TF, de la Fuente JR, Saunders J, Grant M. The Alcohol Use Disorders Identification Test: guidelines for use in primary health care. Geneva, Switzerland: World Health Organization; 1989.
3. American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC: American Psychiatric Association; 1995.
4. Allen JP, Litten RZ, Fertig JB, Babor T. A review of research on the Alcohol Use Disorders Identification Test (AUDIT). Alcoholism 1997;21:613-19.
5. Bohn MJ, Babor TF, Kranzler HR. The Alcohol Use Disorders Identification Test (AUDIT): validation of a screening instrument for use in medical settings. J Studies Alcohol 1995;56:423-32.
6. Sanchez-Craig M, Wilkinson DA, Davila R. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
7. Bradley KA. Screening and diagnosis of alcoholism in the primary care setting. West J Med 1992;156:166-71.
8. Schorling JB, Klas PT, Willems JP, Everett AS. Addressing alcohol use among primary care patients: differences between family medicine and internal medicine residents. J Gen Intern Med 1994;9:248-54.
9. Saunders JB, Aasland OG, Amundsen A, Grant M. Alcohol consumption and related problems among primary health care patients: WHO collaborative project on early detection of persons with harmful alcohol consumption—I. Addiction 1993;88:349-62.
10. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
11. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers: a randomized controlled trial in community-based primary care practices. JAMA 1997;277:1039-45.
12. Kristenson H, Ohlin H, Hulten-Nosslin MB, Trell E, Hood B. Identification and intervention of heavy drinking in middle-aged men: results and follow-up of 24-60 months of long-term study with randomized controls. Alcoholism 1983;7:203-09.
13. Bien TH, Miller WR, Tonigan JS. Brief interventions for alcohol problems: a review. Addiction 1993;88:315-35.
14. Wilk AI, Jensen NM, Havighurst TC. Meta-analysis of randomized control trials addressing brief interventions in heavy alcohol drinkers. J Gen Intern Med 1997;12:274-83.
15. Conigrave KM, Saunders JB, Reznik RB. Predictive capacity of the AUDIT questionnaire for alcohol-related harm. Addiction 1995;90:1479-85.
16. Friedmann PD, McCullough D, Chin MH, Saitz R. Screening and intervention for alcohol problems: a national survey of primary care physicians and psychiatrists. J Gen Intern Med 2000;15:84-91.
17. Bowen OR, Sammons JH. The alcohol-abusing patient: a challenge to the profession. JAMA 1988;260:2267-70.
18. Allen JP, Maisto SA, Connors GJ. Self-report screening tests for alcohol problems in primary care. Arch Intern Med 1995;155:1726-30.
19. Saunders JB, Conigrave KM. Early identification of alcohol problems. CMAJ 1990;143:1060-69.
20. Mayfield D, McLeod G, Hall P. The CAGE questionnaire: validation of a new alcoholism screening instrument. Am J Psychiatry 1974;131:1121-23.
21. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcoholism 1993;28:33-42.
22. Schmidt A, Barry KL, Fleming MF. Detection of problem drinkers: the Alcohol Use Disorders Identification Test (AUDIT). South Med J 1995;88:52-59.
23. Volk RJ, Steinbauer JR, Cantor SB, Holzer CE, III. The Alcohol Use Disorders Identification Test (AUDIT) as a screen for at-risk drinking in primary care patients of different racial/ethnic backgrounds. Addiction 1997;92:197-206.
24. Kettl PA. Detecting problem drinkers in your practice. Patient Care 1997;30:27-41.
25. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
26. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
27. Foster AI, Blondell RD, Looney SW. The practicality of using the SMAST and AUDIT to screen for alcoholism among adolescents in an urban private family practice. J Kentucky Med Assoc 1997;95:105-07.
28. Bradley KA, Bush KR, McDonell MB, Malone T, Fihn SD. Screening for problem drinking: comparison of CAGE and AUDIT: Ambulatory Care Quality Improvement Project (ACQUIP). J Gen Intern Med 1998;13:379-88.
29. Morton JL, Jones TV, Manganaro MA. Performance of alcoholism screening questionnaires in elderly veterans. Am J Med 1996;101:153-59.
30. MacKenzie D, Langa A, Brown TM. Identifying hazardous or harmful alcohol use in medical admissions: a comparison of audit, cage and brief mast. Alcohol Alcoholism 1996;31:591-99.
31. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
32. Seppa K, Makela R, Sillanaukee P. Effectiveness of the Alcohol Use Disorders Identification Test in occupational health screenings. Alcoholism 1995;19:999-1003.
33. Piccinelli M, Tessari E, Bortolomasi M, et al. Efficacy of the alcohol use disorders identification test as a screening tool for hazardous alcohol intake and related disorders in primary care: a validity study. BMJ 1997;314:420-24.
34. Bush K, Kivlahan DR, McDonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Ambulatory Care Quality Improvement Project (ACQUIP). Arch Intern Med 1998;158:1789-95.
35. van Kammen DP, Kelley ME, Gurklis JA, et al. Behavioral vs biochemical prediction of clinical stability following haloperidol withdrawal in schizophrenia. Arch Gen Psychiatry 1995;52:673-78.
36. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver-operating characteristic (ROC) curve. Radiology 1982;143:29-36.
37. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839-43.
38. Conigrave KM, Hall WD, Saunders JB. The AUDIT questionnaire: choosing a cut-off score. Alcohol Use Disorder Identification Test. Addiction 1995;90:1349-56.
39. Cherpitel CJ, Clark WB. Ethnic differences in performance of screening instruments for identifying harmful drinking and alcohol dependence in the emergency room. Alcoholism 1995;19:628-34.
40. Reid MC, Fiellin DA, O’Connor PG. Hazardous and harmful alcohol consumption in primary care. Arch Intern Med 1999;159:1681-89.
41. Fink A, Hays RD, Moore AA, Beck JC. Alcohol-related problems in older persons. Determinants, consequences, and screening. Arch Intern Med 1996;156:1150-56.
42. Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. Alcohol screening questionnaires in women: a critical review. JAMA 1998;280:166-71.
STUDY DESIGN: Cross-sectional survey.
POPULATION: Patients waiting for care at 12 primary care sites in western Pennsylvania from October 1995 to December 1997.
OUTCOMES MEASURED: Sensitivity, specificity, likelihood ratios, and predictive values for the AUDIT, AUDIT-C, and AUDIT-3.
RESULTS: A total of 13,438 patients were surveyed. Compared with a quantity-frequency definition of hazardous drinking (Ž16 drinks/week for men and Ž12 drinks/week for women), the AUDIT, AUDIT-C, and AUDIT-3 had areas under the receiver-operating characteristic curves (AUROC) of 0.940, 0.949, and 0.871, respectively. The AUROCs of the AUDIT and AUDIT-C were significantly different (P=.004). The AUROCs of the AUDIT-C (P <.001) and AUDIT (P <.001) were significantly larger than the AUDIT-3. When compared with a positive AUDIT score of 8 or higher, the AUDIT-C (score Ž3) and the AUDIT-3 (score Ž1) were 94.9% and 99.6% sensitive and 68.8% and 51.1% specific in detecting individuals as hazardous drinkers.
CONCLUSIONS: In a large primary care sample, a 3-question version of the AUDIT identified hazardous drinkers as well as the full AUDIT when such drinkers were defined by quantity-frequency criterion. This version of the AUDIT may be useful as an initial screen for assessing hazardous drinking behavior.
Hazardous drinkers consume enough alcohol to be at risk for adverse consequences but do not meet criteria for alcohol abuse or dependence. They are, however, are at risk for more harmful alcohol abuse.1-5 Such drinking behavior has been defined by quantity and frequency criteria.6 It is estimated that up to 20% of primary care patients are at least hazardous drinkers.7-9 Effective interventions to reduce alcohol consumption exist in primary care settings, so it is important for care providers to reliably and efficiently identify patients who are hazardous drinkers.1,10,11 Traditionally,12-14 care providers are poor at identifying such drinkers, and as many as 72% escape their detection.15-17 This ineffectiveness may be because of a lack of brief and simple questions that aid in patient identification.18-20
Formal screening instruments have been promoted to aid in identification of patients with alcohol problems. The Alcohol Use Disorders Identification Test (AUDIT) was developed by the World Health Organization (WHO) and consists of 2 distinct instruments: a 10-item AUDIT core questionnaire and a clinical screening procedure.1,9,21 The AUDIT core questions can detect hazardous drinkers and have been used alone as a screening instrument.22 The AUDIT questions address intake, dependence, and adverse consequences of drinking,23 emphasize drinking in the past year,5,24 and are indifferent to sex or ethnicity.4,25 It is most useful at detecting drinkers who do not meet criteria for alcohol abuse or dependence.26 Because of its ability to detect less severe alcohol drinkers, the AUDIT seems to have practical value in primary care settings.4,15,21,26
Because of trends toward shorter patient visits, the 10-question AUDIT may be too lengthy to be clinically useful in primary care settings.5,27,28 The shorter CAGE questionnaire, therefore, is often recommended for use in limited time situations.18,20 However, although the CAGE is a valuable tool for identifying alcohol abuse and dependence, it is not as useful for identifying less serious behaviors, such as hazardous drinking.5,28-32
A shorter version of the AUDIT may prove beneficial for use by the busy physician for identifying hazardous drinking behavior. The AUDIT-C (consisting of the first 3 questions of the AUDIT) was shown to be as effective as the full AUDIT in detecting hazardous drinking in a population of veterans.33 Also, the AUDIT-3 (the third question of the AUDIT) may be effective for identifying hazardous drinkers.5,34
We investigated the performance of the AUDIT, AUDIT-C, and AUDIT-3 in detecting such drinkers in a large primary care sample. We also compared the AUDIT-C and the AUDIT-3 to the full AUDIT. We hypothesized that the abbreviated instruments would be comparable with the AUDIT for detecting hazardous drinkers as defined by a quantity-frequency standard.
Methods
Design
We based our study on screening data obtained as part of a large randomized clinical trial of brief interventions for hazardous drinkers (the Early Lifestyle Modification [ELM] Study). Screening forms were administered at 12 primary care sites in the western Pennsylvania area from October 1995 to December 1997. The institutional review board at the University of Pittsburgh and equivalent review groups from each primary care setting approved the ELM study and screening protocol.
Setting
The 12 primary care sites included a Veterans Affairs Medical Center internal medicine clinic, a university-based internal medicine clinic, 2 university-affiliated community care clinics, 3 health maintenance organization clinics, 3 university-affiliated family medicine clinics, and 2 private practice family medicine clinics. All clinics were staffed by physicians. The Veterans Administration and university-based clinics had internal medicine residents participating in patient care. Also, physician assistants and nurse practitioners were involved with primary care at some sites.
Patients
Patients were eligible for the study if they were in the waiting rooms of one of the clinic sites during the screening period, were approached by a research assistant in the waiting room, and agreed to answer the screening questionnaire. Screening of eligible patients occurred from October 1995 to December 1997.
Screening Instruments
In the primary care clinics, patients self-administered an 8-page survey consisting of questions about lifestyle habits. Research assistants approached as many patients in the waiting rooms of the primary care sites as possible. The survey included questions about stress management (8 questions), smoking habits (8), the AUDIT (10), and quantity-frequency questions (3). In initial surveys, just before the alcohol-related questions patients were instructed not to answer alcohol questions if they responded “never” to the following question: “How often do you have a drink containing alcohol (for example: beer, wine, wine coolers, sherry, gin, vodka, or other hard liquor)?” This question was removed in later surveys, because it limited the number of people responding to the alcohol instruments under comparison. We also asked for categorical responses to questions about age, sex, education background, race, marital status, and occupation. We did not compensate patients for completing this questionnaire, and their responses were anonymous.
Alcohol Screening Instruments
The AUDIT consists of 10 questions Figure 1. The AUDIT-C includes the following first 3 questions of the AUDIT: How often do you have a drink containing alcohol? How many drinks containing alcohol do you have on a typical day when you are drinking? How often do you have 6 or more drinks on one occasion? The AUDIT-3 is the third question alone. Each individual question is scored from 0 to 4 points on a Likert scale, with higher numbers indicating more severe drinking behavior. For AUDIT questions with only 3 possible answers (questions 9 and 10), the scores were 0, 2, and 4. Thus, the range of possible scores on the AUDIT are from 0 to 40, the AUDIT-C from 0 to 12, and the AUDIT-3 from 0 to 4.
Our quantity-frequency questionnaire consisted of the questions: “If you drink, how many days per week do you have a drink?” (answers: 1 through 7); “If you drink less than once a week, how many days per month do you drink?” (answers: 1 or less, 2, 3, or 4 or more); and “How many drinks containing alcohol do you have on a typical day when you are drinking?” (answers: 1 through 10 or more). The number of average drinks per week was computed for analysis, and if the answer was “or more” we used the maximum number indicated in the question (4 or 10).
Outcome Measures and Analysis
The main outcome measures concerned the accuracy of the full AUDIT, AUDITC, and AUDIT-3. For comparative purposes, patients with missing responses were not used in subsequent statistical analyses.
The AUDIT, AUDIT-C, and AUDIT-3 were compared with a hazardous drinking criterion defined by the quantity-frequency questions. This criterion was 16 or more drinks per week for men and 12 or more drinks per week for women.6 Clinically, quantity-frequency assessment is often used to determine hazardous drinking behavior.16 The area under the receiver-operating characteristic curve (AUROC) was used to compare each instrument’s diagnostic ability.35 AUROC is an indication of the ability of a test to discriminate between false positives and false negatives. A score approaching 1 will be more sensitive and specific over a range of cutoff points than a score of 0.5, which is a nondiscriminating test. To calculate and compare the AUROCs we used published standards developed by Hanley and McNeil36,37 for curves derived by same cases. We performed chi-square analysis on comparisons of categorical data.
We then compared the AUDIT-C and AUDIT-3 with a score of the full AUDIT as a criterion for hazardous drinking. An AUDIT result of 8 or higher was accepted as a criterion for hazardous drinking. We measured the sensitivity and specificity of the AUDIT-C, using a cutoff score of 3 or higher and AUDIT-3 cutoff score of 1 or higher compared with the criterion of hazardous drinking defined by the full AUDIT.38
Results
At least 1 question on the screening survey was answered by 13,438 patients. Overall, 13,198 (98%) either answered that they currently drink or never drank alcohol. Not all questions were answered by each individual. Of patients who indicated that they currently drink alcohol, the full AUDIT was completed by 7035 (52%). The AUDIT-C was completed by 7190 (54%) patients, and the AUDIT-3 was answered by 7303 (54%) patients. Both the entire AUDIT and quantity-frequency questions were answered by 6954 (52%) patients. Of the 13,438 patients, 36% indicated no alcohol consumption. The majority of the surveyed sample were men, white, married, employed, high school graduates, and younger than 60 years Table 1.
AUDITs Compared with Quantity-Frequency Criterion
Table 2 compares the likelihood of hazardous alcohol use, defined by a quantity-frequency criterion, associated with each score of the AUDIT, AUDIT-C, and AUDIT-3. It is important to note that at each score the true positives of screening are only persons identified at that score. For example, at an AUDIT score of 7, 53 of 754 hazardous drinkers were identified with the resulting likelihood ratio (3.0) and predictive value (26.9%).
For comparisons of the AUDIT, AUDIT-C, and AUDIT-3 at identifying hazardous drinkers who scored at or greater than a minimal cutoff, the sensitivity and specificity compared with a quantity-frequency criterion is shown in Table 3. For cut-point values of an AUDIT score of 8 or higher, the sensitivity of the AUDIT was 76%. Similarly, the sensitivities of the AUDIT-C (with score Ž3) and AUDIT-3 (with score Ž1) were 99.6% and 89.1%, as sensitive as the quantity-frequency questions in detecting these patients. Specificity of the AUDIT, AUDIT-C, and AUDIT-3 at these cutoff values was 92%, 48%, and 65%, respectively.
AUROCs were constructed from all cut-point values Figure 2. Computation of the AUROC indicates the effectiveness of the instrument to discriminate hazardous drinkers over a range of AUDIT scores. The AUROCs for the AUDIT, AUDIT-C, and AUDIT-3 were significantly more discriminating than the line of identity (AUROC=0.5). The AUROC of the AUDIT was significantly different from the AUDIT-C (z=2.69; P=.004). The AUDIT-3 AUROC was significantly different than the AUDIT (z=10.03; P <.001) and AUDIT-C (z=12.69; P <.001).
Abbreviated AUDITs Compared with Full AUDIT Criterion
The full AUDIT is often used as a standard to assess hazardous drinking. We compared the abbreviated instruments to the full AUDIT, with a positive score of 8 or higher as a criterion for such drinking. The AUDIT-C (score (3) and AUDIT-3 (score Ž1) were 94.9% and 99.9% as sensitive and 68.8% and 51.1% as specific as the full AUDIT in obtaining a positive score. We also determined the performance of the AUDIT-3 when compared with a reference standard of a positive AUDIT-C. The AUDIT-3 (score Ž1) was 69% sensitive and 95% specific as the AUDIT-C (score Ž3) at identifying hazardous drinkers (data not shown).
Discussion
We evaluated the performance of the AUDIT and abbreviated AUDIT instruments to detect hazardous drinking in a large multisite primary care sample. The abbreviated forms of the AUDIT were as effective as the AUDIT at identifying hazardous drinkers. Compared with quantity-frequency questions, the AUDIT and AUDIT-C were superior at identifying hazardous drinkers than the AUDIT-3. The abbreviated forms of the AUDIT were as sensitive as the full AUDIT at detecting hazardous drinkers when using standard cutoff values for hazardous drinking.
As with the 4-item CAGE questionnaire for alcohol dependence, a 1- or 3-item AUDIT instrument may increase care providers’ recognition of hazardous drinkers. Providers do not routinely ask standard alcohol questions, are particularly poor at identifying hazardous drinkers, and do not enter patients into alcohol treatment.16 Therefore, providing clinicians with a few easily remembered questions to determine hazardous drinking behavior would be beneficial.39 A short questionnaire would be simple to administer and applicable in a wide variety of practice settings. A positive response would increase suspicion regarding hazardous or abusive drinking behavior and prompt additional questions about patients’ alcohol use.18,40 For example, care providers could use the AUDIT-3 or AUDIT-C (to detect at least hazardous drinking), then administer the questionnaire (to detect abuse and dependence), if the patient’s response was positive.
It is important to realize that the AUDIT and its abbreviated forms are only sensitive to detect hazardous drinking, not to specifically assign patients’ drinking habits as hazardous only. The AUDIT was originally designed to distinguish a person with hazardous drinking from one with nonhazardous drinking.26 As such, this instrument may not be specific enough to distinguish hazardous drinkers from others with severe alcohol behaviors; people who score positive may qualify for alcohol abuse and dependence. For less risky alcohol behaviors, it is more important for a health care provider to identify all hazardous drinkers (true positives), at the risk of falsely identifying a person who may not have this behavior (false positives).18 Therefore, when screening to establish a threshold level of treatment intervention, screening instruments should maximize sensitivity, even at the expense of low specificity.
In our sample, the AUDIT (score Ž8), AUDIT-C (score Ž3), and AUDIT-3 (score Ž1) were as sensitive as quantity-frequency questions in detecting hazardous drinking. This increase in sensitivity of the AUDIT-C and AUDIT-3 is likely related to the consumption focus of these questions. The AUDIT-C consists of quantity and frequency type questions, and the third question is specific for quantity of drinking at one session. However, the performance of AUDIT instruments in our study is comparable and confirms results found in a study by Bush and colleagues,34 even though the studies used different criteria for assessment of abbreviated instruments.
If the AUDIT is used as a standard to detect hazardous drinkers, would the AUDIT-C or AUDIT-3 identify the same patients as the full AUDIT? Using cutoff points for the AUDIT-C of 3 or higher and an AUDIT-3 score of 1 or higher, these instruments were 99.7% and 98.3% as sensitive as the full AUDIT. As expected, specificity is much less for both abbreviated instruments. However, the high sensitivity suggests a clinical utility for these abbreviated instruments. It is unlikely that by asking the 3 AUDIT-C questions or the single AUDIT-3 question, a primary care provider will miss identification of a person who is at least a hazardous drinker.
Limitations
There is no gold standard to identify hazardous drinkers.41 The definition of what level of drinking constitutes that label is controversial, and providers do not routinely ask standard questions about drinking behavior.16 Our criterion, the quantity-frequency questions, may be considered a poor standard to compare survey instruments.7,42 In research, quantity-frequency consumption questions are helpful in specific identification of hazardous drinkers when a cutoff value is defined.5 However, patients may prefer not to answer questions about quantity or frequency of alcohol use and may not respond consistently to heterogeneous provider questions. Therefore, quantity-frequency questions may be useful as a standard to compare similar instruments such as the AUDIT and its abbreviations, although they may not be particularly effective in clinical practice.
Not all surveyed individuals completed the full AUDIT instrument. This was primarily because patients who answered “never” to our initial question regarding any alcohol use did not proceed to the AUDIT. Before completion of the study, we eliminated this question from the survey. However, it is not known whether patients who answered “never” to the initial question, were, in fact, drinkers. Consistency response bias may also have occurred as patients may have wished to answer similar items similarly. In addition, the similarity of the AUDIT-C and AUDIT-3 to quantity-frequency questions suggests that our sensitivity analysis is perhaps only the upper bounds of the briefer instruments.
We derived the abbreviated tests directly from the AUDIT, thus no assessment of the instruments out of the context of the full AUDIT was performed. Independent testing of abbreviated AUDIT instruments is needed. Recruitment was conducted by research assistants who solicited and provided forms to patients in the waiting rooms of primary care clinics. This convenience sample may have led to a selection bias in obtaining survey data. Patient recall bias may also have affected survey answers as patients may have had difficulty answering quantity and frequency questions accurately (an advantage of the AUDIT over quantity-frequency type questions). Also, our study investigates identification of individuals who are at least hazardous drinkers, but may also be abusive or dependent. We did not study the instruments’ ability to distinguish between hazardous drinking and abuse or dependence.
Conclusions
Our results confirm that the AUDIT-C and AUDIT-3 are useful screening tests for hazardous drinking. Because treatment of such drinkers can be effective, identifying people with less severe alcohol problems is crucial and an important public health initiative.21 Abbreviated instruments identify hazardous drinkers quickly, efficiently, and effectively, and may encourage early treatment to prevent the occurrence of alcohol-related consequences, abuse, or dependence. We recommend using the AUDIT-C and CAGE as brief screening instruments for hazardous drinking and alcohol abuse and dependence. This approach warrants further investigation.
Acknowledgments
Our work was supported by a grant to Dr Maisto from the National Institute of Alcohol Abuse and Alcoholism (AA10291). Dr Gordon is supported by a faculty development grant in general internal medicine from the VA Pittsburgh Healthcare System and the VISN 4 Mental Illness Research, Education, and Clinical Center. Dr Kraemer is supported by a Mentored Clinical Scientist Development Award from the National Institute of Alcohol Abuse and Alcoholism (AA00235). Dr J. Conigliaro is supported by a Career Development Award from the HSR & D Service, Department of Veterans Affairs (CD-97324-A) and is a generalist physician faculty scholar of the Robert Wood Johnson Foundation (#031500). We thank the ELM research study staff, Monica O’Connor (the ELM project coordinator), and all the patients who participated in the ELM study.
STUDY DESIGN: Cross-sectional survey.
POPULATION: Patients waiting for care at 12 primary care sites in western Pennsylvania from October 1995 to December 1997.
OUTCOMES MEASURED: Sensitivity, specificity, likelihood ratios, and predictive values for the AUDIT, AUDIT-C, and AUDIT-3.
RESULTS: A total of 13,438 patients were surveyed. Compared with a quantity-frequency definition of hazardous drinking (Ž16 drinks/week for men and Ž12 drinks/week for women), the AUDIT, AUDIT-C, and AUDIT-3 had areas under the receiver-operating characteristic curves (AUROC) of 0.940, 0.949, and 0.871, respectively. The AUROCs of the AUDIT and AUDIT-C were significantly different (P=.004). The AUROCs of the AUDIT-C (P <.001) and AUDIT (P <.001) were significantly larger than the AUDIT-3. When compared with a positive AUDIT score of 8 or higher, the AUDIT-C (score Ž3) and the AUDIT-3 (score Ž1) were 94.9% and 99.6% sensitive and 68.8% and 51.1% specific in detecting individuals as hazardous drinkers.
CONCLUSIONS: In a large primary care sample, a 3-question version of the AUDIT identified hazardous drinkers as well as the full AUDIT when such drinkers were defined by quantity-frequency criterion. This version of the AUDIT may be useful as an initial screen for assessing hazardous drinking behavior.
Hazardous drinkers consume enough alcohol to be at risk for adverse consequences but do not meet criteria for alcohol abuse or dependence. They are, however, are at risk for more harmful alcohol abuse.1-5 Such drinking behavior has been defined by quantity and frequency criteria.6 It is estimated that up to 20% of primary care patients are at least hazardous drinkers.7-9 Effective interventions to reduce alcohol consumption exist in primary care settings, so it is important for care providers to reliably and efficiently identify patients who are hazardous drinkers.1,10,11 Traditionally,12-14 care providers are poor at identifying such drinkers, and as many as 72% escape their detection.15-17 This ineffectiveness may be because of a lack of brief and simple questions that aid in patient identification.18-20
Formal screening instruments have been promoted to aid in identification of patients with alcohol problems. The Alcohol Use Disorders Identification Test (AUDIT) was developed by the World Health Organization (WHO) and consists of 2 distinct instruments: a 10-item AUDIT core questionnaire and a clinical screening procedure.1,9,21 The AUDIT core questions can detect hazardous drinkers and have been used alone as a screening instrument.22 The AUDIT questions address intake, dependence, and adverse consequences of drinking,23 emphasize drinking in the past year,5,24 and are indifferent to sex or ethnicity.4,25 It is most useful at detecting drinkers who do not meet criteria for alcohol abuse or dependence.26 Because of its ability to detect less severe alcohol drinkers, the AUDIT seems to have practical value in primary care settings.4,15,21,26
Because of trends toward shorter patient visits, the 10-question AUDIT may be too lengthy to be clinically useful in primary care settings.5,27,28 The shorter CAGE questionnaire, therefore, is often recommended for use in limited time situations.18,20 However, although the CAGE is a valuable tool for identifying alcohol abuse and dependence, it is not as useful for identifying less serious behaviors, such as hazardous drinking.5,28-32
A shorter version of the AUDIT may prove beneficial for use by the busy physician for identifying hazardous drinking behavior. The AUDIT-C (consisting of the first 3 questions of the AUDIT) was shown to be as effective as the full AUDIT in detecting hazardous drinking in a population of veterans.33 Also, the AUDIT-3 (the third question of the AUDIT) may be effective for identifying hazardous drinkers.5,34
We investigated the performance of the AUDIT, AUDIT-C, and AUDIT-3 in detecting such drinkers in a large primary care sample. We also compared the AUDIT-C and the AUDIT-3 to the full AUDIT. We hypothesized that the abbreviated instruments would be comparable with the AUDIT for detecting hazardous drinkers as defined by a quantity-frequency standard.
Methods
Design
We based our study on screening data obtained as part of a large randomized clinical trial of brief interventions for hazardous drinkers (the Early Lifestyle Modification [ELM] Study). Screening forms were administered at 12 primary care sites in the western Pennsylvania area from October 1995 to December 1997. The institutional review board at the University of Pittsburgh and equivalent review groups from each primary care setting approved the ELM study and screening protocol.
Setting
The 12 primary care sites included a Veterans Affairs Medical Center internal medicine clinic, a university-based internal medicine clinic, 2 university-affiliated community care clinics, 3 health maintenance organization clinics, 3 university-affiliated family medicine clinics, and 2 private practice family medicine clinics. All clinics were staffed by physicians. The Veterans Administration and university-based clinics had internal medicine residents participating in patient care. Also, physician assistants and nurse practitioners were involved with primary care at some sites.
Patients
Patients were eligible for the study if they were in the waiting rooms of one of the clinic sites during the screening period, were approached by a research assistant in the waiting room, and agreed to answer the screening questionnaire. Screening of eligible patients occurred from October 1995 to December 1997.
Screening Instruments
In the primary care clinics, patients self-administered an 8-page survey consisting of questions about lifestyle habits. Research assistants approached as many patients in the waiting rooms of the primary care sites as possible. The survey included questions about stress management (8 questions), smoking habits (8), the AUDIT (10), and quantity-frequency questions (3). In initial surveys, just before the alcohol-related questions patients were instructed not to answer alcohol questions if they responded “never” to the following question: “How often do you have a drink containing alcohol (for example: beer, wine, wine coolers, sherry, gin, vodka, or other hard liquor)?” This question was removed in later surveys, because it limited the number of people responding to the alcohol instruments under comparison. We also asked for categorical responses to questions about age, sex, education background, race, marital status, and occupation. We did not compensate patients for completing this questionnaire, and their responses were anonymous.
Alcohol Screening Instruments
The AUDIT consists of 10 questions Figure 1. The AUDIT-C includes the following first 3 questions of the AUDIT: How often do you have a drink containing alcohol? How many drinks containing alcohol do you have on a typical day when you are drinking? How often do you have 6 or more drinks on one occasion? The AUDIT-3 is the third question alone. Each individual question is scored from 0 to 4 points on a Likert scale, with higher numbers indicating more severe drinking behavior. For AUDIT questions with only 3 possible answers (questions 9 and 10), the scores were 0, 2, and 4. Thus, the range of possible scores on the AUDIT are from 0 to 40, the AUDIT-C from 0 to 12, and the AUDIT-3 from 0 to 4.
Our quantity-frequency questionnaire consisted of the questions: “If you drink, how many days per week do you have a drink?” (answers: 1 through 7); “If you drink less than once a week, how many days per month do you drink?” (answers: 1 or less, 2, 3, or 4 or more); and “How many drinks containing alcohol do you have on a typical day when you are drinking?” (answers: 1 through 10 or more). The number of average drinks per week was computed for analysis, and if the answer was “or more” we used the maximum number indicated in the question (4 or 10).
Outcome Measures and Analysis
The main outcome measures concerned the accuracy of the full AUDIT, AUDITC, and AUDIT-3. For comparative purposes, patients with missing responses were not used in subsequent statistical analyses.
The AUDIT, AUDIT-C, and AUDIT-3 were compared with a hazardous drinking criterion defined by the quantity-frequency questions. This criterion was 16 or more drinks per week for men and 12 or more drinks per week for women.6 Clinically, quantity-frequency assessment is often used to determine hazardous drinking behavior.16 The area under the receiver-operating characteristic curve (AUROC) was used to compare each instrument’s diagnostic ability.35 AUROC is an indication of the ability of a test to discriminate between false positives and false negatives. A score approaching 1 will be more sensitive and specific over a range of cutoff points than a score of 0.5, which is a nondiscriminating test. To calculate and compare the AUROCs we used published standards developed by Hanley and McNeil36,37 for curves derived by same cases. We performed chi-square analysis on comparisons of categorical data.
We then compared the AUDIT-C and AUDIT-3 with a score of the full AUDIT as a criterion for hazardous drinking. An AUDIT result of 8 or higher was accepted as a criterion for hazardous drinking. We measured the sensitivity and specificity of the AUDIT-C, using a cutoff score of 3 or higher and AUDIT-3 cutoff score of 1 or higher compared with the criterion of hazardous drinking defined by the full AUDIT.38
Results
At least 1 question on the screening survey was answered by 13,438 patients. Overall, 13,198 (98%) either answered that they currently drink or never drank alcohol. Not all questions were answered by each individual. Of patients who indicated that they currently drink alcohol, the full AUDIT was completed by 7035 (52%). The AUDIT-C was completed by 7190 (54%) patients, and the AUDIT-3 was answered by 7303 (54%) patients. Both the entire AUDIT and quantity-frequency questions were answered by 6954 (52%) patients. Of the 13,438 patients, 36% indicated no alcohol consumption. The majority of the surveyed sample were men, white, married, employed, high school graduates, and younger than 60 years Table 1.
AUDITs Compared with Quantity-Frequency Criterion
Table 2 compares the likelihood of hazardous alcohol use, defined by a quantity-frequency criterion, associated with each score of the AUDIT, AUDIT-C, and AUDIT-3. It is important to note that at each score the true positives of screening are only persons identified at that score. For example, at an AUDIT score of 7, 53 of 754 hazardous drinkers were identified with the resulting likelihood ratio (3.0) and predictive value (26.9%).
For comparisons of the AUDIT, AUDIT-C, and AUDIT-3 at identifying hazardous drinkers who scored at or greater than a minimal cutoff, the sensitivity and specificity compared with a quantity-frequency criterion is shown in Table 3. For cut-point values of an AUDIT score of 8 or higher, the sensitivity of the AUDIT was 76%. Similarly, the sensitivities of the AUDIT-C (with score Ž3) and AUDIT-3 (with score Ž1) were 99.6% and 89.1%, as sensitive as the quantity-frequency questions in detecting these patients. Specificity of the AUDIT, AUDIT-C, and AUDIT-3 at these cutoff values was 92%, 48%, and 65%, respectively.
AUROCs were constructed from all cut-point values Figure 2. Computation of the AUROC indicates the effectiveness of the instrument to discriminate hazardous drinkers over a range of AUDIT scores. The AUROCs for the AUDIT, AUDIT-C, and AUDIT-3 were significantly more discriminating than the line of identity (AUROC=0.5). The AUROC of the AUDIT was significantly different from the AUDIT-C (z=2.69; P=.004). The AUDIT-3 AUROC was significantly different than the AUDIT (z=10.03; P <.001) and AUDIT-C (z=12.69; P <.001).
Abbreviated AUDITs Compared with Full AUDIT Criterion
The full AUDIT is often used as a standard to assess hazardous drinking. We compared the abbreviated instruments to the full AUDIT, with a positive score of 8 or higher as a criterion for such drinking. The AUDIT-C (score (3) and AUDIT-3 (score Ž1) were 94.9% and 99.9% as sensitive and 68.8% and 51.1% as specific as the full AUDIT in obtaining a positive score. We also determined the performance of the AUDIT-3 when compared with a reference standard of a positive AUDIT-C. The AUDIT-3 (score Ž1) was 69% sensitive and 95% specific as the AUDIT-C (score Ž3) at identifying hazardous drinkers (data not shown).
Discussion
We evaluated the performance of the AUDIT and abbreviated AUDIT instruments to detect hazardous drinking in a large multisite primary care sample. The abbreviated forms of the AUDIT were as effective as the AUDIT at identifying hazardous drinkers. Compared with quantity-frequency questions, the AUDIT and AUDIT-C were superior at identifying hazardous drinkers than the AUDIT-3. The abbreviated forms of the AUDIT were as sensitive as the full AUDIT at detecting hazardous drinkers when using standard cutoff values for hazardous drinking.
As with the 4-item CAGE questionnaire for alcohol dependence, a 1- or 3-item AUDIT instrument may increase care providers’ recognition of hazardous drinkers. Providers do not routinely ask standard alcohol questions, are particularly poor at identifying hazardous drinkers, and do not enter patients into alcohol treatment.16 Therefore, providing clinicians with a few easily remembered questions to determine hazardous drinking behavior would be beneficial.39 A short questionnaire would be simple to administer and applicable in a wide variety of practice settings. A positive response would increase suspicion regarding hazardous or abusive drinking behavior and prompt additional questions about patients’ alcohol use.18,40 For example, care providers could use the AUDIT-3 or AUDIT-C (to detect at least hazardous drinking), then administer the questionnaire (to detect abuse and dependence), if the patient’s response was positive.
It is important to realize that the AUDIT and its abbreviated forms are only sensitive to detect hazardous drinking, not to specifically assign patients’ drinking habits as hazardous only. The AUDIT was originally designed to distinguish a person with hazardous drinking from one with nonhazardous drinking.26 As such, this instrument may not be specific enough to distinguish hazardous drinkers from others with severe alcohol behaviors; people who score positive may qualify for alcohol abuse and dependence. For less risky alcohol behaviors, it is more important for a health care provider to identify all hazardous drinkers (true positives), at the risk of falsely identifying a person who may not have this behavior (false positives).18 Therefore, when screening to establish a threshold level of treatment intervention, screening instruments should maximize sensitivity, even at the expense of low specificity.
In our sample, the AUDIT (score Ž8), AUDIT-C (score Ž3), and AUDIT-3 (score Ž1) were as sensitive as quantity-frequency questions in detecting hazardous drinking. This increase in sensitivity of the AUDIT-C and AUDIT-3 is likely related to the consumption focus of these questions. The AUDIT-C consists of quantity and frequency type questions, and the third question is specific for quantity of drinking at one session. However, the performance of AUDIT instruments in our study is comparable and confirms results found in a study by Bush and colleagues,34 even though the studies used different criteria for assessment of abbreviated instruments.
If the AUDIT is used as a standard to detect hazardous drinkers, would the AUDIT-C or AUDIT-3 identify the same patients as the full AUDIT? Using cutoff points for the AUDIT-C of 3 or higher and an AUDIT-3 score of 1 or higher, these instruments were 99.7% and 98.3% as sensitive as the full AUDIT. As expected, specificity is much less for both abbreviated instruments. However, the high sensitivity suggests a clinical utility for these abbreviated instruments. It is unlikely that by asking the 3 AUDIT-C questions or the single AUDIT-3 question, a primary care provider will miss identification of a person who is at least a hazardous drinker.
Limitations
There is no gold standard to identify hazardous drinkers.41 The definition of what level of drinking constitutes that label is controversial, and providers do not routinely ask standard questions about drinking behavior.16 Our criterion, the quantity-frequency questions, may be considered a poor standard to compare survey instruments.7,42 In research, quantity-frequency consumption questions are helpful in specific identification of hazardous drinkers when a cutoff value is defined.5 However, patients may prefer not to answer questions about quantity or frequency of alcohol use and may not respond consistently to heterogeneous provider questions. Therefore, quantity-frequency questions may be useful as a standard to compare similar instruments such as the AUDIT and its abbreviations, although they may not be particularly effective in clinical practice.
Not all surveyed individuals completed the full AUDIT instrument. This was primarily because patients who answered “never” to our initial question regarding any alcohol use did not proceed to the AUDIT. Before completion of the study, we eliminated this question from the survey. However, it is not known whether patients who answered “never” to the initial question, were, in fact, drinkers. Consistency response bias may also have occurred as patients may have wished to answer similar items similarly. In addition, the similarity of the AUDIT-C and AUDIT-3 to quantity-frequency questions suggests that our sensitivity analysis is perhaps only the upper bounds of the briefer instruments.
We derived the abbreviated tests directly from the AUDIT, thus no assessment of the instruments out of the context of the full AUDIT was performed. Independent testing of abbreviated AUDIT instruments is needed. Recruitment was conducted by research assistants who solicited and provided forms to patients in the waiting rooms of primary care clinics. This convenience sample may have led to a selection bias in obtaining survey data. Patient recall bias may also have affected survey answers as patients may have had difficulty answering quantity and frequency questions accurately (an advantage of the AUDIT over quantity-frequency type questions). Also, our study investigates identification of individuals who are at least hazardous drinkers, but may also be abusive or dependent. We did not study the instruments’ ability to distinguish between hazardous drinking and abuse or dependence.
Conclusions
Our results confirm that the AUDIT-C and AUDIT-3 are useful screening tests for hazardous drinking. Because treatment of such drinkers can be effective, identifying people with less severe alcohol problems is crucial and an important public health initiative.21 Abbreviated instruments identify hazardous drinkers quickly, efficiently, and effectively, and may encourage early treatment to prevent the occurrence of alcohol-related consequences, abuse, or dependence. We recommend using the AUDIT-C and CAGE as brief screening instruments for hazardous drinking and alcohol abuse and dependence. This approach warrants further investigation.
Acknowledgments
Our work was supported by a grant to Dr Maisto from the National Institute of Alcohol Abuse and Alcoholism (AA10291). Dr Gordon is supported by a faculty development grant in general internal medicine from the VA Pittsburgh Healthcare System and the VISN 4 Mental Illness Research, Education, and Clinical Center. Dr Kraemer is supported by a Mentored Clinical Scientist Development Award from the National Institute of Alcohol Abuse and Alcoholism (AA00235). Dr J. Conigliaro is supported by a Career Development Award from the HSR & D Service, Department of Veterans Affairs (CD-97324-A) and is a generalist physician faculty scholar of the Robert Wood Johnson Foundation (#031500). We thank the ELM research study staff, Monica O’Connor (the ELM project coordinator), and all the patients who participated in the ELM study.
1. A cross-national trial of brief interventions with heavy drinkers: WHO Brief Intervention Study Group. Am J Public Health 1996;86:948-55.
2. Babor TF, de la Fuente JR, Saunders J, Grant M. The Alcohol Use Disorders Identification Test: guidelines for use in primary health care. Geneva, Switzerland: World Health Organization; 1989.
3. American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC: American Psychiatric Association; 1995.
4. Allen JP, Litten RZ, Fertig JB, Babor T. A review of research on the Alcohol Use Disorders Identification Test (AUDIT). Alcoholism 1997;21:613-19.
5. Bohn MJ, Babor TF, Kranzler HR. The Alcohol Use Disorders Identification Test (AUDIT): validation of a screening instrument for use in medical settings. J Studies Alcohol 1995;56:423-32.
6. Sanchez-Craig M, Wilkinson DA, Davila R. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
7. Bradley KA. Screening and diagnosis of alcoholism in the primary care setting. West J Med 1992;156:166-71.
8. Schorling JB, Klas PT, Willems JP, Everett AS. Addressing alcohol use among primary care patients: differences between family medicine and internal medicine residents. J Gen Intern Med 1994;9:248-54.
9. Saunders JB, Aasland OG, Amundsen A, Grant M. Alcohol consumption and related problems among primary health care patients: WHO collaborative project on early detection of persons with harmful alcohol consumption—I. Addiction 1993;88:349-62.
10. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
11. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers: a randomized controlled trial in community-based primary care practices. JAMA 1997;277:1039-45.
12. Kristenson H, Ohlin H, Hulten-Nosslin MB, Trell E, Hood B. Identification and intervention of heavy drinking in middle-aged men: results and follow-up of 24-60 months of long-term study with randomized controls. Alcoholism 1983;7:203-09.
13. Bien TH, Miller WR, Tonigan JS. Brief interventions for alcohol problems: a review. Addiction 1993;88:315-35.
14. Wilk AI, Jensen NM, Havighurst TC. Meta-analysis of randomized control trials addressing brief interventions in heavy alcohol drinkers. J Gen Intern Med 1997;12:274-83.
15. Conigrave KM, Saunders JB, Reznik RB. Predictive capacity of the AUDIT questionnaire for alcohol-related harm. Addiction 1995;90:1479-85.
16. Friedmann PD, McCullough D, Chin MH, Saitz R. Screening and intervention for alcohol problems: a national survey of primary care physicians and psychiatrists. J Gen Intern Med 2000;15:84-91.
17. Bowen OR, Sammons JH. The alcohol-abusing patient: a challenge to the profession. JAMA 1988;260:2267-70.
18. Allen JP, Maisto SA, Connors GJ. Self-report screening tests for alcohol problems in primary care. Arch Intern Med 1995;155:1726-30.
19. Saunders JB, Conigrave KM. Early identification of alcohol problems. CMAJ 1990;143:1060-69.
20. Mayfield D, McLeod G, Hall P. The CAGE questionnaire: validation of a new alcoholism screening instrument. Am J Psychiatry 1974;131:1121-23.
21. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcoholism 1993;28:33-42.
22. Schmidt A, Barry KL, Fleming MF. Detection of problem drinkers: the Alcohol Use Disorders Identification Test (AUDIT). South Med J 1995;88:52-59.
23. Volk RJ, Steinbauer JR, Cantor SB, Holzer CE, III. The Alcohol Use Disorders Identification Test (AUDIT) as a screen for at-risk drinking in primary care patients of different racial/ethnic backgrounds. Addiction 1997;92:197-206.
24. Kettl PA. Detecting problem drinkers in your practice. Patient Care 1997;30:27-41.
25. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
26. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
27. Foster AI, Blondell RD, Looney SW. The practicality of using the SMAST and AUDIT to screen for alcoholism among adolescents in an urban private family practice. J Kentucky Med Assoc 1997;95:105-07.
28. Bradley KA, Bush KR, McDonell MB, Malone T, Fihn SD. Screening for problem drinking: comparison of CAGE and AUDIT: Ambulatory Care Quality Improvement Project (ACQUIP). J Gen Intern Med 1998;13:379-88.
29. Morton JL, Jones TV, Manganaro MA. Performance of alcoholism screening questionnaires in elderly veterans. Am J Med 1996;101:153-59.
30. MacKenzie D, Langa A, Brown TM. Identifying hazardous or harmful alcohol use in medical admissions: a comparison of audit, cage and brief mast. Alcohol Alcoholism 1996;31:591-99.
31. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
32. Seppa K, Makela R, Sillanaukee P. Effectiveness of the Alcohol Use Disorders Identification Test in occupational health screenings. Alcoholism 1995;19:999-1003.
33. Piccinelli M, Tessari E, Bortolomasi M, et al. Efficacy of the alcohol use disorders identification test as a screening tool for hazardous alcohol intake and related disorders in primary care: a validity study. BMJ 1997;314:420-24.
34. Bush K, Kivlahan DR, McDonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Ambulatory Care Quality Improvement Project (ACQUIP). Arch Intern Med 1998;158:1789-95.
35. van Kammen DP, Kelley ME, Gurklis JA, et al. Behavioral vs biochemical prediction of clinical stability following haloperidol withdrawal in schizophrenia. Arch Gen Psychiatry 1995;52:673-78.
36. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver-operating characteristic (ROC) curve. Radiology 1982;143:29-36.
37. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839-43.
38. Conigrave KM, Hall WD, Saunders JB. The AUDIT questionnaire: choosing a cut-off score. Alcohol Use Disorder Identification Test. Addiction 1995;90:1349-56.
39. Cherpitel CJ, Clark WB. Ethnic differences in performance of screening instruments for identifying harmful drinking and alcohol dependence in the emergency room. Alcoholism 1995;19:628-34.
40. Reid MC, Fiellin DA, O’Connor PG. Hazardous and harmful alcohol consumption in primary care. Arch Intern Med 1999;159:1681-89.
41. Fink A, Hays RD, Moore AA, Beck JC. Alcohol-related problems in older persons. Determinants, consequences, and screening. Arch Intern Med 1996;156:1150-56.
42. Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. Alcohol screening questionnaires in women: a critical review. JAMA 1998;280:166-71.
1. A cross-national trial of brief interventions with heavy drinkers: WHO Brief Intervention Study Group. Am J Public Health 1996;86:948-55.
2. Babor TF, de la Fuente JR, Saunders J, Grant M. The Alcohol Use Disorders Identification Test: guidelines for use in primary health care. Geneva, Switzerland: World Health Organization; 1989.
3. American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC: American Psychiatric Association; 1995.
4. Allen JP, Litten RZ, Fertig JB, Babor T. A review of research on the Alcohol Use Disorders Identification Test (AUDIT). Alcoholism 1997;21:613-19.
5. Bohn MJ, Babor TF, Kranzler HR. The Alcohol Use Disorders Identification Test (AUDIT): validation of a screening instrument for use in medical settings. J Studies Alcohol 1995;56:423-32.
6. Sanchez-Craig M, Wilkinson DA, Davila R. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
7. Bradley KA. Screening and diagnosis of alcoholism in the primary care setting. West J Med 1992;156:166-71.
8. Schorling JB, Klas PT, Willems JP, Everett AS. Addressing alcohol use among primary care patients: differences between family medicine and internal medicine residents. J Gen Intern Med 1994;9:248-54.
9. Saunders JB, Aasland OG, Amundsen A, Grant M. Alcohol consumption and related problems among primary health care patients: WHO collaborative project on early detection of persons with harmful alcohol consumption—I. Addiction 1993;88:349-62.
10. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
11. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers: a randomized controlled trial in community-based primary care practices. JAMA 1997;277:1039-45.
12. Kristenson H, Ohlin H, Hulten-Nosslin MB, Trell E, Hood B. Identification and intervention of heavy drinking in middle-aged men: results and follow-up of 24-60 months of long-term study with randomized controls. Alcoholism 1983;7:203-09.
13. Bien TH, Miller WR, Tonigan JS. Brief interventions for alcohol problems: a review. Addiction 1993;88:315-35.
14. Wilk AI, Jensen NM, Havighurst TC. Meta-analysis of randomized control trials addressing brief interventions in heavy alcohol drinkers. J Gen Intern Med 1997;12:274-83.
15. Conigrave KM, Saunders JB, Reznik RB. Predictive capacity of the AUDIT questionnaire for alcohol-related harm. Addiction 1995;90:1479-85.
16. Friedmann PD, McCullough D, Chin MH, Saitz R. Screening and intervention for alcohol problems: a national survey of primary care physicians and psychiatrists. J Gen Intern Med 2000;15:84-91.
17. Bowen OR, Sammons JH. The alcohol-abusing patient: a challenge to the profession. JAMA 1988;260:2267-70.
18. Allen JP, Maisto SA, Connors GJ. Self-report screening tests for alcohol problems in primary care. Arch Intern Med 1995;155:1726-30.
19. Saunders JB, Conigrave KM. Early identification of alcohol problems. CMAJ 1990;143:1060-69.
20. Mayfield D, McLeod G, Hall P. The CAGE questionnaire: validation of a new alcoholism screening instrument. Am J Psychiatry 1974;131:1121-23.
21. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcoholism 1993;28:33-42.
22. Schmidt A, Barry KL, Fleming MF. Detection of problem drinkers: the Alcohol Use Disorders Identification Test (AUDIT). South Med J 1995;88:52-59.
23. Volk RJ, Steinbauer JR, Cantor SB, Holzer CE, III. The Alcohol Use Disorders Identification Test (AUDIT) as a screen for at-risk drinking in primary care patients of different racial/ethnic backgrounds. Addiction 1997;92:197-206.
24. Kettl PA. Detecting problem drinkers in your practice. Patient Care 1997;30:27-41.
25. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
26. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
27. Foster AI, Blondell RD, Looney SW. The practicality of using the SMAST and AUDIT to screen for alcoholism among adolescents in an urban private family practice. J Kentucky Med Assoc 1997;95:105-07.
28. Bradley KA, Bush KR, McDonell MB, Malone T, Fihn SD. Screening for problem drinking: comparison of CAGE and AUDIT: Ambulatory Care Quality Improvement Project (ACQUIP). J Gen Intern Med 1998;13:379-88.
29. Morton JL, Jones TV, Manganaro MA. Performance of alcoholism screening questionnaires in elderly veterans. Am J Med 1996;101:153-59.
30. MacKenzie D, Langa A, Brown TM. Identifying hazardous or harmful alcohol use in medical admissions: a comparison of audit, cage and brief mast. Alcohol Alcoholism 1996;31:591-99.
31. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
32. Seppa K, Makela R, Sillanaukee P. Effectiveness of the Alcohol Use Disorders Identification Test in occupational health screenings. Alcoholism 1995;19:999-1003.
33. Piccinelli M, Tessari E, Bortolomasi M, et al. Efficacy of the alcohol use disorders identification test as a screening tool for hazardous alcohol intake and related disorders in primary care: a validity study. BMJ 1997;314:420-24.
34. Bush K, Kivlahan DR, McDonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Ambulatory Care Quality Improvement Project (ACQUIP). Arch Intern Med 1998;158:1789-95.
35. van Kammen DP, Kelley ME, Gurklis JA, et al. Behavioral vs biochemical prediction of clinical stability following haloperidol withdrawal in schizophrenia. Arch Gen Psychiatry 1995;52:673-78.
36. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver-operating characteristic (ROC) curve. Radiology 1982;143:29-36.
37. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839-43.
38. Conigrave KM, Hall WD, Saunders JB. The AUDIT questionnaire: choosing a cut-off score. Alcohol Use Disorder Identification Test. Addiction 1995;90:1349-56.
39. Cherpitel CJ, Clark WB. Ethnic differences in performance of screening instruments for identifying harmful drinking and alcohol dependence in the emergency room. Alcoholism 1995;19:628-34.
40. Reid MC, Fiellin DA, O’Connor PG. Hazardous and harmful alcohol consumption in primary care. Arch Intern Med 1999;159:1681-89.
41. Fink A, Hays RD, Moore AA, Beck JC. Alcohol-related problems in older persons. Determinants, consequences, and screening. Arch Intern Med 1996;156:1150-56.
42. Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. Alcohol screening questionnaires in women: a critical review. JAMA 1998;280:166-71.
Validation of a Single Screening Question for Problem Drinking
STUDY DESIGN: Cross-sectional study.
POPULATION: Adult patients presenting to 3 emergency departments in Boone County, Missouri, for care within 48 hours of an injury.
OUTCOMES MEASURED: The answers to the question were coded as never, more than 12 months ago, 3 to 12 months ago, and within the past 3 months. Problematic drinking was defined as either hazardous drinking (identified by a 29-day retrospective interview) or a past-year alcohol use disorder (defined by questions from the Diagnostic Interview Schedule).
RESULTS: There was a 70% participation rate. Of 2517 interviewed patients: 29% were hazardous drinkers; 20% had a past-year alcohol use disorder; and 35% had either or both. Considering “within the last 3 months” as positive, the sensitivity of the single question was 86%, and the specificity was 86%. In men (n=1432), sensitivity and specificity were 88% and 81%; in women, 83% and 91%. Using the 4 answer options for the question, the area under the receiver-operating characteristic curve was 0.90. Controlling for age, sex, tobacco use, injury severity, and breath alcohol level in logistic regression models changed the findings minimally.
CONCLUSIONS: A single question about the last episode of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinking and alcohol use disorders.
Problematic use of alcohol is a major source of morbidity1 and mortality.2 It is also common: 7.4% of adults in the United States meet criteria for a past-year alcohol use disorder,3 and 15.8% have had at least one episode of heavy drinking in the past month.4 In randomized clinical trials, brief interventions in primary care settings have helped 20% of hazardous and harmful drinkers reduce alcohol consumption to safe levels.5,6
In medical practice, however, most individuals who engage in hazardous or harmful drinking go undetected,7 despite the availability of effective screening instruments.8 Major barriers to implementing screening for alcohol problems include a lack of physician familiarity with screening methods7,9 and a lack of time.10 A simple time-efficient instrument could increase the frequency of screening, which could reduce the burden of alcohol-related harm in our society.
Previous research in primary care showed that a single question about the last occasion of heavy drinking had a sensitivity of 62% and a specificity of 93% for detecting patients with either a past-year alcohol use disorder or hazardous drinking in the past month.11 In that study, the question was presented in written form: “On any single occasion during the past 3 months, have you had more than 5 drinks containing alcohol?” In this report, we examine the clinical utility of a revision of that question presented orally with different threshold values for men and women as a screening instrument for problem drinking among injured patients presenting to emergency departments.
Methods
Data for this report were taken from a study of alcohol and injury that was approved by the institutional review board of the University Health Science Center. Interviews were conducted with patients presenting for care to 1 of the 3 hospital emergency departments in Boone County, Missouri, within 48 hours of an acute injury. Patients were eligible for the study if they were aged 18 years or older, able to converse in English, cognitively intact, not in police custody, and if the injury did not occur in a controlled environment (eg, a nursing home or jail where access to alcohol is limited). Research staff trained in the use of the structured interview provided equal coverage of each day of the week and hour of the day. Interviews were conducted from February 1998 through March 2000.
Instruments
After obtaining informed consent, the first question of the structured interview was about tobacco use. The second was, “When was the last time you had more than X drinks in 1 day?” with X = 5 for men and X=4 for women. The answers were coded as never, more than 12 months ago, 3 to 12 months ago, and within the last 3 months. The threshold values were based on empirical work12 and guidelines;13 we set them one drink higher than in the guidelines to balance sensitivity and specificity based on our previous work.11
We defined problem drinking as either past-month hazardous drinking or a past-year alcohol use disorder. We defined hazardous drinking as consumption of more than 4 drinks in 1 day or 14 in 1 week for men, more than 3 in 1 day or 7 in 1 week for women. One drink in the United States contains approximately 11.5 g of ethanol, the amount in 12 oz of beer, 5 oz of wine, or 1.5 oz of liquor. Data for this assessment were from a 29-day retrospective calendar-based review of day-by-day consumption.14 We defined alcohol use disorders (ie, alcohol abuse or dependence) according to the criteria in the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV),15 using the alcohol questions in the structured Diagnostic Interview Schedule (DIS).16 Participants were given a breath alcohol test at the end of the interview using the Alco-Sensor IV model breathalyzer (Intoximeters, Inc, St. Louis, Mo).
Patients
Of 3616 injured persons presenting for care to one of the participating emergency centers during times when research staff were present, 579 were excluded, because the injury occurred more than 48 hours before, because of mental status changes (chronic or acute), or because the injury occurred in a controlled environment. Of 3037 eligible patients on covered shifts, 12.2% declined to participate, and 15.4% were missed, either because their injuries were severe enough to preclude an interview in the emergency department (8.1%) or because research staff were busy with other interviews (7.3%); 2199 persons were interviewed during covered emergency department shifts from February 1998 through January 2000.
Some injured patients were missed because of the severity of their injuries. Therefore, we recruited additional patients who had been admitted to the hospital during times not covered in the emergency department by study staff. These interviews were conducted from June 1999 through March 2000. A total of 618 were identified: 52 refused (8.4%); 69 were too severely injured (11.2%); and 139 were missed (22.5%), leaving 358 who were interviewed.
We combined these 2 groups (from covered and noncovered emergency department shifts) and excluded those from noncovered shifts who had minimal effect on the results presented here. Of those 2557 interviews, 40 were with patients who had been injured and interviewed before. We excluded these from analysis, leaving 2517 individual patients. Table 1 shows basic demographic data and prevalence of alcohol problems. Seven patients (0.3% of interviews) did not answer the single problem-drinking screening question, and 13 (0.5%) did not complete the calendar-based review of recent drinking.
Statistical Analysis
We used bivariate analysis to calculate sensitivity and specificity, and confidence interval (CI) analysis17 to determine 95% CIs. We used the c statistic from logistic regression to calculate the area under the receiver-operating characteristic (ROC) curve18 in bivariate and multivariate models. We used the formula in Hanley and McNeil19 to calculate 95% CIs around the area under the ROC curve.*
Results
For detecting problem drinking, the single question (with “within the last 3 months” considered positive) had a sensitivity of 86% and a specificity of 86% Table 2. Sensitivity was higher in men; specificity was higher in women. The question was better in detecting hazardous drinking than alcohol use disorder and more effective in whites than in African Americans. In this study with a prevalence of problem drinking of 35%, the positive and negative predictive values were 77% and 92%. In a clinical setting with a prevalence of 15% (the national prevalence of past-month heavy drinking4), positive and negative predictive values would be 52% and 97%. Likelihood ratios18 are provided in Table 3.
Breath or blood alcohol levels were obtained for 2335 patients; 257 had some alcohol detected, and 139 had a level of 0.22 mmol/L (0.1 g/dL) or greater. Of 1493 patients without hazardous drinking or an alcohol use disorder only 49 had any alcohol detected, a specificity of 96%. However, the sensitivity of alcohol testing was only 24% (198 with a positive breath or blood alcohol level out of 842 with problem drinking). Interestingly, 27 of the patients with an alcohol level higher than 0.22 mmol/L (indicative of problem drinking) were negative by the criterion standards (22 of the 27 screened positive with the single question).
Tobacco use is common among patients with alcohol use disorders20 and was associated with problem drinking in our study. Almost half the patients in this study used tobacco in some form; 1013 were cigarette smokers, and 139 used only other forms of tobacco. Current tobacco use had a sensitivity 65% and a specificity of 64% in identifying problem drinking.
The single screening question had 4 answer options (never, >12 months ago, 3-12 months, and within the last 3 months). Using the 3 cut points defined by those answer options, the area under the ROC curve21 for identifying problem drinking was 0.90 (95% CI, 0.88-0.91). With a past-year alcohol use disorder as the diagnostic criterion, the area under the ROC curve was 0.81 (95% CI, 0.79-0.83). Entering the single question, sex, age (continuous or ordinal), and tobacco use as independent variables, the area under the ROC curve increased minimally to 0.91 Figure 1. Entering race/ethnicity, injury severity score,22 or alcohol level into the model had essentially no effect. Considering only current drinkers (n=1535), the single question had a sensitivity for detecting problem drinking of 86% and a specificity of 69%, and the area under the ROC curve was 0.79 (95% CI, 0.77-0.82).
Discussion
A single question about recent heavy drinking has clinically useful sensitivity and specificity for detecting recent hazardous drinking and current alcohol use disorders. In response to “When was the last time you had more than X drinks in one day?”, an answer within the past 3 months has a sensitivity and a specificity of 86%. An answer of “never” with a sensitivity of more than 99% essentially rules out problem drinking. The question has less utility in African American patients, but works equally well in women and men.
Previous studies have explored the utility of a single question about the frequency of drinking 5 or more drinks at one time. Drinking 5 or more drinks on one occasion at least once in the past year had a sensitivity of 86% and a specificity of 63% in detecting past-year alcohol dependence.23 In an emergency center–based study with a positive screening result defined as heavy drinking at least monthly, sensitivity was 58% and specificity 85%.24 The authors of these reports concluded that a single question about the frequency of heavy drinking was inadequate in screening for problem drinking. However, the question we used inquired about the last occasion of heavy drinking not the usual frequency and used different thresholds for men and women, narrowing the sex differences in sensitivity and specificity found in previous work using a single threshold.11
The single question we used compares favorably with the CAGE25 and the Alcohol Use Disorders Identification Test (AUDIT).26 For the single question, the area under the ROC curve is 0.90 for problem drinking and 0.81 for alcohol use disorders only. With the AUDIT,27 the area under the ROC curve was 0.83 to 0.90 in a variety of settings in detecting alcohol use disorders26,28,29 (hazardous drinkers not included) and 0.88 in detecting problem drinking.29 With the CAGE questions the area under the ROC curve was 0.89 in one study25 and 0.68 to 0.88 in a variety of sex-ethnic subgroups in another26 for detecting alcohol use disorders only.
The criterion standards we used are reliable and valid.14,30 Although they were negative in 27 patients with alcohol levels of 0.22 mmol/L or greater in whom intoxication may have limited the validity of self-report, 22 of those 27 patients had a positive screening result with the single question.
Limitations
Several limitations of our study should be noted. The interviewer was aware of the patient’s response on the screening question, and this may have led to ascertainment bias. However, the interview was the same for all participants whether their screen produced positive or negative results, and the DIS is a fully structured interview, minimizing this potential bias.
The study is limited by its nonparticipation rate of 30%. Of eligible injured patients from covered emergency department shifts 15.4% were missed, either because interviewers were busy with other participants or because the patient had severe injuries that precluded interview. Another 12.2% declined to participate. The utility of the screening question in these patients is unknown.
The single question did not perform as well for African Americans as it did for whites; its sensitivity and specificity, however, are clinically useful in both groups. Consistent with the population of central Missouri, the study included few members of other ethnic groups, and the question’s utility in those groups needs study.
The generalizability of our findings may be limited. The study included only injured patients presenting for care to emergency centers in central Missouri. However, alcohol-related injury is a major source of morbidity and mortality especially among young adults, and brief interventions in emergency centers are efficacious.31 We have little reason to expect the question used in this study would be less effective in other clinical settings.
We examined only one approach to screening. However, the screening question used in this study was selected in advance and remained unchanged throughout data collection. Some screening instruments32 developed post hoc from a longer list of questions have been validated in separate samples,24 but others33,34 have not.35,36
Given the morbidity associated with hazardous drinking and the efficacy of brief interventions, screening should include hazardous drinking as well as alcohol use disorders, which we did in this study. Studies of other screening instruments have generally tested their utility only in detecting alcohol use disorders, in which the single question was less specific. Although our study did not address the issue, the single question probably would not identify patients in long-term recovery from a past alcohol use disorder, especially those abstinent for more than a year.
Conclusions
Further study is needed to determine whether clinicians will find this single question easier to apply and whether problem drinkers find it more engaging than alternative screening instruments. The goal of screening is to identify problem drinkers and to engage them effectively in the process of change.37 Different screening approaches—and different ways of following up positive screen results—may vary in their ability to help problem drinkers move toward change.
A single question about the last occasion of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinkers and current alcohol use disorders. The question is simple enough that it could be used, as is a question about tobacco use,38 as part of the taking of routine vital signs. If the question used in our study were adopted, it could efficiently identify which patients require further discussion about their drinking habits, with positive and negative predictive values of 77% and 92% in this emergency center population, approximately 52% and 97%, respectively, in a typical population-based sample. That in turn could lead to more frequent use of effective brief intervention and referral strategies, thereby potentially decreasing society’s burden of alcohol-related harm.
Acknowledgments
Our study was supported by a grant from the National Institute on Alcohol Abuse and Alcoholism (R01 AA11078). During pilot work, Dr Vinson was supported by a Generalist Physician Faculty Scholars Program grant from the Robert Wood Johnson Foundation. The work has also been supported in part by a research center grant from the American Academy of Family Physicians and by the Opal Lewis Fund for Alcohol Research. Data were collected by Deborah Bailey; Ciprian Crismaru, MD; Amelia Devera-Sales, MD; Kari Gilmore; Indira Gujral; Carol Reidinger; and Carey Smith. Data collection was assisted by the following medical students: Stephen Griffith, Darin Lee, Greg Morlin, Rebecca Shumate, Lindsey Thornton, and Aneesh Tosh. Data management was provided by Darla Horman, MA; Robin Kruse, PhD.; and Sandra Taylor. Logistic regression analyses were performed by Jim Hewett, MS; John Hewett, PhD; and Fan Liu. We also thank the personnel of the emergency centers of the University of Missouri, Boone Hospital Center, and Columbia Regional Hospital.
Related resources
FOR PATIENTS:
- “How to Cut Down on Your Drinking,” a brochure for patients from the National Institute on Alcohol Abuse and Alcoholism (NIAAA)http://www.niaaa.nih.gov/publications/handout.htm
- The Center for Substance Abuse Treatment—includes searchable database of substance abuse treatment programs with maps, phone numbers, types of insurance coverage accepted by each center, and other information. Part of the federal Substance Abuse and Mental Health Services Administration.http://www.samhsa.gov/centers/csat/csat.html
- Alcoholics Anonymous http://www.aa.org
- Moderation Management (seeks to help problem drinkers control their consumption)http://www.moderation.org
- Join Together, an advocacy group addressing alcohol and other drug abuse, violence, etc. Resources for individuals, parents, other groups. http://www.jointogether.org/sa/
FOR FAMILY PHYSICIANS/RESEARCHERS:
- “The Physicians’Guide to Helping Patients with Alcohol Problems,” also published by NIAAA.http://www.niaaa.nih.gov/publications/physicn.htm
- The Research Society on Alcoholism http://www.rsa.am/
- Association for Medical Education and Research in Substance Abusehttp://www.amersa.org/
- American Society of Addiction Medicinehttp://www.asam.org/
1. Chou SP, Grant BF, Dawson DA. Medical consequences of alcohol consumption—United States, 1992. Alcohol Clin Exp Res 1996;20:1423-29.
2. McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA 1993;270:2207-12.
3. Grant BF, Harford TC, Dawson DA, Chou P, Dufour M, Pickering R. Prevalence of DSM-IV alcohol abuse and dependence: United States, 1992. Alcohol Health Res World 1994;18:243-48.
4. Behavioral Risk Factor Surveillance System Online Prevalence Data. 1999. Available at: www2.cdc.gov/nccdphp/brfss/index.asp. Accessed December 5, 2000.
5. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
6. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers. JAMA 1997;277:1039-44.
7. Bradley KA, Curry SJ, Koepsell TD, Larson EB. Primary and secondary prevention of alcohol problems: US internist attitudes and practices. J Gen Intern Med 1995;10:67-72.
8. Fiellin DA, Reid MC, O’Connor PG. Screening for alcohol problems in primary care. Arch Intern Med 2000;160:1977-89.
9. Wenrich MD, Paauw DS, Carline JD, Curtis JR, Ramsey PG. Do primary care physicians screen patients about alcohol intake using the CAGE questions? J Gen Intern Med 1995;10:631-34.
10. Stange KC, Flocke SA, Goodwin MS. Opportunistic preventive services delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
11. Taj N, Devera-Sales A, Vinson DC. Screening for problem drinking: does a single question work? J Fam Pract 1998;46:328-35.
12. Sanchez-Craig M. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
13. National Institute on Alcohol Abuse and Alcoholism. The physician’s guide to helping patients with alcohol problems. Bethesda, Md: National Institutes of Health; 1995. Available atsilk.nih.gov/silk/niaaa1/publication/physicn.htm. Accessed December 5, 2000.
14. Sobell LC, Sobell MB. Timeline follow-back: a technique for assessing self reported alcohol consumption. In: Litten R, Allen J, eds. Measuring alcohol consumption: psychosocial and biochemical methods. Totowa, NJ: Humana Press; 1992;41-72.
15. American Psychiatric Association. Substance-related disorders. In: Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: American Psychiatric Association; 1994;175-204.
16. Robins L, Cottler L, Bucholz K, Compton W. Diagnostic Interview Schedule for DSM-IV. St. Louis, Mo: Washington University School of Medicine, Department of Psychiatry; 1996.
17. Gardner SB, Winter PD, Gardner MJ. Confidence interval analysis. London, England: BMJ; 1989.
18. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature: III. How to use an article about a diagnostic test: B. What are the results and will they help me in caring for my patients? JAMA 1994;271:703-07.
19. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36.
20. Fleming MF, Manwell LB, Barry KL, Johnson K. At-risk drinking in an HMO primary care sample: prevalence and health policy implications. Am J Public Health 1998;88:90-93.
21. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. 2nd ed. Boston, Mass: Little, Brown and Company; 1991.
22. Abbreviated injury scale: 1990 revision. Des Plaines, Ill: Associaton for the Advancement of Automotive Medicine; 1990.
23. Dawson DA. Consumption indicators of alcohol dependence. Addiction 1994;89:345-50.
24. Cherpitel CJ. Differences in performance of screening instruments for problem drinking among blacks, whites and Hispanics in an emergency room population. J Stud Alcohol 1998;59:420-26.
25. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
26. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
27. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
28. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcohol 1993;28:33-42.
29. Bush K, Kivlahan DR, Mcdonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Arch Intern Med 1998;158:1789-95.
30. Hasin D, Paykin A. Alcohol dependence and abuse diagnoses: concurrent validity in a nationally representative sample. Alcohol Clin Exp Res 1999;23:144-50.
31. Gentilello LM, Rivara FP, Donovan DM, et al. Alcohol interventions in a trauma center as a means of reducing the risk of injury recurrence. Ann Surg 1999;230:473-80.
32. Cherpitel CJ. Screening for alcohol problems in the emergency room: a rapid alcohol problems screen. Drug Alcohol Depend 1995;40:133-37.
33. Cyr M, Wartman S. The effectiveness of routine screening questions in the detection of alcoholism. JAMA 1988;259:51-54.
34. Skinner HA, Holt S, Schuller R, Roy J, Israel Y. Identification of alcohol abuse using laboratory tests and a history of trauma. Ann Intern Med 1984;101:847-51.
35. Schorling JB, Willems JP, Klas PT. Identifying problem drinkers: lack of sensitivity of the two-question drinking test. Am J Med 1995;98:232-36.
36. Rumpf HJ, Hapke U, Erfurth A, John U. Screening questionnaires in the detection of hazardous alcohol consumption in the general hospital: direct or disguised assessment? J Stud Alcohol 1998;59:698-703.
37. Rollnick S, Mason P, Butler C. Health behavior change: a guide for practitioners. Edinburgh, Scotland: Churchill Livingstone; 1999.
38. Ahluwalia JS, Gibson CA, Kenney RE, Wallace DD, Resnicow K. Smoking status as a vital sign. J Gen Intern Med 1999;14:402-08.
39. Rehm J, Sempos CT. Alcohol consumption and all-cause mortality. Addiction 1995;90:471-80.
STUDY DESIGN: Cross-sectional study.
POPULATION: Adult patients presenting to 3 emergency departments in Boone County, Missouri, for care within 48 hours of an injury.
OUTCOMES MEASURED: The answers to the question were coded as never, more than 12 months ago, 3 to 12 months ago, and within the past 3 months. Problematic drinking was defined as either hazardous drinking (identified by a 29-day retrospective interview) or a past-year alcohol use disorder (defined by questions from the Diagnostic Interview Schedule).
RESULTS: There was a 70% participation rate. Of 2517 interviewed patients: 29% were hazardous drinkers; 20% had a past-year alcohol use disorder; and 35% had either or both. Considering “within the last 3 months” as positive, the sensitivity of the single question was 86%, and the specificity was 86%. In men (n=1432), sensitivity and specificity were 88% and 81%; in women, 83% and 91%. Using the 4 answer options for the question, the area under the receiver-operating characteristic curve was 0.90. Controlling for age, sex, tobacco use, injury severity, and breath alcohol level in logistic regression models changed the findings minimally.
CONCLUSIONS: A single question about the last episode of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinking and alcohol use disorders.
Problematic use of alcohol is a major source of morbidity1 and mortality.2 It is also common: 7.4% of adults in the United States meet criteria for a past-year alcohol use disorder,3 and 15.8% have had at least one episode of heavy drinking in the past month.4 In randomized clinical trials, brief interventions in primary care settings have helped 20% of hazardous and harmful drinkers reduce alcohol consumption to safe levels.5,6
In medical practice, however, most individuals who engage in hazardous or harmful drinking go undetected,7 despite the availability of effective screening instruments.8 Major barriers to implementing screening for alcohol problems include a lack of physician familiarity with screening methods7,9 and a lack of time.10 A simple time-efficient instrument could increase the frequency of screening, which could reduce the burden of alcohol-related harm in our society.
Previous research in primary care showed that a single question about the last occasion of heavy drinking had a sensitivity of 62% and a specificity of 93% for detecting patients with either a past-year alcohol use disorder or hazardous drinking in the past month.11 In that study, the question was presented in written form: “On any single occasion during the past 3 months, have you had more than 5 drinks containing alcohol?” In this report, we examine the clinical utility of a revision of that question presented orally with different threshold values for men and women as a screening instrument for problem drinking among injured patients presenting to emergency departments.
Methods
Data for this report were taken from a study of alcohol and injury that was approved by the institutional review board of the University Health Science Center. Interviews were conducted with patients presenting for care to 1 of the 3 hospital emergency departments in Boone County, Missouri, within 48 hours of an acute injury. Patients were eligible for the study if they were aged 18 years or older, able to converse in English, cognitively intact, not in police custody, and if the injury did not occur in a controlled environment (eg, a nursing home or jail where access to alcohol is limited). Research staff trained in the use of the structured interview provided equal coverage of each day of the week and hour of the day. Interviews were conducted from February 1998 through March 2000.
Instruments
After obtaining informed consent, the first question of the structured interview was about tobacco use. The second was, “When was the last time you had more than X drinks in 1 day?” with X = 5 for men and X=4 for women. The answers were coded as never, more than 12 months ago, 3 to 12 months ago, and within the last 3 months. The threshold values were based on empirical work12 and guidelines;13 we set them one drink higher than in the guidelines to balance sensitivity and specificity based on our previous work.11
We defined problem drinking as either past-month hazardous drinking or a past-year alcohol use disorder. We defined hazardous drinking as consumption of more than 4 drinks in 1 day or 14 in 1 week for men, more than 3 in 1 day or 7 in 1 week for women. One drink in the United States contains approximately 11.5 g of ethanol, the amount in 12 oz of beer, 5 oz of wine, or 1.5 oz of liquor. Data for this assessment were from a 29-day retrospective calendar-based review of day-by-day consumption.14 We defined alcohol use disorders (ie, alcohol abuse or dependence) according to the criteria in the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV),15 using the alcohol questions in the structured Diagnostic Interview Schedule (DIS).16 Participants were given a breath alcohol test at the end of the interview using the Alco-Sensor IV model breathalyzer (Intoximeters, Inc, St. Louis, Mo).
Patients
Of 3616 injured persons presenting for care to one of the participating emergency centers during times when research staff were present, 579 were excluded, because the injury occurred more than 48 hours before, because of mental status changes (chronic or acute), or because the injury occurred in a controlled environment. Of 3037 eligible patients on covered shifts, 12.2% declined to participate, and 15.4% were missed, either because their injuries were severe enough to preclude an interview in the emergency department (8.1%) or because research staff were busy with other interviews (7.3%); 2199 persons were interviewed during covered emergency department shifts from February 1998 through January 2000.
Some injured patients were missed because of the severity of their injuries. Therefore, we recruited additional patients who had been admitted to the hospital during times not covered in the emergency department by study staff. These interviews were conducted from June 1999 through March 2000. A total of 618 were identified: 52 refused (8.4%); 69 were too severely injured (11.2%); and 139 were missed (22.5%), leaving 358 who were interviewed.
We combined these 2 groups (from covered and noncovered emergency department shifts) and excluded those from noncovered shifts who had minimal effect on the results presented here. Of those 2557 interviews, 40 were with patients who had been injured and interviewed before. We excluded these from analysis, leaving 2517 individual patients. Table 1 shows basic demographic data and prevalence of alcohol problems. Seven patients (0.3% of interviews) did not answer the single problem-drinking screening question, and 13 (0.5%) did not complete the calendar-based review of recent drinking.
Statistical Analysis
We used bivariate analysis to calculate sensitivity and specificity, and confidence interval (CI) analysis17 to determine 95% CIs. We used the c statistic from logistic regression to calculate the area under the receiver-operating characteristic (ROC) curve18 in bivariate and multivariate models. We used the formula in Hanley and McNeil19 to calculate 95% CIs around the area under the ROC curve.*
Results
For detecting problem drinking, the single question (with “within the last 3 months” considered positive) had a sensitivity of 86% and a specificity of 86% Table 2. Sensitivity was higher in men; specificity was higher in women. The question was better in detecting hazardous drinking than alcohol use disorder and more effective in whites than in African Americans. In this study with a prevalence of problem drinking of 35%, the positive and negative predictive values were 77% and 92%. In a clinical setting with a prevalence of 15% (the national prevalence of past-month heavy drinking4), positive and negative predictive values would be 52% and 97%. Likelihood ratios18 are provided in Table 3.
Breath or blood alcohol levels were obtained for 2335 patients; 257 had some alcohol detected, and 139 had a level of 0.22 mmol/L (0.1 g/dL) or greater. Of 1493 patients without hazardous drinking or an alcohol use disorder only 49 had any alcohol detected, a specificity of 96%. However, the sensitivity of alcohol testing was only 24% (198 with a positive breath or blood alcohol level out of 842 with problem drinking). Interestingly, 27 of the patients with an alcohol level higher than 0.22 mmol/L (indicative of problem drinking) were negative by the criterion standards (22 of the 27 screened positive with the single question).
Tobacco use is common among patients with alcohol use disorders20 and was associated with problem drinking in our study. Almost half the patients in this study used tobacco in some form; 1013 were cigarette smokers, and 139 used only other forms of tobacco. Current tobacco use had a sensitivity 65% and a specificity of 64% in identifying problem drinking.
The single screening question had 4 answer options (never, >12 months ago, 3-12 months, and within the last 3 months). Using the 3 cut points defined by those answer options, the area under the ROC curve21 for identifying problem drinking was 0.90 (95% CI, 0.88-0.91). With a past-year alcohol use disorder as the diagnostic criterion, the area under the ROC curve was 0.81 (95% CI, 0.79-0.83). Entering the single question, sex, age (continuous or ordinal), and tobacco use as independent variables, the area under the ROC curve increased minimally to 0.91 Figure 1. Entering race/ethnicity, injury severity score,22 or alcohol level into the model had essentially no effect. Considering only current drinkers (n=1535), the single question had a sensitivity for detecting problem drinking of 86% and a specificity of 69%, and the area under the ROC curve was 0.79 (95% CI, 0.77-0.82).
Discussion
A single question about recent heavy drinking has clinically useful sensitivity and specificity for detecting recent hazardous drinking and current alcohol use disorders. In response to “When was the last time you had more than X drinks in one day?”, an answer within the past 3 months has a sensitivity and a specificity of 86%. An answer of “never” with a sensitivity of more than 99% essentially rules out problem drinking. The question has less utility in African American patients, but works equally well in women and men.
Previous studies have explored the utility of a single question about the frequency of drinking 5 or more drinks at one time. Drinking 5 or more drinks on one occasion at least once in the past year had a sensitivity of 86% and a specificity of 63% in detecting past-year alcohol dependence.23 In an emergency center–based study with a positive screening result defined as heavy drinking at least monthly, sensitivity was 58% and specificity 85%.24 The authors of these reports concluded that a single question about the frequency of heavy drinking was inadequate in screening for problem drinking. However, the question we used inquired about the last occasion of heavy drinking not the usual frequency and used different thresholds for men and women, narrowing the sex differences in sensitivity and specificity found in previous work using a single threshold.11
The single question we used compares favorably with the CAGE25 and the Alcohol Use Disorders Identification Test (AUDIT).26 For the single question, the area under the ROC curve is 0.90 for problem drinking and 0.81 for alcohol use disorders only. With the AUDIT,27 the area under the ROC curve was 0.83 to 0.90 in a variety of settings in detecting alcohol use disorders26,28,29 (hazardous drinkers not included) and 0.88 in detecting problem drinking.29 With the CAGE questions the area under the ROC curve was 0.89 in one study25 and 0.68 to 0.88 in a variety of sex-ethnic subgroups in another26 for detecting alcohol use disorders only.
The criterion standards we used are reliable and valid.14,30 Although they were negative in 27 patients with alcohol levels of 0.22 mmol/L or greater in whom intoxication may have limited the validity of self-report, 22 of those 27 patients had a positive screening result with the single question.
Limitations
Several limitations of our study should be noted. The interviewer was aware of the patient’s response on the screening question, and this may have led to ascertainment bias. However, the interview was the same for all participants whether their screen produced positive or negative results, and the DIS is a fully structured interview, minimizing this potential bias.
The study is limited by its nonparticipation rate of 30%. Of eligible injured patients from covered emergency department shifts 15.4% were missed, either because interviewers were busy with other participants or because the patient had severe injuries that precluded interview. Another 12.2% declined to participate. The utility of the screening question in these patients is unknown.
The single question did not perform as well for African Americans as it did for whites; its sensitivity and specificity, however, are clinically useful in both groups. Consistent with the population of central Missouri, the study included few members of other ethnic groups, and the question’s utility in those groups needs study.
The generalizability of our findings may be limited. The study included only injured patients presenting for care to emergency centers in central Missouri. However, alcohol-related injury is a major source of morbidity and mortality especially among young adults, and brief interventions in emergency centers are efficacious.31 We have little reason to expect the question used in this study would be less effective in other clinical settings.
We examined only one approach to screening. However, the screening question used in this study was selected in advance and remained unchanged throughout data collection. Some screening instruments32 developed post hoc from a longer list of questions have been validated in separate samples,24 but others33,34 have not.35,36
Given the morbidity associated with hazardous drinking and the efficacy of brief interventions, screening should include hazardous drinking as well as alcohol use disorders, which we did in this study. Studies of other screening instruments have generally tested their utility only in detecting alcohol use disorders, in which the single question was less specific. Although our study did not address the issue, the single question probably would not identify patients in long-term recovery from a past alcohol use disorder, especially those abstinent for more than a year.
Conclusions
Further study is needed to determine whether clinicians will find this single question easier to apply and whether problem drinkers find it more engaging than alternative screening instruments. The goal of screening is to identify problem drinkers and to engage them effectively in the process of change.37 Different screening approaches—and different ways of following up positive screen results—may vary in their ability to help problem drinkers move toward change.
A single question about the last occasion of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinkers and current alcohol use disorders. The question is simple enough that it could be used, as is a question about tobacco use,38 as part of the taking of routine vital signs. If the question used in our study were adopted, it could efficiently identify which patients require further discussion about their drinking habits, with positive and negative predictive values of 77% and 92% in this emergency center population, approximately 52% and 97%, respectively, in a typical population-based sample. That in turn could lead to more frequent use of effective brief intervention and referral strategies, thereby potentially decreasing society’s burden of alcohol-related harm.
Acknowledgments
Our study was supported by a grant from the National Institute on Alcohol Abuse and Alcoholism (R01 AA11078). During pilot work, Dr Vinson was supported by a Generalist Physician Faculty Scholars Program grant from the Robert Wood Johnson Foundation. The work has also been supported in part by a research center grant from the American Academy of Family Physicians and by the Opal Lewis Fund for Alcohol Research. Data were collected by Deborah Bailey; Ciprian Crismaru, MD; Amelia Devera-Sales, MD; Kari Gilmore; Indira Gujral; Carol Reidinger; and Carey Smith. Data collection was assisted by the following medical students: Stephen Griffith, Darin Lee, Greg Morlin, Rebecca Shumate, Lindsey Thornton, and Aneesh Tosh. Data management was provided by Darla Horman, MA; Robin Kruse, PhD.; and Sandra Taylor. Logistic regression analyses were performed by Jim Hewett, MS; John Hewett, PhD; and Fan Liu. We also thank the personnel of the emergency centers of the University of Missouri, Boone Hospital Center, and Columbia Regional Hospital.
Related resources
FOR PATIENTS:
- “How to Cut Down on Your Drinking,” a brochure for patients from the National Institute on Alcohol Abuse and Alcoholism (NIAAA)http://www.niaaa.nih.gov/publications/handout.htm
- The Center for Substance Abuse Treatment—includes searchable database of substance abuse treatment programs with maps, phone numbers, types of insurance coverage accepted by each center, and other information. Part of the federal Substance Abuse and Mental Health Services Administration.http://www.samhsa.gov/centers/csat/csat.html
- Alcoholics Anonymous http://www.aa.org
- Moderation Management (seeks to help problem drinkers control their consumption)http://www.moderation.org
- Join Together, an advocacy group addressing alcohol and other drug abuse, violence, etc. Resources for individuals, parents, other groups. http://www.jointogether.org/sa/
FOR FAMILY PHYSICIANS/RESEARCHERS:
- “The Physicians’Guide to Helping Patients with Alcohol Problems,” also published by NIAAA.http://www.niaaa.nih.gov/publications/physicn.htm
- The Research Society on Alcoholism http://www.rsa.am/
- Association for Medical Education and Research in Substance Abusehttp://www.amersa.org/
- American Society of Addiction Medicinehttp://www.asam.org/
STUDY DESIGN: Cross-sectional study.
POPULATION: Adult patients presenting to 3 emergency departments in Boone County, Missouri, for care within 48 hours of an injury.
OUTCOMES MEASURED: The answers to the question were coded as never, more than 12 months ago, 3 to 12 months ago, and within the past 3 months. Problematic drinking was defined as either hazardous drinking (identified by a 29-day retrospective interview) or a past-year alcohol use disorder (defined by questions from the Diagnostic Interview Schedule).
RESULTS: There was a 70% participation rate. Of 2517 interviewed patients: 29% were hazardous drinkers; 20% had a past-year alcohol use disorder; and 35% had either or both. Considering “within the last 3 months” as positive, the sensitivity of the single question was 86%, and the specificity was 86%. In men (n=1432), sensitivity and specificity were 88% and 81%; in women, 83% and 91%. Using the 4 answer options for the question, the area under the receiver-operating characteristic curve was 0.90. Controlling for age, sex, tobacco use, injury severity, and breath alcohol level in logistic regression models changed the findings minimally.
CONCLUSIONS: A single question about the last episode of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinking and alcohol use disorders.
Problematic use of alcohol is a major source of morbidity1 and mortality.2 It is also common: 7.4% of adults in the United States meet criteria for a past-year alcohol use disorder,3 and 15.8% have had at least one episode of heavy drinking in the past month.4 In randomized clinical trials, brief interventions in primary care settings have helped 20% of hazardous and harmful drinkers reduce alcohol consumption to safe levels.5,6
In medical practice, however, most individuals who engage in hazardous or harmful drinking go undetected,7 despite the availability of effective screening instruments.8 Major barriers to implementing screening for alcohol problems include a lack of physician familiarity with screening methods7,9 and a lack of time.10 A simple time-efficient instrument could increase the frequency of screening, which could reduce the burden of alcohol-related harm in our society.
Previous research in primary care showed that a single question about the last occasion of heavy drinking had a sensitivity of 62% and a specificity of 93% for detecting patients with either a past-year alcohol use disorder or hazardous drinking in the past month.11 In that study, the question was presented in written form: “On any single occasion during the past 3 months, have you had more than 5 drinks containing alcohol?” In this report, we examine the clinical utility of a revision of that question presented orally with different threshold values for men and women as a screening instrument for problem drinking among injured patients presenting to emergency departments.
Methods
Data for this report were taken from a study of alcohol and injury that was approved by the institutional review board of the University Health Science Center. Interviews were conducted with patients presenting for care to 1 of the 3 hospital emergency departments in Boone County, Missouri, within 48 hours of an acute injury. Patients were eligible for the study if they were aged 18 years or older, able to converse in English, cognitively intact, not in police custody, and if the injury did not occur in a controlled environment (eg, a nursing home or jail where access to alcohol is limited). Research staff trained in the use of the structured interview provided equal coverage of each day of the week and hour of the day. Interviews were conducted from February 1998 through March 2000.
Instruments
After obtaining informed consent, the first question of the structured interview was about tobacco use. The second was, “When was the last time you had more than X drinks in 1 day?” with X = 5 for men and X=4 for women. The answers were coded as never, more than 12 months ago, 3 to 12 months ago, and within the last 3 months. The threshold values were based on empirical work12 and guidelines;13 we set them one drink higher than in the guidelines to balance sensitivity and specificity based on our previous work.11
We defined problem drinking as either past-month hazardous drinking or a past-year alcohol use disorder. We defined hazardous drinking as consumption of more than 4 drinks in 1 day or 14 in 1 week for men, more than 3 in 1 day or 7 in 1 week for women. One drink in the United States contains approximately 11.5 g of ethanol, the amount in 12 oz of beer, 5 oz of wine, or 1.5 oz of liquor. Data for this assessment were from a 29-day retrospective calendar-based review of day-by-day consumption.14 We defined alcohol use disorders (ie, alcohol abuse or dependence) according to the criteria in the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV),15 using the alcohol questions in the structured Diagnostic Interview Schedule (DIS).16 Participants were given a breath alcohol test at the end of the interview using the Alco-Sensor IV model breathalyzer (Intoximeters, Inc, St. Louis, Mo).
Patients
Of 3616 injured persons presenting for care to one of the participating emergency centers during times when research staff were present, 579 were excluded, because the injury occurred more than 48 hours before, because of mental status changes (chronic or acute), or because the injury occurred in a controlled environment. Of 3037 eligible patients on covered shifts, 12.2% declined to participate, and 15.4% were missed, either because their injuries were severe enough to preclude an interview in the emergency department (8.1%) or because research staff were busy with other interviews (7.3%); 2199 persons were interviewed during covered emergency department shifts from February 1998 through January 2000.
Some injured patients were missed because of the severity of their injuries. Therefore, we recruited additional patients who had been admitted to the hospital during times not covered in the emergency department by study staff. These interviews were conducted from June 1999 through March 2000. A total of 618 were identified: 52 refused (8.4%); 69 were too severely injured (11.2%); and 139 were missed (22.5%), leaving 358 who were interviewed.
We combined these 2 groups (from covered and noncovered emergency department shifts) and excluded those from noncovered shifts who had minimal effect on the results presented here. Of those 2557 interviews, 40 were with patients who had been injured and interviewed before. We excluded these from analysis, leaving 2517 individual patients. Table 1 shows basic demographic data and prevalence of alcohol problems. Seven patients (0.3% of interviews) did not answer the single problem-drinking screening question, and 13 (0.5%) did not complete the calendar-based review of recent drinking.
Statistical Analysis
We used bivariate analysis to calculate sensitivity and specificity, and confidence interval (CI) analysis17 to determine 95% CIs. We used the c statistic from logistic regression to calculate the area under the receiver-operating characteristic (ROC) curve18 in bivariate and multivariate models. We used the formula in Hanley and McNeil19 to calculate 95% CIs around the area under the ROC curve.*
Results
For detecting problem drinking, the single question (with “within the last 3 months” considered positive) had a sensitivity of 86% and a specificity of 86% Table 2. Sensitivity was higher in men; specificity was higher in women. The question was better in detecting hazardous drinking than alcohol use disorder and more effective in whites than in African Americans. In this study with a prevalence of problem drinking of 35%, the positive and negative predictive values were 77% and 92%. In a clinical setting with a prevalence of 15% (the national prevalence of past-month heavy drinking4), positive and negative predictive values would be 52% and 97%. Likelihood ratios18 are provided in Table 3.
Breath or blood alcohol levels were obtained for 2335 patients; 257 had some alcohol detected, and 139 had a level of 0.22 mmol/L (0.1 g/dL) or greater. Of 1493 patients without hazardous drinking or an alcohol use disorder only 49 had any alcohol detected, a specificity of 96%. However, the sensitivity of alcohol testing was only 24% (198 with a positive breath or blood alcohol level out of 842 with problem drinking). Interestingly, 27 of the patients with an alcohol level higher than 0.22 mmol/L (indicative of problem drinking) were negative by the criterion standards (22 of the 27 screened positive with the single question).
Tobacco use is common among patients with alcohol use disorders20 and was associated with problem drinking in our study. Almost half the patients in this study used tobacco in some form; 1013 were cigarette smokers, and 139 used only other forms of tobacco. Current tobacco use had a sensitivity 65% and a specificity of 64% in identifying problem drinking.
The single screening question had 4 answer options (never, >12 months ago, 3-12 months, and within the last 3 months). Using the 3 cut points defined by those answer options, the area under the ROC curve21 for identifying problem drinking was 0.90 (95% CI, 0.88-0.91). With a past-year alcohol use disorder as the diagnostic criterion, the area under the ROC curve was 0.81 (95% CI, 0.79-0.83). Entering the single question, sex, age (continuous or ordinal), and tobacco use as independent variables, the area under the ROC curve increased minimally to 0.91 Figure 1. Entering race/ethnicity, injury severity score,22 or alcohol level into the model had essentially no effect. Considering only current drinkers (n=1535), the single question had a sensitivity for detecting problem drinking of 86% and a specificity of 69%, and the area under the ROC curve was 0.79 (95% CI, 0.77-0.82).
Discussion
A single question about recent heavy drinking has clinically useful sensitivity and specificity for detecting recent hazardous drinking and current alcohol use disorders. In response to “When was the last time you had more than X drinks in one day?”, an answer within the past 3 months has a sensitivity and a specificity of 86%. An answer of “never” with a sensitivity of more than 99% essentially rules out problem drinking. The question has less utility in African American patients, but works equally well in women and men.
Previous studies have explored the utility of a single question about the frequency of drinking 5 or more drinks at one time. Drinking 5 or more drinks on one occasion at least once in the past year had a sensitivity of 86% and a specificity of 63% in detecting past-year alcohol dependence.23 In an emergency center–based study with a positive screening result defined as heavy drinking at least monthly, sensitivity was 58% and specificity 85%.24 The authors of these reports concluded that a single question about the frequency of heavy drinking was inadequate in screening for problem drinking. However, the question we used inquired about the last occasion of heavy drinking not the usual frequency and used different thresholds for men and women, narrowing the sex differences in sensitivity and specificity found in previous work using a single threshold.11
The single question we used compares favorably with the CAGE25 and the Alcohol Use Disorders Identification Test (AUDIT).26 For the single question, the area under the ROC curve is 0.90 for problem drinking and 0.81 for alcohol use disorders only. With the AUDIT,27 the area under the ROC curve was 0.83 to 0.90 in a variety of settings in detecting alcohol use disorders26,28,29 (hazardous drinkers not included) and 0.88 in detecting problem drinking.29 With the CAGE questions the area under the ROC curve was 0.89 in one study25 and 0.68 to 0.88 in a variety of sex-ethnic subgroups in another26 for detecting alcohol use disorders only.
The criterion standards we used are reliable and valid.14,30 Although they were negative in 27 patients with alcohol levels of 0.22 mmol/L or greater in whom intoxication may have limited the validity of self-report, 22 of those 27 patients had a positive screening result with the single question.
Limitations
Several limitations of our study should be noted. The interviewer was aware of the patient’s response on the screening question, and this may have led to ascertainment bias. However, the interview was the same for all participants whether their screen produced positive or negative results, and the DIS is a fully structured interview, minimizing this potential bias.
The study is limited by its nonparticipation rate of 30%. Of eligible injured patients from covered emergency department shifts 15.4% were missed, either because interviewers were busy with other participants or because the patient had severe injuries that precluded interview. Another 12.2% declined to participate. The utility of the screening question in these patients is unknown.
The single question did not perform as well for African Americans as it did for whites; its sensitivity and specificity, however, are clinically useful in both groups. Consistent with the population of central Missouri, the study included few members of other ethnic groups, and the question’s utility in those groups needs study.
The generalizability of our findings may be limited. The study included only injured patients presenting for care to emergency centers in central Missouri. However, alcohol-related injury is a major source of morbidity and mortality especially among young adults, and brief interventions in emergency centers are efficacious.31 We have little reason to expect the question used in this study would be less effective in other clinical settings.
We examined only one approach to screening. However, the screening question used in this study was selected in advance and remained unchanged throughout data collection. Some screening instruments32 developed post hoc from a longer list of questions have been validated in separate samples,24 but others33,34 have not.35,36
Given the morbidity associated with hazardous drinking and the efficacy of brief interventions, screening should include hazardous drinking as well as alcohol use disorders, which we did in this study. Studies of other screening instruments have generally tested their utility only in detecting alcohol use disorders, in which the single question was less specific. Although our study did not address the issue, the single question probably would not identify patients in long-term recovery from a past alcohol use disorder, especially those abstinent for more than a year.
Conclusions
Further study is needed to determine whether clinicians will find this single question easier to apply and whether problem drinkers find it more engaging than alternative screening instruments. The goal of screening is to identify problem drinkers and to engage them effectively in the process of change.37 Different screening approaches—and different ways of following up positive screen results—may vary in their ability to help problem drinkers move toward change.
A single question about the last occasion of heavy drinking has clinically useful sensitivity and specificity in detecting hazardous drinkers and current alcohol use disorders. The question is simple enough that it could be used, as is a question about tobacco use,38 as part of the taking of routine vital signs. If the question used in our study were adopted, it could efficiently identify which patients require further discussion about their drinking habits, with positive and negative predictive values of 77% and 92% in this emergency center population, approximately 52% and 97%, respectively, in a typical population-based sample. That in turn could lead to more frequent use of effective brief intervention and referral strategies, thereby potentially decreasing society’s burden of alcohol-related harm.
Acknowledgments
Our study was supported by a grant from the National Institute on Alcohol Abuse and Alcoholism (R01 AA11078). During pilot work, Dr Vinson was supported by a Generalist Physician Faculty Scholars Program grant from the Robert Wood Johnson Foundation. The work has also been supported in part by a research center grant from the American Academy of Family Physicians and by the Opal Lewis Fund for Alcohol Research. Data were collected by Deborah Bailey; Ciprian Crismaru, MD; Amelia Devera-Sales, MD; Kari Gilmore; Indira Gujral; Carol Reidinger; and Carey Smith. Data collection was assisted by the following medical students: Stephen Griffith, Darin Lee, Greg Morlin, Rebecca Shumate, Lindsey Thornton, and Aneesh Tosh. Data management was provided by Darla Horman, MA; Robin Kruse, PhD.; and Sandra Taylor. Logistic regression analyses were performed by Jim Hewett, MS; John Hewett, PhD; and Fan Liu. We also thank the personnel of the emergency centers of the University of Missouri, Boone Hospital Center, and Columbia Regional Hospital.
Related resources
FOR PATIENTS:
- “How to Cut Down on Your Drinking,” a brochure for patients from the National Institute on Alcohol Abuse and Alcoholism (NIAAA)http://www.niaaa.nih.gov/publications/handout.htm
- The Center for Substance Abuse Treatment—includes searchable database of substance abuse treatment programs with maps, phone numbers, types of insurance coverage accepted by each center, and other information. Part of the federal Substance Abuse and Mental Health Services Administration.http://www.samhsa.gov/centers/csat/csat.html
- Alcoholics Anonymous http://www.aa.org
- Moderation Management (seeks to help problem drinkers control their consumption)http://www.moderation.org
- Join Together, an advocacy group addressing alcohol and other drug abuse, violence, etc. Resources for individuals, parents, other groups. http://www.jointogether.org/sa/
FOR FAMILY PHYSICIANS/RESEARCHERS:
- “The Physicians’Guide to Helping Patients with Alcohol Problems,” also published by NIAAA.http://www.niaaa.nih.gov/publications/physicn.htm
- The Research Society on Alcoholism http://www.rsa.am/
- Association for Medical Education and Research in Substance Abusehttp://www.amersa.org/
- American Society of Addiction Medicinehttp://www.asam.org/
1. Chou SP, Grant BF, Dawson DA. Medical consequences of alcohol consumption—United States, 1992. Alcohol Clin Exp Res 1996;20:1423-29.
2. McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA 1993;270:2207-12.
3. Grant BF, Harford TC, Dawson DA, Chou P, Dufour M, Pickering R. Prevalence of DSM-IV alcohol abuse and dependence: United States, 1992. Alcohol Health Res World 1994;18:243-48.
4. Behavioral Risk Factor Surveillance System Online Prevalence Data. 1999. Available at: www2.cdc.gov/nccdphp/brfss/index.asp. Accessed December 5, 2000.
5. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
6. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers. JAMA 1997;277:1039-44.
7. Bradley KA, Curry SJ, Koepsell TD, Larson EB. Primary and secondary prevention of alcohol problems: US internist attitudes and practices. J Gen Intern Med 1995;10:67-72.
8. Fiellin DA, Reid MC, O’Connor PG. Screening for alcohol problems in primary care. Arch Intern Med 2000;160:1977-89.
9. Wenrich MD, Paauw DS, Carline JD, Curtis JR, Ramsey PG. Do primary care physicians screen patients about alcohol intake using the CAGE questions? J Gen Intern Med 1995;10:631-34.
10. Stange KC, Flocke SA, Goodwin MS. Opportunistic preventive services delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
11. Taj N, Devera-Sales A, Vinson DC. Screening for problem drinking: does a single question work? J Fam Pract 1998;46:328-35.
12. Sanchez-Craig M. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
13. National Institute on Alcohol Abuse and Alcoholism. The physician’s guide to helping patients with alcohol problems. Bethesda, Md: National Institutes of Health; 1995. Available atsilk.nih.gov/silk/niaaa1/publication/physicn.htm. Accessed December 5, 2000.
14. Sobell LC, Sobell MB. Timeline follow-back: a technique for assessing self reported alcohol consumption. In: Litten R, Allen J, eds. Measuring alcohol consumption: psychosocial and biochemical methods. Totowa, NJ: Humana Press; 1992;41-72.
15. American Psychiatric Association. Substance-related disorders. In: Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: American Psychiatric Association; 1994;175-204.
16. Robins L, Cottler L, Bucholz K, Compton W. Diagnostic Interview Schedule for DSM-IV. St. Louis, Mo: Washington University School of Medicine, Department of Psychiatry; 1996.
17. Gardner SB, Winter PD, Gardner MJ. Confidence interval analysis. London, England: BMJ; 1989.
18. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature: III. How to use an article about a diagnostic test: B. What are the results and will they help me in caring for my patients? JAMA 1994;271:703-07.
19. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36.
20. Fleming MF, Manwell LB, Barry KL, Johnson K. At-risk drinking in an HMO primary care sample: prevalence and health policy implications. Am J Public Health 1998;88:90-93.
21. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. 2nd ed. Boston, Mass: Little, Brown and Company; 1991.
22. Abbreviated injury scale: 1990 revision. Des Plaines, Ill: Associaton for the Advancement of Automotive Medicine; 1990.
23. Dawson DA. Consumption indicators of alcohol dependence. Addiction 1994;89:345-50.
24. Cherpitel CJ. Differences in performance of screening instruments for problem drinking among blacks, whites and Hispanics in an emergency room population. J Stud Alcohol 1998;59:420-26.
25. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
26. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
27. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
28. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcohol 1993;28:33-42.
29. Bush K, Kivlahan DR, Mcdonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Arch Intern Med 1998;158:1789-95.
30. Hasin D, Paykin A. Alcohol dependence and abuse diagnoses: concurrent validity in a nationally representative sample. Alcohol Clin Exp Res 1999;23:144-50.
31. Gentilello LM, Rivara FP, Donovan DM, et al. Alcohol interventions in a trauma center as a means of reducing the risk of injury recurrence. Ann Surg 1999;230:473-80.
32. Cherpitel CJ. Screening for alcohol problems in the emergency room: a rapid alcohol problems screen. Drug Alcohol Depend 1995;40:133-37.
33. Cyr M, Wartman S. The effectiveness of routine screening questions in the detection of alcoholism. JAMA 1988;259:51-54.
34. Skinner HA, Holt S, Schuller R, Roy J, Israel Y. Identification of alcohol abuse using laboratory tests and a history of trauma. Ann Intern Med 1984;101:847-51.
35. Schorling JB, Willems JP, Klas PT. Identifying problem drinkers: lack of sensitivity of the two-question drinking test. Am J Med 1995;98:232-36.
36. Rumpf HJ, Hapke U, Erfurth A, John U. Screening questionnaires in the detection of hazardous alcohol consumption in the general hospital: direct or disguised assessment? J Stud Alcohol 1998;59:698-703.
37. Rollnick S, Mason P, Butler C. Health behavior change: a guide for practitioners. Edinburgh, Scotland: Churchill Livingstone; 1999.
38. Ahluwalia JS, Gibson CA, Kenney RE, Wallace DD, Resnicow K. Smoking status as a vital sign. J Gen Intern Med 1999;14:402-08.
39. Rehm J, Sempos CT. Alcohol consumption and all-cause mortality. Addiction 1995;90:471-80.
1. Chou SP, Grant BF, Dawson DA. Medical consequences of alcohol consumption—United States, 1992. Alcohol Clin Exp Res 1996;20:1423-29.
2. McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA 1993;270:2207-12.
3. Grant BF, Harford TC, Dawson DA, Chou P, Dufour M, Pickering R. Prevalence of DSM-IV alcohol abuse and dependence: United States, 1992. Alcohol Health Res World 1994;18:243-48.
4. Behavioral Risk Factor Surveillance System Online Prevalence Data. 1999. Available at: www2.cdc.gov/nccdphp/brfss/index.asp. Accessed December 5, 2000.
5. Wallace P, Cutler S, Haines A. Randomised controlled trial of general practitioner intervention in patients with excessive alcohol consumption. BMJ 1988;297:663-68.
6. Fleming MF, Barry KL, Manwell LB, Johnson K, London R. Brief physician advice for problem alcohol drinkers. JAMA 1997;277:1039-44.
7. Bradley KA, Curry SJ, Koepsell TD, Larson EB. Primary and secondary prevention of alcohol problems: US internist attitudes and practices. J Gen Intern Med 1995;10:67-72.
8. Fiellin DA, Reid MC, O’Connor PG. Screening for alcohol problems in primary care. Arch Intern Med 2000;160:1977-89.
9. Wenrich MD, Paauw DS, Carline JD, Curtis JR, Ramsey PG. Do primary care physicians screen patients about alcohol intake using the CAGE questions? J Gen Intern Med 1995;10:631-34.
10. Stange KC, Flocke SA, Goodwin MS. Opportunistic preventive services delivery: are time limitations and patient satisfaction barriers? J Fam Pract 1998;46:419-24.
11. Taj N, Devera-Sales A, Vinson DC. Screening for problem drinking: does a single question work? J Fam Pract 1998;46:328-35.
12. Sanchez-Craig M. Empirically based guidelines for moderate drinking: 1-year results from three studies with problem drinkers. Am J Public Health 1995;85:823-28.
13. National Institute on Alcohol Abuse and Alcoholism. The physician’s guide to helping patients with alcohol problems. Bethesda, Md: National Institutes of Health; 1995. Available atsilk.nih.gov/silk/niaaa1/publication/physicn.htm. Accessed December 5, 2000.
14. Sobell LC, Sobell MB. Timeline follow-back: a technique for assessing self reported alcohol consumption. In: Litten R, Allen J, eds. Measuring alcohol consumption: psychosocial and biochemical methods. Totowa, NJ: Humana Press; 1992;41-72.
15. American Psychiatric Association. Substance-related disorders. In: Diagnostic and Statistical Manual of Mental Disorders. Washington, DC: American Psychiatric Association; 1994;175-204.
16. Robins L, Cottler L, Bucholz K, Compton W. Diagnostic Interview Schedule for DSM-IV. St. Louis, Mo: Washington University School of Medicine, Department of Psychiatry; 1996.
17. Gardner SB, Winter PD, Gardner MJ. Confidence interval analysis. London, England: BMJ; 1989.
18. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature: III. How to use an article about a diagnostic test: B. What are the results and will they help me in caring for my patients? JAMA 1994;271:703-07.
19. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36.
20. Fleming MF, Manwell LB, Barry KL, Johnson K. At-risk drinking in an HMO primary care sample: prevalence and health policy implications. Am J Public Health 1998;88:90-93.
21. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. 2nd ed. Boston, Mass: Little, Brown and Company; 1991.
22. Abbreviated injury scale: 1990 revision. Des Plaines, Ill: Associaton for the Advancement of Automotive Medicine; 1990.
23. Dawson DA. Consumption indicators of alcohol dependence. Addiction 1994;89:345-50.
24. Cherpitel CJ. Differences in performance of screening instruments for problem drinking among blacks, whites and Hispanics in an emergency room population. J Stud Alcohol 1998;59:420-26.
25. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med 1991;115:774-77.
26. Steinbauer JR, Cantor SB, Holzer CE, III, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med 1998;129:353-62.
27. Saunders JB, Aasland OG, Babor TF, de la Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption—II. Addiction 1993;88:791-804.
28. Barry KL, Fleming MF. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: predictive validity in a rural primary care sample. Alcohol Alcohol 1993;28:33-42.
29. Bush K, Kivlahan DR, Mcdonell MB, Fihn SD, Bradley KA. The AUDIT alcohol consumption questions (AUDIT-C): an effective brief screening test for problem drinking. Arch Intern Med 1998;158:1789-95.
30. Hasin D, Paykin A. Alcohol dependence and abuse diagnoses: concurrent validity in a nationally representative sample. Alcohol Clin Exp Res 1999;23:144-50.
31. Gentilello LM, Rivara FP, Donovan DM, et al. Alcohol interventions in a trauma center as a means of reducing the risk of injury recurrence. Ann Surg 1999;230:473-80.
32. Cherpitel CJ. Screening for alcohol problems in the emergency room: a rapid alcohol problems screen. Drug Alcohol Depend 1995;40:133-37.
33. Cyr M, Wartman S. The effectiveness of routine screening questions in the detection of alcoholism. JAMA 1988;259:51-54.
34. Skinner HA, Holt S, Schuller R, Roy J, Israel Y. Identification of alcohol abuse using laboratory tests and a history of trauma. Ann Intern Med 1984;101:847-51.
35. Schorling JB, Willems JP, Klas PT. Identifying problem drinkers: lack of sensitivity of the two-question drinking test. Am J Med 1995;98:232-36.
36. Rumpf HJ, Hapke U, Erfurth A, John U. Screening questionnaires in the detection of hazardous alcohol consumption in the general hospital: direct or disguised assessment? J Stud Alcohol 1998;59:698-703.
37. Rollnick S, Mason P, Butler C. Health behavior change: a guide for practitioners. Edinburgh, Scotland: Churchill Livingstone; 1999.
38. Ahluwalia JS, Gibson CA, Kenney RE, Wallace DD, Resnicow K. Smoking status as a vital sign. J Gen Intern Med 1999;14:402-08.
39. Rehm J, Sempos CT. Alcohol consumption and all-cause mortality. Addiction 1995;90:471-80.
Efficacy and Safety of Terbinafine 1% Solution in the Treatment of Interdigital Tinea Pedis and Tinea Corporis or Tinea Cruris
The Effect of Cluster Randomization on Sample Size in Prevention Research
METHODS: We performed a cross-sectional study involving data from 46 participating practices with 106 physicians collected using self-administered questionnaires and a chart audit of 100 randomly selected charts per practice. The population was health service organizations (HSOs) located in Southern Ontario. We analyzed performance data for 13 preventive maneuvers determined by chart review and used analysis of variance to determine the intraclass correlation coefficient. An index of “up-to-datedness” was computed for each physician and practice as the number of a recommended preventive measures done divided by the number of eligible patients. An index called “inappropriatness” was computed in the same manner for the not-recommended measures. The intraclass correlation coefficients for the 2 key study outcomes (up-to-datedness and inappropriateness) were also calculated and compared.
RESULTS: The mean up-to-datedness score for the practices was 53.5% (95% confidence interval [CI], 51.0%-56.0%), and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). The intraclass correlation for up-to-datedness was 0.0365 compared with inappropriateness at 0.1790. The intraclass correlation for preventive maneuvers ranged from 0.005 for blood pressure measurement to 0.66 for chest radiographs of smokers, and as a consequence required that the sample size ranged from 20 to 42 physicians per group.
CONCLUSIONS: Randomizing by practice clusters and analyzing at the level of the physician has important implications for sample size requirements. Larger intraclass correlations indicate interdependence among the physicians within a cluster; as a consequence, variability within clusters is reduced and the required sample size increased. The key finding that many potential outcome measures perform differently in terms of the intracluster correlation reinforces the need for researchers to carefully consider the selection of outcome measures and adjust sample sizes accordingly when the unit of analysis and randomization are not the same.
In conducting research with community-based primary care practices it is often not feasible to randomize individual physicians to the treatment conditions. This is due to problems of potential contamination between intervention and control subjects in the same practice setting or because the success of the intervention demands that all physicians in the practice setting adhere to a guideline. As a result, the practice itself is randomized to the conditions.
The randomization of physicians in groups, rather than each individual separately, has important consequences for sample size, interpretation, and analysis.1-3 It is argued that groups of physicians are likely to be heterogeneous,4 giving rise to a component of variation that one must take into account in the analysis and that one can control only by studying many groups of physicians rather than many physicians within each group.4
Randomizing physicians by cluster and then analyzing the data by physician or patient has the potential to introduce possible bias in the results. It has been noted that many studies randomized groups of health professionals (cluster randomization) but analyzed the results by physician, thus resulting in a possible overestimation of the significance of the observed effects (unit of analysis error).5 Divine and colleagues6 observed that 38 out of 54 studies of physicians’ patient care practices had not appropriately accounted for the clustered nature of the study data. Similarly, Simpson and coworkers7 found that only 4 out of 21 primary prevention trials included sample size calculations or discussions of power that allowed for clustering, while 12 out of 21 took clustering into account in the statistical analysis. When the effect size of the intervention is small to moderate, analyzing results by individual without adjusting for the cluster phenomena can lead to false conclusions about the significance of the effectiveness of the intervention. For example, Donner and Klar8 show that for the data of Murray and colleagues9 the P value would be .03 if the effect of clustering were ignored, while it was greater than .1 after adjusting for the effect of clustering.
Using baseline data from a successful randomized controlled trial of primary care practices in Southern Ontario, Canada,10 we will explain the intracluster correlation coefficient (ICC) in determining the required sample size of physicians. The ICC is a measure of variation within and between clusters of physicians. It is a measure of the clustering effect or the lack of independence among the physicians that make up the cluster. The smaller the ICC, the more likely the physicians in the cluster behave independently, and analysis at the level of the physician can proceed without significant adjustment to sample size. The higher the ICC, the more closely the measure quantifies class or group rather than the individual physician, and the effective sample size is decreased to the number of classes rather than the number of individuals. Our objective was to provide information on the cluster effect of measuring the performance of various preventive maneuvers between groups of physicians to enable other researchers in the area of primary care prevention to avoid errors.
Methods
As part of a larger clinical trial to improve preventive practice, we conducted a cross-sectional study to provide a point estimate of preventive performance in capitation primary care practices. We chose the preventive maneuvers from the Canadian Task Force on the Periodic Health Examination.11 According to their classification system, there is randomized clinical trial evidence to support “A” level (highly recommended) maneuvers and cohort and case controlled studies to support “B” (recommended) maneuvers. The task force also reviewed the quality of evidence for maneuvers that should not be done and identified these as “D” level maneuvers. Eight A and B level recommendations and 5 D level recommendations were identified by a panel of practicing family physicians. Selection criteria included the need to represent a broad spectrum of preventive interventions for both men and women patients of all ages, and the need to address diseases that were clinically important. The 8 recommended and 5 inappropriate maneuvers chosen for our study are listed in Table 1.
This study was conducted in 72 community-based health service organizations (HSOs) in Ontario located at 100 different sites primarily in the Toronto, Hamilton, and London areas in the spring of 1997. The Ottawa Civic Hospital research ethics committee approved our study.
Data Collection
Practice and physician characteristics were collected using a self-administered questionnaire to which 96% of 108 participating physicians responded (Table 2 has the questionnaire items). Preventive performance at the physician and overall practice level was determined using a chart audit.
Chart Audit. Patient charts were eligible for inclusion in the medical audit if they were for patients who were aged 17 years or older on the date of last visit and had visited the HSO at least once in the 2 years before the audit. The variables collected from the charts included demographic and patient characteristics as well as indicators of performance of preventive maneuvers.
The chart auditors obtained a list of patients within an HSO practice group of physicians and then randomly selected charts using computer-generated random numbers. The patient list was either constructed by the auditors or by using the medical office record computer system. The list included all rostered and nonrostered patients. Unique chart numbers or numeric identifiers were assigned to each patient. The required number of charts was randomly selected from the sampling frame, the chart was pulled, and eligibility for inclusion was determined. The auditors proceeded to find charts at random from the sampling frame until they obtained 100 eligible charts per practice.
To verify the quality of the data entered from the 100 randomly selected charts and to measure the inter-rater reliability between auditors, 20% of each HSO’s audited charts were independently verified by another auditor. If coding discrepancies were found in more than 5 of 20 charts, the entire 100 charts were audited and verified again.
Data Analysis. Our analysis with SPSS software version 8.0 (SPSS Inc, Chicago, Ill) focused primarily on calculating the extent to which each preventive maneuver was being performed according to the recommendations of the Canadian Task Force on the Periodic Health Exam. An index of “up-to-datedness” was computed for each physician and practice as the number of A and B preventive measures done divided by the number of eligible A and B measures. In addition, an index called “inappropriateness” was computed in the same manner to represent the D measures.
Frequencies and descriptive statistics were generated on all variables, and each variable was checked for data entry errors and inappropriate or illogical responses. Means and standard deviations were computed for continuous variables and frequency distributions were computed for categorical variables, such as sex and age group. In addition, chi-square tests were used to compare the background characteristics of participating and nonparticipating HSO physicians. Ninety-five percent confidence intervals were calculated for the mean preventive indexes. Finally, kwas computed as a measure of reliability between the 2 chart auditors.
The ICC was calculated for sample cluster means12 of up-to-datedness and inappropriateness. The practice characteristic data revealed that the mean cluster size in terms of number of physicians per practice was 2.8 with a variance of 3.6 and a total of 106 physicians across 46 practices. To determine the between-subjects (practices) variance (sb2) and within-subjects (practices) variance (sw 2) for the ICC calculation, the one-way analysis of variance (ANOVA) procedure was run on both measures (up-to-datedness and inappropriateness) as well as each of the preventive maneuvers separately.13 ICC (D) was computed from the F statistics of the one-way ANOVA and the adjusted cluster size as follows: D = F-1/F+n0-1 where n0 is the mean practice size ([2.8] - [practice variance (3.6)/106 physicians]).
Finally, using the formula by Donner and coworkers14 and the ICC, the sample size for comparing 2 independent groups allowing for clustering for both up-to-datedness and inappropriateness was determined. The formula is: n = 2(Z−/2+Z02)2F2[1+(xc-1)D]/*2 where n is the per group sample size; Z−/2= 1.96 and Z02= 0.84 are the standard normal percentiles for the type I and type II error rates at 0.05 and 0.20, respectively; Fis the standard deviation in the outcome variable; * is the expected difference between the two means; xcis the average cluster size; and D is the ICC. The sample size calculations were based on an expected difference of 0.09 between groups (with 80% power and 5% significance) and a standard deviation of 0.10. Table 3 shows the effect on per-group sample size for varying ICC values and average cluster sizes.
Results
A total of 46 HSOs were recruited out of a possible 100 sites, for a response rate of 46% at baseline. The response rate to the physician questionnaire was 98% (106 of 108). Physicians in practices that agreed to participate differed significantly from those who did not. Participating physicians were younger, having graduated in 1977 on average compared with 1971 (t=4.58 [df=191], P<.001) and were more likely to be women, 30.4% compared with 9.9% for nonparticipating physicians (c2=11.09 [df=1, N=193], P=.001). Table 2 provides descriptive information on practice and physician characteristics. Five practices of 46 needed to have the entire 100 charts re-audited. Final concordance between the 2 auditors for each practice verification was 85% (k=.71).
The mean up-to-datedness score for the practices or the mean proportion of A and B maneuvers performed was 53.5% (95% confidence interval [CI], 51.0%-56.0%) and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). In other words, on average, 53.5% of patients eligible for recommended preventive maneuvers received them and 21.5% of eligible patients received inappropriate preventive maneuvers.
Table 1 gives the practice mean square, the error mean square, the ICC, and the required sample size per group for the overall measures of up-to-datedness and inappropriateness as well as for 13 preventive maneuvers individually. For inappropriateness, there was more variability between practices than within practices among physicians, resulting in a larger practice mean square and a significant F statistic (P <.05). For up-to-datedness, the variability within practices among physicians was greater than the variability between practices, although not significantly so. Table 1 shows the intraclass correlation as 0.0365 for up-to-datedness and 0.1790 for inappropriateness. Inappropriateness scores were not normally distributed, and 2 physicians had scores greater than 0.60. However, with these extreme outliers removed, the ICC for inappropriateness remained high at 0.14.
The ICC ranges from 0.005 for blood pressure measurement to 0.66 for chest x-rays of smokers. The variability between and within group clusters is the same for blood pressure measurement. For chest x-rays of smokers the variability between clusters is very significant and within clusters it is small, indicating that some practice clusters perform a larger number of chest x-rays on smokers than other practices. However, the performance of chest x-rays was not normally distributed, with 79% of physicians not performing them and one solo physician with an extreme score of 0.53. With this extreme outlier removed the ICC for chest x-rays was 0.25, with a mean square between practices of 0.0024 and a mean square within practices of 0.0012 (P<.01). Table 1 shows the effect on sample size for analysis at the level of the physician as the ICC varies.
Discussion
Statistical theory points to the consequences of cluster randomization as a reduction in effective sample size. This occurs because the individuals within a cluster cannot be regarded as independent. The precise effect of cluster randomization on sample size requirements depends on both the size of the cluster and the degree of within-cluster dependence as measured by ICC.2 Cluster randomized trials are increasingly being used in health services research particularly for evaluating interventions involving organizational changes when it is not feasible to randomize at the level of the individual. Cluster randomization at the level of the practice minimizes the potential for contamination between treatment and control groups. However, the statistical power of a cluster randomized trial when the unit of randomization is the practice and the unit of analysis is the health professional can be greatly reduced in comparison to an individually randomized trial.15
To preserve power the researcher should, whenever possible, ensure that the unit of randomization and the unit of analysis are the same.16 In this manner standard statistical tests can be used. Often this is not possible given secondary research questions that may be targeted to the health professionals within the practice and not the practice as a whole. If data are analyzed at the level of the individual and not at the level of the cluster (in effect ignoring the clustering effect), then there is a strong possibility that P values will be artificially extreme and confidence intervals will be overly narrow, increasing the chances of spuriously significant findings and misleading conclusions.15 When using the individual physician as the unit of analysis, one must take into account the correlation between responses of individuals within the same cluster. For continuous outcome variables that are normally distributed, a mixed-effects analysis of variance (or covariance) is appropriate, with clusters nested within the comparison groups.17 For dichotomous variables, Donner and Klar suggest that an adjusted chi-square test be used.8 Although we focus on the issue of clustering for study designs using random allocation, the issue of clustering is also apparent in cross-sectional and cohort studies, where the practice-level and/or physician-level factors may have an impact on patient-level data. Researchers need to be aware of the possibility of intracluster correlation and the implications for analysis in these studies as well.18
In the example presented, the ICC for the outcome measure “up-to-datedness” was approximately 0.04 in contrast to the ICC for inappropriateness, which was 0.18. The required sample size per group for the outcome measure “up-to-datedness” would be 21 physicians compared with inappropriateness, where the sample size would be 25 per group. In contrast, if the study dealt with improving smoking cessation counseling or reducing chest x-rays in smokers, the sample size would be 27 or 42 physicians per group. Treating the unit of analysis and the unit of randomization the same would require only 19 physicians per group.
Campbell and colleagues19 looked at a number of primary and secondary care study data sets and found that ICCs for measures in primary care were generally between 0.05 and 0.15. In contrast, in this study the ICCs ranged from 0.005 to 0.66, depending on the measure. The difference in ICC between measures and across studies is interesting, and we can only speculate why some measures show more interdependence. It is possible that inappropriateness taps phenomena such as policies at the practice level which physicians can not easily influence, while up-to-datedness may help explain how physicians even when working in the same practice setting behave independently when it comes to delivering recommended preventive care. It is important to be aware and not to assume that because one measure may show independence that all measures under study show the same independence. For example, blood pressure measurement and urine proteinuria screening are different in terms of ICC. Differences between outcome measures should be taken into account when calculating required sample size and in statistical analysis when the unit of randomization and analysis are not the same.
Limitations
There are 2 limitations with this research. First, analysis of respondents and nonrespondents to the recruitment effort showed that the study participants were more likely to be younger and women. This would imply that our findings may not be generalizable to the HSO population as a whole. Second, the measures of preventive performance were based on a chart audit and as a consequence are susceptible to the potential problems associated with chart documentation. A low level of preventive performance does not necessarily mean that prevention is not being practiced or that it is being performed inconsistently within a group practice. It may indicate that a less sophisticated documentation process is being used.
Conclusion
Physicians clustered together in the same practice do not necessarily perform the delivery of preventive services equally. As demonstrated by the measure “up-to-datedness,” there is relatively little correlation among physicians working together for performance of many preventive maneuvers. For some maneuvers, most notably those that may be automatically performed as part of practice policy, there is modest correlation among physicians who work together. We hope that these findings assist other researchers in their decision making around the need to adjust sample sizes for the effect of clustering.
1. Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol 1978;108:100-02.
2. Donner A. An empirical study of cluster randomization. Int J Epidemiol 1982;11:283-86.
3. Kerry SM, Bland JM. The intercluster correlation coefficient in cluster randomisation. BMJ 1998;316:1455.
4. Gail MH, Mark SD, Carrol RJ, Green SB, Pee D. On design considerations and randomization based inference for community intervention trials. Stat Med 1996;15:1069-92.
5. Bero LA, Grilli R, Grimshaw JM, Harvey E, Oxman AD, Thomson MA. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings: the Cochrane Effective Practice and Organization of Care Review Group. BMJ 1998;317:465-86.
6. Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians’ patient care behavior. J Gen Intern Med 1992;7:623-29.
7. Simpson JM, Klar N, Donner A. Accounting for cluster randomization: a review of primary prevention trials, 1990 through 1993. Am J Public Health 1995;85:1378-83.
8. Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is the cluster. Am J Epidemiol 1994;140:279-89.
9. Murray DM, Perry CL, Griffen G, et al. Results from a statewide approach to adolescent tobacco use prevention. Prev Med 1992;21:449-72.
10. Lemelin J, Hogg W, Baskerville B. Evidence to action: a tailored multi-faceted approach to changing family physician practice patterns and improving preventive care. CMAJ. In press.
11. Canadian Task Force on the Periodic Health Examination The Canadian guide to clinical preventive health care. Ottawa, Canada: Health Canada; 1994.
12. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons; 1981.
13. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41-42.
14. Donner A, Birkett N, Buck C. Randomization by cluster: sample size requirements and analysis. Am J Epidemiol 1981;114:906-14.
15. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. BMJ 1998;317:1171-72.
16. Bland JM, Kerry SM. Trials randomised in clusters. BMJ 1997;315:600.
17. Koepsell TD, Martin DC, Diehr PH, et al. Data analysis and sample size issues in evaluations of community based health promotion and disease prevention programs: a mixed-model analysis of variance approach. Am J Public Health 1995;85:1378-83.
18. Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994;13:61-78.
19. Campbell M, Grimshaw J, Steen N. Sample size calculations for cluster randomised trials. J Health Serv Res Policy 2000;5:12-16.
METHODS: We performed a cross-sectional study involving data from 46 participating practices with 106 physicians collected using self-administered questionnaires and a chart audit of 100 randomly selected charts per practice. The population was health service organizations (HSOs) located in Southern Ontario. We analyzed performance data for 13 preventive maneuvers determined by chart review and used analysis of variance to determine the intraclass correlation coefficient. An index of “up-to-datedness” was computed for each physician and practice as the number of a recommended preventive measures done divided by the number of eligible patients. An index called “inappropriatness” was computed in the same manner for the not-recommended measures. The intraclass correlation coefficients for the 2 key study outcomes (up-to-datedness and inappropriateness) were also calculated and compared.
RESULTS: The mean up-to-datedness score for the practices was 53.5% (95% confidence interval [CI], 51.0%-56.0%), and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). The intraclass correlation for up-to-datedness was 0.0365 compared with inappropriateness at 0.1790. The intraclass correlation for preventive maneuvers ranged from 0.005 for blood pressure measurement to 0.66 for chest radiographs of smokers, and as a consequence required that the sample size ranged from 20 to 42 physicians per group.
CONCLUSIONS: Randomizing by practice clusters and analyzing at the level of the physician has important implications for sample size requirements. Larger intraclass correlations indicate interdependence among the physicians within a cluster; as a consequence, variability within clusters is reduced and the required sample size increased. The key finding that many potential outcome measures perform differently in terms of the intracluster correlation reinforces the need for researchers to carefully consider the selection of outcome measures and adjust sample sizes accordingly when the unit of analysis and randomization are not the same.
In conducting research with community-based primary care practices it is often not feasible to randomize individual physicians to the treatment conditions. This is due to problems of potential contamination between intervention and control subjects in the same practice setting or because the success of the intervention demands that all physicians in the practice setting adhere to a guideline. As a result, the practice itself is randomized to the conditions.
The randomization of physicians in groups, rather than each individual separately, has important consequences for sample size, interpretation, and analysis.1-3 It is argued that groups of physicians are likely to be heterogeneous,4 giving rise to a component of variation that one must take into account in the analysis and that one can control only by studying many groups of physicians rather than many physicians within each group.4
Randomizing physicians by cluster and then analyzing the data by physician or patient has the potential to introduce possible bias in the results. It has been noted that many studies randomized groups of health professionals (cluster randomization) but analyzed the results by physician, thus resulting in a possible overestimation of the significance of the observed effects (unit of analysis error).5 Divine and colleagues6 observed that 38 out of 54 studies of physicians’ patient care practices had not appropriately accounted for the clustered nature of the study data. Similarly, Simpson and coworkers7 found that only 4 out of 21 primary prevention trials included sample size calculations or discussions of power that allowed for clustering, while 12 out of 21 took clustering into account in the statistical analysis. When the effect size of the intervention is small to moderate, analyzing results by individual without adjusting for the cluster phenomena can lead to false conclusions about the significance of the effectiveness of the intervention. For example, Donner and Klar8 show that for the data of Murray and colleagues9 the P value would be .03 if the effect of clustering were ignored, while it was greater than .1 after adjusting for the effect of clustering.
Using baseline data from a successful randomized controlled trial of primary care practices in Southern Ontario, Canada,10 we will explain the intracluster correlation coefficient (ICC) in determining the required sample size of physicians. The ICC is a measure of variation within and between clusters of physicians. It is a measure of the clustering effect or the lack of independence among the physicians that make up the cluster. The smaller the ICC, the more likely the physicians in the cluster behave independently, and analysis at the level of the physician can proceed without significant adjustment to sample size. The higher the ICC, the more closely the measure quantifies class or group rather than the individual physician, and the effective sample size is decreased to the number of classes rather than the number of individuals. Our objective was to provide information on the cluster effect of measuring the performance of various preventive maneuvers between groups of physicians to enable other researchers in the area of primary care prevention to avoid errors.
Methods
As part of a larger clinical trial to improve preventive practice, we conducted a cross-sectional study to provide a point estimate of preventive performance in capitation primary care practices. We chose the preventive maneuvers from the Canadian Task Force on the Periodic Health Examination.11 According to their classification system, there is randomized clinical trial evidence to support “A” level (highly recommended) maneuvers and cohort and case controlled studies to support “B” (recommended) maneuvers. The task force also reviewed the quality of evidence for maneuvers that should not be done and identified these as “D” level maneuvers. Eight A and B level recommendations and 5 D level recommendations were identified by a panel of practicing family physicians. Selection criteria included the need to represent a broad spectrum of preventive interventions for both men and women patients of all ages, and the need to address diseases that were clinically important. The 8 recommended and 5 inappropriate maneuvers chosen for our study are listed in Table 1.
This study was conducted in 72 community-based health service organizations (HSOs) in Ontario located at 100 different sites primarily in the Toronto, Hamilton, and London areas in the spring of 1997. The Ottawa Civic Hospital research ethics committee approved our study.
Data Collection
Practice and physician characteristics were collected using a self-administered questionnaire to which 96% of 108 participating physicians responded (Table 2 has the questionnaire items). Preventive performance at the physician and overall practice level was determined using a chart audit.
Chart Audit. Patient charts were eligible for inclusion in the medical audit if they were for patients who were aged 17 years or older on the date of last visit and had visited the HSO at least once in the 2 years before the audit. The variables collected from the charts included demographic and patient characteristics as well as indicators of performance of preventive maneuvers.
The chart auditors obtained a list of patients within an HSO practice group of physicians and then randomly selected charts using computer-generated random numbers. The patient list was either constructed by the auditors or by using the medical office record computer system. The list included all rostered and nonrostered patients. Unique chart numbers or numeric identifiers were assigned to each patient. The required number of charts was randomly selected from the sampling frame, the chart was pulled, and eligibility for inclusion was determined. The auditors proceeded to find charts at random from the sampling frame until they obtained 100 eligible charts per practice.
To verify the quality of the data entered from the 100 randomly selected charts and to measure the inter-rater reliability between auditors, 20% of each HSO’s audited charts were independently verified by another auditor. If coding discrepancies were found in more than 5 of 20 charts, the entire 100 charts were audited and verified again.
Data Analysis. Our analysis with SPSS software version 8.0 (SPSS Inc, Chicago, Ill) focused primarily on calculating the extent to which each preventive maneuver was being performed according to the recommendations of the Canadian Task Force on the Periodic Health Exam. An index of “up-to-datedness” was computed for each physician and practice as the number of A and B preventive measures done divided by the number of eligible A and B measures. In addition, an index called “inappropriateness” was computed in the same manner to represent the D measures.
Frequencies and descriptive statistics were generated on all variables, and each variable was checked for data entry errors and inappropriate or illogical responses. Means and standard deviations were computed for continuous variables and frequency distributions were computed for categorical variables, such as sex and age group. In addition, chi-square tests were used to compare the background characteristics of participating and nonparticipating HSO physicians. Ninety-five percent confidence intervals were calculated for the mean preventive indexes. Finally, kwas computed as a measure of reliability between the 2 chart auditors.
The ICC was calculated for sample cluster means12 of up-to-datedness and inappropriateness. The practice characteristic data revealed that the mean cluster size in terms of number of physicians per practice was 2.8 with a variance of 3.6 and a total of 106 physicians across 46 practices. To determine the between-subjects (practices) variance (sb2) and within-subjects (practices) variance (sw 2) for the ICC calculation, the one-way analysis of variance (ANOVA) procedure was run on both measures (up-to-datedness and inappropriateness) as well as each of the preventive maneuvers separately.13 ICC (D) was computed from the F statistics of the one-way ANOVA and the adjusted cluster size as follows: D = F-1/F+n0-1 where n0 is the mean practice size ([2.8] - [practice variance (3.6)/106 physicians]).
Finally, using the formula by Donner and coworkers14 and the ICC, the sample size for comparing 2 independent groups allowing for clustering for both up-to-datedness and inappropriateness was determined. The formula is: n = 2(Z−/2+Z02)2F2[1+(xc-1)D]/*2 where n is the per group sample size; Z−/2= 1.96 and Z02= 0.84 are the standard normal percentiles for the type I and type II error rates at 0.05 and 0.20, respectively; Fis the standard deviation in the outcome variable; * is the expected difference between the two means; xcis the average cluster size; and D is the ICC. The sample size calculations were based on an expected difference of 0.09 between groups (with 80% power and 5% significance) and a standard deviation of 0.10. Table 3 shows the effect on per-group sample size for varying ICC values and average cluster sizes.
Results
A total of 46 HSOs were recruited out of a possible 100 sites, for a response rate of 46% at baseline. The response rate to the physician questionnaire was 98% (106 of 108). Physicians in practices that agreed to participate differed significantly from those who did not. Participating physicians were younger, having graduated in 1977 on average compared with 1971 (t=4.58 [df=191], P<.001) and were more likely to be women, 30.4% compared with 9.9% for nonparticipating physicians (c2=11.09 [df=1, N=193], P=.001). Table 2 provides descriptive information on practice and physician characteristics. Five practices of 46 needed to have the entire 100 charts re-audited. Final concordance between the 2 auditors for each practice verification was 85% (k=.71).
The mean up-to-datedness score for the practices or the mean proportion of A and B maneuvers performed was 53.5% (95% confidence interval [CI], 51.0%-56.0%) and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). In other words, on average, 53.5% of patients eligible for recommended preventive maneuvers received them and 21.5% of eligible patients received inappropriate preventive maneuvers.
Table 1 gives the practice mean square, the error mean square, the ICC, and the required sample size per group for the overall measures of up-to-datedness and inappropriateness as well as for 13 preventive maneuvers individually. For inappropriateness, there was more variability between practices than within practices among physicians, resulting in a larger practice mean square and a significant F statistic (P <.05). For up-to-datedness, the variability within practices among physicians was greater than the variability between practices, although not significantly so. Table 1 shows the intraclass correlation as 0.0365 for up-to-datedness and 0.1790 for inappropriateness. Inappropriateness scores were not normally distributed, and 2 physicians had scores greater than 0.60. However, with these extreme outliers removed, the ICC for inappropriateness remained high at 0.14.
The ICC ranges from 0.005 for blood pressure measurement to 0.66 for chest x-rays of smokers. The variability between and within group clusters is the same for blood pressure measurement. For chest x-rays of smokers the variability between clusters is very significant and within clusters it is small, indicating that some practice clusters perform a larger number of chest x-rays on smokers than other practices. However, the performance of chest x-rays was not normally distributed, with 79% of physicians not performing them and one solo physician with an extreme score of 0.53. With this extreme outlier removed the ICC for chest x-rays was 0.25, with a mean square between practices of 0.0024 and a mean square within practices of 0.0012 (P<.01). Table 1 shows the effect on sample size for analysis at the level of the physician as the ICC varies.
Discussion
Statistical theory points to the consequences of cluster randomization as a reduction in effective sample size. This occurs because the individuals within a cluster cannot be regarded as independent. The precise effect of cluster randomization on sample size requirements depends on both the size of the cluster and the degree of within-cluster dependence as measured by ICC.2 Cluster randomized trials are increasingly being used in health services research particularly for evaluating interventions involving organizational changes when it is not feasible to randomize at the level of the individual. Cluster randomization at the level of the practice minimizes the potential for contamination between treatment and control groups. However, the statistical power of a cluster randomized trial when the unit of randomization is the practice and the unit of analysis is the health professional can be greatly reduced in comparison to an individually randomized trial.15
To preserve power the researcher should, whenever possible, ensure that the unit of randomization and the unit of analysis are the same.16 In this manner standard statistical tests can be used. Often this is not possible given secondary research questions that may be targeted to the health professionals within the practice and not the practice as a whole. If data are analyzed at the level of the individual and not at the level of the cluster (in effect ignoring the clustering effect), then there is a strong possibility that P values will be artificially extreme and confidence intervals will be overly narrow, increasing the chances of spuriously significant findings and misleading conclusions.15 When using the individual physician as the unit of analysis, one must take into account the correlation between responses of individuals within the same cluster. For continuous outcome variables that are normally distributed, a mixed-effects analysis of variance (or covariance) is appropriate, with clusters nested within the comparison groups.17 For dichotomous variables, Donner and Klar suggest that an adjusted chi-square test be used.8 Although we focus on the issue of clustering for study designs using random allocation, the issue of clustering is also apparent in cross-sectional and cohort studies, where the practice-level and/or physician-level factors may have an impact on patient-level data. Researchers need to be aware of the possibility of intracluster correlation and the implications for analysis in these studies as well.18
In the example presented, the ICC for the outcome measure “up-to-datedness” was approximately 0.04 in contrast to the ICC for inappropriateness, which was 0.18. The required sample size per group for the outcome measure “up-to-datedness” would be 21 physicians compared with inappropriateness, where the sample size would be 25 per group. In contrast, if the study dealt with improving smoking cessation counseling or reducing chest x-rays in smokers, the sample size would be 27 or 42 physicians per group. Treating the unit of analysis and the unit of randomization the same would require only 19 physicians per group.
Campbell and colleagues19 looked at a number of primary and secondary care study data sets and found that ICCs for measures in primary care were generally between 0.05 and 0.15. In contrast, in this study the ICCs ranged from 0.005 to 0.66, depending on the measure. The difference in ICC between measures and across studies is interesting, and we can only speculate why some measures show more interdependence. It is possible that inappropriateness taps phenomena such as policies at the practice level which physicians can not easily influence, while up-to-datedness may help explain how physicians even when working in the same practice setting behave independently when it comes to delivering recommended preventive care. It is important to be aware and not to assume that because one measure may show independence that all measures under study show the same independence. For example, blood pressure measurement and urine proteinuria screening are different in terms of ICC. Differences between outcome measures should be taken into account when calculating required sample size and in statistical analysis when the unit of randomization and analysis are not the same.
Limitations
There are 2 limitations with this research. First, analysis of respondents and nonrespondents to the recruitment effort showed that the study participants were more likely to be younger and women. This would imply that our findings may not be generalizable to the HSO population as a whole. Second, the measures of preventive performance were based on a chart audit and as a consequence are susceptible to the potential problems associated with chart documentation. A low level of preventive performance does not necessarily mean that prevention is not being practiced or that it is being performed inconsistently within a group practice. It may indicate that a less sophisticated documentation process is being used.
Conclusion
Physicians clustered together in the same practice do not necessarily perform the delivery of preventive services equally. As demonstrated by the measure “up-to-datedness,” there is relatively little correlation among physicians working together for performance of many preventive maneuvers. For some maneuvers, most notably those that may be automatically performed as part of practice policy, there is modest correlation among physicians who work together. We hope that these findings assist other researchers in their decision making around the need to adjust sample sizes for the effect of clustering.
METHODS: We performed a cross-sectional study involving data from 46 participating practices with 106 physicians collected using self-administered questionnaires and a chart audit of 100 randomly selected charts per practice. The population was health service organizations (HSOs) located in Southern Ontario. We analyzed performance data for 13 preventive maneuvers determined by chart review and used analysis of variance to determine the intraclass correlation coefficient. An index of “up-to-datedness” was computed for each physician and practice as the number of a recommended preventive measures done divided by the number of eligible patients. An index called “inappropriatness” was computed in the same manner for the not-recommended measures. The intraclass correlation coefficients for the 2 key study outcomes (up-to-datedness and inappropriateness) were also calculated and compared.
RESULTS: The mean up-to-datedness score for the practices was 53.5% (95% confidence interval [CI], 51.0%-56.0%), and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). The intraclass correlation for up-to-datedness was 0.0365 compared with inappropriateness at 0.1790. The intraclass correlation for preventive maneuvers ranged from 0.005 for blood pressure measurement to 0.66 for chest radiographs of smokers, and as a consequence required that the sample size ranged from 20 to 42 physicians per group.
CONCLUSIONS: Randomizing by practice clusters and analyzing at the level of the physician has important implications for sample size requirements. Larger intraclass correlations indicate interdependence among the physicians within a cluster; as a consequence, variability within clusters is reduced and the required sample size increased. The key finding that many potential outcome measures perform differently in terms of the intracluster correlation reinforces the need for researchers to carefully consider the selection of outcome measures and adjust sample sizes accordingly when the unit of analysis and randomization are not the same.
In conducting research with community-based primary care practices it is often not feasible to randomize individual physicians to the treatment conditions. This is due to problems of potential contamination between intervention and control subjects in the same practice setting or because the success of the intervention demands that all physicians in the practice setting adhere to a guideline. As a result, the practice itself is randomized to the conditions.
The randomization of physicians in groups, rather than each individual separately, has important consequences for sample size, interpretation, and analysis.1-3 It is argued that groups of physicians are likely to be heterogeneous,4 giving rise to a component of variation that one must take into account in the analysis and that one can control only by studying many groups of physicians rather than many physicians within each group.4
Randomizing physicians by cluster and then analyzing the data by physician or patient has the potential to introduce possible bias in the results. It has been noted that many studies randomized groups of health professionals (cluster randomization) but analyzed the results by physician, thus resulting in a possible overestimation of the significance of the observed effects (unit of analysis error).5 Divine and colleagues6 observed that 38 out of 54 studies of physicians’ patient care practices had not appropriately accounted for the clustered nature of the study data. Similarly, Simpson and coworkers7 found that only 4 out of 21 primary prevention trials included sample size calculations or discussions of power that allowed for clustering, while 12 out of 21 took clustering into account in the statistical analysis. When the effect size of the intervention is small to moderate, analyzing results by individual without adjusting for the cluster phenomena can lead to false conclusions about the significance of the effectiveness of the intervention. For example, Donner and Klar8 show that for the data of Murray and colleagues9 the P value would be .03 if the effect of clustering were ignored, while it was greater than .1 after adjusting for the effect of clustering.
Using baseline data from a successful randomized controlled trial of primary care practices in Southern Ontario, Canada,10 we will explain the intracluster correlation coefficient (ICC) in determining the required sample size of physicians. The ICC is a measure of variation within and between clusters of physicians. It is a measure of the clustering effect or the lack of independence among the physicians that make up the cluster. The smaller the ICC, the more likely the physicians in the cluster behave independently, and analysis at the level of the physician can proceed without significant adjustment to sample size. The higher the ICC, the more closely the measure quantifies class or group rather than the individual physician, and the effective sample size is decreased to the number of classes rather than the number of individuals. Our objective was to provide information on the cluster effect of measuring the performance of various preventive maneuvers between groups of physicians to enable other researchers in the area of primary care prevention to avoid errors.
Methods
As part of a larger clinical trial to improve preventive practice, we conducted a cross-sectional study to provide a point estimate of preventive performance in capitation primary care practices. We chose the preventive maneuvers from the Canadian Task Force on the Periodic Health Examination.11 According to their classification system, there is randomized clinical trial evidence to support “A” level (highly recommended) maneuvers and cohort and case controlled studies to support “B” (recommended) maneuvers. The task force also reviewed the quality of evidence for maneuvers that should not be done and identified these as “D” level maneuvers. Eight A and B level recommendations and 5 D level recommendations were identified by a panel of practicing family physicians. Selection criteria included the need to represent a broad spectrum of preventive interventions for both men and women patients of all ages, and the need to address diseases that were clinically important. The 8 recommended and 5 inappropriate maneuvers chosen for our study are listed in Table 1.
This study was conducted in 72 community-based health service organizations (HSOs) in Ontario located at 100 different sites primarily in the Toronto, Hamilton, and London areas in the spring of 1997. The Ottawa Civic Hospital research ethics committee approved our study.
Data Collection
Practice and physician characteristics were collected using a self-administered questionnaire to which 96% of 108 participating physicians responded (Table 2 has the questionnaire items). Preventive performance at the physician and overall practice level was determined using a chart audit.
Chart Audit. Patient charts were eligible for inclusion in the medical audit if they were for patients who were aged 17 years or older on the date of last visit and had visited the HSO at least once in the 2 years before the audit. The variables collected from the charts included demographic and patient characteristics as well as indicators of performance of preventive maneuvers.
The chart auditors obtained a list of patients within an HSO practice group of physicians and then randomly selected charts using computer-generated random numbers. The patient list was either constructed by the auditors or by using the medical office record computer system. The list included all rostered and nonrostered patients. Unique chart numbers or numeric identifiers were assigned to each patient. The required number of charts was randomly selected from the sampling frame, the chart was pulled, and eligibility for inclusion was determined. The auditors proceeded to find charts at random from the sampling frame until they obtained 100 eligible charts per practice.
To verify the quality of the data entered from the 100 randomly selected charts and to measure the inter-rater reliability between auditors, 20% of each HSO’s audited charts were independently verified by another auditor. If coding discrepancies were found in more than 5 of 20 charts, the entire 100 charts were audited and verified again.
Data Analysis. Our analysis with SPSS software version 8.0 (SPSS Inc, Chicago, Ill) focused primarily on calculating the extent to which each preventive maneuver was being performed according to the recommendations of the Canadian Task Force on the Periodic Health Exam. An index of “up-to-datedness” was computed for each physician and practice as the number of A and B preventive measures done divided by the number of eligible A and B measures. In addition, an index called “inappropriateness” was computed in the same manner to represent the D measures.
Frequencies and descriptive statistics were generated on all variables, and each variable was checked for data entry errors and inappropriate or illogical responses. Means and standard deviations were computed for continuous variables and frequency distributions were computed for categorical variables, such as sex and age group. In addition, chi-square tests were used to compare the background characteristics of participating and nonparticipating HSO physicians. Ninety-five percent confidence intervals were calculated for the mean preventive indexes. Finally, kwas computed as a measure of reliability between the 2 chart auditors.
The ICC was calculated for sample cluster means12 of up-to-datedness and inappropriateness. The practice characteristic data revealed that the mean cluster size in terms of number of physicians per practice was 2.8 with a variance of 3.6 and a total of 106 physicians across 46 practices. To determine the between-subjects (practices) variance (sb2) and within-subjects (practices) variance (sw 2) for the ICC calculation, the one-way analysis of variance (ANOVA) procedure was run on both measures (up-to-datedness and inappropriateness) as well as each of the preventive maneuvers separately.13 ICC (D) was computed from the F statistics of the one-way ANOVA and the adjusted cluster size as follows: D = F-1/F+n0-1 where n0 is the mean practice size ([2.8] - [practice variance (3.6)/106 physicians]).
Finally, using the formula by Donner and coworkers14 and the ICC, the sample size for comparing 2 independent groups allowing for clustering for both up-to-datedness and inappropriateness was determined. The formula is: n = 2(Z−/2+Z02)2F2[1+(xc-1)D]/*2 where n is the per group sample size; Z−/2= 1.96 and Z02= 0.84 are the standard normal percentiles for the type I and type II error rates at 0.05 and 0.20, respectively; Fis the standard deviation in the outcome variable; * is the expected difference between the two means; xcis the average cluster size; and D is the ICC. The sample size calculations were based on an expected difference of 0.09 between groups (with 80% power and 5% significance) and a standard deviation of 0.10. Table 3 shows the effect on per-group sample size for varying ICC values and average cluster sizes.
Results
A total of 46 HSOs were recruited out of a possible 100 sites, for a response rate of 46% at baseline. The response rate to the physician questionnaire was 98% (106 of 108). Physicians in practices that agreed to participate differed significantly from those who did not. Participating physicians were younger, having graduated in 1977 on average compared with 1971 (t=4.58 [df=191], P<.001) and were more likely to be women, 30.4% compared with 9.9% for nonparticipating physicians (c2=11.09 [df=1, N=193], P=.001). Table 2 provides descriptive information on practice and physician characteristics. Five practices of 46 needed to have the entire 100 charts re-audited. Final concordance between the 2 auditors for each practice verification was 85% (k=.71).
The mean up-to-datedness score for the practices or the mean proportion of A and B maneuvers performed was 53.5% (95% confidence interval [CI], 51.0%-56.0%) and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). In other words, on average, 53.5% of patients eligible for recommended preventive maneuvers received them and 21.5% of eligible patients received inappropriate preventive maneuvers.
Table 1 gives the practice mean square, the error mean square, the ICC, and the required sample size per group for the overall measures of up-to-datedness and inappropriateness as well as for 13 preventive maneuvers individually. For inappropriateness, there was more variability between practices than within practices among physicians, resulting in a larger practice mean square and a significant F statistic (P <.05). For up-to-datedness, the variability within practices among physicians was greater than the variability between practices, although not significantly so. Table 1 shows the intraclass correlation as 0.0365 for up-to-datedness and 0.1790 for inappropriateness. Inappropriateness scores were not normally distributed, and 2 physicians had scores greater than 0.60. However, with these extreme outliers removed, the ICC for inappropriateness remained high at 0.14.
The ICC ranges from 0.005 for blood pressure measurement to 0.66 for chest x-rays of smokers. The variability between and within group clusters is the same for blood pressure measurement. For chest x-rays of smokers the variability between clusters is very significant and within clusters it is small, indicating that some practice clusters perform a larger number of chest x-rays on smokers than other practices. However, the performance of chest x-rays was not normally distributed, with 79% of physicians not performing them and one solo physician with an extreme score of 0.53. With this extreme outlier removed the ICC for chest x-rays was 0.25, with a mean square between practices of 0.0024 and a mean square within practices of 0.0012 (P<.01). Table 1 shows the effect on sample size for analysis at the level of the physician as the ICC varies.
Discussion
Statistical theory points to the consequences of cluster randomization as a reduction in effective sample size. This occurs because the individuals within a cluster cannot be regarded as independent. The precise effect of cluster randomization on sample size requirements depends on both the size of the cluster and the degree of within-cluster dependence as measured by ICC.2 Cluster randomized trials are increasingly being used in health services research particularly for evaluating interventions involving organizational changes when it is not feasible to randomize at the level of the individual. Cluster randomization at the level of the practice minimizes the potential for contamination between treatment and control groups. However, the statistical power of a cluster randomized trial when the unit of randomization is the practice and the unit of analysis is the health professional can be greatly reduced in comparison to an individually randomized trial.15
To preserve power the researcher should, whenever possible, ensure that the unit of randomization and the unit of analysis are the same.16 In this manner standard statistical tests can be used. Often this is not possible given secondary research questions that may be targeted to the health professionals within the practice and not the practice as a whole. If data are analyzed at the level of the individual and not at the level of the cluster (in effect ignoring the clustering effect), then there is a strong possibility that P values will be artificially extreme and confidence intervals will be overly narrow, increasing the chances of spuriously significant findings and misleading conclusions.15 When using the individual physician as the unit of analysis, one must take into account the correlation between responses of individuals within the same cluster. For continuous outcome variables that are normally distributed, a mixed-effects analysis of variance (or covariance) is appropriate, with clusters nested within the comparison groups.17 For dichotomous variables, Donner and Klar suggest that an adjusted chi-square test be used.8 Although we focus on the issue of clustering for study designs using random allocation, the issue of clustering is also apparent in cross-sectional and cohort studies, where the practice-level and/or physician-level factors may have an impact on patient-level data. Researchers need to be aware of the possibility of intracluster correlation and the implications for analysis in these studies as well.18
In the example presented, the ICC for the outcome measure “up-to-datedness” was approximately 0.04 in contrast to the ICC for inappropriateness, which was 0.18. The required sample size per group for the outcome measure “up-to-datedness” would be 21 physicians compared with inappropriateness, where the sample size would be 25 per group. In contrast, if the study dealt with improving smoking cessation counseling or reducing chest x-rays in smokers, the sample size would be 27 or 42 physicians per group. Treating the unit of analysis and the unit of randomization the same would require only 19 physicians per group.
Campbell and colleagues19 looked at a number of primary and secondary care study data sets and found that ICCs for measures in primary care were generally between 0.05 and 0.15. In contrast, in this study the ICCs ranged from 0.005 to 0.66, depending on the measure. The difference in ICC between measures and across studies is interesting, and we can only speculate why some measures show more interdependence. It is possible that inappropriateness taps phenomena such as policies at the practice level which physicians can not easily influence, while up-to-datedness may help explain how physicians even when working in the same practice setting behave independently when it comes to delivering recommended preventive care. It is important to be aware and not to assume that because one measure may show independence that all measures under study show the same independence. For example, blood pressure measurement and urine proteinuria screening are different in terms of ICC. Differences between outcome measures should be taken into account when calculating required sample size and in statistical analysis when the unit of randomization and analysis are not the same.
Limitations
There are 2 limitations with this research. First, analysis of respondents and nonrespondents to the recruitment effort showed that the study participants were more likely to be younger and women. This would imply that our findings may not be generalizable to the HSO population as a whole. Second, the measures of preventive performance were based on a chart audit and as a consequence are susceptible to the potential problems associated with chart documentation. A low level of preventive performance does not necessarily mean that prevention is not being practiced or that it is being performed inconsistently within a group practice. It may indicate that a less sophisticated documentation process is being used.
Conclusion
Physicians clustered together in the same practice do not necessarily perform the delivery of preventive services equally. As demonstrated by the measure “up-to-datedness,” there is relatively little correlation among physicians working together for performance of many preventive maneuvers. For some maneuvers, most notably those that may be automatically performed as part of practice policy, there is modest correlation among physicians who work together. We hope that these findings assist other researchers in their decision making around the need to adjust sample sizes for the effect of clustering.
1. Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol 1978;108:100-02.
2. Donner A. An empirical study of cluster randomization. Int J Epidemiol 1982;11:283-86.
3. Kerry SM, Bland JM. The intercluster correlation coefficient in cluster randomisation. BMJ 1998;316:1455.
4. Gail MH, Mark SD, Carrol RJ, Green SB, Pee D. On design considerations and randomization based inference for community intervention trials. Stat Med 1996;15:1069-92.
5. Bero LA, Grilli R, Grimshaw JM, Harvey E, Oxman AD, Thomson MA. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings: the Cochrane Effective Practice and Organization of Care Review Group. BMJ 1998;317:465-86.
6. Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians’ patient care behavior. J Gen Intern Med 1992;7:623-29.
7. Simpson JM, Klar N, Donner A. Accounting for cluster randomization: a review of primary prevention trials, 1990 through 1993. Am J Public Health 1995;85:1378-83.
8. Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is the cluster. Am J Epidemiol 1994;140:279-89.
9. Murray DM, Perry CL, Griffen G, et al. Results from a statewide approach to adolescent tobacco use prevention. Prev Med 1992;21:449-72.
10. Lemelin J, Hogg W, Baskerville B. Evidence to action: a tailored multi-faceted approach to changing family physician practice patterns and improving preventive care. CMAJ. In press.
11. Canadian Task Force on the Periodic Health Examination The Canadian guide to clinical preventive health care. Ottawa, Canada: Health Canada; 1994.
12. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons; 1981.
13. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41-42.
14. Donner A, Birkett N, Buck C. Randomization by cluster: sample size requirements and analysis. Am J Epidemiol 1981;114:906-14.
15. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. BMJ 1998;317:1171-72.
16. Bland JM, Kerry SM. Trials randomised in clusters. BMJ 1997;315:600.
17. Koepsell TD, Martin DC, Diehr PH, et al. Data analysis and sample size issues in evaluations of community based health promotion and disease prevention programs: a mixed-model analysis of variance approach. Am J Public Health 1995;85:1378-83.
18. Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994;13:61-78.
19. Campbell M, Grimshaw J, Steen N. Sample size calculations for cluster randomised trials. J Health Serv Res Policy 2000;5:12-16.
1. Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol 1978;108:100-02.
2. Donner A. An empirical study of cluster randomization. Int J Epidemiol 1982;11:283-86.
3. Kerry SM, Bland JM. The intercluster correlation coefficient in cluster randomisation. BMJ 1998;316:1455.
4. Gail MH, Mark SD, Carrol RJ, Green SB, Pee D. On design considerations and randomization based inference for community intervention trials. Stat Med 1996;15:1069-92.
5. Bero LA, Grilli R, Grimshaw JM, Harvey E, Oxman AD, Thomson MA. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings: the Cochrane Effective Practice and Organization of Care Review Group. BMJ 1998;317:465-86.
6. Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians’ patient care behavior. J Gen Intern Med 1992;7:623-29.
7. Simpson JM, Klar N, Donner A. Accounting for cluster randomization: a review of primary prevention trials, 1990 through 1993. Am J Public Health 1995;85:1378-83.
8. Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is the cluster. Am J Epidemiol 1994;140:279-89.
9. Murray DM, Perry CL, Griffen G, et al. Results from a statewide approach to adolescent tobacco use prevention. Prev Med 1992;21:449-72.
10. Lemelin J, Hogg W, Baskerville B. Evidence to action: a tailored multi-faceted approach to changing family physician practice patterns and improving preventive care. CMAJ. In press.
11. Canadian Task Force on the Periodic Health Examination The Canadian guide to clinical preventive health care. Ottawa, Canada: Health Canada; 1994.
12. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons; 1981.
13. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41-42.
14. Donner A, Birkett N, Buck C. Randomization by cluster: sample size requirements and analysis. Am J Epidemiol 1981;114:906-14.
15. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. BMJ 1998;317:1171-72.
16. Bland JM, Kerry SM. Trials randomised in clusters. BMJ 1997;315:600.
17. Koepsell TD, Martin DC, Diehr PH, et al. Data analysis and sample size issues in evaluations of community based health promotion and disease prevention programs: a mixed-model analysis of variance approach. Am J Public Health 1995;85:1378-83.
18. Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994;13:61-78.
19. Campbell M, Grimshaw J, Steen N. Sample size calculations for cluster randomised trials. J Health Serv Res Policy 2000;5:12-16.