Association of higher costs with symptoms and diagnosis of depression

Article Type
Changed
Display Headline
Association of higher costs with symptoms and diagnosis of depression

 

ABSTRACT

OBJECTIVE: We examined the relationships among depressive symptoms, physician diagnosis of depression, and charges for care.

STUDY DESIGN: We used a prospective observational design.

POPULATION: Five hundred eight new adult patients were randomly assigned to senior residents in family practice and internal medicine.

OUTCOMES MEASURED: Self-reports of health status assessment (Medical Outcomes Study Short Form-36) and depressive symptoms (Beck Depression Inventory) were determined at study entry and at 1-year follow-up. Physician diagnosis of depression was determined by chart audit; charges for care were monitored electronically.

RESULTS: Symptoms of depression and the diagnosis of depression were associated with charges for care. Statistical models were developed to identify predictors for the occurrence and magnitude of medical charges. Neither depressive symptoms nor diagnosis of depression significantly predicted the occurrence of charges in the areas studied, but physician diagnosis of depression predicted the magnitude of primary care and total charges.

CONCLUSIONS: A complex relationship exists among depressive symptoms, the diagnosis of depression, and charges for medical care. Understanding these relationships may help primary care physicians diagnose depression and deliver primary care to depressed patients more effectively while managing health care expenditures.

 

KEY POINTS FOR CLINICIANS

 

  • Diagnosis of depression is associated with higher costs.
  • Failure to diagnose depression may raise laboratory costs.
  • Diagnosis of depression with few symptoms deserves study.

As US medical care has evolved, physicians have been expected to recognize and treat mental health problems in primary care,1 “the hidden mental health network.”2,3 Primary care clinicians are expected to observe signs of possible mental health problems, incorporate those observations into differential diagnoses, and decide which problems to treat or monitor and which to send for consultation or referral.4 These decisions can have important financial and health consequences, especially in dealing with depression.

Depression is common in the community5 and among primary care patients, 6% to 9% of whom report symptoms of major depression.6-8 An additional 10% to 15% of primary care patients show signs of less severe but important depressive problems.8,9 “Subclinical depression” is marked by symptoms that might indicate physical disease, signs of depression, or both; recognition may affect costs of care.10-13

Research has begun to define the impact of depression on processes14 and costs of care.15-17 For example, elderly patients reporting symptoms of depression have more laboratory tests performed at higher cost.15 Primary care patients diagnosed with depression had total yearly health care costs almost double those of patients without depression, with increased costs secondary to higher medical utilization and not mental health specialty treatment.16 There is evidence that depressive symptoms and the diagnosis of depression may predict increases in costs of care.17

Costs of care might be influenced by the model used by primary care physicians to identify depression.18 For example, a biomedical model might use more laboratory testing to reach a diagnosis of depression by exclusion, whereas a psychosocial model would use fewer laboratory tests while the physician pursues psychosocial issues. To identify optimal strategies for practice, it is important to determine how symptoms of depression and physician diagnosis of depression might interrelate and affect medical care costs.

We explored the following hypotheses: (1) that there are significant differences in each type of charge determined by the presence or absence of symptoms and diagnosis of depression; (2) that depressive symptoms and physician diagnosis of depression predict the occurrence of charges for specialty care, emergency services, laboratory services, and hospitalization; and (3) that depressive symptoms and physician diagnosis of depression predict the magnitude of medical charges for primary care, specialty care, emergency services, laboratory services, hospitalization, and total charges.

Methods

Study design

Five hundred eight adult nonpregnant new patients were assigned randomly to primary care providers in either family practice or general internal medicine clinics in a teaching hospital. Children younger than 18 years and pregnant women were excluded because they are not followed in general internal medicine. At enrollment and follow-up, self-reported depression was determined with the abbreviated Beck Depression Inventory (BDI)19 and health status was measured with the Medical Outcomes Studies Short Form-36 (MOS SF-36).20 To avoid altering clinician practice, physicians were not provided with either score. Physicians included 105 senior residents (second and third year) in family practice and general internal medicine.

Measures

Beck Depression Inventory. The BDI is a reliable and valid instrument used to measure depressive symptoms.19,21 The abbreviated version includes 13 items weighted and summed to produce a total score.19 A score between 9 and 15 indicates moderate depression, and a score of at least 16 indicates severe depression. The BDI is used widely for screening and to assess treatment efficacy.22 In this study, a BDI score between 0 and 8 was considered “low” or normal, and a score of at least 9 was considered “high” or indicative of symptoms of depression.

 

 

At study entrance or exit, 130 patients were identified with significant symptoms of depression (BDI > 8) by meeting criteria for moderate or severe depression19 and thus identifying roughly the top quartile of BDI scores among participants. This proportion approximates that of primary care patients estimated to experience significant depression.6,7

Medical Outcomes Studies Short Form-36. Health status was measured with the MOS SF-36,20 a 36-item self-report questionnaire. Reliability has been verified for difficult populations.23 Summary measures can describe a physical component score and a mental component score.24,25 The physical component score was used in this study to measure physical health status.

Medical chart review. Two physicians (K.D.B. and J.A.R.) reviewed the charts to identify notations of depression on problem lists and in visit notes to signify physician diagnosis of depression.

Charges. Charges were used as a proxy for costs. Electronic data for all health system charges were monitored from the initial visit through 1 full year of care. Six categories were monitored: primary care, specialty care, laboratory testing, emergency department, hospitalization, and total charges. Pharmacy charges were excluded because some patients purchased prescriptions outside the hospital system.

Statistical procedures

Mean log values for each area of medical charges were determined and contrasted with the Duncan multiple range test26 to explore the first hypothesis that charges are associated with symptoms and diagnoses of depression. Next, a double hurdle model was used to test the hypotheses that depressive symptoms and physician diagnosis of depression predict the occurrence and magnitude of charges for a variety of services.27,28 In a double hurdle model, the first “hurdle,” or step, involves exploring whether there are variables that can significantly predict the occurrence of an event (such as a medical charge). The second step involves exploring whether there are variables that can predict the magnitude of the event (eg, a medical charge).

Log-transformation of charges was performed to eliminate undue influence from outliers. No logistic regression models were developed for the occurrence of primary care charges or total charges (the first hurdle) because all study patients had charges in both categories. Results are presented by hypothesis.

Results

Seventy-seven of 508 study patients (15.1%) were identified as depressed by their primary care providers in chart notes. BDI scores showed considerable spread (range, 0–31) and were significantly associated with the diagnosis of depression (P < .001). Whereas 140 patients reported BDI scores of at least 9, only 36 of these patients were diagnosed as depressed by their physicians. Similarly, 41 patients were diagnosed as depressed despite reporting low (normal) BDI scores. Patients were assigned to 1 of 4 groups: those diagnosed as depressed and having high (abnormal) BDI scores (n = 36); those diagnosed as depressed despite low BDI scores (n = 41); those not diagnosed as depressed despite high BDI scores (n = 94); and those not diagnosed as depressed and not having high BDI scores (n = 337).

Hypothesis 1: overall impact of symptoms and diagnosis on charges

Groups diagnosed with depression had significantly higher log primary care charges than did those not diagnosed (Table 1). Both groups diagnosed with depression showed the highest primary care and total medical charges. Patients diagnosed with depression and reporting high BDI scores had higher specialty charges than those not depressed. Highest laboratory costs were found for those diagnosed as depressed despite low BDI scores and those with elevated BDI scores who were not diagnosed as depressed. There were no significant differences among groups for log charges for emergency care and hospital charges.

TABLE 1
Log charges of care by diagnosis and symptoms of depression

 

 Diagnosis of depressionNo diagnosis of depression
ChargesBDI ≥ 9BDI < 9BDI ≥ 9BDI < 9
Primary care5.868*6.054*5.4315.347
Specialty care4.266*3.742*3.3322.927
Emergency care1.6812.1721.6041.248
Laboratory tests6.1216.4736.3575.401
Hospital charges2.1743.7421.5481.1893
Total charges7.7047.8787.5086.979
*Log costs were higher for patients with the diagnosis of depression regardless of BDI score than for those with no diagnosis and a BDI below 9.
All charges are logarithmic.
Log costs were higher for patients with the diagnosis of depression and a BDI score below 9 or no diagnosis and a BDI score of at least 9 than for those with no diagnosis and a BDI score below 9.
BDI, Beck Depression Inventory.

Hypotheses 2 and 3: factors predicting occurrence and magnitude of charges

Cost models are presented as regressions in Table 2. The left side of the table presents logistic regressions exploring which variables predict whether or not a patient accrues charges in all areas except primary care and total charges. Because all patients had at least 1 primary care visit charge and, hence, a total charge, it was not possible to develop a model to predict the occurrence of those charges.

 

 

Physical health status (measured by the physical component score of the MOS SF-36) predicted the occurrence of all charges measured with the exception of laboratory tests. Advanced patient age predicted increased likelihood of charges in each area; female sex showed a trend toward predicting occurrence of emergency care charges; and education showed a trend toward predicting occurrence of laboratory charges. BDI scores (measure of symptoms of depression) and physician diagnosis of depression failed to contribute significantly to the prediction of specialty care, emergency care, laboratory testing, or hospital charges. However, there was a trend for depressive symptoms to predict the occurrence of laboratory charges.

The right side of Table 2 presents regression models that predicted the magnitude of the different categories of charges. Physical health status was a significant predictor of the magnitude of all types of charges except emergency care. Patient age contributed to prediction of size of all types of charges except emergency visits and laboratory tests. Female sex was a significant predictor of magnitude of charges in primary care, laboratory tests, and total medical charges. The diagnosis of depression was a significant predictor of magnitude of primary care (P = .0029) and total medical (P = .0158) charges. Neither depressive symptoms nor the diagnosis of depression contributed significantly to the prediction of magnitude of charges for specialty care, emergency care, laboratory testing, or hospital use, although there was a trend for depressive symptoms to predict the magnitude of laboratory costs. Although an interaction term was entered into both kinds of regression equations, there was no evidence of a significant contribution from the interaction of symptoms of depression and diagnosis of depression in any of the predictor models developed.

TABLE 2
Regression analyses predicting charges

 

  OccurrenceMagnitude 
ChargesIndependent variable*BetaPBetaPR 2
Primary carePCS-.0961.0410.40%
Sex-.1271.004
Age (y).1891.0001
Diagnosis.2097.003
Specialty carePCS-.1583.005-.1904.0042.40%
Age (y).2235.0002.1261.07
Emergency carePCS-.2518.00039.75%
Sex-.1344.06
Education-.2827.0068
Age (y)-.1621.04
Laboratory testsPCS-.2689.000119.90%
Sex.1459.0009
Education.0408.09
Age (y)-.0411.0001.1978.0001
BDI score.2487.08.0945.08
Hospital carePCS-.2583.0007-.2554.049.40%
Education-.2632.02
Age (y).0089.0007
Total chargesPCS-.2547.000117.00%
Sex.0846.05
Age (y).2193.0001
Diagnosis.1631.02
*Only variables significantly associated with the occurrence or magnitude of charges for each component are shown.
BDI, Beck Depression Inventory; PCS, physical component score.

Discussion

Medical charges were related to symptoms of depression and physician diagnosis of depression in this study. Although the patient sample was small, it was representative of the primary care population in displaying a wide range of depressive symptoms as measured by the BDI.6,7 In this study, physician diagnosis of depression was related to self-reported depression ratings: those diagnosed as depressed had significantly higher BDI scores than did those not diagnosed as depressed. However, the relationship between self-reported symptoms and diagnosis was not perfect: 72% of patients with high BDI scores were not recognized as depressed, as often occurs in primary care.6,7 In fact, more patients diagnosed with depression had low BDI scores (< 9, n = 41) than high BDI scores (> 8, n = 36). Clearly, other factors enter the process by which primary care physicians reach the diagnosis of depression.

Symptoms of depression and the diagnosis of depression probably influence the process of care in different ways. Differences in process of care likely would be reflected in different relationships to medical charges. Physician diagnosis of depression was associated with higher primary care and total costs and contributed to models predicting magnitude of primary care and total charges. However, neither symptoms of depression nor diagnosis of depression predicted which patients were more likely to incur charges for specialty care, emergency care, laboratory tests, or hospitalization. There was a trend only for the symptoms of depression to predict who would incur laboratory charges. These findings suggest that the relationship between depression and primary care charges and total charges is clear but less apparent when looking at less frequently occurring charges.

Other demographic factors showed fairly robust associations with the occurrence of charges. Patient age predicted who would get specialty care, emergency care, laboratory costs, and hospitalization, and there was a trend for female sex to predict occurrence of emergency department charges. Health status proved to be a significant predictor of the magnitude of all charges except those for emergency care. These powerful influences must be considered to accurately assess the impact of depression on charges.

Age also predicted the total amount of charges for primary and all medical care for the year and showed a trend toward prediction of magnitude of specialty charges. Female sex was a significant predictor of magnitude of primary care charges, laboratory charges, and total charges, and less education was a significant predictor of magnitude of emergency department and hospital charges. Some of these demographic predictors are readily explained. For example, as patients age, the number and costs of medical problems often increase. More education may enhance socioeconomic status and self-care, each of which may buffer against the need for emergency care and hospitalization. The reasons that charges are often higher for women are probably more complex. Higher utilization of primary and specialty care for women was associated with lower self-report-ed health status, less education, and lower socioeconomic status in our previous study.29

 

 

These results also suggest that physician diagnosis of depression in the absence of elevated BDI scores may flag a different kind of patient presentation. Diagnosis of depression without elevated BDI scores could result from effective treatment controlling the symptoms of previously diagnosed depression, but this does not adequately explain the occurrence. Perhaps other aspects of physician–patient interaction trigger a depression diagnosis without symptoms. This group ranked highest for log-transformed charges for 5 of the 6 areas explored: only for specialty care did those with high BDI scores and diagnosis of depression rank higher in total cost. This strong association with charges implies that these patients represent diagnostic dilemmas, thereby generating more primary care visits and laboratory tests. They may be diagnosed as depressed despite their low BDI scores simply because no organic explanation can be readily identified.

BDI scores showed a trend toward predicting higher laboratory charges in our models. This finding supports the importance of depressive symptoms in influencing the process of primary care, especially laboratory testing.15,30 Perhaps the diagnosis of depression actually slowed the ordering of laboratory tests.18 Because our data did not allow a separation of charges for laboratory tests before and after the diagnosis of depression, we did not test this possibility.

The size of this sample (N = 508) and the length of time patients were followed (1 year) might not have provided adequate power to fully test the contributions of symptoms and diagnosis of depression to the 6 sets of charges. This was likely true for hospitalization charges because hospitalization was an infrequent event in this study. Previous, larger studies found indications of increased hospitalization charges for those diagnosed as depressed17 and those with symptoms of depression.15,30 Alternatively, the recent emphasis on decreasing hospitalizations to reduce medical costs may mean that hospitalization for depressive symptoms rather than for physical illness is less likely to occur.31 In addition, these observations were made by resident physicians and not by community clinicians. It is not clear whether these results would generalize to another setting, although they are consistent with community observations in previous research.

These data do suggest an intriguing interplay of the impact of physician diagnosis of depression and presence of symptoms of depression in a number of indicators of charges and utilization in primary care. Even though each element was associated with increased utilization and charges, their differential impact is unclear. Both may prove important for efforts to enhance recognition of depression; recognition of a mental health problem appeared to shift the process of care in this and previous studies.14,32 To date, there are no data indicating that the diagnosis of depression reduces utilization or costs of primary care delivery. What is known is that physicians working in primary care are more apt to accurately diagnose those with more severe symptoms of depression than those with more transient or less severe symptoms.16,33 Although introducing a screening device such as the BDI or the PRIME-MD9 likely would increase the number of patients diagnosed with depression, it is unclear what impact that would have on the process, costs, and outcomes of care. Simpler interventions such as training in communication skills such as empathy34 might provide the primary care physician with all the tools needed for identification of emotional distress and mental health problems14,30 and appropriate treatment or referral.

References

 

1. deGruy F. Mental health care in the primary care setting. In: Donaldson MS, Yordy KD, Lohr KN, Vanselow NA, eds. Primary Care: America’s Health in the New Era. Washington, DC: National Academy Press; 1996;285-311.

2. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.

3. Schurman RA, Kramer PD, Mitchell JB. The hidden mental health network. Treatment of mental illness by nonpsychiatrist physicians. Arch Gen Psychiatry 1985;42:89-94.

4. Nutting PA, Franks P, Clancy CM. Referral and consultation in primary care: do we understand what we’re doing? [editorial; comment]. J Fam Pract 1992;35:21-3.

5. Laepine JP, Gastpar M, Mendlewicz J, Tylee A. Depression in the community: the first pan-European study DEPRES (Depression Research in European Society). Int Clin Psychopharmacol 1997;12:19-29.

6. Panel DG. Clinical Practice Guidelines. Vol I. Washington, DC: Agency for Health Care Policy and Research; 1993.

7. Panel DG. Clinical Practice Guidelines. Vol II. Washington, DC: Agency for Health Care Policy and Research; 1993.

8. Katon W. The epidemiology of depression in medical care. Int J Psychiatry Med 1987;17:93-112.

9. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study [see comments]. JAMA 1994;272:1749-56.

10. Greenberg PE, Stiglin LE, Finkelstein SN, Berndt ER. The economic burden of depression in 1990 [see comments]. J Clin Psychiatry 1993;54:405-18.

11. Kirmayer LJ, Robbins JM. Three forms of somatization in primary care: prevalence, co-occurrence, and sociodemographic characteristics. J Nerv Ment Dis 1991;179:647-55.

12. Kirmayer LJ, Robbins JM, Dworkind M, Yaffe MJ. Somatization and the recognition of depression and anxiety in primary care. Am J Psychiatry 1993;150:734-41.

13. Kirmayer LJ, Robbins JM. Patients who somatize in primary care: a longitudinal study of cognitive and social characteristics. Psychol Med 1996;26:937-51.

14. Callahan EJ, Bertakis KD, Azari R, et al. The influence of depression on physician-patient interaction in primary care. Fam Med 1996;28:346-51.

15. Callahan CM, Kesterson JG, Tierney WM. Association of symptoms of depression with diagnostic test charges among older adults. Ann Intern Med 1997;126:426-32.

16. Simon GE, VonKorff M, Barlow W. Health care costs of primary care patients with recognized depression. Arch Gen Psychiatry 1995;52:850-6.

17. Simon G, Ormel J, VonKorff M, Barlow W. Health care costs associated with depressive and anxiety disorders in primary care. Am J Psychiatry 1995;152:352-7.

18. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.

19. Beck AT, Beck RW. Screening depressed patients in family practice. A rapid technique. Postgrad Med 1972;52:81-5.

20. Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473-83.

21. Beck AT, Ward CH, Mendelson M. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561-71.

22. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression Inventory: twenty-five years of evaluation. Clin Psychol Rev 1988;8:77-100.

23. Stewart AL, Hays RD, Ware JE, Jr. The MOS short-form general health survey. Reliability and validity in a patient population. Med Care 1988;26:724-35.

24. McHorney CA, Ware JE, Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247-63.

25. Ware JE, Kosinski M, Keller SD. SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston: Nimrod Press; 1994.

26. Harter HL. Critical values for Duncan’s new multiple range test. Biometrics 1960;16:671-85.

27. Duan N. Smearing estimates: a nonparametric retransformation method. J Am Stat Assoc 1983;78:605-10.

28. Duan N, Manning WG, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Business Econ Stat 1983;1:115-26.

29. Bertakis KD, Azari R, Helms LJ, Callahan EJ, Robbins JA. Gender differences in the utilization of health care services. J Fam Pract 2000;49:147-52.

30. Unutzer J, Patrick DL, Simon G, et al. Depressive symptoms and the cost of health services in HMO patients aged 65 years and older. A 4-year prospective study. JAMA 1997;277:1618-23.

31. Leslie DL, Rosenheck R. Shifting to outpatient care? Mental health care use and cost under private insurance. Am J Psychiatry 1999;156:1250-7.

32. Callahan EJ, Jaéen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physi-cian-patient encounter in family practice [see comments]. J Fam Pract 1998;46:410-8.

33. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care physicians reconsidered [see comments]. Gen Hosp Psychiatry 1995;17:3-12.

34. Suchman AL, Markakis K, Beckman HB, Frankel R. A model of empathic communication in the medical interview [see comments]. JAMA 1997;277:678-82.

Article PDF
Author and Disclosure Information

 

EDWARD J. CALLAHAN, PHD
KLEA D. BERTAKIS, MD, MPH
RAHMAN AZARI, PHD
JOHN A. ROBBINS, MD
JAY L. HELMS, PHD
PAUL J. LEIGH, PHD
Davis, California
From the Center for Health Services Research in Primary Care (E.J.C., K.D.B., R.A., J.A.R., L.J.H, J.P.L.), and the Departments of Family and Community Medicine (E.J.C., K.D.B.), Statistics (R.A.), Internal Medicine (J.A.R.), Economics (L.J.H.), and Epidemiology and Preventive Medicine (J.P.L.), University of California, Davis, CA. This study was supported by grants R03-HS080291 and R18-06167 from the Agency for Health Care Policy and Research (now known as the Agency for Healthcare Research and Quality). The authors report no competing interests. Address reprint requests to: Edward J. Callahan, PhD, Department of Family and Community Medicine, University of California at Davis, 4860 Y Street, Sacramento, CA 95817. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(06)
Publications
Topics
Page Number
540-544
Legacy Keywords
,Depressionfees and charges and utilizationprimary health care. (J Fam Pract 2002; 51:540–544)
Sections
Author and Disclosure Information

 

EDWARD J. CALLAHAN, PHD
KLEA D. BERTAKIS, MD, MPH
RAHMAN AZARI, PHD
JOHN A. ROBBINS, MD
JAY L. HELMS, PHD
PAUL J. LEIGH, PHD
Davis, California
From the Center for Health Services Research in Primary Care (E.J.C., K.D.B., R.A., J.A.R., L.J.H, J.P.L.), and the Departments of Family and Community Medicine (E.J.C., K.D.B.), Statistics (R.A.), Internal Medicine (J.A.R.), Economics (L.J.H.), and Epidemiology and Preventive Medicine (J.P.L.), University of California, Davis, CA. This study was supported by grants R03-HS080291 and R18-06167 from the Agency for Health Care Policy and Research (now known as the Agency for Healthcare Research and Quality). The authors report no competing interests. Address reprint requests to: Edward J. Callahan, PhD, Department of Family and Community Medicine, University of California at Davis, 4860 Y Street, Sacramento, CA 95817. E-mail: [email protected].

Author and Disclosure Information

 

EDWARD J. CALLAHAN, PHD
KLEA D. BERTAKIS, MD, MPH
RAHMAN AZARI, PHD
JOHN A. ROBBINS, MD
JAY L. HELMS, PHD
PAUL J. LEIGH, PHD
Davis, California
From the Center for Health Services Research in Primary Care (E.J.C., K.D.B., R.A., J.A.R., L.J.H, J.P.L.), and the Departments of Family and Community Medicine (E.J.C., K.D.B.), Statistics (R.A.), Internal Medicine (J.A.R.), Economics (L.J.H.), and Epidemiology and Preventive Medicine (J.P.L.), University of California, Davis, CA. This study was supported by grants R03-HS080291 and R18-06167 from the Agency for Health Care Policy and Research (now known as the Agency for Healthcare Research and Quality). The authors report no competing interests. Address reprint requests to: Edward J. Callahan, PhD, Department of Family and Community Medicine, University of California at Davis, 4860 Y Street, Sacramento, CA 95817. E-mail: [email protected].

Article PDF
Article PDF

 

ABSTRACT

OBJECTIVE: We examined the relationships among depressive symptoms, physician diagnosis of depression, and charges for care.

STUDY DESIGN: We used a prospective observational design.

POPULATION: Five hundred eight new adult patients were randomly assigned to senior residents in family practice and internal medicine.

OUTCOMES MEASURED: Self-reports of health status assessment (Medical Outcomes Study Short Form-36) and depressive symptoms (Beck Depression Inventory) were determined at study entry and at 1-year follow-up. Physician diagnosis of depression was determined by chart audit; charges for care were monitored electronically.

RESULTS: Symptoms of depression and the diagnosis of depression were associated with charges for care. Statistical models were developed to identify predictors for the occurrence and magnitude of medical charges. Neither depressive symptoms nor diagnosis of depression significantly predicted the occurrence of charges in the areas studied, but physician diagnosis of depression predicted the magnitude of primary care and total charges.

CONCLUSIONS: A complex relationship exists among depressive symptoms, the diagnosis of depression, and charges for medical care. Understanding these relationships may help primary care physicians diagnose depression and deliver primary care to depressed patients more effectively while managing health care expenditures.

 

KEY POINTS FOR CLINICIANS

 

  • Diagnosis of depression is associated with higher costs.
  • Failure to diagnose depression may raise laboratory costs.
  • Diagnosis of depression with few symptoms deserves study.

As US medical care has evolved, physicians have been expected to recognize and treat mental health problems in primary care,1 “the hidden mental health network.”2,3 Primary care clinicians are expected to observe signs of possible mental health problems, incorporate those observations into differential diagnoses, and decide which problems to treat or monitor and which to send for consultation or referral.4 These decisions can have important financial and health consequences, especially in dealing with depression.

Depression is common in the community5 and among primary care patients, 6% to 9% of whom report symptoms of major depression.6-8 An additional 10% to 15% of primary care patients show signs of less severe but important depressive problems.8,9 “Subclinical depression” is marked by symptoms that might indicate physical disease, signs of depression, or both; recognition may affect costs of care.10-13

Research has begun to define the impact of depression on processes14 and costs of care.15-17 For example, elderly patients reporting symptoms of depression have more laboratory tests performed at higher cost.15 Primary care patients diagnosed with depression had total yearly health care costs almost double those of patients without depression, with increased costs secondary to higher medical utilization and not mental health specialty treatment.16 There is evidence that depressive symptoms and the diagnosis of depression may predict increases in costs of care.17

Costs of care might be influenced by the model used by primary care physicians to identify depression.18 For example, a biomedical model might use more laboratory testing to reach a diagnosis of depression by exclusion, whereas a psychosocial model would use fewer laboratory tests while the physician pursues psychosocial issues. To identify optimal strategies for practice, it is important to determine how symptoms of depression and physician diagnosis of depression might interrelate and affect medical care costs.

We explored the following hypotheses: (1) that there are significant differences in each type of charge determined by the presence or absence of symptoms and diagnosis of depression; (2) that depressive symptoms and physician diagnosis of depression predict the occurrence of charges for specialty care, emergency services, laboratory services, and hospitalization; and (3) that depressive symptoms and physician diagnosis of depression predict the magnitude of medical charges for primary care, specialty care, emergency services, laboratory services, hospitalization, and total charges.

Methods

Study design

Five hundred eight adult nonpregnant new patients were assigned randomly to primary care providers in either family practice or general internal medicine clinics in a teaching hospital. Children younger than 18 years and pregnant women were excluded because they are not followed in general internal medicine. At enrollment and follow-up, self-reported depression was determined with the abbreviated Beck Depression Inventory (BDI)19 and health status was measured with the Medical Outcomes Studies Short Form-36 (MOS SF-36).20 To avoid altering clinician practice, physicians were not provided with either score. Physicians included 105 senior residents (second and third year) in family practice and general internal medicine.

Measures

Beck Depression Inventory. The BDI is a reliable and valid instrument used to measure depressive symptoms.19,21 The abbreviated version includes 13 items weighted and summed to produce a total score.19 A score between 9 and 15 indicates moderate depression, and a score of at least 16 indicates severe depression. The BDI is used widely for screening and to assess treatment efficacy.22 In this study, a BDI score between 0 and 8 was considered “low” or normal, and a score of at least 9 was considered “high” or indicative of symptoms of depression.

 

 

At study entrance or exit, 130 patients were identified with significant symptoms of depression (BDI > 8) by meeting criteria for moderate or severe depression19 and thus identifying roughly the top quartile of BDI scores among participants. This proportion approximates that of primary care patients estimated to experience significant depression.6,7

Medical Outcomes Studies Short Form-36. Health status was measured with the MOS SF-36,20 a 36-item self-report questionnaire. Reliability has been verified for difficult populations.23 Summary measures can describe a physical component score and a mental component score.24,25 The physical component score was used in this study to measure physical health status.

Medical chart review. Two physicians (K.D.B. and J.A.R.) reviewed the charts to identify notations of depression on problem lists and in visit notes to signify physician diagnosis of depression.

Charges. Charges were used as a proxy for costs. Electronic data for all health system charges were monitored from the initial visit through 1 full year of care. Six categories were monitored: primary care, specialty care, laboratory testing, emergency department, hospitalization, and total charges. Pharmacy charges were excluded because some patients purchased prescriptions outside the hospital system.

Statistical procedures

Mean log values for each area of medical charges were determined and contrasted with the Duncan multiple range test26 to explore the first hypothesis that charges are associated with symptoms and diagnoses of depression. Next, a double hurdle model was used to test the hypotheses that depressive symptoms and physician diagnosis of depression predict the occurrence and magnitude of charges for a variety of services.27,28 In a double hurdle model, the first “hurdle,” or step, involves exploring whether there are variables that can significantly predict the occurrence of an event (such as a medical charge). The second step involves exploring whether there are variables that can predict the magnitude of the event (eg, a medical charge).

Log-transformation of charges was performed to eliminate undue influence from outliers. No logistic regression models were developed for the occurrence of primary care charges or total charges (the first hurdle) because all study patients had charges in both categories. Results are presented by hypothesis.

Results

Seventy-seven of 508 study patients (15.1%) were identified as depressed by their primary care providers in chart notes. BDI scores showed considerable spread (range, 0–31) and were significantly associated with the diagnosis of depression (P < .001). Whereas 140 patients reported BDI scores of at least 9, only 36 of these patients were diagnosed as depressed by their physicians. Similarly, 41 patients were diagnosed as depressed despite reporting low (normal) BDI scores. Patients were assigned to 1 of 4 groups: those diagnosed as depressed and having high (abnormal) BDI scores (n = 36); those diagnosed as depressed despite low BDI scores (n = 41); those not diagnosed as depressed despite high BDI scores (n = 94); and those not diagnosed as depressed and not having high BDI scores (n = 337).

Hypothesis 1: overall impact of symptoms and diagnosis on charges

Groups diagnosed with depression had significantly higher log primary care charges than did those not diagnosed (Table 1). Both groups diagnosed with depression showed the highest primary care and total medical charges. Patients diagnosed with depression and reporting high BDI scores had higher specialty charges than those not depressed. Highest laboratory costs were found for those diagnosed as depressed despite low BDI scores and those with elevated BDI scores who were not diagnosed as depressed. There were no significant differences among groups for log charges for emergency care and hospital charges.

TABLE 1
Log charges of care by diagnosis and symptoms of depression

 

 Diagnosis of depressionNo diagnosis of depression
ChargesBDI ≥ 9BDI < 9BDI ≥ 9BDI < 9
Primary care5.868*6.054*5.4315.347
Specialty care4.266*3.742*3.3322.927
Emergency care1.6812.1721.6041.248
Laboratory tests6.1216.4736.3575.401
Hospital charges2.1743.7421.5481.1893
Total charges7.7047.8787.5086.979
*Log costs were higher for patients with the diagnosis of depression regardless of BDI score than for those with no diagnosis and a BDI below 9.
All charges are logarithmic.
Log costs were higher for patients with the diagnosis of depression and a BDI score below 9 or no diagnosis and a BDI score of at least 9 than for those with no diagnosis and a BDI score below 9.
BDI, Beck Depression Inventory.

Hypotheses 2 and 3: factors predicting occurrence and magnitude of charges

Cost models are presented as regressions in Table 2. The left side of the table presents logistic regressions exploring which variables predict whether or not a patient accrues charges in all areas except primary care and total charges. Because all patients had at least 1 primary care visit charge and, hence, a total charge, it was not possible to develop a model to predict the occurrence of those charges.

 

 

Physical health status (measured by the physical component score of the MOS SF-36) predicted the occurrence of all charges measured with the exception of laboratory tests. Advanced patient age predicted increased likelihood of charges in each area; female sex showed a trend toward predicting occurrence of emergency care charges; and education showed a trend toward predicting occurrence of laboratory charges. BDI scores (measure of symptoms of depression) and physician diagnosis of depression failed to contribute significantly to the prediction of specialty care, emergency care, laboratory testing, or hospital charges. However, there was a trend for depressive symptoms to predict the occurrence of laboratory charges.

The right side of Table 2 presents regression models that predicted the magnitude of the different categories of charges. Physical health status was a significant predictor of the magnitude of all types of charges except emergency care. Patient age contributed to prediction of size of all types of charges except emergency visits and laboratory tests. Female sex was a significant predictor of magnitude of charges in primary care, laboratory tests, and total medical charges. The diagnosis of depression was a significant predictor of magnitude of primary care (P = .0029) and total medical (P = .0158) charges. Neither depressive symptoms nor the diagnosis of depression contributed significantly to the prediction of magnitude of charges for specialty care, emergency care, laboratory testing, or hospital use, although there was a trend for depressive symptoms to predict the magnitude of laboratory costs. Although an interaction term was entered into both kinds of regression equations, there was no evidence of a significant contribution from the interaction of symptoms of depression and diagnosis of depression in any of the predictor models developed.

TABLE 2
Regression analyses predicting charges

 

  OccurrenceMagnitude 
ChargesIndependent variable*BetaPBetaPR 2
Primary carePCS-.0961.0410.40%
Sex-.1271.004
Age (y).1891.0001
Diagnosis.2097.003
Specialty carePCS-.1583.005-.1904.0042.40%
Age (y).2235.0002.1261.07
Emergency carePCS-.2518.00039.75%
Sex-.1344.06
Education-.2827.0068
Age (y)-.1621.04
Laboratory testsPCS-.2689.000119.90%
Sex.1459.0009
Education.0408.09
Age (y)-.0411.0001.1978.0001
BDI score.2487.08.0945.08
Hospital carePCS-.2583.0007-.2554.049.40%
Education-.2632.02
Age (y).0089.0007
Total chargesPCS-.2547.000117.00%
Sex.0846.05
Age (y).2193.0001
Diagnosis.1631.02
*Only variables significantly associated with the occurrence or magnitude of charges for each component are shown.
BDI, Beck Depression Inventory; PCS, physical component score.

Discussion

Medical charges were related to symptoms of depression and physician diagnosis of depression in this study. Although the patient sample was small, it was representative of the primary care population in displaying a wide range of depressive symptoms as measured by the BDI.6,7 In this study, physician diagnosis of depression was related to self-reported depression ratings: those diagnosed as depressed had significantly higher BDI scores than did those not diagnosed as depressed. However, the relationship between self-reported symptoms and diagnosis was not perfect: 72% of patients with high BDI scores were not recognized as depressed, as often occurs in primary care.6,7 In fact, more patients diagnosed with depression had low BDI scores (< 9, n = 41) than high BDI scores (> 8, n = 36). Clearly, other factors enter the process by which primary care physicians reach the diagnosis of depression.

Symptoms of depression and the diagnosis of depression probably influence the process of care in different ways. Differences in process of care likely would be reflected in different relationships to medical charges. Physician diagnosis of depression was associated with higher primary care and total costs and contributed to models predicting magnitude of primary care and total charges. However, neither symptoms of depression nor diagnosis of depression predicted which patients were more likely to incur charges for specialty care, emergency care, laboratory tests, or hospitalization. There was a trend only for the symptoms of depression to predict who would incur laboratory charges. These findings suggest that the relationship between depression and primary care charges and total charges is clear but less apparent when looking at less frequently occurring charges.

Other demographic factors showed fairly robust associations with the occurrence of charges. Patient age predicted who would get specialty care, emergency care, laboratory costs, and hospitalization, and there was a trend for female sex to predict occurrence of emergency department charges. Health status proved to be a significant predictor of the magnitude of all charges except those for emergency care. These powerful influences must be considered to accurately assess the impact of depression on charges.

Age also predicted the total amount of charges for primary and all medical care for the year and showed a trend toward prediction of magnitude of specialty charges. Female sex was a significant predictor of magnitude of primary care charges, laboratory charges, and total charges, and less education was a significant predictor of magnitude of emergency department and hospital charges. Some of these demographic predictors are readily explained. For example, as patients age, the number and costs of medical problems often increase. More education may enhance socioeconomic status and self-care, each of which may buffer against the need for emergency care and hospitalization. The reasons that charges are often higher for women are probably more complex. Higher utilization of primary and specialty care for women was associated with lower self-report-ed health status, less education, and lower socioeconomic status in our previous study.29

 

 

These results also suggest that physician diagnosis of depression in the absence of elevated BDI scores may flag a different kind of patient presentation. Diagnosis of depression without elevated BDI scores could result from effective treatment controlling the symptoms of previously diagnosed depression, but this does not adequately explain the occurrence. Perhaps other aspects of physician–patient interaction trigger a depression diagnosis without symptoms. This group ranked highest for log-transformed charges for 5 of the 6 areas explored: only for specialty care did those with high BDI scores and diagnosis of depression rank higher in total cost. This strong association with charges implies that these patients represent diagnostic dilemmas, thereby generating more primary care visits and laboratory tests. They may be diagnosed as depressed despite their low BDI scores simply because no organic explanation can be readily identified.

BDI scores showed a trend toward predicting higher laboratory charges in our models. This finding supports the importance of depressive symptoms in influencing the process of primary care, especially laboratory testing.15,30 Perhaps the diagnosis of depression actually slowed the ordering of laboratory tests.18 Because our data did not allow a separation of charges for laboratory tests before and after the diagnosis of depression, we did not test this possibility.

The size of this sample (N = 508) and the length of time patients were followed (1 year) might not have provided adequate power to fully test the contributions of symptoms and diagnosis of depression to the 6 sets of charges. This was likely true for hospitalization charges because hospitalization was an infrequent event in this study. Previous, larger studies found indications of increased hospitalization charges for those diagnosed as depressed17 and those with symptoms of depression.15,30 Alternatively, the recent emphasis on decreasing hospitalizations to reduce medical costs may mean that hospitalization for depressive symptoms rather than for physical illness is less likely to occur.31 In addition, these observations were made by resident physicians and not by community clinicians. It is not clear whether these results would generalize to another setting, although they are consistent with community observations in previous research.

These data do suggest an intriguing interplay of the impact of physician diagnosis of depression and presence of symptoms of depression in a number of indicators of charges and utilization in primary care. Even though each element was associated with increased utilization and charges, their differential impact is unclear. Both may prove important for efforts to enhance recognition of depression; recognition of a mental health problem appeared to shift the process of care in this and previous studies.14,32 To date, there are no data indicating that the diagnosis of depression reduces utilization or costs of primary care delivery. What is known is that physicians working in primary care are more apt to accurately diagnose those with more severe symptoms of depression than those with more transient or less severe symptoms.16,33 Although introducing a screening device such as the BDI or the PRIME-MD9 likely would increase the number of patients diagnosed with depression, it is unclear what impact that would have on the process, costs, and outcomes of care. Simpler interventions such as training in communication skills such as empathy34 might provide the primary care physician with all the tools needed for identification of emotional distress and mental health problems14,30 and appropriate treatment or referral.

 

ABSTRACT

OBJECTIVE: We examined the relationships among depressive symptoms, physician diagnosis of depression, and charges for care.

STUDY DESIGN: We used a prospective observational design.

POPULATION: Five hundred eight new adult patients were randomly assigned to senior residents in family practice and internal medicine.

OUTCOMES MEASURED: Self-reports of health status assessment (Medical Outcomes Study Short Form-36) and depressive symptoms (Beck Depression Inventory) were determined at study entry and at 1-year follow-up. Physician diagnosis of depression was determined by chart audit; charges for care were monitored electronically.

RESULTS: Symptoms of depression and the diagnosis of depression were associated with charges for care. Statistical models were developed to identify predictors for the occurrence and magnitude of medical charges. Neither depressive symptoms nor diagnosis of depression significantly predicted the occurrence of charges in the areas studied, but physician diagnosis of depression predicted the magnitude of primary care and total charges.

CONCLUSIONS: A complex relationship exists among depressive symptoms, the diagnosis of depression, and charges for medical care. Understanding these relationships may help primary care physicians diagnose depression and deliver primary care to depressed patients more effectively while managing health care expenditures.

 

KEY POINTS FOR CLINICIANS

 

  • Diagnosis of depression is associated with higher costs.
  • Failure to diagnose depression may raise laboratory costs.
  • Diagnosis of depression with few symptoms deserves study.

As US medical care has evolved, physicians have been expected to recognize and treat mental health problems in primary care,1 “the hidden mental health network.”2,3 Primary care clinicians are expected to observe signs of possible mental health problems, incorporate those observations into differential diagnoses, and decide which problems to treat or monitor and which to send for consultation or referral.4 These decisions can have important financial and health consequences, especially in dealing with depression.

Depression is common in the community5 and among primary care patients, 6% to 9% of whom report symptoms of major depression.6-8 An additional 10% to 15% of primary care patients show signs of less severe but important depressive problems.8,9 “Subclinical depression” is marked by symptoms that might indicate physical disease, signs of depression, or both; recognition may affect costs of care.10-13

Research has begun to define the impact of depression on processes14 and costs of care.15-17 For example, elderly patients reporting symptoms of depression have more laboratory tests performed at higher cost.15 Primary care patients diagnosed with depression had total yearly health care costs almost double those of patients without depression, with increased costs secondary to higher medical utilization and not mental health specialty treatment.16 There is evidence that depressive symptoms and the diagnosis of depression may predict increases in costs of care.17

Costs of care might be influenced by the model used by primary care physicians to identify depression.18 For example, a biomedical model might use more laboratory testing to reach a diagnosis of depression by exclusion, whereas a psychosocial model would use fewer laboratory tests while the physician pursues psychosocial issues. To identify optimal strategies for practice, it is important to determine how symptoms of depression and physician diagnosis of depression might interrelate and affect medical care costs.

We explored the following hypotheses: (1) that there are significant differences in each type of charge determined by the presence or absence of symptoms and diagnosis of depression; (2) that depressive symptoms and physician diagnosis of depression predict the occurrence of charges for specialty care, emergency services, laboratory services, and hospitalization; and (3) that depressive symptoms and physician diagnosis of depression predict the magnitude of medical charges for primary care, specialty care, emergency services, laboratory services, hospitalization, and total charges.

Methods

Study design

Five hundred eight adult nonpregnant new patients were assigned randomly to primary care providers in either family practice or general internal medicine clinics in a teaching hospital. Children younger than 18 years and pregnant women were excluded because they are not followed in general internal medicine. At enrollment and follow-up, self-reported depression was determined with the abbreviated Beck Depression Inventory (BDI)19 and health status was measured with the Medical Outcomes Studies Short Form-36 (MOS SF-36).20 To avoid altering clinician practice, physicians were not provided with either score. Physicians included 105 senior residents (second and third year) in family practice and general internal medicine.

Measures

Beck Depression Inventory. The BDI is a reliable and valid instrument used to measure depressive symptoms.19,21 The abbreviated version includes 13 items weighted and summed to produce a total score.19 A score between 9 and 15 indicates moderate depression, and a score of at least 16 indicates severe depression. The BDI is used widely for screening and to assess treatment efficacy.22 In this study, a BDI score between 0 and 8 was considered “low” or normal, and a score of at least 9 was considered “high” or indicative of symptoms of depression.

 

 

At study entrance or exit, 130 patients were identified with significant symptoms of depression (BDI > 8) by meeting criteria for moderate or severe depression19 and thus identifying roughly the top quartile of BDI scores among participants. This proportion approximates that of primary care patients estimated to experience significant depression.6,7

Medical Outcomes Studies Short Form-36. Health status was measured with the MOS SF-36,20 a 36-item self-report questionnaire. Reliability has been verified for difficult populations.23 Summary measures can describe a physical component score and a mental component score.24,25 The physical component score was used in this study to measure physical health status.

Medical chart review. Two physicians (K.D.B. and J.A.R.) reviewed the charts to identify notations of depression on problem lists and in visit notes to signify physician diagnosis of depression.

Charges. Charges were used as a proxy for costs. Electronic data for all health system charges were monitored from the initial visit through 1 full year of care. Six categories were monitored: primary care, specialty care, laboratory testing, emergency department, hospitalization, and total charges. Pharmacy charges were excluded because some patients purchased prescriptions outside the hospital system.

Statistical procedures

Mean log values for each area of medical charges were determined and contrasted with the Duncan multiple range test26 to explore the first hypothesis that charges are associated with symptoms and diagnoses of depression. Next, a double hurdle model was used to test the hypotheses that depressive symptoms and physician diagnosis of depression predict the occurrence and magnitude of charges for a variety of services.27,28 In a double hurdle model, the first “hurdle,” or step, involves exploring whether there are variables that can significantly predict the occurrence of an event (such as a medical charge). The second step involves exploring whether there are variables that can predict the magnitude of the event (eg, a medical charge).

Log-transformation of charges was performed to eliminate undue influence from outliers. No logistic regression models were developed for the occurrence of primary care charges or total charges (the first hurdle) because all study patients had charges in both categories. Results are presented by hypothesis.

Results

Seventy-seven of 508 study patients (15.1%) were identified as depressed by their primary care providers in chart notes. BDI scores showed considerable spread (range, 0–31) and were significantly associated with the diagnosis of depression (P < .001). Whereas 140 patients reported BDI scores of at least 9, only 36 of these patients were diagnosed as depressed by their physicians. Similarly, 41 patients were diagnosed as depressed despite reporting low (normal) BDI scores. Patients were assigned to 1 of 4 groups: those diagnosed as depressed and having high (abnormal) BDI scores (n = 36); those diagnosed as depressed despite low BDI scores (n = 41); those not diagnosed as depressed despite high BDI scores (n = 94); and those not diagnosed as depressed and not having high BDI scores (n = 337).

Hypothesis 1: overall impact of symptoms and diagnosis on charges

Groups diagnosed with depression had significantly higher log primary care charges than did those not diagnosed (Table 1). Both groups diagnosed with depression showed the highest primary care and total medical charges. Patients diagnosed with depression and reporting high BDI scores had higher specialty charges than those not depressed. Highest laboratory costs were found for those diagnosed as depressed despite low BDI scores and those with elevated BDI scores who were not diagnosed as depressed. There were no significant differences among groups for log charges for emergency care and hospital charges.

TABLE 1
Log charges of care by diagnosis and symptoms of depression

 

 Diagnosis of depressionNo diagnosis of depression
ChargesBDI ≥ 9BDI < 9BDI ≥ 9BDI < 9
Primary care5.868*6.054*5.4315.347
Specialty care4.266*3.742*3.3322.927
Emergency care1.6812.1721.6041.248
Laboratory tests6.1216.4736.3575.401
Hospital charges2.1743.7421.5481.1893
Total charges7.7047.8787.5086.979
*Log costs were higher for patients with the diagnosis of depression regardless of BDI score than for those with no diagnosis and a BDI below 9.
All charges are logarithmic.
Log costs were higher for patients with the diagnosis of depression and a BDI score below 9 or no diagnosis and a BDI score of at least 9 than for those with no diagnosis and a BDI score below 9.
BDI, Beck Depression Inventory.

Hypotheses 2 and 3: factors predicting occurrence and magnitude of charges

Cost models are presented as regressions in Table 2. The left side of the table presents logistic regressions exploring which variables predict whether or not a patient accrues charges in all areas except primary care and total charges. Because all patients had at least 1 primary care visit charge and, hence, a total charge, it was not possible to develop a model to predict the occurrence of those charges.

 

 

Physical health status (measured by the physical component score of the MOS SF-36) predicted the occurrence of all charges measured with the exception of laboratory tests. Advanced patient age predicted increased likelihood of charges in each area; female sex showed a trend toward predicting occurrence of emergency care charges; and education showed a trend toward predicting occurrence of laboratory charges. BDI scores (measure of symptoms of depression) and physician diagnosis of depression failed to contribute significantly to the prediction of specialty care, emergency care, laboratory testing, or hospital charges. However, there was a trend for depressive symptoms to predict the occurrence of laboratory charges.

The right side of Table 2 presents regression models that predicted the magnitude of the different categories of charges. Physical health status was a significant predictor of the magnitude of all types of charges except emergency care. Patient age contributed to prediction of size of all types of charges except emergency visits and laboratory tests. Female sex was a significant predictor of magnitude of charges in primary care, laboratory tests, and total medical charges. The diagnosis of depression was a significant predictor of magnitude of primary care (P = .0029) and total medical (P = .0158) charges. Neither depressive symptoms nor the diagnosis of depression contributed significantly to the prediction of magnitude of charges for specialty care, emergency care, laboratory testing, or hospital use, although there was a trend for depressive symptoms to predict the magnitude of laboratory costs. Although an interaction term was entered into both kinds of regression equations, there was no evidence of a significant contribution from the interaction of symptoms of depression and diagnosis of depression in any of the predictor models developed.

TABLE 2
Regression analyses predicting charges

 

  OccurrenceMagnitude 
ChargesIndependent variable*BetaPBetaPR 2
Primary carePCS-.0961.0410.40%
Sex-.1271.004
Age (y).1891.0001
Diagnosis.2097.003
Specialty carePCS-.1583.005-.1904.0042.40%
Age (y).2235.0002.1261.07
Emergency carePCS-.2518.00039.75%
Sex-.1344.06
Education-.2827.0068
Age (y)-.1621.04
Laboratory testsPCS-.2689.000119.90%
Sex.1459.0009
Education.0408.09
Age (y)-.0411.0001.1978.0001
BDI score.2487.08.0945.08
Hospital carePCS-.2583.0007-.2554.049.40%
Education-.2632.02
Age (y).0089.0007
Total chargesPCS-.2547.000117.00%
Sex.0846.05
Age (y).2193.0001
Diagnosis.1631.02
*Only variables significantly associated with the occurrence or magnitude of charges for each component are shown.
BDI, Beck Depression Inventory; PCS, physical component score.

Discussion

Medical charges were related to symptoms of depression and physician diagnosis of depression in this study. Although the patient sample was small, it was representative of the primary care population in displaying a wide range of depressive symptoms as measured by the BDI.6,7 In this study, physician diagnosis of depression was related to self-reported depression ratings: those diagnosed as depressed had significantly higher BDI scores than did those not diagnosed as depressed. However, the relationship between self-reported symptoms and diagnosis was not perfect: 72% of patients with high BDI scores were not recognized as depressed, as often occurs in primary care.6,7 In fact, more patients diagnosed with depression had low BDI scores (< 9, n = 41) than high BDI scores (> 8, n = 36). Clearly, other factors enter the process by which primary care physicians reach the diagnosis of depression.

Symptoms of depression and the diagnosis of depression probably influence the process of care in different ways. Differences in process of care likely would be reflected in different relationships to medical charges. Physician diagnosis of depression was associated with higher primary care and total costs and contributed to models predicting magnitude of primary care and total charges. However, neither symptoms of depression nor diagnosis of depression predicted which patients were more likely to incur charges for specialty care, emergency care, laboratory tests, or hospitalization. There was a trend only for the symptoms of depression to predict who would incur laboratory charges. These findings suggest that the relationship between depression and primary care charges and total charges is clear but less apparent when looking at less frequently occurring charges.

Other demographic factors showed fairly robust associations with the occurrence of charges. Patient age predicted who would get specialty care, emergency care, laboratory costs, and hospitalization, and there was a trend for female sex to predict occurrence of emergency department charges. Health status proved to be a significant predictor of the magnitude of all charges except those for emergency care. These powerful influences must be considered to accurately assess the impact of depression on charges.

Age also predicted the total amount of charges for primary and all medical care for the year and showed a trend toward prediction of magnitude of specialty charges. Female sex was a significant predictor of magnitude of primary care charges, laboratory charges, and total charges, and less education was a significant predictor of magnitude of emergency department and hospital charges. Some of these demographic predictors are readily explained. For example, as patients age, the number and costs of medical problems often increase. More education may enhance socioeconomic status and self-care, each of which may buffer against the need for emergency care and hospitalization. The reasons that charges are often higher for women are probably more complex. Higher utilization of primary and specialty care for women was associated with lower self-report-ed health status, less education, and lower socioeconomic status in our previous study.29

 

 

These results also suggest that physician diagnosis of depression in the absence of elevated BDI scores may flag a different kind of patient presentation. Diagnosis of depression without elevated BDI scores could result from effective treatment controlling the symptoms of previously diagnosed depression, but this does not adequately explain the occurrence. Perhaps other aspects of physician–patient interaction trigger a depression diagnosis without symptoms. This group ranked highest for log-transformed charges for 5 of the 6 areas explored: only for specialty care did those with high BDI scores and diagnosis of depression rank higher in total cost. This strong association with charges implies that these patients represent diagnostic dilemmas, thereby generating more primary care visits and laboratory tests. They may be diagnosed as depressed despite their low BDI scores simply because no organic explanation can be readily identified.

BDI scores showed a trend toward predicting higher laboratory charges in our models. This finding supports the importance of depressive symptoms in influencing the process of primary care, especially laboratory testing.15,30 Perhaps the diagnosis of depression actually slowed the ordering of laboratory tests.18 Because our data did not allow a separation of charges for laboratory tests before and after the diagnosis of depression, we did not test this possibility.

The size of this sample (N = 508) and the length of time patients were followed (1 year) might not have provided adequate power to fully test the contributions of symptoms and diagnosis of depression to the 6 sets of charges. This was likely true for hospitalization charges because hospitalization was an infrequent event in this study. Previous, larger studies found indications of increased hospitalization charges for those diagnosed as depressed17 and those with symptoms of depression.15,30 Alternatively, the recent emphasis on decreasing hospitalizations to reduce medical costs may mean that hospitalization for depressive symptoms rather than for physical illness is less likely to occur.31 In addition, these observations were made by resident physicians and not by community clinicians. It is not clear whether these results would generalize to another setting, although they are consistent with community observations in previous research.

These data do suggest an intriguing interplay of the impact of physician diagnosis of depression and presence of symptoms of depression in a number of indicators of charges and utilization in primary care. Even though each element was associated with increased utilization and charges, their differential impact is unclear. Both may prove important for efforts to enhance recognition of depression; recognition of a mental health problem appeared to shift the process of care in this and previous studies.14,32 To date, there are no data indicating that the diagnosis of depression reduces utilization or costs of primary care delivery. What is known is that physicians working in primary care are more apt to accurately diagnose those with more severe symptoms of depression than those with more transient or less severe symptoms.16,33 Although introducing a screening device such as the BDI or the PRIME-MD9 likely would increase the number of patients diagnosed with depression, it is unclear what impact that would have on the process, costs, and outcomes of care. Simpler interventions such as training in communication skills such as empathy34 might provide the primary care physician with all the tools needed for identification of emotional distress and mental health problems14,30 and appropriate treatment or referral.

References

 

1. deGruy F. Mental health care in the primary care setting. In: Donaldson MS, Yordy KD, Lohr KN, Vanselow NA, eds. Primary Care: America’s Health in the New Era. Washington, DC: National Academy Press; 1996;285-311.

2. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.

3. Schurman RA, Kramer PD, Mitchell JB. The hidden mental health network. Treatment of mental illness by nonpsychiatrist physicians. Arch Gen Psychiatry 1985;42:89-94.

4. Nutting PA, Franks P, Clancy CM. Referral and consultation in primary care: do we understand what we’re doing? [editorial; comment]. J Fam Pract 1992;35:21-3.

5. Laepine JP, Gastpar M, Mendlewicz J, Tylee A. Depression in the community: the first pan-European study DEPRES (Depression Research in European Society). Int Clin Psychopharmacol 1997;12:19-29.

6. Panel DG. Clinical Practice Guidelines. Vol I. Washington, DC: Agency for Health Care Policy and Research; 1993.

7. Panel DG. Clinical Practice Guidelines. Vol II. Washington, DC: Agency for Health Care Policy and Research; 1993.

8. Katon W. The epidemiology of depression in medical care. Int J Psychiatry Med 1987;17:93-112.

9. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study [see comments]. JAMA 1994;272:1749-56.

10. Greenberg PE, Stiglin LE, Finkelstein SN, Berndt ER. The economic burden of depression in 1990 [see comments]. J Clin Psychiatry 1993;54:405-18.

11. Kirmayer LJ, Robbins JM. Three forms of somatization in primary care: prevalence, co-occurrence, and sociodemographic characteristics. J Nerv Ment Dis 1991;179:647-55.

12. Kirmayer LJ, Robbins JM, Dworkind M, Yaffe MJ. Somatization and the recognition of depression and anxiety in primary care. Am J Psychiatry 1993;150:734-41.

13. Kirmayer LJ, Robbins JM. Patients who somatize in primary care: a longitudinal study of cognitive and social characteristics. Psychol Med 1996;26:937-51.

14. Callahan EJ, Bertakis KD, Azari R, et al. The influence of depression on physician-patient interaction in primary care. Fam Med 1996;28:346-51.

15. Callahan CM, Kesterson JG, Tierney WM. Association of symptoms of depression with diagnostic test charges among older adults. Ann Intern Med 1997;126:426-32.

16. Simon GE, VonKorff M, Barlow W. Health care costs of primary care patients with recognized depression. Arch Gen Psychiatry 1995;52:850-6.

17. Simon G, Ormel J, VonKorff M, Barlow W. Health care costs associated with depressive and anxiety disorders in primary care. Am J Psychiatry 1995;152:352-7.

18. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.

19. Beck AT, Beck RW. Screening depressed patients in family practice. A rapid technique. Postgrad Med 1972;52:81-5.

20. Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473-83.

21. Beck AT, Ward CH, Mendelson M. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561-71.

22. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression Inventory: twenty-five years of evaluation. Clin Psychol Rev 1988;8:77-100.

23. Stewart AL, Hays RD, Ware JE, Jr. The MOS short-form general health survey. Reliability and validity in a patient population. Med Care 1988;26:724-35.

24. McHorney CA, Ware JE, Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247-63.

25. Ware JE, Kosinski M, Keller SD. SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston: Nimrod Press; 1994.

26. Harter HL. Critical values for Duncan’s new multiple range test. Biometrics 1960;16:671-85.

27. Duan N. Smearing estimates: a nonparametric retransformation method. J Am Stat Assoc 1983;78:605-10.

28. Duan N, Manning WG, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Business Econ Stat 1983;1:115-26.

29. Bertakis KD, Azari R, Helms LJ, Callahan EJ, Robbins JA. Gender differences in the utilization of health care services. J Fam Pract 2000;49:147-52.

30. Unutzer J, Patrick DL, Simon G, et al. Depressive symptoms and the cost of health services in HMO patients aged 65 years and older. A 4-year prospective study. JAMA 1997;277:1618-23.

31. Leslie DL, Rosenheck R. Shifting to outpatient care? Mental health care use and cost under private insurance. Am J Psychiatry 1999;156:1250-7.

32. Callahan EJ, Jaéen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physi-cian-patient encounter in family practice [see comments]. J Fam Pract 1998;46:410-8.

33. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care physicians reconsidered [see comments]. Gen Hosp Psychiatry 1995;17:3-12.

34. Suchman AL, Markakis K, Beckman HB, Frankel R. A model of empathic communication in the medical interview [see comments]. JAMA 1997;277:678-82.

References

 

1. deGruy F. Mental health care in the primary care setting. In: Donaldson MS, Yordy KD, Lohr KN, Vanselow NA, eds. Primary Care: America’s Health in the New Era. Washington, DC: National Academy Press; 1996;285-311.

2. Regier DA, Goldberg ID, Taube CA. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry 1978;35:685-93.

3. Schurman RA, Kramer PD, Mitchell JB. The hidden mental health network. Treatment of mental illness by nonpsychiatrist physicians. Arch Gen Psychiatry 1985;42:89-94.

4. Nutting PA, Franks P, Clancy CM. Referral and consultation in primary care: do we understand what we’re doing? [editorial; comment]. J Fam Pract 1992;35:21-3.

5. Laepine JP, Gastpar M, Mendlewicz J, Tylee A. Depression in the community: the first pan-European study DEPRES (Depression Research in European Society). Int Clin Psychopharmacol 1997;12:19-29.

6. Panel DG. Clinical Practice Guidelines. Vol I. Washington, DC: Agency for Health Care Policy and Research; 1993.

7. Panel DG. Clinical Practice Guidelines. Vol II. Washington, DC: Agency for Health Care Policy and Research; 1993.

8. Katon W. The epidemiology of depression in medical care. Int J Psychiatry Med 1987;17:93-112.

9. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study [see comments]. JAMA 1994;272:1749-56.

10. Greenberg PE, Stiglin LE, Finkelstein SN, Berndt ER. The economic burden of depression in 1990 [see comments]. J Clin Psychiatry 1993;54:405-18.

11. Kirmayer LJ, Robbins JM. Three forms of somatization in primary care: prevalence, co-occurrence, and sociodemographic characteristics. J Nerv Ment Dis 1991;179:647-55.

12. Kirmayer LJ, Robbins JM, Dworkind M, Yaffe MJ. Somatization and the recognition of depression and anxiety in primary care. Am J Psychiatry 1993;150:734-41.

13. Kirmayer LJ, Robbins JM. Patients who somatize in primary care: a longitudinal study of cognitive and social characteristics. Psychol Med 1996;26:937-51.

14. Callahan EJ, Bertakis KD, Azari R, et al. The influence of depression on physician-patient interaction in primary care. Fam Med 1996;28:346-51.

15. Callahan CM, Kesterson JG, Tierney WM. Association of symptoms of depression with diagnostic test charges among older adults. Ann Intern Med 1997;126:426-32.

16. Simon GE, VonKorff M, Barlow W. Health care costs of primary care patients with recognized depression. Arch Gen Psychiatry 1995;52:850-6.

17. Simon G, Ormel J, VonKorff M, Barlow W. Health care costs associated with depressive and anxiety disorders in primary care. Am J Psychiatry 1995;152:352-7.

18. Carney PA, Rhodes LA, Eliassen MS, et al. Variations in approaching the diagnosis of depression: a guided focus group study. J Fam Pract 1998;46:73-82.

19. Beck AT, Beck RW. Screening depressed patients in family practice. A rapid technique. Postgrad Med 1972;52:81-5.

20. Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473-83.

21. Beck AT, Ward CH, Mendelson M. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561-71.

22. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression Inventory: twenty-five years of evaluation. Clin Psychol Rev 1988;8:77-100.

23. Stewart AL, Hays RD, Ware JE, Jr. The MOS short-form general health survey. Reliability and validity in a patient population. Med Care 1988;26:724-35.

24. McHorney CA, Ware JE, Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247-63.

25. Ware JE, Kosinski M, Keller SD. SF-36 Physical and Mental Health Summary Scales: A User’s Manual. Boston: Nimrod Press; 1994.

26. Harter HL. Critical values for Duncan’s new multiple range test. Biometrics 1960;16:671-85.

27. Duan N. Smearing estimates: a nonparametric retransformation method. J Am Stat Assoc 1983;78:605-10.

28. Duan N, Manning WG, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Business Econ Stat 1983;1:115-26.

29. Bertakis KD, Azari R, Helms LJ, Callahan EJ, Robbins JA. Gender differences in the utilization of health care services. J Fam Pract 2000;49:147-52.

30. Unutzer J, Patrick DL, Simon G, et al. Depressive symptoms and the cost of health services in HMO patients aged 65 years and older. A 4-year prospective study. JAMA 1997;277:1618-23.

31. Leslie DL, Rosenheck R. Shifting to outpatient care? Mental health care use and cost under private insurance. Am J Psychiatry 1999;156:1250-7.

32. Callahan EJ, Jaéen CR, Crabtree BF, et al. The impact of recent emotional distress and diagnosis of depression or anxiety on the physi-cian-patient encounter in family practice [see comments]. J Fam Pract 1998;46:410-8.

33. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care physicians reconsidered [see comments]. Gen Hosp Psychiatry 1995;17:3-12.

34. Suchman AL, Markakis K, Beckman HB, Frankel R. A model of empathic communication in the medical interview [see comments]. JAMA 1997;277:678-82.

Issue
The Journal of Family Practice - 51(06)
Issue
The Journal of Family Practice - 51(06)
Page Number
540-544
Page Number
540-544
Publications
Publications
Topics
Article Type
Display Headline
Association of higher costs with symptoms and diagnosis of depression
Display Headline
Association of higher costs with symptoms and diagnosis of depression
Legacy Keywords
,Depressionfees and charges and utilizationprimary health care. (J Fam Pract 2002; 51:540–544)
Legacy Keywords
,Depressionfees and charges and utilizationprimary health care. (J Fam Pract 2002; 51:540–544)
Sections
Disallow All Ads
Alternative CME
Article PDF Media

Randomized placebo-controlled trial comparing efficacy and safety of valdecoxib with naproxen in patients with osteoarthritis

Article Type
Changed
Display Headline
Randomized placebo-controlled trial comparing efficacy and safety of valdecoxib with naproxen in patients with osteoarthritis

ABSTRACT

OBJECTIVE: We compared the efficacy and upper gastrointestinal safety of the cyclooxygenase-2–specific inhibitor valdecoxib with naproxen and placebo in treating moderate to severe osteoarthritis of the knee.

STUDY DESIGN: This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at the dosage of 500 mg twice daily.

POPULATION: We included patients who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology.

OUTCOMES MEASURED: The Patient’s and Physician’s Global Assessment of Arthritis (PaGAA, PhGAA), Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS), and Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices were assessed at baseline and at weeks 2, 6, and 12. Upper gastrointestinal ulceration was assessed by pre- and posttreatment endoscopies.

RESULTS: Valdecoxib 10 and 20 mg once daily (but not 5 mg once daily) demonstrated similar efficacy to naproxen at 500 mg twice daily, and all 3 dosages were superior to placebo for the PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices at most assessments throughout the 12-week study (P < .05). The incidence of endoscopically proven ulcers was significantly higher in the naproxen group than in the 5- and 10-mg valdecoxib groups, but not in the 20-mg valdecoxib group. All 3 valdecoxib doses were comparable to placebo in ulcer incidence.

CONCLUSIONS: Valdecoxib (10 and 20 mg once daily) is significantly superior to placebo and as effective as naproxen (500 mg twice daily) in improving moderate to severe osteoarthritis of the knee. Upper gastrointestinal tract safety of valdecoxib (5 and 10 mg) was comparable to that of placebo and significantly better than that of naproxen.

KEY POINTS FOR CLINICIANS

  • The cyclooxygenase-2–specific inhibitor valdecoxib 10 or 20 mg once daily is as effective as naproxen 500 mg twice daily.
  • Valdecoxib at the recommended dose for treatment of osteoarthritis (10 mg once daily) had better upper gastrointestinal safety than naproxen.

Current medical therapies for osteoarthritis include conventional nonsteroidal anti-inflammatory drugs (NSAIDs), acetaminophen, glucosamine sulfate, and intra-articular injections of corticosteroids and hyaluronic acid. However, long-term use of corticosteroid injections can exacerbate damage to the affected joints.1,2 Conventional NSAIDs are associated with upper gastrointestinal tract ulceration and inhibition of platelet function.3

Cyclooxygenase-2 (COX-2)–specific inhibitors have demonstrated equivalent efficacy to conventional NSAIDs in treating pain and inflammation associated with osteoarthritis and rheumatoid arthritis. Further, COX-2–specific inhibitors significantly reduce the incidence of gastrointestinal ulceration and bleeding side effects caused by conventional NSAIDs.4,5 Valdecoxib (Bextra; Pharmacia Corporation and Pfizer Corporation) is a novel COX-2–specific inhibitor that is approximately 28,000-fold more selective against COX-2 than against COX-1. As a potent COX-2–specific inhibitor, valdecoxib is expected to provide efficacy equivalent to conventional NSAIDs for treatment of arthritis and spare the COX-1–related side effects. This randomized, placebo-controlled, double-blind, 12-week study was designed to test this hypothesis by comparing the efficacy and upper gastrointestinal tract safety of valdecoxib with that of naproxen, a leading conventional NSAID comparator.

Methods

Study population

Ambulatory adults who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology6,7 were eligible to participate in the trial. Patients were recruited from primary care and rheumatology specialty settings. Patients who had baseline scores of at least 40 mm on the Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS) and baseline categorical scores of poor to very poor on the Patient’s (PaGAA) and Physician’s (PhGAA) Global Assessments of Arthritis were included.8,9 Any patient suffering from inflammatory arthritis, gout, pseudogout, Paget disease, or any chronic pain syndrome that might interfere with assessment of the Index Knee was excluded from the trial. Patients diagnosed with osteoarthritis of the hip ipsilateral to the Index Knee, severe anserine bursitis, acute joint trauma, or complete loss of articular cartilage on the Index Knee also were excluded. Patients were not eligible if they had active gastrointestinal disease, gastrointestinal tract ulceration 30 days before the trial, a significant bleeding disorder, or a history of gastric or duodenal surgery. Patients with an esophageal, gastric, pyloric channel, or duodenal ulcer or a score of at least 10 for esophageal, gastric, or duodenal erosions at the pretreatment endoscopy examination also were excluded.

FIGURE 1
Patient’s global assessment of arthritis

Study design

This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee. The trial was conducted in 85 centers in the United States and Canada, in accordance with the principles of good clinical practice and the Declaration of Helsinki. Eligible patients were randomized to treatment groups and self-administered oral study medication. Patients were randomized to study treatment in the order in which they were enrolled into the study by using a treatment sequence that was determined by a Searle-prepared computer-generated randomization schedule. Patients received their allocated study medications in bottles labeled A and B according to the randomization schedule. Personnel at the study centers carried out the assessments and remained blinded throughout the study. Eligible patients were enrolled and discontinued regular pain medication. Patients discontinued their normal medications at the following specified times before the baseline endoscopy: NSAIDs (including full-dose aspirin at a dosage of ≥325 mg/day) at 48 hours, corticosteroid injections at 4 weeks, and intra-articular injections of corticosteroid or hyaluronic acid preparations at 3 and 6 months, respectively. The use of antiulcer drugs, including H2 blockers, proton pump inhibitors, misoprostol, and sucralfate, was discontinued at least 24 hours before the baseline endoscopy.

 

 

Efficacy assessments

The following arthritis assessments were made at baseline and at 2, 6, and 12 weeks or at early termination after study drug administration. PaGAA or PhGAA was measured on a 5-point categorical scale, where 1 = very good, 2 = good, 3 = fair, 4 = poor, and 5 = very poor. The PAAP-VAS was measured on a scale of 0 to 100 mm, where 0 = no pain and 100 = most severe pain. The Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices including Pain, Stiffness, Physical Function, and Composite were measured as described previously.10

Upper gastrointestinal assessments

Upper gastrointestinal tract endoscopy was performed within 7 days before the first study dose and at the 12-week assessment or at early termination if the patient withdrew. An endoscopy could be performed at any time if the patient experienced symptoms suggestive of an ulcer. The endoscopists performing baseline and 12-week (early termination) assessments remained blinded throughout the study.

General safety assessments

Clinical laboratory tests were performed at screening, baseline, weeks 2, 6, and 12, or at early termination, and a complete physical examination was performed at screening and final visits. The incidence of adverse events occurring in each treatment arm was monitored throughout the study. Adverse events occurring within 7 days and serious adverse events occurring within 30 days of the last study dosage of medication were included in the safety analyses.

Statistical analyses

A sample size of 200 patients per treatment group was deemed sufficient to detect a difference in ulcer rates of 5% for valdecoxib vs 16% for naproxen, with 80% power and type 1 error at .017 (adjusted for 3 primary comparisons against placebo). Homogeneity of treatment groups at baseline with respect to age, height, weight, duration of osteoarthritis, PAAP-VAS, and WOMAC Osteoarthritis Index scores was assessed with 2-way analysis of variance, with treatment group and center as factors. All other demographics and baseline characteristics were compared with the Cochran-Mantel-Haenszel (CMH) test, stratified by center.

All efficacy assessments were performed on the modified intent-to-treat (ITT) cohort by using the last observation carried forward approach. The ITT cohort comprised all patients who were randomized and had taken at least 1 dose of study medication. Analyses of mean change from baseline for PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices were performed by using analysis of covariance, with treatment and center as factors and the corresponding baseline score as the covariate. Pairwise comparisons of valdecoxib at dosages of 10 and 20 mg once daily vs placebo were interpreted with the Hochberg procedure.11 Primary pairwise comparisons were amended in the statistical analysis plan before data unblinding to compare placebo with 10 and 20 mg valdecoxib, but not with the 5-mg dose. For all other comparisons, including 5 mg valdecoxib and naproxen vs placebo, differences were considered significant if the pairwise P values were less than .05. The incidence of withdrawal due to treatment failure was analyzed by the Fisher exact test, and the time to withdrawal in each treatment group was analyzed by log-rank test and plotted with the Kaplan-Meier product limit.12,13

Upper gastrointestinal tract endoscopic analyses were performed on the upper gastrointestinal tract ITT population. Randomized patients were included in this cohort if they received at least 1 dose of study medication and had undergone pretreatment and posttreatment endoscopies. Overall and pairwise comparisons of gastroduodenal, gastric, and duodenal ulcers and erosions were assessed with the CMH test stratified by center. The incidence of adverse events was compared between treatment groups with the Fisher exact test. Changes in vital signs were compared between treatment groups with an analysis of covariance using pairwise treatment comparisons, with treatment group as a factor and baseline value as a covariate.

Results

Patient baseline characteristics

Of the 1019 eligible randomized patients, 1 patient randomized to 10 mg/day valdecoxib, 1 to 20 mg/day valdecoxib, and 1 to 500 mg naproxen twice daily did not take the study medication and were excluded from efficacy and safety analyses. The remaining 1016 randomized patients received study medication and were included in the ITT cohort on which analyses of all efficacy end points were based. A total of 269 patients withdrew before the end of the study due to treatment failure, preexisting protocol violations, noncompliance, or adverse signs and symptoms, or were lost to follow-up: 74 patients in the placebo group, 39 in the 5-mg valdecoxib group, 56 in the 10-mg valdecoxib group, 44 in the 20-mg valdecoxib group, and 56 in the naproxen group. The upper gastrointestinal tract ITT cohort comprised 908 patients who were included in the upper gastrointestinal tract safety analyses. More than 90% of patients included in the study evaluated their osteoarthritis as poor to very poor as assessed by baseline PaGAA scores. Treatment groups were homogeneous with respect to demographics, vital signs, medical history, and all baseline arthritis assessments (Table 1).

 

 

TABLE 1
Patient baseline characteristics

  ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 206)20 mg qd (n = 202)500 mg bid (n = 205)
Mean (SD) age, y60.3 (10.5)58.7 (11.9)59.8 (11.0)59.6 (10.4)60.4 (10.7)
Mean (SD) weight, kg87.5 (21.2)91.4 (22.6)89.3 (21.4)92.6 (23.7)88.1 (21.7)
Race, n (%)
  White162 (79)155 (77)154 (75)160 (79)163 (80)
  Black21 (10)26 (13)24 (12)24 (12)23 (11)
  Asian1 (0)1 (0)1 (0)1 (0)2 (1)
  Hispanic19 (9)18 (9)25 (12)15 (7)15 (7)
Male sex, n (%)73 (36)73 (36)72 (35)66 (33)76 (37)
Mean (SD) disease duration, y8.3 (8.0)9.8 (9.5)8.7 (8.0)9.2 (8.0)9.4 (8.7)
History of GI bleeding, n (%)2 (1)0 (0)3 (1)2 (1)3 (1)
History of gastroduodenal ulcer, n (%)20 (10)21 (10)24 (12)28 (14)31 (15)
PaGAA, n (%)
  Poor168 (82)175 (87)168 (82)162 (80)169 (82)
  Very poor33 (16)23 (11)32 (16)36 (18)31 (15)
PhGAA, n (%)
  Poor179 (87)181 (90)176 (85)173 (86)175 (85)
  Very poor24 (12)18 (9)25 (12)24 (12)25 (12)
No significant differences were observed between treatment groups at any baseline characteristic.
bid, twice daily; GI, gastrointestinal; PaGAA, Patient’s Global Assessment of Arthritis; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily.

Efficacy

The least square mean change in the PaGAA was significantly improved at most assessments in response to valdecoxib (10 and 20 mg/day) and 500 mg naproxen twice daily compared with placebo (Table 2). However, the improvement in response to valdecoxib 5 mg qd did not reach statistical significance (Table 2). Significant improvements in the PhGAA were observed in response to valdecoxib and naproxen at all assessments (Table 2).

The dosages of 20 mg/day valdecoxib and 500 mg naproxen twice daily were associated with a reduction in pain, as assessed by the PAAP-VAS scores. Pain reduction associated with 5 and 10 mg/day valdecoxib was significantly better than that with placebo at all assessments except for week 12 (Table 2).

Valdecoxib and naproxen treatments improved the WOMAC Pain, Stiffness, Physical Function, and Composite indices compared with placebo at 2, 6, and 12 weeks. Valdecoxib 20 mg/day and naproxen 500 mg twice daily produced statistically significant changes in all WOMAC Osteoarthritis scores throughout the 12-week study period compared with placebo (P < .05). WOMAC Pain scores for 10 mg valdecoxib were significantly different from those for placebo at 2 weeks (P < .001) but not at 6 or 12 weeks. No significant differences were noted between any of the valdecoxib treatment doses and naproxen in terms of improvement in WOMAC indices.

The incidences of withdrawal due to treatment failure were 20% (95% confidence interval [CI], 15.3–26.8) in the placebo group; 8% (95% CI, 4.8–12.8), 12% (95% CI, 7.8–17.1), and 10% (95% CI, 6.3–15.2) in the 5-, 10-, and 20-mg/day valdecoxib groups; and 6% (95% CI, 3.6–10.9) in the 500-mg naproxen group (P < .05; Table 3). Patients in the placebo group withdrew at a significantly faster rate than those in the 4 active treatment groups (P < .05), but there were no significant differences in withdrawal rates across the 4 active treatment groups.

TABLE 2
Baseline arthritis assessments and mean changes from baseline scores

 ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 205)20 mg qd (n = 201)500 mg bid (n = 204)
PhGAA§
Baseline mean4.104.074.094.094.10
LSM change
  Week 2 (CI)-1.04 (-1.16, -0.91)-1.31(-1.44, -1.19)-1.37(-1.50, -1.25)-1.42(-1.54, -1.29)-1.35(-1.48, -1.23)
  Week 6 (CI)-1.22 (-1.35, -1.08)-1.44*(-1.58, -1.31)-1.50(-1.63, -1.36)-1.41* (-1.55, -1.28)-1.45* (-1.59, -1.32)
  Week 12 (CI)-1.22 (-1.36, -1.08)-1.43* (-1.58, -1.28)-1.52(-1.67, -1.38)-1.45* (-1.60, -1.31)-1.43* (-1.58, -1.29)
PAAP
Baseline mean71.2071.4272.4172.5472.36
LSM change
  Week 2 (CI)-21.19 (-24.80, -17.58)-28.46(-32.11, -24.82)-30.21(-33.83, -26.59)-32.07(-35.73, -28.41)-31.03(-34.66, -27.40)
  Week 6 (CI)-23.92 (-27.72, -20.12)-30.81(-34.65, -26.97)-29.85* (-33.67, -26.04)-32.28(-36.13, -28.42)-31.84(-35.66, -28.02)
  Week 12 (CI)-25.97 (-30.02, -21.92)-31.33 (-35.42, -27.24)-30.41 (-34.47, -30.41)-32.70* (-36.81, -32.70)-31.83* (-35.90, -27.76)
WOMAC OA, Stiffness
Baseline mean4.844.874.914.734.94
LSM change
  Week 2 (CI)-0.78 (-0.98, -0.57)-1.03 (-1.24, -0.82)-1.20(-1.41, -0.99)-1.24(-1.45, -1.03)-1.28(-1.49, -1.08)
  Week 6 (CI)-1.04 (-1.27, -0.82)-1.25 (-1.48, -1.02)-1.42* (-1.65, -1.20)-1.43* (-1.66, -1.20)-1.40(-1.62, -1.17)
  Week 12 (CI)-1.12 (-1.36, -0.89)-1.33 (-1.57, -1.09)-1.41 (-1.65, -1.17)-1.46* (-1.70, -1.22)-1.54* (-1.78, -1.30)
WOMAC OA, Composite #
Baseline mean53.4953.0354.7353.4253.67
LSM change
  Week 2 (CI)-10.13 (-12.28, -7.99)-13.26* (-15.42, -11.09)-15.05(-17.20, -12.90)-15.44(-17.63, -13.32)-15.47(-17.63, -13.32)
  Week 6 (CI)-12.98 (-15.45, -10.51)-15.47 (-17.97, -12.98)-16.74* (-19.22, -14.26)-17.33* (-19.48, -14.51)-16.99* (-19.48, -14.51)
  Week 12 (CI)-13.48 (-16.07, -10.89)-16.84 (-19.46, -14.23)-17.34* (-19.93, -14.74)-17.22* (-20.64, -15.44)-18.04* (-20.64, -15.44)
*P < .05 vs placebo, significant.
P < .01 vs placebo, significant.
P < .001 vs placebo, significant.
§ Scale = 1 (very good) to 5 (very poor).
Scale = 0 mm (no pain) to 100 mm (most severe pain).
Scale = 0 (no symptoms) to 8 (worse symptoms).
# Scale = 0 (no symptoms) to 96 (worse symptoms).
bid, twice daily; CI, 95% confidence interval; LSM, least square mean; PAAP, Patient’s Assessment of Arthritis Pain; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily; WOMAC OA, Western Ontario and McMaster’s Universities Osteoarthritis Index.

TABLE 3
Incidence of gastroduodenal, gastric, and duodenal ulcers (>5 mm) at final endoscopic evaluation

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Gastroduodenal§8 (4) [2.1, 9.0]6 (3) [1.3, 7.1]5 (3) [1.1, 6.9]10 (5) [2.8, 10.0]18 (10)* [6.1, 15.3]
  Gastric§8 (4) [2.1, 9.0]4 (2) [0.7, 5.7]3 (2) [0.4, 5.4]9 (5) [2.4, 9.3]16 (9) [5.2, 14.1]
  Duodenal§0 (0) [0.05, 2.6]2 (1) [0.2, 4.2]2 (1) [0.2, 4.5]1 (1) [0.0, 3.4]2 (1) [0.2, 4.3]
Symptomatic ulcers (n)01237
*P < .05 vs placebo.
P < .05 vs naproxen.
P < .01 vs naproxen.
§ Data are presented as n (%) [95% confidence interval].
bid, twice daily; qd, once daily.
 

 

Safety

Valdecoxib and placebo had comparable upper gastrointestinal tract ulceration rates, whereas naproxen produced a significantly higher incidence of upper gastrointestinal tract ulcers than did 5 and 10 mg valdecoxib and placebo (P < .05). There were 14 adjudicated symptomatic ulcers during the study: 1 in the 5-mg valdecoxib group, 2 in the 10-mg valdecoxib group, 3 in the 20-mg valdecoxib group, and 7 in the 500-mg naproxen group.

Adverse events with an incidence of at least 5% in any treatment group and adverse events leading to withdrawal from the study are summarized by body system in Table 4. There were no significant differences in the incidence of adverse events between the valdecoxib and placebo groups. In contrast, 500 mg naproxen twice daily was associated with significantly more adverse events than 5 or 10 mg/day valdecoxib (P < .05). The incidence of adverse events was similar in the 20-mg valdecoxib and naproxen groups. Most adverse events were reported in the gastrointestinal system and consisted of abdominal pain, constipation, diarrhea, dyspepsia, flatulence, and nausea. The incidences of constipation, diarrhea, and flatulence were significantly higher in the naproxen group than in the 5-, 10-, and 20-mg valdecoxib groups, respectively. Other adverse events included accidental injury, headache, myalgia, and upper respiratory tract infections. Valdecoxib at 5 mg/day produced a significantly higher incidence of myalgia than did placebo, and valdecoxib at 20 mg/day produced a significantly lower incidence of upper respiratory tract infections than did placebo. Adverse events causing withdrawal with an incidence of at least 1% were accidental injury, abdominal pain, diarrhea, dyspepsia, nausea, abnormal hepatic function, rash, and blurred vision. The proportion of patients in the naproxen group (12.7%) who withdrew from the study was significantly greater than those for the 5-and 20-mg valdecoxib (6.0% and 5.5%) groups (P < .05), although the incidence of withdrawal due to adverse events in the 10-mg valdecoxib and naproxen groups were similar. In addition, gastrointestinal adverse events commonly related to NSAID treatment, such as dyspepsia and constipation, were more frequent in the naproxen group than in the valdecoxib and placebo groups.

TABLE 4
Adverse events

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Incidence ≥ 5% in any treatment group
  Total109 (53.2)112 (55.7)113 (55.1)121 (60.2)139 (68.1)*
  Accidental injury11 (5.4)3 (1.5)10 (4.9)12 (6.0)9 (4.4)
  Headache11 (5.4)12 (6.0)7 (3.4)14 (7.0)9 (4.4)
  Abdominal pain19 (9.3)14 (7.0)18 (8.8)13 (6.5)25 (12.3)
  Constipation6 (2.9)4 (2.0)1 (0.5)4 (2.0)12 (5.9)
  Diarrhea10 (4.9)7 (3.5)14 (6.8)11 (5.5)12 (5.9)
  Dyspepsia15 (7.3)22 (10.9)22 (10.7)20 (9.9)35 (17.2)*
  Flatulence12 (5.9)7 (3.5)5 (2.4)9 (4.5)14 (6.9)
  Nausea10 (4.9)18 (9.0)17 (8.3)9 (4.5)10 (4.9)
  Myalgia0 (0.0)13 (6.5)*3 (1.5)2 (1.0)1 (0.5)
  Upper respiratory tract infections18 (8.8)9 (4.5)10 (4.9)7 (3.5)*10 (4.9)
Incidence ≥ 1% in any treatment group causing withdrawal
  Total17 (8.3)12 (6.0)18 (8.8)11 (5.5)26 (12.7)
  Accidental injury2 (1.0)0 (0.0)0 (0.0)1 (0.5)1 (0.5)
  Abdominal pain5 (2.4)2 (1.0)6 (2.9)2 (1.0)7 (3.4)
  Diarrhea0 (0.0)0 (0.0)1 (0.5)1 (0.5)3 (1.5)
  Dyspepsia2 (1.0)2 (1.0)3 (1.5)1 (0.5)9 (4.4)*
  Nausea2 (1.0)1 (0.5)2 (1.0)1 (0.5)2 (1.0)
  Abnormal hepatic function0 (0.0)2 (1.0)0 (0.0)0 (0.0)0 (0.0)
  Rash0 (0.0)2 (1.0)1 (0.5)0 (0.0)0 (0.0)
  Blurred vision2 (1.0)0 (0.0)1 (0.5)0 (0.0)0 (0.0)
*P < .05 vs placebo.
P < .05 vs naproxen.
Data are presented as number (%) of patients reporting events.
bid, twice daily; qd, once daily.

FIGURE 2
Western Ontario and McMaster’s Universities Osteoarthritis Pain Index

Discussion

This study confirmed that the novel COX-2–specific inhibitor valdecoxib at a dosage of 10 or 20 mg/day is as effective as naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee over 12 weeks. In addition, treatment with 10 mg/day valdecoxib orally, the recommended dosage for treatment of osteoarthritis, is associated with a significantly lower gastroduodenal ulceration rate than occurs with the conventional NSAID, naproxen.

Patients receiving 10 and 20 mg/day valdecoxib experienced significant improvements in the signs and symptoms of osteoarthritis, and in all assessments the efficacies of valdecoxib 10 and 20 mg/day were numerically similar to that of naproxen. This finding is consistent with the inhibition of prostaglandin production in inflamed synovial tissue and in the central pain pathway. Increased COX-2 activity in the spinal cord in response to tissue damage and in the synovial membrane of osteoarthritis patients is at least partly responsible for joint inflammation and sensitization to inflammatory pain.1416 The efficacy of valdecoxib in treating moderate to severe osteoarthritis of the knee was consistent with reports of other COX-2–specific inhibitors that are comparable to conventional NSAIDs in relieving chronic pain and inflammation.17,18 These data confirmed that 10 mg/day valdecoxib is as effective as 500 mg naproxen twice daily in treating the pain and inflammation associated with osteoarthritis. The efficacy of 10 mg/day valdecoxib makes it one of the most potent COX-2–specific inhibitors for treating moderate to severe osteoarthritis.

 

 

Conventional NSAIDs were associated with a significant risk of serious gastrointestinal complications such as ulceration and perforation and low gastrointestinal tolerability.2022 Naproxen treatment of osteoarthritis and rheumatoid arthritis demonstrated a higher rate of endoscopically proven gastrointestinal ulceration than did COX-2–specific inhibitors,17,23 and that finding was confirmed in this study for 10 mg valdecoxib. Naproxen treatment was associated with significantly more gastroduodenal ulcers than 5 or 10 mg valdecoxib. We found no significant difference between 20 mg valdecoxib and naproxen, which might be explained by a lower incidence of ulcers with naproxen than reported in previous studies.24 In terms of numbers needed to treat, 14 patients would be needed to observe a difference in endoscopic ulcer rates between valdecoxib (5 or 10 mg) and naproxen compared with 20 patients to observe a difference between 20 mg valdecoxib and naproxen and 16 to observe a difference in ulcer rates between naproxen and placebo.

Valdecoxib at a dosage of 10 mg/day also demonstrated overall improved gastrointestinal tolerability, with significantly fewer adverse events and withdrawals due to adverse events, in particular gastrointestinal-related events such as constipation and dyspepsia, than did naproxen. The improved upper gastrointestinal tract safety of valdecoxib was as expected because the COX-1–sparing nature of this agent allows effective inhibition of COX-2 without inhibiting COX-1 in the gastric mucosa and platelets. An improved gastrointestinal safety profile is an important consideration in the treatment of osteoarthritis because the moderate to severe gastrointestinal complications associated with conventional NSAID therapy frequently lead to poor patient compliance or discontinuation of the medication.25,26

Overall, this study suggests clinical benefits of single daily doses of 10 and 20 mg valdecoxib and improved upper gastrointestinal tract safety for the 10-mg dose, compared with 500 mg/day naproxen. No additional efficacy benefit was obtained from a 20-mg dose as opposed to a 10-mg dose. Valdecoxib (10 mg) is a potent and effective once-daily alternative to conventional NSAIDs, with a gastrointestinal safety advantage that will be of value to rheumatologists and primary care physicians alike.

FIGURE 3
Western Ontario and Western Universities Osteoarthritis Physical Function Index

References

1. Klippel J, et al. Primer on the Rheumatic Diseases. 12th ed. Atlanta, GA: Arthritis Foundation; 2001.

2. Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev 1988;10:1-28.

3. Borda IT, Koff R. NSAIDs: A Profile of Adverse Effects. Philadelphia: Hanley and Belfus; 1995.

4. Bensen WG, Fiechtner JJ, McMillen JI, et al. Treatment of osteoarthritis with celecoxib, a cyclooxygenase-2 inhibitor: a randomized controlled trial. Mayo Clin Proc 1999;74:1095-105.

5. Geis GS. Update on clinical developments with celecoxib, a new specific COX-2 inhibitor: what can we expect? J Rheumatol 1999;26(suppl 56):31-6.

6. Altman R, Asch E, Bloch G, et al. The American College of Rheumatology criteria for the classification and reportings of osteoarthritis of the knee. Arthritis Rheum 1986;29:1039-49.

7. Schumacher HR. Primer on the Rheumatic Diseases. Atlanta, GA: Arthritis Foundation; 1986.

8. Cooperating Clinics Committee of American Rheumatism Association. A seven day variability study of 499 patients with peripheral rheumatoid arthritis. Arthritis Rheum 1965;8:302-34.

9. Ward JR, Williams HJ, Boyce E, et al. Comparison of auranofin, gold sodium thiomalate, and placebo in the treatment of rheumatoid arthritis. Subsets of responses. Am J Med 1983;75:133-7.

10. Bellamy N. WOMAC Osteoarthritis Index: A User’s Guide. London, Ontario, Canada: The Western Ontario and McMaster Universities; 1995.

11. Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 1998;75:800-2.

12. Miller R. Survival Analyses. New York: John Wiley & Sons; 1998.

13. Simon R, Lee YJ. Nonparametric confidence limits for survival probabilities and median survival time. Cancer Treat Rep 1982;66:37-42.

14. Amin AR, Attur M, Patel RN, et al. Superinduction of cyclooxygenase-2 activity in human osteoarthritis-affected cartilage. Influence of nitric oxide. J Clin Invest 1997;99:1231-7.

15. Hay C, de Belleroche J. Carrageenan-induced hyperalgesia is associated with increased cyclooxygenase-2 expression in spinal cord. Neuroreport 1997;8:1249-51.

16. Kang RY, Freire Moar, Sigal E, et al. Expression of cyclooxygenase-2 in human and an animal model of rheumatoid arthritis. Br J Rheumatol 1996;35:711-8.

17. Bensen WG, Zhao SZ, Burke TA, et al. Upper gastrointestinal tolerability of celecoxib, a COX-2 specific inhibitor, compared to naproxen and placebo. J Rheumatol 2000;27:1876-83.

18. Day R, Morrison B, Luza A, et al. A randomized trial of the efficacy and tolerability of the COX-2 inhibitor rofecoxib vs ibuprofen in patients with osteoarthritis. Rofecoxib/Ibuprofen Comparator Study Group. Arch Intern Med 2000;160:1781-7.

19. Fiechtner J, Sikes D, Recker D. A double-blind, placebo-controlled dose ranging study to evaluate the efficacy of valdecoxib, a novel COX-2 specific inhibitor, in treating the signs and symptoms of osteoarthritis of the knee. Paper presented at: European League Against Rheumatism (EULAR); May 13–16, 2001; Prague, Czech Republic.

20. Garcia Rodriguez LA, Jick H. Risk of upper gastrointestinal bleeding and perforation associated with individual nonsteroidal anti-inflammatory drugs. Lancet 1994;343:769-72.

21. Singh G, Ramey DR, Morfeld D, et al. Gastrointestinal tract complications of nonsteroidal anti-inflammatory drug treatment in rheumatoid arthritis. A prospective observational cohort study. Arch Intern Med 1996;156:1530-6.

22. Singh G, Rosen Ramey D. NSAID induced gastrointestinal complications: the ARAMIS perspective-1997. Arthritis, Rheumatism, and Aging Medical Information System. J Rheumatol 1998;51(suppl):8-16.

23. Watson DJ, Harper SE, Zhao PL, et al. Gastrointestinal tolerability of the selective cyclooxygenase-2 (COX-2) inhibitor rofecoxib compared with nonselective COX-1 and COX-2 inhibitors in osteoarthritis. Arch Intern Med 2000;160:2998-3003.

24. Simon LS, Weaver AL, Graham DY, et al. Anti-inflammatory and upper gastrointestinal effects of celecoxib in rheumatoid arthritis: a randomized controlled trial. JAMA 1999;282:1921-8.

25. Langman MJ, Jensen DM, Watson DJ, et al. Adverse upper gastrointestinal effects of rofecoxib compared with NSAIDs. JAMA 1999;282:1929-33.

26. Scholes D, Stergachis A, Penna P, Normand E, Hansten P. Nonsteroidal anti-inflammatory drug discontinuation in patients with osteoarthritis. J Rheumatol 1995;22:708-12.

Article PDF
Author and Disclosure Information

ALAN KIVITZ, MD
GLENN EISEN, MD
WILLIAM W. ZHAO, PHD
TERRY BEVIRT, BS, MT (ASCP)
DAVID P. RECKER, MD
Duncansville, Pennsylvania; Nashville, Tennessee; and Skokie, Illinois
From the Altoona Center for Clinical Research, Duncansville, PA (A.K.); the Department of Medicine, Vanderbilt University Medical Center, Nashville, TN (G.E.); and the Pharmacia Corporation, Skokie, IL (W.W.Z., T.B., D.R.). This work was sponsored by Pharmacia Corporation and Pfizer, Inc. Terry Bevirt, David P. Recker, and Kenneth M. Verburg are employees of Pharmacia Corporation and have stock interest within the company. Alan Kivitz has acted in capacity of consultant for Pharmacia Corporation. Address reprint requests to David P. Recker, MD, Clinical Research and Development, Pharmacia Corporation, 5200 Old Orchard Road, Skokie, IL 60077. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(06)
Publications
Page Number
530-537
Legacy Keywords
,Cyclooxygenase-2–specific inhibitorsosteoarthritisnonsteroidal anti-inflammatory drugprostaglandin-endoperoxide synthase. (J Fam Pract 2002; 51:530–537)
Sections
Author and Disclosure Information

ALAN KIVITZ, MD
GLENN EISEN, MD
WILLIAM W. ZHAO, PHD
TERRY BEVIRT, BS, MT (ASCP)
DAVID P. RECKER, MD
Duncansville, Pennsylvania; Nashville, Tennessee; and Skokie, Illinois
From the Altoona Center for Clinical Research, Duncansville, PA (A.K.); the Department of Medicine, Vanderbilt University Medical Center, Nashville, TN (G.E.); and the Pharmacia Corporation, Skokie, IL (W.W.Z., T.B., D.R.). This work was sponsored by Pharmacia Corporation and Pfizer, Inc. Terry Bevirt, David P. Recker, and Kenneth M. Verburg are employees of Pharmacia Corporation and have stock interest within the company. Alan Kivitz has acted in capacity of consultant for Pharmacia Corporation. Address reprint requests to David P. Recker, MD, Clinical Research and Development, Pharmacia Corporation, 5200 Old Orchard Road, Skokie, IL 60077. E-mail: [email protected].

Author and Disclosure Information

ALAN KIVITZ, MD
GLENN EISEN, MD
WILLIAM W. ZHAO, PHD
TERRY BEVIRT, BS, MT (ASCP)
DAVID P. RECKER, MD
Duncansville, Pennsylvania; Nashville, Tennessee; and Skokie, Illinois
From the Altoona Center for Clinical Research, Duncansville, PA (A.K.); the Department of Medicine, Vanderbilt University Medical Center, Nashville, TN (G.E.); and the Pharmacia Corporation, Skokie, IL (W.W.Z., T.B., D.R.). This work was sponsored by Pharmacia Corporation and Pfizer, Inc. Terry Bevirt, David P. Recker, and Kenneth M. Verburg are employees of Pharmacia Corporation and have stock interest within the company. Alan Kivitz has acted in capacity of consultant for Pharmacia Corporation. Address reprint requests to David P. Recker, MD, Clinical Research and Development, Pharmacia Corporation, 5200 Old Orchard Road, Skokie, IL 60077. E-mail: [email protected].

Article PDF
Article PDF

ABSTRACT

OBJECTIVE: We compared the efficacy and upper gastrointestinal safety of the cyclooxygenase-2–specific inhibitor valdecoxib with naproxen and placebo in treating moderate to severe osteoarthritis of the knee.

STUDY DESIGN: This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at the dosage of 500 mg twice daily.

POPULATION: We included patients who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology.

OUTCOMES MEASURED: The Patient’s and Physician’s Global Assessment of Arthritis (PaGAA, PhGAA), Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS), and Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices were assessed at baseline and at weeks 2, 6, and 12. Upper gastrointestinal ulceration was assessed by pre- and posttreatment endoscopies.

RESULTS: Valdecoxib 10 and 20 mg once daily (but not 5 mg once daily) demonstrated similar efficacy to naproxen at 500 mg twice daily, and all 3 dosages were superior to placebo for the PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices at most assessments throughout the 12-week study (P < .05). The incidence of endoscopically proven ulcers was significantly higher in the naproxen group than in the 5- and 10-mg valdecoxib groups, but not in the 20-mg valdecoxib group. All 3 valdecoxib doses were comparable to placebo in ulcer incidence.

CONCLUSIONS: Valdecoxib (10 and 20 mg once daily) is significantly superior to placebo and as effective as naproxen (500 mg twice daily) in improving moderate to severe osteoarthritis of the knee. Upper gastrointestinal tract safety of valdecoxib (5 and 10 mg) was comparable to that of placebo and significantly better than that of naproxen.

KEY POINTS FOR CLINICIANS

  • The cyclooxygenase-2–specific inhibitor valdecoxib 10 or 20 mg once daily is as effective as naproxen 500 mg twice daily.
  • Valdecoxib at the recommended dose for treatment of osteoarthritis (10 mg once daily) had better upper gastrointestinal safety than naproxen.

Current medical therapies for osteoarthritis include conventional nonsteroidal anti-inflammatory drugs (NSAIDs), acetaminophen, glucosamine sulfate, and intra-articular injections of corticosteroids and hyaluronic acid. However, long-term use of corticosteroid injections can exacerbate damage to the affected joints.1,2 Conventional NSAIDs are associated with upper gastrointestinal tract ulceration and inhibition of platelet function.3

Cyclooxygenase-2 (COX-2)–specific inhibitors have demonstrated equivalent efficacy to conventional NSAIDs in treating pain and inflammation associated with osteoarthritis and rheumatoid arthritis. Further, COX-2–specific inhibitors significantly reduce the incidence of gastrointestinal ulceration and bleeding side effects caused by conventional NSAIDs.4,5 Valdecoxib (Bextra; Pharmacia Corporation and Pfizer Corporation) is a novel COX-2–specific inhibitor that is approximately 28,000-fold more selective against COX-2 than against COX-1. As a potent COX-2–specific inhibitor, valdecoxib is expected to provide efficacy equivalent to conventional NSAIDs for treatment of arthritis and spare the COX-1–related side effects. This randomized, placebo-controlled, double-blind, 12-week study was designed to test this hypothesis by comparing the efficacy and upper gastrointestinal tract safety of valdecoxib with that of naproxen, a leading conventional NSAID comparator.

Methods

Study population

Ambulatory adults who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology6,7 were eligible to participate in the trial. Patients were recruited from primary care and rheumatology specialty settings. Patients who had baseline scores of at least 40 mm on the Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS) and baseline categorical scores of poor to very poor on the Patient’s (PaGAA) and Physician’s (PhGAA) Global Assessments of Arthritis were included.8,9 Any patient suffering from inflammatory arthritis, gout, pseudogout, Paget disease, or any chronic pain syndrome that might interfere with assessment of the Index Knee was excluded from the trial. Patients diagnosed with osteoarthritis of the hip ipsilateral to the Index Knee, severe anserine bursitis, acute joint trauma, or complete loss of articular cartilage on the Index Knee also were excluded. Patients were not eligible if they had active gastrointestinal disease, gastrointestinal tract ulceration 30 days before the trial, a significant bleeding disorder, or a history of gastric or duodenal surgery. Patients with an esophageal, gastric, pyloric channel, or duodenal ulcer or a score of at least 10 for esophageal, gastric, or duodenal erosions at the pretreatment endoscopy examination also were excluded.

FIGURE 1
Patient’s global assessment of arthritis

Study design

This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee. The trial was conducted in 85 centers in the United States and Canada, in accordance with the principles of good clinical practice and the Declaration of Helsinki. Eligible patients were randomized to treatment groups and self-administered oral study medication. Patients were randomized to study treatment in the order in which they were enrolled into the study by using a treatment sequence that was determined by a Searle-prepared computer-generated randomization schedule. Patients received their allocated study medications in bottles labeled A and B according to the randomization schedule. Personnel at the study centers carried out the assessments and remained blinded throughout the study. Eligible patients were enrolled and discontinued regular pain medication. Patients discontinued their normal medications at the following specified times before the baseline endoscopy: NSAIDs (including full-dose aspirin at a dosage of ≥325 mg/day) at 48 hours, corticosteroid injections at 4 weeks, and intra-articular injections of corticosteroid or hyaluronic acid preparations at 3 and 6 months, respectively. The use of antiulcer drugs, including H2 blockers, proton pump inhibitors, misoprostol, and sucralfate, was discontinued at least 24 hours before the baseline endoscopy.

 

 

Efficacy assessments

The following arthritis assessments were made at baseline and at 2, 6, and 12 weeks or at early termination after study drug administration. PaGAA or PhGAA was measured on a 5-point categorical scale, where 1 = very good, 2 = good, 3 = fair, 4 = poor, and 5 = very poor. The PAAP-VAS was measured on a scale of 0 to 100 mm, where 0 = no pain and 100 = most severe pain. The Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices including Pain, Stiffness, Physical Function, and Composite were measured as described previously.10

Upper gastrointestinal assessments

Upper gastrointestinal tract endoscopy was performed within 7 days before the first study dose and at the 12-week assessment or at early termination if the patient withdrew. An endoscopy could be performed at any time if the patient experienced symptoms suggestive of an ulcer. The endoscopists performing baseline and 12-week (early termination) assessments remained blinded throughout the study.

General safety assessments

Clinical laboratory tests were performed at screening, baseline, weeks 2, 6, and 12, or at early termination, and a complete physical examination was performed at screening and final visits. The incidence of adverse events occurring in each treatment arm was monitored throughout the study. Adverse events occurring within 7 days and serious adverse events occurring within 30 days of the last study dosage of medication were included in the safety analyses.

Statistical analyses

A sample size of 200 patients per treatment group was deemed sufficient to detect a difference in ulcer rates of 5% for valdecoxib vs 16% for naproxen, with 80% power and type 1 error at .017 (adjusted for 3 primary comparisons against placebo). Homogeneity of treatment groups at baseline with respect to age, height, weight, duration of osteoarthritis, PAAP-VAS, and WOMAC Osteoarthritis Index scores was assessed with 2-way analysis of variance, with treatment group and center as factors. All other demographics and baseline characteristics were compared with the Cochran-Mantel-Haenszel (CMH) test, stratified by center.

All efficacy assessments were performed on the modified intent-to-treat (ITT) cohort by using the last observation carried forward approach. The ITT cohort comprised all patients who were randomized and had taken at least 1 dose of study medication. Analyses of mean change from baseline for PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices were performed by using analysis of covariance, with treatment and center as factors and the corresponding baseline score as the covariate. Pairwise comparisons of valdecoxib at dosages of 10 and 20 mg once daily vs placebo were interpreted with the Hochberg procedure.11 Primary pairwise comparisons were amended in the statistical analysis plan before data unblinding to compare placebo with 10 and 20 mg valdecoxib, but not with the 5-mg dose. For all other comparisons, including 5 mg valdecoxib and naproxen vs placebo, differences were considered significant if the pairwise P values were less than .05. The incidence of withdrawal due to treatment failure was analyzed by the Fisher exact test, and the time to withdrawal in each treatment group was analyzed by log-rank test and plotted with the Kaplan-Meier product limit.12,13

Upper gastrointestinal tract endoscopic analyses were performed on the upper gastrointestinal tract ITT population. Randomized patients were included in this cohort if they received at least 1 dose of study medication and had undergone pretreatment and posttreatment endoscopies. Overall and pairwise comparisons of gastroduodenal, gastric, and duodenal ulcers and erosions were assessed with the CMH test stratified by center. The incidence of adverse events was compared between treatment groups with the Fisher exact test. Changes in vital signs were compared between treatment groups with an analysis of covariance using pairwise treatment comparisons, with treatment group as a factor and baseline value as a covariate.

Results

Patient baseline characteristics

Of the 1019 eligible randomized patients, 1 patient randomized to 10 mg/day valdecoxib, 1 to 20 mg/day valdecoxib, and 1 to 500 mg naproxen twice daily did not take the study medication and were excluded from efficacy and safety analyses. The remaining 1016 randomized patients received study medication and were included in the ITT cohort on which analyses of all efficacy end points were based. A total of 269 patients withdrew before the end of the study due to treatment failure, preexisting protocol violations, noncompliance, or adverse signs and symptoms, or were lost to follow-up: 74 patients in the placebo group, 39 in the 5-mg valdecoxib group, 56 in the 10-mg valdecoxib group, 44 in the 20-mg valdecoxib group, and 56 in the naproxen group. The upper gastrointestinal tract ITT cohort comprised 908 patients who were included in the upper gastrointestinal tract safety analyses. More than 90% of patients included in the study evaluated their osteoarthritis as poor to very poor as assessed by baseline PaGAA scores. Treatment groups were homogeneous with respect to demographics, vital signs, medical history, and all baseline arthritis assessments (Table 1).

 

 

TABLE 1
Patient baseline characteristics

  ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 206)20 mg qd (n = 202)500 mg bid (n = 205)
Mean (SD) age, y60.3 (10.5)58.7 (11.9)59.8 (11.0)59.6 (10.4)60.4 (10.7)
Mean (SD) weight, kg87.5 (21.2)91.4 (22.6)89.3 (21.4)92.6 (23.7)88.1 (21.7)
Race, n (%)
  White162 (79)155 (77)154 (75)160 (79)163 (80)
  Black21 (10)26 (13)24 (12)24 (12)23 (11)
  Asian1 (0)1 (0)1 (0)1 (0)2 (1)
  Hispanic19 (9)18 (9)25 (12)15 (7)15 (7)
Male sex, n (%)73 (36)73 (36)72 (35)66 (33)76 (37)
Mean (SD) disease duration, y8.3 (8.0)9.8 (9.5)8.7 (8.0)9.2 (8.0)9.4 (8.7)
History of GI bleeding, n (%)2 (1)0 (0)3 (1)2 (1)3 (1)
History of gastroduodenal ulcer, n (%)20 (10)21 (10)24 (12)28 (14)31 (15)
PaGAA, n (%)
  Poor168 (82)175 (87)168 (82)162 (80)169 (82)
  Very poor33 (16)23 (11)32 (16)36 (18)31 (15)
PhGAA, n (%)
  Poor179 (87)181 (90)176 (85)173 (86)175 (85)
  Very poor24 (12)18 (9)25 (12)24 (12)25 (12)
No significant differences were observed between treatment groups at any baseline characteristic.
bid, twice daily; GI, gastrointestinal; PaGAA, Patient’s Global Assessment of Arthritis; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily.

Efficacy

The least square mean change in the PaGAA was significantly improved at most assessments in response to valdecoxib (10 and 20 mg/day) and 500 mg naproxen twice daily compared with placebo (Table 2). However, the improvement in response to valdecoxib 5 mg qd did not reach statistical significance (Table 2). Significant improvements in the PhGAA were observed in response to valdecoxib and naproxen at all assessments (Table 2).

The dosages of 20 mg/day valdecoxib and 500 mg naproxen twice daily were associated with a reduction in pain, as assessed by the PAAP-VAS scores. Pain reduction associated with 5 and 10 mg/day valdecoxib was significantly better than that with placebo at all assessments except for week 12 (Table 2).

Valdecoxib and naproxen treatments improved the WOMAC Pain, Stiffness, Physical Function, and Composite indices compared with placebo at 2, 6, and 12 weeks. Valdecoxib 20 mg/day and naproxen 500 mg twice daily produced statistically significant changes in all WOMAC Osteoarthritis scores throughout the 12-week study period compared with placebo (P < .05). WOMAC Pain scores for 10 mg valdecoxib were significantly different from those for placebo at 2 weeks (P < .001) but not at 6 or 12 weeks. No significant differences were noted between any of the valdecoxib treatment doses and naproxen in terms of improvement in WOMAC indices.

The incidences of withdrawal due to treatment failure were 20% (95% confidence interval [CI], 15.3–26.8) in the placebo group; 8% (95% CI, 4.8–12.8), 12% (95% CI, 7.8–17.1), and 10% (95% CI, 6.3–15.2) in the 5-, 10-, and 20-mg/day valdecoxib groups; and 6% (95% CI, 3.6–10.9) in the 500-mg naproxen group (P < .05; Table 3). Patients in the placebo group withdrew at a significantly faster rate than those in the 4 active treatment groups (P < .05), but there were no significant differences in withdrawal rates across the 4 active treatment groups.

TABLE 2
Baseline arthritis assessments and mean changes from baseline scores

 ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 205)20 mg qd (n = 201)500 mg bid (n = 204)
PhGAA§
Baseline mean4.104.074.094.094.10
LSM change
  Week 2 (CI)-1.04 (-1.16, -0.91)-1.31(-1.44, -1.19)-1.37(-1.50, -1.25)-1.42(-1.54, -1.29)-1.35(-1.48, -1.23)
  Week 6 (CI)-1.22 (-1.35, -1.08)-1.44*(-1.58, -1.31)-1.50(-1.63, -1.36)-1.41* (-1.55, -1.28)-1.45* (-1.59, -1.32)
  Week 12 (CI)-1.22 (-1.36, -1.08)-1.43* (-1.58, -1.28)-1.52(-1.67, -1.38)-1.45* (-1.60, -1.31)-1.43* (-1.58, -1.29)
PAAP
Baseline mean71.2071.4272.4172.5472.36
LSM change
  Week 2 (CI)-21.19 (-24.80, -17.58)-28.46(-32.11, -24.82)-30.21(-33.83, -26.59)-32.07(-35.73, -28.41)-31.03(-34.66, -27.40)
  Week 6 (CI)-23.92 (-27.72, -20.12)-30.81(-34.65, -26.97)-29.85* (-33.67, -26.04)-32.28(-36.13, -28.42)-31.84(-35.66, -28.02)
  Week 12 (CI)-25.97 (-30.02, -21.92)-31.33 (-35.42, -27.24)-30.41 (-34.47, -30.41)-32.70* (-36.81, -32.70)-31.83* (-35.90, -27.76)
WOMAC OA, Stiffness
Baseline mean4.844.874.914.734.94
LSM change
  Week 2 (CI)-0.78 (-0.98, -0.57)-1.03 (-1.24, -0.82)-1.20(-1.41, -0.99)-1.24(-1.45, -1.03)-1.28(-1.49, -1.08)
  Week 6 (CI)-1.04 (-1.27, -0.82)-1.25 (-1.48, -1.02)-1.42* (-1.65, -1.20)-1.43* (-1.66, -1.20)-1.40(-1.62, -1.17)
  Week 12 (CI)-1.12 (-1.36, -0.89)-1.33 (-1.57, -1.09)-1.41 (-1.65, -1.17)-1.46* (-1.70, -1.22)-1.54* (-1.78, -1.30)
WOMAC OA, Composite #
Baseline mean53.4953.0354.7353.4253.67
LSM change
  Week 2 (CI)-10.13 (-12.28, -7.99)-13.26* (-15.42, -11.09)-15.05(-17.20, -12.90)-15.44(-17.63, -13.32)-15.47(-17.63, -13.32)
  Week 6 (CI)-12.98 (-15.45, -10.51)-15.47 (-17.97, -12.98)-16.74* (-19.22, -14.26)-17.33* (-19.48, -14.51)-16.99* (-19.48, -14.51)
  Week 12 (CI)-13.48 (-16.07, -10.89)-16.84 (-19.46, -14.23)-17.34* (-19.93, -14.74)-17.22* (-20.64, -15.44)-18.04* (-20.64, -15.44)
*P < .05 vs placebo, significant.
P < .01 vs placebo, significant.
P < .001 vs placebo, significant.
§ Scale = 1 (very good) to 5 (very poor).
Scale = 0 mm (no pain) to 100 mm (most severe pain).
Scale = 0 (no symptoms) to 8 (worse symptoms).
# Scale = 0 (no symptoms) to 96 (worse symptoms).
bid, twice daily; CI, 95% confidence interval; LSM, least square mean; PAAP, Patient’s Assessment of Arthritis Pain; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily; WOMAC OA, Western Ontario and McMaster’s Universities Osteoarthritis Index.

TABLE 3
Incidence of gastroduodenal, gastric, and duodenal ulcers (>5 mm) at final endoscopic evaluation

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Gastroduodenal§8 (4) [2.1, 9.0]6 (3) [1.3, 7.1]5 (3) [1.1, 6.9]10 (5) [2.8, 10.0]18 (10)* [6.1, 15.3]
  Gastric§8 (4) [2.1, 9.0]4 (2) [0.7, 5.7]3 (2) [0.4, 5.4]9 (5) [2.4, 9.3]16 (9) [5.2, 14.1]
  Duodenal§0 (0) [0.05, 2.6]2 (1) [0.2, 4.2]2 (1) [0.2, 4.5]1 (1) [0.0, 3.4]2 (1) [0.2, 4.3]
Symptomatic ulcers (n)01237
*P < .05 vs placebo.
P < .05 vs naproxen.
P < .01 vs naproxen.
§ Data are presented as n (%) [95% confidence interval].
bid, twice daily; qd, once daily.
 

 

Safety

Valdecoxib and placebo had comparable upper gastrointestinal tract ulceration rates, whereas naproxen produced a significantly higher incidence of upper gastrointestinal tract ulcers than did 5 and 10 mg valdecoxib and placebo (P < .05). There were 14 adjudicated symptomatic ulcers during the study: 1 in the 5-mg valdecoxib group, 2 in the 10-mg valdecoxib group, 3 in the 20-mg valdecoxib group, and 7 in the 500-mg naproxen group.

Adverse events with an incidence of at least 5% in any treatment group and adverse events leading to withdrawal from the study are summarized by body system in Table 4. There were no significant differences in the incidence of adverse events between the valdecoxib and placebo groups. In contrast, 500 mg naproxen twice daily was associated with significantly more adverse events than 5 or 10 mg/day valdecoxib (P < .05). The incidence of adverse events was similar in the 20-mg valdecoxib and naproxen groups. Most adverse events were reported in the gastrointestinal system and consisted of abdominal pain, constipation, diarrhea, dyspepsia, flatulence, and nausea. The incidences of constipation, diarrhea, and flatulence were significantly higher in the naproxen group than in the 5-, 10-, and 20-mg valdecoxib groups, respectively. Other adverse events included accidental injury, headache, myalgia, and upper respiratory tract infections. Valdecoxib at 5 mg/day produced a significantly higher incidence of myalgia than did placebo, and valdecoxib at 20 mg/day produced a significantly lower incidence of upper respiratory tract infections than did placebo. Adverse events causing withdrawal with an incidence of at least 1% were accidental injury, abdominal pain, diarrhea, dyspepsia, nausea, abnormal hepatic function, rash, and blurred vision. The proportion of patients in the naproxen group (12.7%) who withdrew from the study was significantly greater than those for the 5-and 20-mg valdecoxib (6.0% and 5.5%) groups (P < .05), although the incidence of withdrawal due to adverse events in the 10-mg valdecoxib and naproxen groups were similar. In addition, gastrointestinal adverse events commonly related to NSAID treatment, such as dyspepsia and constipation, were more frequent in the naproxen group than in the valdecoxib and placebo groups.

TABLE 4
Adverse events

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Incidence ≥ 5% in any treatment group
  Total109 (53.2)112 (55.7)113 (55.1)121 (60.2)139 (68.1)*
  Accidental injury11 (5.4)3 (1.5)10 (4.9)12 (6.0)9 (4.4)
  Headache11 (5.4)12 (6.0)7 (3.4)14 (7.0)9 (4.4)
  Abdominal pain19 (9.3)14 (7.0)18 (8.8)13 (6.5)25 (12.3)
  Constipation6 (2.9)4 (2.0)1 (0.5)4 (2.0)12 (5.9)
  Diarrhea10 (4.9)7 (3.5)14 (6.8)11 (5.5)12 (5.9)
  Dyspepsia15 (7.3)22 (10.9)22 (10.7)20 (9.9)35 (17.2)*
  Flatulence12 (5.9)7 (3.5)5 (2.4)9 (4.5)14 (6.9)
  Nausea10 (4.9)18 (9.0)17 (8.3)9 (4.5)10 (4.9)
  Myalgia0 (0.0)13 (6.5)*3 (1.5)2 (1.0)1 (0.5)
  Upper respiratory tract infections18 (8.8)9 (4.5)10 (4.9)7 (3.5)*10 (4.9)
Incidence ≥ 1% in any treatment group causing withdrawal
  Total17 (8.3)12 (6.0)18 (8.8)11 (5.5)26 (12.7)
  Accidental injury2 (1.0)0 (0.0)0 (0.0)1 (0.5)1 (0.5)
  Abdominal pain5 (2.4)2 (1.0)6 (2.9)2 (1.0)7 (3.4)
  Diarrhea0 (0.0)0 (0.0)1 (0.5)1 (0.5)3 (1.5)
  Dyspepsia2 (1.0)2 (1.0)3 (1.5)1 (0.5)9 (4.4)*
  Nausea2 (1.0)1 (0.5)2 (1.0)1 (0.5)2 (1.0)
  Abnormal hepatic function0 (0.0)2 (1.0)0 (0.0)0 (0.0)0 (0.0)
  Rash0 (0.0)2 (1.0)1 (0.5)0 (0.0)0 (0.0)
  Blurred vision2 (1.0)0 (0.0)1 (0.5)0 (0.0)0 (0.0)
*P < .05 vs placebo.
P < .05 vs naproxen.
Data are presented as number (%) of patients reporting events.
bid, twice daily; qd, once daily.

FIGURE 2
Western Ontario and McMaster’s Universities Osteoarthritis Pain Index

Discussion

This study confirmed that the novel COX-2–specific inhibitor valdecoxib at a dosage of 10 or 20 mg/day is as effective as naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee over 12 weeks. In addition, treatment with 10 mg/day valdecoxib orally, the recommended dosage for treatment of osteoarthritis, is associated with a significantly lower gastroduodenal ulceration rate than occurs with the conventional NSAID, naproxen.

Patients receiving 10 and 20 mg/day valdecoxib experienced significant improvements in the signs and symptoms of osteoarthritis, and in all assessments the efficacies of valdecoxib 10 and 20 mg/day were numerically similar to that of naproxen. This finding is consistent with the inhibition of prostaglandin production in inflamed synovial tissue and in the central pain pathway. Increased COX-2 activity in the spinal cord in response to tissue damage and in the synovial membrane of osteoarthritis patients is at least partly responsible for joint inflammation and sensitization to inflammatory pain.1416 The efficacy of valdecoxib in treating moderate to severe osteoarthritis of the knee was consistent with reports of other COX-2–specific inhibitors that are comparable to conventional NSAIDs in relieving chronic pain and inflammation.17,18 These data confirmed that 10 mg/day valdecoxib is as effective as 500 mg naproxen twice daily in treating the pain and inflammation associated with osteoarthritis. The efficacy of 10 mg/day valdecoxib makes it one of the most potent COX-2–specific inhibitors for treating moderate to severe osteoarthritis.

 

 

Conventional NSAIDs were associated with a significant risk of serious gastrointestinal complications such as ulceration and perforation and low gastrointestinal tolerability.2022 Naproxen treatment of osteoarthritis and rheumatoid arthritis demonstrated a higher rate of endoscopically proven gastrointestinal ulceration than did COX-2–specific inhibitors,17,23 and that finding was confirmed in this study for 10 mg valdecoxib. Naproxen treatment was associated with significantly more gastroduodenal ulcers than 5 or 10 mg valdecoxib. We found no significant difference between 20 mg valdecoxib and naproxen, which might be explained by a lower incidence of ulcers with naproxen than reported in previous studies.24 In terms of numbers needed to treat, 14 patients would be needed to observe a difference in endoscopic ulcer rates between valdecoxib (5 or 10 mg) and naproxen compared with 20 patients to observe a difference between 20 mg valdecoxib and naproxen and 16 to observe a difference in ulcer rates between naproxen and placebo.

Valdecoxib at a dosage of 10 mg/day also demonstrated overall improved gastrointestinal tolerability, with significantly fewer adverse events and withdrawals due to adverse events, in particular gastrointestinal-related events such as constipation and dyspepsia, than did naproxen. The improved upper gastrointestinal tract safety of valdecoxib was as expected because the COX-1–sparing nature of this agent allows effective inhibition of COX-2 without inhibiting COX-1 in the gastric mucosa and platelets. An improved gastrointestinal safety profile is an important consideration in the treatment of osteoarthritis because the moderate to severe gastrointestinal complications associated with conventional NSAID therapy frequently lead to poor patient compliance or discontinuation of the medication.25,26

Overall, this study suggests clinical benefits of single daily doses of 10 and 20 mg valdecoxib and improved upper gastrointestinal tract safety for the 10-mg dose, compared with 500 mg/day naproxen. No additional efficacy benefit was obtained from a 20-mg dose as opposed to a 10-mg dose. Valdecoxib (10 mg) is a potent and effective once-daily alternative to conventional NSAIDs, with a gastrointestinal safety advantage that will be of value to rheumatologists and primary care physicians alike.

FIGURE 3
Western Ontario and Western Universities Osteoarthritis Physical Function Index

ABSTRACT

OBJECTIVE: We compared the efficacy and upper gastrointestinal safety of the cyclooxygenase-2–specific inhibitor valdecoxib with naproxen and placebo in treating moderate to severe osteoarthritis of the knee.

STUDY DESIGN: This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at the dosage of 500 mg twice daily.

POPULATION: We included patients who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology.

OUTCOMES MEASURED: The Patient’s and Physician’s Global Assessment of Arthritis (PaGAA, PhGAA), Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS), and Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices were assessed at baseline and at weeks 2, 6, and 12. Upper gastrointestinal ulceration was assessed by pre- and posttreatment endoscopies.

RESULTS: Valdecoxib 10 and 20 mg once daily (but not 5 mg once daily) demonstrated similar efficacy to naproxen at 500 mg twice daily, and all 3 dosages were superior to placebo for the PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices at most assessments throughout the 12-week study (P < .05). The incidence of endoscopically proven ulcers was significantly higher in the naproxen group than in the 5- and 10-mg valdecoxib groups, but not in the 20-mg valdecoxib group. All 3 valdecoxib doses were comparable to placebo in ulcer incidence.

CONCLUSIONS: Valdecoxib (10 and 20 mg once daily) is significantly superior to placebo and as effective as naproxen (500 mg twice daily) in improving moderate to severe osteoarthritis of the knee. Upper gastrointestinal tract safety of valdecoxib (5 and 10 mg) was comparable to that of placebo and significantly better than that of naproxen.

KEY POINTS FOR CLINICIANS

  • The cyclooxygenase-2–specific inhibitor valdecoxib 10 or 20 mg once daily is as effective as naproxen 500 mg twice daily.
  • Valdecoxib at the recommended dose for treatment of osteoarthritis (10 mg once daily) had better upper gastrointestinal safety than naproxen.

Current medical therapies for osteoarthritis include conventional nonsteroidal anti-inflammatory drugs (NSAIDs), acetaminophen, glucosamine sulfate, and intra-articular injections of corticosteroids and hyaluronic acid. However, long-term use of corticosteroid injections can exacerbate damage to the affected joints.1,2 Conventional NSAIDs are associated with upper gastrointestinal tract ulceration and inhibition of platelet function.3

Cyclooxygenase-2 (COX-2)–specific inhibitors have demonstrated equivalent efficacy to conventional NSAIDs in treating pain and inflammation associated with osteoarthritis and rheumatoid arthritis. Further, COX-2–specific inhibitors significantly reduce the incidence of gastrointestinal ulceration and bleeding side effects caused by conventional NSAIDs.4,5 Valdecoxib (Bextra; Pharmacia Corporation and Pfizer Corporation) is a novel COX-2–specific inhibitor that is approximately 28,000-fold more selective against COX-2 than against COX-1. As a potent COX-2–specific inhibitor, valdecoxib is expected to provide efficacy equivalent to conventional NSAIDs for treatment of arthritis and spare the COX-1–related side effects. This randomized, placebo-controlled, double-blind, 12-week study was designed to test this hypothesis by comparing the efficacy and upper gastrointestinal tract safety of valdecoxib with that of naproxen, a leading conventional NSAID comparator.

Methods

Study population

Ambulatory adults who had been diagnosed with moderate to severe osteoarthritis of the knee according to the modified criteria of the American College of Rheumatology6,7 were eligible to participate in the trial. Patients were recruited from primary care and rheumatology specialty settings. Patients who had baseline scores of at least 40 mm on the Patient’s Assessment of Arthritis Pain–Visual Analog Scale (PAAP-VAS) and baseline categorical scores of poor to very poor on the Patient’s (PaGAA) and Physician’s (PhGAA) Global Assessments of Arthritis were included.8,9 Any patient suffering from inflammatory arthritis, gout, pseudogout, Paget disease, or any chronic pain syndrome that might interfere with assessment of the Index Knee was excluded from the trial. Patients diagnosed with osteoarthritis of the hip ipsilateral to the Index Knee, severe anserine bursitis, acute joint trauma, or complete loss of articular cartilage on the Index Knee also were excluded. Patients were not eligible if they had active gastrointestinal disease, gastrointestinal tract ulceration 30 days before the trial, a significant bleeding disorder, or a history of gastric or duodenal surgery. Patients with an esophageal, gastric, pyloric channel, or duodenal ulcer or a score of at least 10 for esophageal, gastric, or duodenal erosions at the pretreatment endoscopy examination also were excluded.

FIGURE 1
Patient’s global assessment of arthritis

Study design

This multicenter, randomized, double-blind, placebo-controlled study compared the efficacy and upper gastrointestinal tract safety of valdecoxib at dosages of 5, 10, and 20 mg once daily with placebo and naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee. The trial was conducted in 85 centers in the United States and Canada, in accordance with the principles of good clinical practice and the Declaration of Helsinki. Eligible patients were randomized to treatment groups and self-administered oral study medication. Patients were randomized to study treatment in the order in which they were enrolled into the study by using a treatment sequence that was determined by a Searle-prepared computer-generated randomization schedule. Patients received their allocated study medications in bottles labeled A and B according to the randomization schedule. Personnel at the study centers carried out the assessments and remained blinded throughout the study. Eligible patients were enrolled and discontinued regular pain medication. Patients discontinued their normal medications at the following specified times before the baseline endoscopy: NSAIDs (including full-dose aspirin at a dosage of ≥325 mg/day) at 48 hours, corticosteroid injections at 4 weeks, and intra-articular injections of corticosteroid or hyaluronic acid preparations at 3 and 6 months, respectively. The use of antiulcer drugs, including H2 blockers, proton pump inhibitors, misoprostol, and sucralfate, was discontinued at least 24 hours before the baseline endoscopy.

 

 

Efficacy assessments

The following arthritis assessments were made at baseline and at 2, 6, and 12 weeks or at early termination after study drug administration. PaGAA or PhGAA was measured on a 5-point categorical scale, where 1 = very good, 2 = good, 3 = fair, 4 = poor, and 5 = very poor. The PAAP-VAS was measured on a scale of 0 to 100 mm, where 0 = no pain and 100 = most severe pain. The Western Ontario and McMaster’s Universities (WOMAC) Osteoarthritis indices including Pain, Stiffness, Physical Function, and Composite were measured as described previously.10

Upper gastrointestinal assessments

Upper gastrointestinal tract endoscopy was performed within 7 days before the first study dose and at the 12-week assessment or at early termination if the patient withdrew. An endoscopy could be performed at any time if the patient experienced symptoms suggestive of an ulcer. The endoscopists performing baseline and 12-week (early termination) assessments remained blinded throughout the study.

General safety assessments

Clinical laboratory tests were performed at screening, baseline, weeks 2, 6, and 12, or at early termination, and a complete physical examination was performed at screening and final visits. The incidence of adverse events occurring in each treatment arm was monitored throughout the study. Adverse events occurring within 7 days and serious adverse events occurring within 30 days of the last study dosage of medication were included in the safety analyses.

Statistical analyses

A sample size of 200 patients per treatment group was deemed sufficient to detect a difference in ulcer rates of 5% for valdecoxib vs 16% for naproxen, with 80% power and type 1 error at .017 (adjusted for 3 primary comparisons against placebo). Homogeneity of treatment groups at baseline with respect to age, height, weight, duration of osteoarthritis, PAAP-VAS, and WOMAC Osteoarthritis Index scores was assessed with 2-way analysis of variance, with treatment group and center as factors. All other demographics and baseline characteristics were compared with the Cochran-Mantel-Haenszel (CMH) test, stratified by center.

All efficacy assessments were performed on the modified intent-to-treat (ITT) cohort by using the last observation carried forward approach. The ITT cohort comprised all patients who were randomized and had taken at least 1 dose of study medication. Analyses of mean change from baseline for PaGAA, PhGAA, PAAP-VAS, and WOMAC Osteoarthritis indices were performed by using analysis of covariance, with treatment and center as factors and the corresponding baseline score as the covariate. Pairwise comparisons of valdecoxib at dosages of 10 and 20 mg once daily vs placebo were interpreted with the Hochberg procedure.11 Primary pairwise comparisons were amended in the statistical analysis plan before data unblinding to compare placebo with 10 and 20 mg valdecoxib, but not with the 5-mg dose. For all other comparisons, including 5 mg valdecoxib and naproxen vs placebo, differences were considered significant if the pairwise P values were less than .05. The incidence of withdrawal due to treatment failure was analyzed by the Fisher exact test, and the time to withdrawal in each treatment group was analyzed by log-rank test and plotted with the Kaplan-Meier product limit.12,13

Upper gastrointestinal tract endoscopic analyses were performed on the upper gastrointestinal tract ITT population. Randomized patients were included in this cohort if they received at least 1 dose of study medication and had undergone pretreatment and posttreatment endoscopies. Overall and pairwise comparisons of gastroduodenal, gastric, and duodenal ulcers and erosions were assessed with the CMH test stratified by center. The incidence of adverse events was compared between treatment groups with the Fisher exact test. Changes in vital signs were compared between treatment groups with an analysis of covariance using pairwise treatment comparisons, with treatment group as a factor and baseline value as a covariate.

Results

Patient baseline characteristics

Of the 1019 eligible randomized patients, 1 patient randomized to 10 mg/day valdecoxib, 1 to 20 mg/day valdecoxib, and 1 to 500 mg naproxen twice daily did not take the study medication and were excluded from efficacy and safety analyses. The remaining 1016 randomized patients received study medication and were included in the ITT cohort on which analyses of all efficacy end points were based. A total of 269 patients withdrew before the end of the study due to treatment failure, preexisting protocol violations, noncompliance, or adverse signs and symptoms, or were lost to follow-up: 74 patients in the placebo group, 39 in the 5-mg valdecoxib group, 56 in the 10-mg valdecoxib group, 44 in the 20-mg valdecoxib group, and 56 in the naproxen group. The upper gastrointestinal tract ITT cohort comprised 908 patients who were included in the upper gastrointestinal tract safety analyses. More than 90% of patients included in the study evaluated their osteoarthritis as poor to very poor as assessed by baseline PaGAA scores. Treatment groups were homogeneous with respect to demographics, vital signs, medical history, and all baseline arthritis assessments (Table 1).

 

 

TABLE 1
Patient baseline characteristics

  ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 206)20 mg qd (n = 202)500 mg bid (n = 205)
Mean (SD) age, y60.3 (10.5)58.7 (11.9)59.8 (11.0)59.6 (10.4)60.4 (10.7)
Mean (SD) weight, kg87.5 (21.2)91.4 (22.6)89.3 (21.4)92.6 (23.7)88.1 (21.7)
Race, n (%)
  White162 (79)155 (77)154 (75)160 (79)163 (80)
  Black21 (10)26 (13)24 (12)24 (12)23 (11)
  Asian1 (0)1 (0)1 (0)1 (0)2 (1)
  Hispanic19 (9)18 (9)25 (12)15 (7)15 (7)
Male sex, n (%)73 (36)73 (36)72 (35)66 (33)76 (37)
Mean (SD) disease duration, y8.3 (8.0)9.8 (9.5)8.7 (8.0)9.2 (8.0)9.4 (8.7)
History of GI bleeding, n (%)2 (1)0 (0)3 (1)2 (1)3 (1)
History of gastroduodenal ulcer, n (%)20 (10)21 (10)24 (12)28 (14)31 (15)
PaGAA, n (%)
  Poor168 (82)175 (87)168 (82)162 (80)169 (82)
  Very poor33 (16)23 (11)32 (16)36 (18)31 (15)
PhGAA, n (%)
  Poor179 (87)181 (90)176 (85)173 (86)175 (85)
  Very poor24 (12)18 (9)25 (12)24 (12)25 (12)
No significant differences were observed between treatment groups at any baseline characteristic.
bid, twice daily; GI, gastrointestinal; PaGAA, Patient’s Global Assessment of Arthritis; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily.

Efficacy

The least square mean change in the PaGAA was significantly improved at most assessments in response to valdecoxib (10 and 20 mg/day) and 500 mg naproxen twice daily compared with placebo (Table 2). However, the improvement in response to valdecoxib 5 mg qd did not reach statistical significance (Table 2). Significant improvements in the PhGAA were observed in response to valdecoxib and naproxen at all assessments (Table 2).

The dosages of 20 mg/day valdecoxib and 500 mg naproxen twice daily were associated with a reduction in pain, as assessed by the PAAP-VAS scores. Pain reduction associated with 5 and 10 mg/day valdecoxib was significantly better than that with placebo at all assessments except for week 12 (Table 2).

Valdecoxib and naproxen treatments improved the WOMAC Pain, Stiffness, Physical Function, and Composite indices compared with placebo at 2, 6, and 12 weeks. Valdecoxib 20 mg/day and naproxen 500 mg twice daily produced statistically significant changes in all WOMAC Osteoarthritis scores throughout the 12-week study period compared with placebo (P < .05). WOMAC Pain scores for 10 mg valdecoxib were significantly different from those for placebo at 2 weeks (P < .001) but not at 6 or 12 weeks. No significant differences were noted between any of the valdecoxib treatment doses and naproxen in terms of improvement in WOMAC indices.

The incidences of withdrawal due to treatment failure were 20% (95% confidence interval [CI], 15.3–26.8) in the placebo group; 8% (95% CI, 4.8–12.8), 12% (95% CI, 7.8–17.1), and 10% (95% CI, 6.3–15.2) in the 5-, 10-, and 20-mg/day valdecoxib groups; and 6% (95% CI, 3.6–10.9) in the 500-mg naproxen group (P < .05; Table 3). Patients in the placebo group withdrew at a significantly faster rate than those in the 4 active treatment groups (P < .05), but there were no significant differences in withdrawal rates across the 4 active treatment groups.

TABLE 2
Baseline arthritis assessments and mean changes from baseline scores

 ValdecoxibNaproxen
 Placebo (n = 205)5 mg qd (n = 201)10 mg qd (n = 205)20 mg qd (n = 201)500 mg bid (n = 204)
PhGAA§
Baseline mean4.104.074.094.094.10
LSM change
  Week 2 (CI)-1.04 (-1.16, -0.91)-1.31(-1.44, -1.19)-1.37(-1.50, -1.25)-1.42(-1.54, -1.29)-1.35(-1.48, -1.23)
  Week 6 (CI)-1.22 (-1.35, -1.08)-1.44*(-1.58, -1.31)-1.50(-1.63, -1.36)-1.41* (-1.55, -1.28)-1.45* (-1.59, -1.32)
  Week 12 (CI)-1.22 (-1.36, -1.08)-1.43* (-1.58, -1.28)-1.52(-1.67, -1.38)-1.45* (-1.60, -1.31)-1.43* (-1.58, -1.29)
PAAP
Baseline mean71.2071.4272.4172.5472.36
LSM change
  Week 2 (CI)-21.19 (-24.80, -17.58)-28.46(-32.11, -24.82)-30.21(-33.83, -26.59)-32.07(-35.73, -28.41)-31.03(-34.66, -27.40)
  Week 6 (CI)-23.92 (-27.72, -20.12)-30.81(-34.65, -26.97)-29.85* (-33.67, -26.04)-32.28(-36.13, -28.42)-31.84(-35.66, -28.02)
  Week 12 (CI)-25.97 (-30.02, -21.92)-31.33 (-35.42, -27.24)-30.41 (-34.47, -30.41)-32.70* (-36.81, -32.70)-31.83* (-35.90, -27.76)
WOMAC OA, Stiffness
Baseline mean4.844.874.914.734.94
LSM change
  Week 2 (CI)-0.78 (-0.98, -0.57)-1.03 (-1.24, -0.82)-1.20(-1.41, -0.99)-1.24(-1.45, -1.03)-1.28(-1.49, -1.08)
  Week 6 (CI)-1.04 (-1.27, -0.82)-1.25 (-1.48, -1.02)-1.42* (-1.65, -1.20)-1.43* (-1.66, -1.20)-1.40(-1.62, -1.17)
  Week 12 (CI)-1.12 (-1.36, -0.89)-1.33 (-1.57, -1.09)-1.41 (-1.65, -1.17)-1.46* (-1.70, -1.22)-1.54* (-1.78, -1.30)
WOMAC OA, Composite #
Baseline mean53.4953.0354.7353.4253.67
LSM change
  Week 2 (CI)-10.13 (-12.28, -7.99)-13.26* (-15.42, -11.09)-15.05(-17.20, -12.90)-15.44(-17.63, -13.32)-15.47(-17.63, -13.32)
  Week 6 (CI)-12.98 (-15.45, -10.51)-15.47 (-17.97, -12.98)-16.74* (-19.22, -14.26)-17.33* (-19.48, -14.51)-16.99* (-19.48, -14.51)
  Week 12 (CI)-13.48 (-16.07, -10.89)-16.84 (-19.46, -14.23)-17.34* (-19.93, -14.74)-17.22* (-20.64, -15.44)-18.04* (-20.64, -15.44)
*P < .05 vs placebo, significant.
P < .01 vs placebo, significant.
P < .001 vs placebo, significant.
§ Scale = 1 (very good) to 5 (very poor).
Scale = 0 mm (no pain) to 100 mm (most severe pain).
Scale = 0 (no symptoms) to 8 (worse symptoms).
# Scale = 0 (no symptoms) to 96 (worse symptoms).
bid, twice daily; CI, 95% confidence interval; LSM, least square mean; PAAP, Patient’s Assessment of Arthritis Pain; PhGAA, Physician’s Global Assessment of Arthritis; qd, once daily; WOMAC OA, Western Ontario and McMaster’s Universities Osteoarthritis Index.

TABLE 3
Incidence of gastroduodenal, gastric, and duodenal ulcers (>5 mm) at final endoscopic evaluation

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Gastroduodenal§8 (4) [2.1, 9.0]6 (3) [1.3, 7.1]5 (3) [1.1, 6.9]10 (5) [2.8, 10.0]18 (10)* [6.1, 15.3]
  Gastric§8 (4) [2.1, 9.0]4 (2) [0.7, 5.7]3 (2) [0.4, 5.4]9 (5) [2.4, 9.3]16 (9) [5.2, 14.1]
  Duodenal§0 (0) [0.05, 2.6]2 (1) [0.2, 4.2]2 (1) [0.2, 4.5]1 (1) [0.0, 3.4]2 (1) [0.2, 4.3]
Symptomatic ulcers (n)01237
*P < .05 vs placebo.
P < .05 vs naproxen.
P < .01 vs naproxen.
§ Data are presented as n (%) [95% confidence interval].
bid, twice daily; qd, once daily.
 

 

Safety

Valdecoxib and placebo had comparable upper gastrointestinal tract ulceration rates, whereas naproxen produced a significantly higher incidence of upper gastrointestinal tract ulcers than did 5 and 10 mg valdecoxib and placebo (P < .05). There were 14 adjudicated symptomatic ulcers during the study: 1 in the 5-mg valdecoxib group, 2 in the 10-mg valdecoxib group, 3 in the 20-mg valdecoxib group, and 7 in the 500-mg naproxen group.

Adverse events with an incidence of at least 5% in any treatment group and adverse events leading to withdrawal from the study are summarized by body system in Table 4. There were no significant differences in the incidence of adverse events between the valdecoxib and placebo groups. In contrast, 500 mg naproxen twice daily was associated with significantly more adverse events than 5 or 10 mg/day valdecoxib (P < .05). The incidence of adverse events was similar in the 20-mg valdecoxib and naproxen groups. Most adverse events were reported in the gastrointestinal system and consisted of abdominal pain, constipation, diarrhea, dyspepsia, flatulence, and nausea. The incidences of constipation, diarrhea, and flatulence were significantly higher in the naproxen group than in the 5-, 10-, and 20-mg valdecoxib groups, respectively. Other adverse events included accidental injury, headache, myalgia, and upper respiratory tract infections. Valdecoxib at 5 mg/day produced a significantly higher incidence of myalgia than did placebo, and valdecoxib at 20 mg/day produced a significantly lower incidence of upper respiratory tract infections than did placebo. Adverse events causing withdrawal with an incidence of at least 1% were accidental injury, abdominal pain, diarrhea, dyspepsia, nausea, abnormal hepatic function, rash, and blurred vision. The proportion of patients in the naproxen group (12.7%) who withdrew from the study was significantly greater than those for the 5-and 20-mg valdecoxib (6.0% and 5.5%) groups (P < .05), although the incidence of withdrawal due to adverse events in the 10-mg valdecoxib and naproxen groups were similar. In addition, gastrointestinal adverse events commonly related to NSAID treatment, such as dyspepsia and constipation, were more frequent in the naproxen group than in the valdecoxib and placebo groups.

TABLE 4
Adverse events

 ValdecoxibNaproxen
 Placebo (n = 178)5 mg qd (n = 188)10 mg qd (n = 174)20 mg qd (n = 185)500 mg bid (n = 183)
Incidence ≥ 5% in any treatment group
  Total109 (53.2)112 (55.7)113 (55.1)121 (60.2)139 (68.1)*
  Accidental injury11 (5.4)3 (1.5)10 (4.9)12 (6.0)9 (4.4)
  Headache11 (5.4)12 (6.0)7 (3.4)14 (7.0)9 (4.4)
  Abdominal pain19 (9.3)14 (7.0)18 (8.8)13 (6.5)25 (12.3)
  Constipation6 (2.9)4 (2.0)1 (0.5)4 (2.0)12 (5.9)
  Diarrhea10 (4.9)7 (3.5)14 (6.8)11 (5.5)12 (5.9)
  Dyspepsia15 (7.3)22 (10.9)22 (10.7)20 (9.9)35 (17.2)*
  Flatulence12 (5.9)7 (3.5)5 (2.4)9 (4.5)14 (6.9)
  Nausea10 (4.9)18 (9.0)17 (8.3)9 (4.5)10 (4.9)
  Myalgia0 (0.0)13 (6.5)*3 (1.5)2 (1.0)1 (0.5)
  Upper respiratory tract infections18 (8.8)9 (4.5)10 (4.9)7 (3.5)*10 (4.9)
Incidence ≥ 1% in any treatment group causing withdrawal
  Total17 (8.3)12 (6.0)18 (8.8)11 (5.5)26 (12.7)
  Accidental injury2 (1.0)0 (0.0)0 (0.0)1 (0.5)1 (0.5)
  Abdominal pain5 (2.4)2 (1.0)6 (2.9)2 (1.0)7 (3.4)
  Diarrhea0 (0.0)0 (0.0)1 (0.5)1 (0.5)3 (1.5)
  Dyspepsia2 (1.0)2 (1.0)3 (1.5)1 (0.5)9 (4.4)*
  Nausea2 (1.0)1 (0.5)2 (1.0)1 (0.5)2 (1.0)
  Abnormal hepatic function0 (0.0)2 (1.0)0 (0.0)0 (0.0)0 (0.0)
  Rash0 (0.0)2 (1.0)1 (0.5)0 (0.0)0 (0.0)
  Blurred vision2 (1.0)0 (0.0)1 (0.5)0 (0.0)0 (0.0)
*P < .05 vs placebo.
P < .05 vs naproxen.
Data are presented as number (%) of patients reporting events.
bid, twice daily; qd, once daily.

FIGURE 2
Western Ontario and McMaster’s Universities Osteoarthritis Pain Index

Discussion

This study confirmed that the novel COX-2–specific inhibitor valdecoxib at a dosage of 10 or 20 mg/day is as effective as naproxen at a dosage of 500 mg twice daily in relieving moderate to severe osteoarthritis of the knee over 12 weeks. In addition, treatment with 10 mg/day valdecoxib orally, the recommended dosage for treatment of osteoarthritis, is associated with a significantly lower gastroduodenal ulceration rate than occurs with the conventional NSAID, naproxen.

Patients receiving 10 and 20 mg/day valdecoxib experienced significant improvements in the signs and symptoms of osteoarthritis, and in all assessments the efficacies of valdecoxib 10 and 20 mg/day were numerically similar to that of naproxen. This finding is consistent with the inhibition of prostaglandin production in inflamed synovial tissue and in the central pain pathway. Increased COX-2 activity in the spinal cord in response to tissue damage and in the synovial membrane of osteoarthritis patients is at least partly responsible for joint inflammation and sensitization to inflammatory pain.1416 The efficacy of valdecoxib in treating moderate to severe osteoarthritis of the knee was consistent with reports of other COX-2–specific inhibitors that are comparable to conventional NSAIDs in relieving chronic pain and inflammation.17,18 These data confirmed that 10 mg/day valdecoxib is as effective as 500 mg naproxen twice daily in treating the pain and inflammation associated with osteoarthritis. The efficacy of 10 mg/day valdecoxib makes it one of the most potent COX-2–specific inhibitors for treating moderate to severe osteoarthritis.

 

 

Conventional NSAIDs were associated with a significant risk of serious gastrointestinal complications such as ulceration and perforation and low gastrointestinal tolerability.2022 Naproxen treatment of osteoarthritis and rheumatoid arthritis demonstrated a higher rate of endoscopically proven gastrointestinal ulceration than did COX-2–specific inhibitors,17,23 and that finding was confirmed in this study for 10 mg valdecoxib. Naproxen treatment was associated with significantly more gastroduodenal ulcers than 5 or 10 mg valdecoxib. We found no significant difference between 20 mg valdecoxib and naproxen, which might be explained by a lower incidence of ulcers with naproxen than reported in previous studies.24 In terms of numbers needed to treat, 14 patients would be needed to observe a difference in endoscopic ulcer rates between valdecoxib (5 or 10 mg) and naproxen compared with 20 patients to observe a difference between 20 mg valdecoxib and naproxen and 16 to observe a difference in ulcer rates between naproxen and placebo.

Valdecoxib at a dosage of 10 mg/day also demonstrated overall improved gastrointestinal tolerability, with significantly fewer adverse events and withdrawals due to adverse events, in particular gastrointestinal-related events such as constipation and dyspepsia, than did naproxen. The improved upper gastrointestinal tract safety of valdecoxib was as expected because the COX-1–sparing nature of this agent allows effective inhibition of COX-2 without inhibiting COX-1 in the gastric mucosa and platelets. An improved gastrointestinal safety profile is an important consideration in the treatment of osteoarthritis because the moderate to severe gastrointestinal complications associated with conventional NSAID therapy frequently lead to poor patient compliance or discontinuation of the medication.25,26

Overall, this study suggests clinical benefits of single daily doses of 10 and 20 mg valdecoxib and improved upper gastrointestinal tract safety for the 10-mg dose, compared with 500 mg/day naproxen. No additional efficacy benefit was obtained from a 20-mg dose as opposed to a 10-mg dose. Valdecoxib (10 mg) is a potent and effective once-daily alternative to conventional NSAIDs, with a gastrointestinal safety advantage that will be of value to rheumatologists and primary care physicians alike.

FIGURE 3
Western Ontario and Western Universities Osteoarthritis Physical Function Index

References

1. Klippel J, et al. Primer on the Rheumatic Diseases. 12th ed. Atlanta, GA: Arthritis Foundation; 2001.

2. Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev 1988;10:1-28.

3. Borda IT, Koff R. NSAIDs: A Profile of Adverse Effects. Philadelphia: Hanley and Belfus; 1995.

4. Bensen WG, Fiechtner JJ, McMillen JI, et al. Treatment of osteoarthritis with celecoxib, a cyclooxygenase-2 inhibitor: a randomized controlled trial. Mayo Clin Proc 1999;74:1095-105.

5. Geis GS. Update on clinical developments with celecoxib, a new specific COX-2 inhibitor: what can we expect? J Rheumatol 1999;26(suppl 56):31-6.

6. Altman R, Asch E, Bloch G, et al. The American College of Rheumatology criteria for the classification and reportings of osteoarthritis of the knee. Arthritis Rheum 1986;29:1039-49.

7. Schumacher HR. Primer on the Rheumatic Diseases. Atlanta, GA: Arthritis Foundation; 1986.

8. Cooperating Clinics Committee of American Rheumatism Association. A seven day variability study of 499 patients with peripheral rheumatoid arthritis. Arthritis Rheum 1965;8:302-34.

9. Ward JR, Williams HJ, Boyce E, et al. Comparison of auranofin, gold sodium thiomalate, and placebo in the treatment of rheumatoid arthritis. Subsets of responses. Am J Med 1983;75:133-7.

10. Bellamy N. WOMAC Osteoarthritis Index: A User’s Guide. London, Ontario, Canada: The Western Ontario and McMaster Universities; 1995.

11. Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 1998;75:800-2.

12. Miller R. Survival Analyses. New York: John Wiley & Sons; 1998.

13. Simon R, Lee YJ. Nonparametric confidence limits for survival probabilities and median survival time. Cancer Treat Rep 1982;66:37-42.

14. Amin AR, Attur M, Patel RN, et al. Superinduction of cyclooxygenase-2 activity in human osteoarthritis-affected cartilage. Influence of nitric oxide. J Clin Invest 1997;99:1231-7.

15. Hay C, de Belleroche J. Carrageenan-induced hyperalgesia is associated with increased cyclooxygenase-2 expression in spinal cord. Neuroreport 1997;8:1249-51.

16. Kang RY, Freire Moar, Sigal E, et al. Expression of cyclooxygenase-2 in human and an animal model of rheumatoid arthritis. Br J Rheumatol 1996;35:711-8.

17. Bensen WG, Zhao SZ, Burke TA, et al. Upper gastrointestinal tolerability of celecoxib, a COX-2 specific inhibitor, compared to naproxen and placebo. J Rheumatol 2000;27:1876-83.

18. Day R, Morrison B, Luza A, et al. A randomized trial of the efficacy and tolerability of the COX-2 inhibitor rofecoxib vs ibuprofen in patients with osteoarthritis. Rofecoxib/Ibuprofen Comparator Study Group. Arch Intern Med 2000;160:1781-7.

19. Fiechtner J, Sikes D, Recker D. A double-blind, placebo-controlled dose ranging study to evaluate the efficacy of valdecoxib, a novel COX-2 specific inhibitor, in treating the signs and symptoms of osteoarthritis of the knee. Paper presented at: European League Against Rheumatism (EULAR); May 13–16, 2001; Prague, Czech Republic.

20. Garcia Rodriguez LA, Jick H. Risk of upper gastrointestinal bleeding and perforation associated with individual nonsteroidal anti-inflammatory drugs. Lancet 1994;343:769-72.

21. Singh G, Ramey DR, Morfeld D, et al. Gastrointestinal tract complications of nonsteroidal anti-inflammatory drug treatment in rheumatoid arthritis. A prospective observational cohort study. Arch Intern Med 1996;156:1530-6.

22. Singh G, Rosen Ramey D. NSAID induced gastrointestinal complications: the ARAMIS perspective-1997. Arthritis, Rheumatism, and Aging Medical Information System. J Rheumatol 1998;51(suppl):8-16.

23. Watson DJ, Harper SE, Zhao PL, et al. Gastrointestinal tolerability of the selective cyclooxygenase-2 (COX-2) inhibitor rofecoxib compared with nonselective COX-1 and COX-2 inhibitors in osteoarthritis. Arch Intern Med 2000;160:2998-3003.

24. Simon LS, Weaver AL, Graham DY, et al. Anti-inflammatory and upper gastrointestinal effects of celecoxib in rheumatoid arthritis: a randomized controlled trial. JAMA 1999;282:1921-8.

25. Langman MJ, Jensen DM, Watson DJ, et al. Adverse upper gastrointestinal effects of rofecoxib compared with NSAIDs. JAMA 1999;282:1929-33.

26. Scholes D, Stergachis A, Penna P, Normand E, Hansten P. Nonsteroidal anti-inflammatory drug discontinuation in patients with osteoarthritis. J Rheumatol 1995;22:708-12.

References

1. Klippel J, et al. Primer on the Rheumatic Diseases. 12th ed. Atlanta, GA: Arthritis Foundation; 2001.

2. Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev 1988;10:1-28.

3. Borda IT, Koff R. NSAIDs: A Profile of Adverse Effects. Philadelphia: Hanley and Belfus; 1995.

4. Bensen WG, Fiechtner JJ, McMillen JI, et al. Treatment of osteoarthritis with celecoxib, a cyclooxygenase-2 inhibitor: a randomized controlled trial. Mayo Clin Proc 1999;74:1095-105.

5. Geis GS. Update on clinical developments with celecoxib, a new specific COX-2 inhibitor: what can we expect? J Rheumatol 1999;26(suppl 56):31-6.

6. Altman R, Asch E, Bloch G, et al. The American College of Rheumatology criteria for the classification and reportings of osteoarthritis of the knee. Arthritis Rheum 1986;29:1039-49.

7. Schumacher HR. Primer on the Rheumatic Diseases. Atlanta, GA: Arthritis Foundation; 1986.

8. Cooperating Clinics Committee of American Rheumatism Association. A seven day variability study of 499 patients with peripheral rheumatoid arthritis. Arthritis Rheum 1965;8:302-34.

9. Ward JR, Williams HJ, Boyce E, et al. Comparison of auranofin, gold sodium thiomalate, and placebo in the treatment of rheumatoid arthritis. Subsets of responses. Am J Med 1983;75:133-7.

10. Bellamy N. WOMAC Osteoarthritis Index: A User’s Guide. London, Ontario, Canada: The Western Ontario and McMaster Universities; 1995.

11. Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 1998;75:800-2.

12. Miller R. Survival Analyses. New York: John Wiley & Sons; 1998.

13. Simon R, Lee YJ. Nonparametric confidence limits for survival probabilities and median survival time. Cancer Treat Rep 1982;66:37-42.

14. Amin AR, Attur M, Patel RN, et al. Superinduction of cyclooxygenase-2 activity in human osteoarthritis-affected cartilage. Influence of nitric oxide. J Clin Invest 1997;99:1231-7.

15. Hay C, de Belleroche J. Carrageenan-induced hyperalgesia is associated with increased cyclooxygenase-2 expression in spinal cord. Neuroreport 1997;8:1249-51.

16. Kang RY, Freire Moar, Sigal E, et al. Expression of cyclooxygenase-2 in human and an animal model of rheumatoid arthritis. Br J Rheumatol 1996;35:711-8.

17. Bensen WG, Zhao SZ, Burke TA, et al. Upper gastrointestinal tolerability of celecoxib, a COX-2 specific inhibitor, compared to naproxen and placebo. J Rheumatol 2000;27:1876-83.

18. Day R, Morrison B, Luza A, et al. A randomized trial of the efficacy and tolerability of the COX-2 inhibitor rofecoxib vs ibuprofen in patients with osteoarthritis. Rofecoxib/Ibuprofen Comparator Study Group. Arch Intern Med 2000;160:1781-7.

19. Fiechtner J, Sikes D, Recker D. A double-blind, placebo-controlled dose ranging study to evaluate the efficacy of valdecoxib, a novel COX-2 specific inhibitor, in treating the signs and symptoms of osteoarthritis of the knee. Paper presented at: European League Against Rheumatism (EULAR); May 13–16, 2001; Prague, Czech Republic.

20. Garcia Rodriguez LA, Jick H. Risk of upper gastrointestinal bleeding and perforation associated with individual nonsteroidal anti-inflammatory drugs. Lancet 1994;343:769-72.

21. Singh G, Ramey DR, Morfeld D, et al. Gastrointestinal tract complications of nonsteroidal anti-inflammatory drug treatment in rheumatoid arthritis. A prospective observational cohort study. Arch Intern Med 1996;156:1530-6.

22. Singh G, Rosen Ramey D. NSAID induced gastrointestinal complications: the ARAMIS perspective-1997. Arthritis, Rheumatism, and Aging Medical Information System. J Rheumatol 1998;51(suppl):8-16.

23. Watson DJ, Harper SE, Zhao PL, et al. Gastrointestinal tolerability of the selective cyclooxygenase-2 (COX-2) inhibitor rofecoxib compared with nonselective COX-1 and COX-2 inhibitors in osteoarthritis. Arch Intern Med 2000;160:2998-3003.

24. Simon LS, Weaver AL, Graham DY, et al. Anti-inflammatory and upper gastrointestinal effects of celecoxib in rheumatoid arthritis: a randomized controlled trial. JAMA 1999;282:1921-8.

25. Langman MJ, Jensen DM, Watson DJ, et al. Adverse upper gastrointestinal effects of rofecoxib compared with NSAIDs. JAMA 1999;282:1929-33.

26. Scholes D, Stergachis A, Penna P, Normand E, Hansten P. Nonsteroidal anti-inflammatory drug discontinuation in patients with osteoarthritis. J Rheumatol 1995;22:708-12.

Issue
The Journal of Family Practice - 51(06)
Issue
The Journal of Family Practice - 51(06)
Page Number
530-537
Page Number
530-537
Publications
Publications
Article Type
Display Headline
Randomized placebo-controlled trial comparing efficacy and safety of valdecoxib with naproxen in patients with osteoarthritis
Display Headline
Randomized placebo-controlled trial comparing efficacy and safety of valdecoxib with naproxen in patients with osteoarthritis
Legacy Keywords
,Cyclooxygenase-2–specific inhibitorsosteoarthritisnonsteroidal anti-inflammatory drugprostaglandin-endoperoxide synthase. (J Fam Pract 2002; 51:530–537)
Legacy Keywords
,Cyclooxygenase-2–specific inhibitorsosteoarthritisnonsteroidal anti-inflammatory drugprostaglandin-endoperoxide synthase. (J Fam Pract 2002; 51:530–537)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Association of cervical cryotherapy with inadequate follow-up colposcopy

Article Type
Changed
Display Headline
Association of cervical cryotherapy with inadequate follow-up colposcopy

 

ABSTRACT

OBJECTIVE: We studied the anatomic changes that occur in the ectocervix after cryotherapy and the role these changes play in the adequacy of follow-up colposcopic examination.

STUDY DESIGN: We retrospectively reviewed patients’ charts.

POPULATION: Between January 1, 1991, and December 1, 1995, 268 women underwent 2 colposcopic examinations in 7 state-run public health clinics.

OUTCOMES MEASURED: The likelihood that a follow-up colposcopic examination would be inadequate.

RESULTS: Of the 268 women who underwent 2 colposcopic examinations during the study period, 83 had cryotherapy, 24 had loop excision of the ectocervical portion or cervical conization, and 96 had no procedure. Sixty-five were excluded because of missing data. Subjects were similar with respect to age, whether endocervical curettage was performed, presence of cervical dysplasia or human papilloma virus, and whether glandular involvement was noted. Patients who had cryotherapy had an increased likelihood of inadequate follow-up colposcopic examination compared with women who had no procedure (adjusted odds ratio = 18.7, 95% confidence interval = 7.0–49.8).

CONCLUSIONS: Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. Given the reported high rates of regression of mild and moderate cervical dysplasia and the risks posed by possibly unnecessary procedures performed after inadequate colposcopic examination, a trend toward less aggressive therapy and watchful waiting may be appropriate but should be investigated in a controlled clinical trial.

 

KEY POINTS FOR CLINICIANS

 

  • Based on this study, cervical cryotherapy increases the risk that a follow-up colposcopic examination will be inadequate.
  • Further studies are needed to determine the most effective treatment for mild cervical dysplasia and possible local effects of cryotherapy.

Cryotherapy is an accepted procedure for treating low-grade cervical dysplasia.1,2 Only minor modifications of the precise technique of cryotherapy application have occurred since its inception. Currently the double-freeze technique of cryotherapy is an accepted treatment for mild and focal moderate dysplasia of the uterine cervix.3 Cervical cryotherapy is used widely not only because of its proven efficacy but also because of its ease of use in the outpatient setting and lack of known significant side effects. The procedure can be performed in the office setting without the use of local or general anesthesia, making it superior to the more invasive procedures performed before the availability of cryotherapy (eg, cervical conization and hysterectomy).

There has been limited investigation of the effects of cryotherapy on the anatomy of the uterine cervix. Whereas one study showed that cryotherapy has no effect on subsequent fertility or pregnancy outcome,4 another in adolescents reported cervical stenosis and pelvic inflammatory disease as possible treatment side effects.5 In addition, a study published in 1984 by Jobson and Homesley reported higher rates of retraction of the proximal squamocolumnar junction into the endocervical canal in patients undergoing cryotherapy compared with patients undergoing carbon dioxide laser ablation6; 47% of the follow-up colposcopic examinations were inadequate in that study population. Adequacy of colposcopic examination is defined as complete visualization of the transformation zone, visualization of the entire lesion, if present, and correlation between cytologic and histologic findings and the colposcopist’s impression.7 Failure to meet any one of these criteria leads to an inadequate colposcopic examination requiring further, more invasive evaluation. This study compared the rate of adequate and inadequate colposcopic examinations in women with and without a history of cryotherapy. Other factors found to influence the adequacy of follow-up colposcopy also are described.

Methods

We performed a retrospective cohort study using data collected from 7 of 14 state-run public health clinics. These 7 sites included rural and urban clinics. All women undergoing at least 2 colposcopic examinations in these clinics between January 1, 1991, and December 1, 1995, were included. Women underwent initial colposcopic examination after an abnormality was noted on a screening Pap test. Only women who had both colposcopic examinations in the same clinic were included. Care provided in these clinics included Pap test screening, colposcopic examinations, and treatment of identified cervical dysplasia with cervical cryotherapy, conization, and loop excision of the ectocervical portion (LEEP). State-contracted physicians trained in obstetrics and gynecology followed women who attended these clinics.

Chart review was used to determine the adequacy of the initial examination, whether an intervening procedure was done, and the adequacy of follow-up colposcopic examination. Adequacy was documented by the physician performing the colposcopic examination with the use of a standard form consistent among clinics. The accepted criteria for adequacy were used, and each colposcopic examination was documented as adequate or inadequate based on the colposcopist’s findings. Charts were reviewed and data were abstracted by 3 reviewers. Cervical biopsy results, presence of human papilloma virus (HPV) noted on routine cytology, endocervical curettage (ECC) results, and routine demographic data also were recorded. The management and therapeutic protocols were consistent across the 7 clinics.

 

 

Women were excluded from the analysis if (1) they had cryotherapy performed before their initial colposcopic examination (n = 1), (2) the date of the initial colposcopic examination was not available (n = 36), (3) information confirming the type of treatment used between colposcopic examinations was unknown (n = 32), (4) initial colposcopic examination was inadequate (n = 16), or (5) the adequacy of the follow-up colposcopic examination was not documented (n = 6). The total number of women excluded was 65 because some women met multiple exclusion criteria.

The management after initial colposcopic examination was done according to whether the women had cryotherapy, cone, LEEP, or no procedure between initial and follow-up colposcopic examinations. Univariate analysis of the association between the management group with clinic of treatment, performance of ECC, biopsy results, presence of HPV, and cytologic presence of glandular atypia was performed. Mean age and duration (interval between initial and follow-up colposcopic examinations or between the procedure and follow-up examination) were calculated for all groups.

The odds ratio of an inadequate follow-up colposcopic examination was estimated for type of treatment (cryotherapy, cone/LEEP) compared with no treatment, age, clinic where treatment was provided, performance of ECC, biopsy results, presence of HPV, and presence of glandular atypia. The 95% confidence intervals about the relative odds estimates were calculated. Mean age and duration between initial colposcopy and follow-up colposcopy were calculated for the groups with adequate and inadequate follow-up colposcopic examinations.

Multivariable logistic regression analysis was used to evaluate the association of adequacy of follow-up colposcopic examination with age (years), clinic where colposcopic examination was performed, duration (days), whether or not ECC was performed, biopsy results from the initial colposcopic examination, presence of HPV, and presence of glandular atypia noted on initial colposcopy. Biopsy results were categorized as normal or abnormal in the model that is reported. The stepwise backward elimination technique was used to evaluate the best model. The 95% confidence intervals about the adjusted odds ratio were calculated.

The Pearson chi-square test was used to test the significance of the association between binary variables. The significance of the difference between means was tested with the one-way analysis of variance. Data were analyzed with the personal computer version of the Statistical Package for the Social Sciences (SPSS/PC+ version 8.0).

Results

Between January 1, 1991, and December 31, 1995, 3225 women underwent colposcopic evaluation or treatment at 7 county colposcopy clinics in Oklahoma. Two hundred sixty-eight of these women underwent 2 examinations during the study period. There were 203 of 268 subjects available for analysis after exclusions for missing data. Eighty-three patients (41.1%) had cryotherapy, 24 (11.9%) underwent a cone biopsy or a LEEP procedure, and 96 (47.5%) underwent no procedure between initial and follow-up colposcopic examinations.

Table 1 shows characteristics of women who had cryotherapy, cone/LEEP, and no procedure. The groups were similar with respect to age, whether ECC was performed, presence of HPV, and whether glandular involvement was noted. There was an association between the degree of cervical dysplasia and the three treatment groups, which was expected because degree of dysplasia determines treatment modality. Women who had cryotherapy had follow-up colposcopy (mean = 565 days) later than women who had cone or LEEP (mean = 319 days) or no procedure (mean = 339 days; P < .0001).

Thirty-three percent (n = 67) had inadequate follow-up colposcopic examinations. These included a large proportion of women, 61.4%, who had cryotherapy (51/83) compared with 20.8% (5/24) of women who had cone or LEEP and 11.5% (11/96) of women who had no procedure.

Table 2 shows the relationship between inadequate second colposcopy and previous cryotherapy, cone/LEEP, abnormal cervical biopsy, ECC, presence of HPV, and presence of glandular atypia. Patients who had cryotherapy had an increased likelihood of inadequate follow-up compared with patients who had no procedure (adjusted odds ratio =18.67, 95% confidence interval = 6.99–49.81). Cone/LEEP increased the likelihood of inadequate follow-up but was not statistically significant. Age, duration, ECC, presence of HPV, or presence of glandular atypia did not increase the likelihood of subsequent inadequate colposcopic examination. Odds ratio estimates for different clinics are not reported but were imprecise due to small numbers.

TABLE 1
Characteristics of patients with and without cryotherapy between initial and follow-up colposcopy

 

CharacteristicCryotherapy (n = 82)Cone or LEEP (n = 24)No procedure (n = 96)P*
Mean age (y)24.626.523.8.229
Mean duration(d)565319339.004
ECC (%)80.275.07.1.135
Cervical dysplasia (%)   < .001
  Normal21.719.039.5 
  Mild dysplasia58.023.845.3 
  >Mild dysplasia20.357.115.1 
HPV (%)72.652.460.4.130
Glandular atypia (%)23.938.117.8.125
*Pearson 2 for proportions and analysis of variance for means.
Duration from treatment (cryotherapy) or examination to follow-up colposcopic examination.
ECC, endocervical curettage; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion.
 

 

TABLE 2
Likelihood of inadequate follow-up colposcopic examination*

 

CharacteristicsAdjusted OR95% CI
Cryotherapy18.666.99–49.81
Cone or LEEP3.010.78–11.58
Cervical dysplasia  
  MildNA 
  >MildNA 
ECCNA 
HPVNA 
Glandular atypiaNA 
*Logistic regression model included the clinic of colposcopy (not shown). Age (years) and duration (days; from treatment or first colposcopy to second colposcopy) were removed from the model by backward elimination.
CI, confidence interval; ECC, endocervical curettage ; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion; NA, not applicable; OR, odds ratio.

Discussion

Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. This agrees with the findings of Jobson and Homesley’s 1984 study,6 which looked at the efficacy of cryotherapy vs carbon dioxide laser ablation in the treatment of cervical dysplasia. Although it was not the focus of their study, a high rate of inadequacy was noted on follow-up colposcopic examinations after cryotherapy.

Because of the retrospective design of this study, we could not randomly assign women to a treatment group. However, the study groups were similar with respect to other variables potentially associated with the outcome measure. In addition, we attempted to control confounding variables by using multivariable analysis. By including the clinic where the examination was performed, we attempted to limit the effect of the subjective assignment of adequacy by the physician. This is a limitation of this study.

We found an association between the clinics where the follow-up colposcopist’s examinations were performed and whether a follow-up examination was adequate or inadequate. The determination of adequacy depends on the physician’s observations during the colposcopic examination. We were unable to measure the intra- or interobservation variation between the examinations. However, we attempted to control for this effect by including the clinic site in the multivariable analysis.

The current standard of care for inadequate colposcopic examination recommends more invasive evaluation with a procedure such as cervical conization or LEEP. This allows clarification of discordance between cytology, histology, and the colposcopist’s impression; sampling of any lesion that may extend past the view of standard colposcopy; and histologic evaluation of the entire transformation zone. Given the reported high rates of spontaneous regression of mild and moderate cervical dysplasias with a watchful waiting approach,8 12 we wonder whether we are performing unnecessary procedures (LEEP and conization) after cryotherapy as a result of inadequate follow-up colposcopic examinations. A study evaluating the pathologic findings of cone or LEEP specimens from inadequate colposcopic examinations after cryotherapy would help answer these questions. If there is no persistence or progression of dysplasia, then this would support the hypothesis that cryotherapy leads to unnecessary, invasive procedures. Further controlled trials are required to answer these questions.

ACKNOWLEDGMENTS

The authors acknowledge the assistance of Adeline Yerkes, of the Chronic Disease Division, Oklahoma State Department of Health, in facilitating access to the county clinic records.

References

 

1. Ferris DG. Office procedures: colposcopy. Prim Care 1997;24:241-67.

2. Crisp WE, Asadourian L, Romberger W. Application of cryosurgery to gynecologic malignancy. Obstet Gynecol 1967;30:668-73.

3. Mayeaux EJ, Jr, Spigener SD, German JA. Cryotherapy of the uterine cervix. J Fam Pract 1998;47:99-102.

4. Benrubi GI, Young M, Nuss RC. Intrapartum outcome of term pregnancy after cervical cryotherapy. J Reprod Med 1984;29:251-4.

5. Hillard PA, Biro FM, Wildey L. Complications of cervical cryotherapy in adolescents. J Reprod Med 1991;36:711-5.

6. Jobson VW, Homesley HD. Comparison of cryosurgery and carbon dioxide laser ablation for treatment of cervical intraepithelial neoplasia. Colposc Gynecol Laser Surg 1984;1:173-80.

7. Ryan KJ. Kistner’s Gynecology and Women’s Health. 7th ed. St Louis, MO: Mosby; 1999.

8. Ostergard DR. Cryosurgical treatment of cervical intraepithelial neoplasia. Obstet Gynecol 1980;56:231-3.

9. Walton LA, Edelman DA, Fowler WC, Jr, Photopulos GJ. Cryosurgery for the treatment of cervical intraepithelial neoplasia during the reproductive years. Obstet Gynecol 1980;55:353-7.

10. Hemmingsson E, Stendahl U, Stenson S. Cryosurgical treatment of cervical intraepithelial neoplasia with follow-up of five to eight years. Am J Obstet Gynecol 1981;139:144-7.

11. Andersen ES, Husth M. Cryosurgery for cervical intraepithelial neoplasia: 10-year follow-up. Gynecol Oncol 1992;45:240-2.

12. Benedet JL, Miller DM, Nickerson KG, Anderson GH. The results of cryosurgical treatment of cervical intraepithelial neoplasia at one, five, and ten years. Am J Obstet Gynecol 1987;157:268-73.

Article PDF
Author and Disclosure Information

 

RHONDA A. SPARKS, MD
DEWEY SCHEID, MD
VICKI LOEMKER, MD
ERIC STADER, MD
KATHY REILLY, MD, MPH
ROB HAMM, PHD
LAINE MCCARTHY, MLIS
Oklahoma City, Oklahoma
From the Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK. The authors report no competing interests. Address reprint requests to Rhonda Sparks, MD, Assistant Professor, Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, 900 NE 10th Street, Oklahoma City, OK 73104.
[email protected]

Issue
The Journal of Family Practice - 51(06)
Publications
Topics
Page Number
526-529
Legacy Keywords
,Colposcopycervical dysplasiacervical cryotherapy. (J Fam Pract 2002; 51:526–529)
Sections
Author and Disclosure Information

 

RHONDA A. SPARKS, MD
DEWEY SCHEID, MD
VICKI LOEMKER, MD
ERIC STADER, MD
KATHY REILLY, MD, MPH
ROB HAMM, PHD
LAINE MCCARTHY, MLIS
Oklahoma City, Oklahoma
From the Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK. The authors report no competing interests. Address reprint requests to Rhonda Sparks, MD, Assistant Professor, Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, 900 NE 10th Street, Oklahoma City, OK 73104.
[email protected]

Author and Disclosure Information

 

RHONDA A. SPARKS, MD
DEWEY SCHEID, MD
VICKI LOEMKER, MD
ERIC STADER, MD
KATHY REILLY, MD, MPH
ROB HAMM, PHD
LAINE MCCARTHY, MLIS
Oklahoma City, Oklahoma
From the Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK. The authors report no competing interests. Address reprint requests to Rhonda Sparks, MD, Assistant Professor, Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center, 900 NE 10th Street, Oklahoma City, OK 73104.
[email protected]

Article PDF
Article PDF

 

ABSTRACT

OBJECTIVE: We studied the anatomic changes that occur in the ectocervix after cryotherapy and the role these changes play in the adequacy of follow-up colposcopic examination.

STUDY DESIGN: We retrospectively reviewed patients’ charts.

POPULATION: Between January 1, 1991, and December 1, 1995, 268 women underwent 2 colposcopic examinations in 7 state-run public health clinics.

OUTCOMES MEASURED: The likelihood that a follow-up colposcopic examination would be inadequate.

RESULTS: Of the 268 women who underwent 2 colposcopic examinations during the study period, 83 had cryotherapy, 24 had loop excision of the ectocervical portion or cervical conization, and 96 had no procedure. Sixty-five were excluded because of missing data. Subjects were similar with respect to age, whether endocervical curettage was performed, presence of cervical dysplasia or human papilloma virus, and whether glandular involvement was noted. Patients who had cryotherapy had an increased likelihood of inadequate follow-up colposcopic examination compared with women who had no procedure (adjusted odds ratio = 18.7, 95% confidence interval = 7.0–49.8).

CONCLUSIONS: Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. Given the reported high rates of regression of mild and moderate cervical dysplasia and the risks posed by possibly unnecessary procedures performed after inadequate colposcopic examination, a trend toward less aggressive therapy and watchful waiting may be appropriate but should be investigated in a controlled clinical trial.

 

KEY POINTS FOR CLINICIANS

 

  • Based on this study, cervical cryotherapy increases the risk that a follow-up colposcopic examination will be inadequate.
  • Further studies are needed to determine the most effective treatment for mild cervical dysplasia and possible local effects of cryotherapy.

Cryotherapy is an accepted procedure for treating low-grade cervical dysplasia.1,2 Only minor modifications of the precise technique of cryotherapy application have occurred since its inception. Currently the double-freeze technique of cryotherapy is an accepted treatment for mild and focal moderate dysplasia of the uterine cervix.3 Cervical cryotherapy is used widely not only because of its proven efficacy but also because of its ease of use in the outpatient setting and lack of known significant side effects. The procedure can be performed in the office setting without the use of local or general anesthesia, making it superior to the more invasive procedures performed before the availability of cryotherapy (eg, cervical conization and hysterectomy).

There has been limited investigation of the effects of cryotherapy on the anatomy of the uterine cervix. Whereas one study showed that cryotherapy has no effect on subsequent fertility or pregnancy outcome,4 another in adolescents reported cervical stenosis and pelvic inflammatory disease as possible treatment side effects.5 In addition, a study published in 1984 by Jobson and Homesley reported higher rates of retraction of the proximal squamocolumnar junction into the endocervical canal in patients undergoing cryotherapy compared with patients undergoing carbon dioxide laser ablation6; 47% of the follow-up colposcopic examinations were inadequate in that study population. Adequacy of colposcopic examination is defined as complete visualization of the transformation zone, visualization of the entire lesion, if present, and correlation between cytologic and histologic findings and the colposcopist’s impression.7 Failure to meet any one of these criteria leads to an inadequate colposcopic examination requiring further, more invasive evaluation. This study compared the rate of adequate and inadequate colposcopic examinations in women with and without a history of cryotherapy. Other factors found to influence the adequacy of follow-up colposcopy also are described.

Methods

We performed a retrospective cohort study using data collected from 7 of 14 state-run public health clinics. These 7 sites included rural and urban clinics. All women undergoing at least 2 colposcopic examinations in these clinics between January 1, 1991, and December 1, 1995, were included. Women underwent initial colposcopic examination after an abnormality was noted on a screening Pap test. Only women who had both colposcopic examinations in the same clinic were included. Care provided in these clinics included Pap test screening, colposcopic examinations, and treatment of identified cervical dysplasia with cervical cryotherapy, conization, and loop excision of the ectocervical portion (LEEP). State-contracted physicians trained in obstetrics and gynecology followed women who attended these clinics.

Chart review was used to determine the adequacy of the initial examination, whether an intervening procedure was done, and the adequacy of follow-up colposcopic examination. Adequacy was documented by the physician performing the colposcopic examination with the use of a standard form consistent among clinics. The accepted criteria for adequacy were used, and each colposcopic examination was documented as adequate or inadequate based on the colposcopist’s findings. Charts were reviewed and data were abstracted by 3 reviewers. Cervical biopsy results, presence of human papilloma virus (HPV) noted on routine cytology, endocervical curettage (ECC) results, and routine demographic data also were recorded. The management and therapeutic protocols were consistent across the 7 clinics.

 

 

Women were excluded from the analysis if (1) they had cryotherapy performed before their initial colposcopic examination (n = 1), (2) the date of the initial colposcopic examination was not available (n = 36), (3) information confirming the type of treatment used between colposcopic examinations was unknown (n = 32), (4) initial colposcopic examination was inadequate (n = 16), or (5) the adequacy of the follow-up colposcopic examination was not documented (n = 6). The total number of women excluded was 65 because some women met multiple exclusion criteria.

The management after initial colposcopic examination was done according to whether the women had cryotherapy, cone, LEEP, or no procedure between initial and follow-up colposcopic examinations. Univariate analysis of the association between the management group with clinic of treatment, performance of ECC, biopsy results, presence of HPV, and cytologic presence of glandular atypia was performed. Mean age and duration (interval between initial and follow-up colposcopic examinations or between the procedure and follow-up examination) were calculated for all groups.

The odds ratio of an inadequate follow-up colposcopic examination was estimated for type of treatment (cryotherapy, cone/LEEP) compared with no treatment, age, clinic where treatment was provided, performance of ECC, biopsy results, presence of HPV, and presence of glandular atypia. The 95% confidence intervals about the relative odds estimates were calculated. Mean age and duration between initial colposcopy and follow-up colposcopy were calculated for the groups with adequate and inadequate follow-up colposcopic examinations.

Multivariable logistic regression analysis was used to evaluate the association of adequacy of follow-up colposcopic examination with age (years), clinic where colposcopic examination was performed, duration (days), whether or not ECC was performed, biopsy results from the initial colposcopic examination, presence of HPV, and presence of glandular atypia noted on initial colposcopy. Biopsy results were categorized as normal or abnormal in the model that is reported. The stepwise backward elimination technique was used to evaluate the best model. The 95% confidence intervals about the adjusted odds ratio were calculated.

The Pearson chi-square test was used to test the significance of the association between binary variables. The significance of the difference between means was tested with the one-way analysis of variance. Data were analyzed with the personal computer version of the Statistical Package for the Social Sciences (SPSS/PC+ version 8.0).

Results

Between January 1, 1991, and December 31, 1995, 3225 women underwent colposcopic evaluation or treatment at 7 county colposcopy clinics in Oklahoma. Two hundred sixty-eight of these women underwent 2 examinations during the study period. There were 203 of 268 subjects available for analysis after exclusions for missing data. Eighty-three patients (41.1%) had cryotherapy, 24 (11.9%) underwent a cone biopsy or a LEEP procedure, and 96 (47.5%) underwent no procedure between initial and follow-up colposcopic examinations.

Table 1 shows characteristics of women who had cryotherapy, cone/LEEP, and no procedure. The groups were similar with respect to age, whether ECC was performed, presence of HPV, and whether glandular involvement was noted. There was an association between the degree of cervical dysplasia and the three treatment groups, which was expected because degree of dysplasia determines treatment modality. Women who had cryotherapy had follow-up colposcopy (mean = 565 days) later than women who had cone or LEEP (mean = 319 days) or no procedure (mean = 339 days; P < .0001).

Thirty-three percent (n = 67) had inadequate follow-up colposcopic examinations. These included a large proportion of women, 61.4%, who had cryotherapy (51/83) compared with 20.8% (5/24) of women who had cone or LEEP and 11.5% (11/96) of women who had no procedure.

Table 2 shows the relationship between inadequate second colposcopy and previous cryotherapy, cone/LEEP, abnormal cervical biopsy, ECC, presence of HPV, and presence of glandular atypia. Patients who had cryotherapy had an increased likelihood of inadequate follow-up compared with patients who had no procedure (adjusted odds ratio =18.67, 95% confidence interval = 6.99–49.81). Cone/LEEP increased the likelihood of inadequate follow-up but was not statistically significant. Age, duration, ECC, presence of HPV, or presence of glandular atypia did not increase the likelihood of subsequent inadequate colposcopic examination. Odds ratio estimates for different clinics are not reported but were imprecise due to small numbers.

TABLE 1
Characteristics of patients with and without cryotherapy between initial and follow-up colposcopy

 

CharacteristicCryotherapy (n = 82)Cone or LEEP (n = 24)No procedure (n = 96)P*
Mean age (y)24.626.523.8.229
Mean duration(d)565319339.004
ECC (%)80.275.07.1.135
Cervical dysplasia (%)   < .001
  Normal21.719.039.5 
  Mild dysplasia58.023.845.3 
  >Mild dysplasia20.357.115.1 
HPV (%)72.652.460.4.130
Glandular atypia (%)23.938.117.8.125
*Pearson 2 for proportions and analysis of variance for means.
Duration from treatment (cryotherapy) or examination to follow-up colposcopic examination.
ECC, endocervical curettage; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion.
 

 

TABLE 2
Likelihood of inadequate follow-up colposcopic examination*

 

CharacteristicsAdjusted OR95% CI
Cryotherapy18.666.99–49.81
Cone or LEEP3.010.78–11.58
Cervical dysplasia  
  MildNA 
  >MildNA 
ECCNA 
HPVNA 
Glandular atypiaNA 
*Logistic regression model included the clinic of colposcopy (not shown). Age (years) and duration (days; from treatment or first colposcopy to second colposcopy) were removed from the model by backward elimination.
CI, confidence interval; ECC, endocervical curettage ; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion; NA, not applicable; OR, odds ratio.

Discussion

Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. This agrees with the findings of Jobson and Homesley’s 1984 study,6 which looked at the efficacy of cryotherapy vs carbon dioxide laser ablation in the treatment of cervical dysplasia. Although it was not the focus of their study, a high rate of inadequacy was noted on follow-up colposcopic examinations after cryotherapy.

Because of the retrospective design of this study, we could not randomly assign women to a treatment group. However, the study groups were similar with respect to other variables potentially associated with the outcome measure. In addition, we attempted to control confounding variables by using multivariable analysis. By including the clinic where the examination was performed, we attempted to limit the effect of the subjective assignment of adequacy by the physician. This is a limitation of this study.

We found an association between the clinics where the follow-up colposcopist’s examinations were performed and whether a follow-up examination was adequate or inadequate. The determination of adequacy depends on the physician’s observations during the colposcopic examination. We were unable to measure the intra- or interobservation variation between the examinations. However, we attempted to control for this effect by including the clinic site in the multivariable analysis.

The current standard of care for inadequate colposcopic examination recommends more invasive evaluation with a procedure such as cervical conization or LEEP. This allows clarification of discordance between cytology, histology, and the colposcopist’s impression; sampling of any lesion that may extend past the view of standard colposcopy; and histologic evaluation of the entire transformation zone. Given the reported high rates of spontaneous regression of mild and moderate cervical dysplasias with a watchful waiting approach,8 12 we wonder whether we are performing unnecessary procedures (LEEP and conization) after cryotherapy as a result of inadequate follow-up colposcopic examinations. A study evaluating the pathologic findings of cone or LEEP specimens from inadequate colposcopic examinations after cryotherapy would help answer these questions. If there is no persistence or progression of dysplasia, then this would support the hypothesis that cryotherapy leads to unnecessary, invasive procedures. Further controlled trials are required to answer these questions.

ACKNOWLEDGMENTS

The authors acknowledge the assistance of Adeline Yerkes, of the Chronic Disease Division, Oklahoma State Department of Health, in facilitating access to the county clinic records.

 

ABSTRACT

OBJECTIVE: We studied the anatomic changes that occur in the ectocervix after cryotherapy and the role these changes play in the adequacy of follow-up colposcopic examination.

STUDY DESIGN: We retrospectively reviewed patients’ charts.

POPULATION: Between January 1, 1991, and December 1, 1995, 268 women underwent 2 colposcopic examinations in 7 state-run public health clinics.

OUTCOMES MEASURED: The likelihood that a follow-up colposcopic examination would be inadequate.

RESULTS: Of the 268 women who underwent 2 colposcopic examinations during the study period, 83 had cryotherapy, 24 had loop excision of the ectocervical portion or cervical conization, and 96 had no procedure. Sixty-five were excluded because of missing data. Subjects were similar with respect to age, whether endocervical curettage was performed, presence of cervical dysplasia or human papilloma virus, and whether glandular involvement was noted. Patients who had cryotherapy had an increased likelihood of inadequate follow-up colposcopic examination compared with women who had no procedure (adjusted odds ratio = 18.7, 95% confidence interval = 7.0–49.8).

CONCLUSIONS: Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. Given the reported high rates of regression of mild and moderate cervical dysplasia and the risks posed by possibly unnecessary procedures performed after inadequate colposcopic examination, a trend toward less aggressive therapy and watchful waiting may be appropriate but should be investigated in a controlled clinical trial.

 

KEY POINTS FOR CLINICIANS

 

  • Based on this study, cervical cryotherapy increases the risk that a follow-up colposcopic examination will be inadequate.
  • Further studies are needed to determine the most effective treatment for mild cervical dysplasia and possible local effects of cryotherapy.

Cryotherapy is an accepted procedure for treating low-grade cervical dysplasia.1,2 Only minor modifications of the precise technique of cryotherapy application have occurred since its inception. Currently the double-freeze technique of cryotherapy is an accepted treatment for mild and focal moderate dysplasia of the uterine cervix.3 Cervical cryotherapy is used widely not only because of its proven efficacy but also because of its ease of use in the outpatient setting and lack of known significant side effects. The procedure can be performed in the office setting without the use of local or general anesthesia, making it superior to the more invasive procedures performed before the availability of cryotherapy (eg, cervical conization and hysterectomy).

There has been limited investigation of the effects of cryotherapy on the anatomy of the uterine cervix. Whereas one study showed that cryotherapy has no effect on subsequent fertility or pregnancy outcome,4 another in adolescents reported cervical stenosis and pelvic inflammatory disease as possible treatment side effects.5 In addition, a study published in 1984 by Jobson and Homesley reported higher rates of retraction of the proximal squamocolumnar junction into the endocervical canal in patients undergoing cryotherapy compared with patients undergoing carbon dioxide laser ablation6; 47% of the follow-up colposcopic examinations were inadequate in that study population. Adequacy of colposcopic examination is defined as complete visualization of the transformation zone, visualization of the entire lesion, if present, and correlation between cytologic and histologic findings and the colposcopist’s impression.7 Failure to meet any one of these criteria leads to an inadequate colposcopic examination requiring further, more invasive evaluation. This study compared the rate of adequate and inadequate colposcopic examinations in women with and without a history of cryotherapy. Other factors found to influence the adequacy of follow-up colposcopy also are described.

Methods

We performed a retrospective cohort study using data collected from 7 of 14 state-run public health clinics. These 7 sites included rural and urban clinics. All women undergoing at least 2 colposcopic examinations in these clinics between January 1, 1991, and December 1, 1995, were included. Women underwent initial colposcopic examination after an abnormality was noted on a screening Pap test. Only women who had both colposcopic examinations in the same clinic were included. Care provided in these clinics included Pap test screening, colposcopic examinations, and treatment of identified cervical dysplasia with cervical cryotherapy, conization, and loop excision of the ectocervical portion (LEEP). State-contracted physicians trained in obstetrics and gynecology followed women who attended these clinics.

Chart review was used to determine the adequacy of the initial examination, whether an intervening procedure was done, and the adequacy of follow-up colposcopic examination. Adequacy was documented by the physician performing the colposcopic examination with the use of a standard form consistent among clinics. The accepted criteria for adequacy were used, and each colposcopic examination was documented as adequate or inadequate based on the colposcopist’s findings. Charts were reviewed and data were abstracted by 3 reviewers. Cervical biopsy results, presence of human papilloma virus (HPV) noted on routine cytology, endocervical curettage (ECC) results, and routine demographic data also were recorded. The management and therapeutic protocols were consistent across the 7 clinics.

 

 

Women were excluded from the analysis if (1) they had cryotherapy performed before their initial colposcopic examination (n = 1), (2) the date of the initial colposcopic examination was not available (n = 36), (3) information confirming the type of treatment used between colposcopic examinations was unknown (n = 32), (4) initial colposcopic examination was inadequate (n = 16), or (5) the adequacy of the follow-up colposcopic examination was not documented (n = 6). The total number of women excluded was 65 because some women met multiple exclusion criteria.

The management after initial colposcopic examination was done according to whether the women had cryotherapy, cone, LEEP, or no procedure between initial and follow-up colposcopic examinations. Univariate analysis of the association between the management group with clinic of treatment, performance of ECC, biopsy results, presence of HPV, and cytologic presence of glandular atypia was performed. Mean age and duration (interval between initial and follow-up colposcopic examinations or between the procedure and follow-up examination) were calculated for all groups.

The odds ratio of an inadequate follow-up colposcopic examination was estimated for type of treatment (cryotherapy, cone/LEEP) compared with no treatment, age, clinic where treatment was provided, performance of ECC, biopsy results, presence of HPV, and presence of glandular atypia. The 95% confidence intervals about the relative odds estimates were calculated. Mean age and duration between initial colposcopy and follow-up colposcopy were calculated for the groups with adequate and inadequate follow-up colposcopic examinations.

Multivariable logistic regression analysis was used to evaluate the association of adequacy of follow-up colposcopic examination with age (years), clinic where colposcopic examination was performed, duration (days), whether or not ECC was performed, biopsy results from the initial colposcopic examination, presence of HPV, and presence of glandular atypia noted on initial colposcopy. Biopsy results were categorized as normal or abnormal in the model that is reported. The stepwise backward elimination technique was used to evaluate the best model. The 95% confidence intervals about the adjusted odds ratio were calculated.

The Pearson chi-square test was used to test the significance of the association between binary variables. The significance of the difference between means was tested with the one-way analysis of variance. Data were analyzed with the personal computer version of the Statistical Package for the Social Sciences (SPSS/PC+ version 8.0).

Results

Between January 1, 1991, and December 31, 1995, 3225 women underwent colposcopic evaluation or treatment at 7 county colposcopy clinics in Oklahoma. Two hundred sixty-eight of these women underwent 2 examinations during the study period. There were 203 of 268 subjects available for analysis after exclusions for missing data. Eighty-three patients (41.1%) had cryotherapy, 24 (11.9%) underwent a cone biopsy or a LEEP procedure, and 96 (47.5%) underwent no procedure between initial and follow-up colposcopic examinations.

Table 1 shows characteristics of women who had cryotherapy, cone/LEEP, and no procedure. The groups were similar with respect to age, whether ECC was performed, presence of HPV, and whether glandular involvement was noted. There was an association between the degree of cervical dysplasia and the three treatment groups, which was expected because degree of dysplasia determines treatment modality. Women who had cryotherapy had follow-up colposcopy (mean = 565 days) later than women who had cone or LEEP (mean = 319 days) or no procedure (mean = 339 days; P < .0001).

Thirty-three percent (n = 67) had inadequate follow-up colposcopic examinations. These included a large proportion of women, 61.4%, who had cryotherapy (51/83) compared with 20.8% (5/24) of women who had cone or LEEP and 11.5% (11/96) of women who had no procedure.

Table 2 shows the relationship between inadequate second colposcopy and previous cryotherapy, cone/LEEP, abnormal cervical biopsy, ECC, presence of HPV, and presence of glandular atypia. Patients who had cryotherapy had an increased likelihood of inadequate follow-up compared with patients who had no procedure (adjusted odds ratio =18.67, 95% confidence interval = 6.99–49.81). Cone/LEEP increased the likelihood of inadequate follow-up but was not statistically significant. Age, duration, ECC, presence of HPV, or presence of glandular atypia did not increase the likelihood of subsequent inadequate colposcopic examination. Odds ratio estimates for different clinics are not reported but were imprecise due to small numbers.

TABLE 1
Characteristics of patients with and without cryotherapy between initial and follow-up colposcopy

 

CharacteristicCryotherapy (n = 82)Cone or LEEP (n = 24)No procedure (n = 96)P*
Mean age (y)24.626.523.8.229
Mean duration(d)565319339.004
ECC (%)80.275.07.1.135
Cervical dysplasia (%)   < .001
  Normal21.719.039.5 
  Mild dysplasia58.023.845.3 
  >Mild dysplasia20.357.115.1 
HPV (%)72.652.460.4.130
Glandular atypia (%)23.938.117.8.125
*Pearson 2 for proportions and analysis of variance for means.
Duration from treatment (cryotherapy) or examination to follow-up colposcopic examination.
ECC, endocervical curettage; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion.
 

 

TABLE 2
Likelihood of inadequate follow-up colposcopic examination*

 

CharacteristicsAdjusted OR95% CI
Cryotherapy18.666.99–49.81
Cone or LEEP3.010.78–11.58
Cervical dysplasia  
  MildNA 
  >MildNA 
ECCNA 
HPVNA 
Glandular atypiaNA 
*Logistic regression model included the clinic of colposcopy (not shown). Age (years) and duration (days; from treatment or first colposcopy to second colposcopy) were removed from the model by backward elimination.
CI, confidence interval; ECC, endocervical curettage ; HPV, human papilloma virus; LEEP, loop excision of the ectocervical portion; NA, not applicable; OR, odds ratio.

Discussion

Undergoing cryotherapy of the uterine cervix increases the risk that a follow-up colposcopic examination will be inadequate. This agrees with the findings of Jobson and Homesley’s 1984 study,6 which looked at the efficacy of cryotherapy vs carbon dioxide laser ablation in the treatment of cervical dysplasia. Although it was not the focus of their study, a high rate of inadequacy was noted on follow-up colposcopic examinations after cryotherapy.

Because of the retrospective design of this study, we could not randomly assign women to a treatment group. However, the study groups were similar with respect to other variables potentially associated with the outcome measure. In addition, we attempted to control confounding variables by using multivariable analysis. By including the clinic where the examination was performed, we attempted to limit the effect of the subjective assignment of adequacy by the physician. This is a limitation of this study.

We found an association between the clinics where the follow-up colposcopist’s examinations were performed and whether a follow-up examination was adequate or inadequate. The determination of adequacy depends on the physician’s observations during the colposcopic examination. We were unable to measure the intra- or interobservation variation between the examinations. However, we attempted to control for this effect by including the clinic site in the multivariable analysis.

The current standard of care for inadequate colposcopic examination recommends more invasive evaluation with a procedure such as cervical conization or LEEP. This allows clarification of discordance between cytology, histology, and the colposcopist’s impression; sampling of any lesion that may extend past the view of standard colposcopy; and histologic evaluation of the entire transformation zone. Given the reported high rates of spontaneous regression of mild and moderate cervical dysplasias with a watchful waiting approach,8 12 we wonder whether we are performing unnecessary procedures (LEEP and conization) after cryotherapy as a result of inadequate follow-up colposcopic examinations. A study evaluating the pathologic findings of cone or LEEP specimens from inadequate colposcopic examinations after cryotherapy would help answer these questions. If there is no persistence or progression of dysplasia, then this would support the hypothesis that cryotherapy leads to unnecessary, invasive procedures. Further controlled trials are required to answer these questions.

ACKNOWLEDGMENTS

The authors acknowledge the assistance of Adeline Yerkes, of the Chronic Disease Division, Oklahoma State Department of Health, in facilitating access to the county clinic records.

References

 

1. Ferris DG. Office procedures: colposcopy. Prim Care 1997;24:241-67.

2. Crisp WE, Asadourian L, Romberger W. Application of cryosurgery to gynecologic malignancy. Obstet Gynecol 1967;30:668-73.

3. Mayeaux EJ, Jr, Spigener SD, German JA. Cryotherapy of the uterine cervix. J Fam Pract 1998;47:99-102.

4. Benrubi GI, Young M, Nuss RC. Intrapartum outcome of term pregnancy after cervical cryotherapy. J Reprod Med 1984;29:251-4.

5. Hillard PA, Biro FM, Wildey L. Complications of cervical cryotherapy in adolescents. J Reprod Med 1991;36:711-5.

6. Jobson VW, Homesley HD. Comparison of cryosurgery and carbon dioxide laser ablation for treatment of cervical intraepithelial neoplasia. Colposc Gynecol Laser Surg 1984;1:173-80.

7. Ryan KJ. Kistner’s Gynecology and Women’s Health. 7th ed. St Louis, MO: Mosby; 1999.

8. Ostergard DR. Cryosurgical treatment of cervical intraepithelial neoplasia. Obstet Gynecol 1980;56:231-3.

9. Walton LA, Edelman DA, Fowler WC, Jr, Photopulos GJ. Cryosurgery for the treatment of cervical intraepithelial neoplasia during the reproductive years. Obstet Gynecol 1980;55:353-7.

10. Hemmingsson E, Stendahl U, Stenson S. Cryosurgical treatment of cervical intraepithelial neoplasia with follow-up of five to eight years. Am J Obstet Gynecol 1981;139:144-7.

11. Andersen ES, Husth M. Cryosurgery for cervical intraepithelial neoplasia: 10-year follow-up. Gynecol Oncol 1992;45:240-2.

12. Benedet JL, Miller DM, Nickerson KG, Anderson GH. The results of cryosurgical treatment of cervical intraepithelial neoplasia at one, five, and ten years. Am J Obstet Gynecol 1987;157:268-73.

References

 

1. Ferris DG. Office procedures: colposcopy. Prim Care 1997;24:241-67.

2. Crisp WE, Asadourian L, Romberger W. Application of cryosurgery to gynecologic malignancy. Obstet Gynecol 1967;30:668-73.

3. Mayeaux EJ, Jr, Spigener SD, German JA. Cryotherapy of the uterine cervix. J Fam Pract 1998;47:99-102.

4. Benrubi GI, Young M, Nuss RC. Intrapartum outcome of term pregnancy after cervical cryotherapy. J Reprod Med 1984;29:251-4.

5. Hillard PA, Biro FM, Wildey L. Complications of cervical cryotherapy in adolescents. J Reprod Med 1991;36:711-5.

6. Jobson VW, Homesley HD. Comparison of cryosurgery and carbon dioxide laser ablation for treatment of cervical intraepithelial neoplasia. Colposc Gynecol Laser Surg 1984;1:173-80.

7. Ryan KJ. Kistner’s Gynecology and Women’s Health. 7th ed. St Louis, MO: Mosby; 1999.

8. Ostergard DR. Cryosurgical treatment of cervical intraepithelial neoplasia. Obstet Gynecol 1980;56:231-3.

9. Walton LA, Edelman DA, Fowler WC, Jr, Photopulos GJ. Cryosurgery for the treatment of cervical intraepithelial neoplasia during the reproductive years. Obstet Gynecol 1980;55:353-7.

10. Hemmingsson E, Stendahl U, Stenson S. Cryosurgical treatment of cervical intraepithelial neoplasia with follow-up of five to eight years. Am J Obstet Gynecol 1981;139:144-7.

11. Andersen ES, Husth M. Cryosurgery for cervical intraepithelial neoplasia: 10-year follow-up. Gynecol Oncol 1992;45:240-2.

12. Benedet JL, Miller DM, Nickerson KG, Anderson GH. The results of cryosurgical treatment of cervical intraepithelial neoplasia at one, five, and ten years. Am J Obstet Gynecol 1987;157:268-73.

Issue
The Journal of Family Practice - 51(06)
Issue
The Journal of Family Practice - 51(06)
Page Number
526-529
Page Number
526-529
Publications
Publications
Topics
Article Type
Display Headline
Association of cervical cryotherapy with inadequate follow-up colposcopy
Display Headline
Association of cervical cryotherapy with inadequate follow-up colposcopy
Legacy Keywords
,Colposcopycervical dysplasiacervical cryotherapy. (J Fam Pract 2002; 51:526–529)
Legacy Keywords
,Colposcopycervical dysplasiacervical cryotherapy. (J Fam Pract 2002; 51:526–529)
Sections
Disallow All Ads
Alternative CME
Article PDF Media

Computer-using patients want Internet services from family physicians

Article Type
Changed
Display Headline
Computer-using patients want Internet services from family physicians

KEY POINTS FOR CLINICIANS

  • Computer-using patients desire Web-based services to augment their care.
  • Practice Web sites should be designed to go beyond information alone and incorporate services such as online appointments.
  • Physicians should consider providing “virtual visits” to assist with disease management.

Patients are increasingly using the Internet to obtain medical information. Few practice Web sites provide services beyond information about the clinic and common medical diseases. We surveyed computer-using patients at 4 family medicine clinics in Denver, Colorado, by assessing their desire for Internet services from their providers. Patients were especially interested in getting e-mail reminders about appointments, online booking of appointments in real time, and receiving updates about new advances in treatment. Patients were also interested in virtual visits for simple and chronic medical problems and for following chronic conditions through virtual means. We concluded that computer-using patients desire Internet services to augment their medical care. As growth and communication via the Internet continue, primary care physicians should move more aggressively toward adding services to their practices’ Internet Web sites beyond the simple provision of information.

Patients are increasingly using the Internet to obtain medical information. A recent Harris poll estimated that 98 million Americans have retrieved health-related information online, an increase of 44 million since 1998.1 Previous studies examined patients’ subjective ratings2 of medical information sites and assessed the quality of medical information available through the World Wide Web.3 However, very little research has been published regarding patients’ interest in “e-health” services.4,5 The health care industry lags far behind other industries in terms of providing useful Internet services for the consumer.

We hypothesized that computer-using patients were interested in using the current and potential future services of Web-based technology to augment their care through clinic-based Web sites. The purpose of this study was to specifically determine the interests and needs of computer-using patients in clinic Web services beyond informational services alone.

Methods

An anonymous survey was given to a convenience sample of patients from 4 Denver Family Medicine clinics, with each surveying anywhere from 40 to 110 patients. The clinical sites used in this survey were socioeconomically diverse and included 1 community-based residency clinic, 1 university-based residency clinic, and 2 health maintenance organization clinics. A total of 600 surveys were distributed. Patient surveys were placed at the front desk, where the personnel were requested to ask patients to complete this volunteer survey. Computer and noncomputer users were asked to take the survey and their computer-using status was noted on the survey. Surveys were completed during the visit and returned to the front desk for collection. The surveys represented visits in these clinics from July 2000 to November 2000. This anonymous survey assessed patient demographics, Internet use, and patients’ interest in Internet services. Preferences for 22 Internet services were assessed on a Likert scale of 1 (definitely would not use) to 10 (definitely would use).

Data were analyzed using SPSS version 10 for Windows (SPSS Inc., Chicago, IL). Only computer users were included in the final calculations because of the very small percentage of noncomputer users (7.4%) who volunteered to take the survey. Frequencies were used to describe the computer-using survey respondents, their use of computers, and their preferences for Web-based services. Tests were used to evaluate significant variations among the survey respondents.

Results

Of 600 surveys, 227 were returned (37.8%). Most respondents were female (66.3%) with a mean age of 44.7 years. The vast majority of those who responded to this survey owned computers at home (90.0%) and/or had them at work (83.7%); 44.5% were college graduates and 52.1% had chronic medical conditions. Data on patients’ current use of the Internet are shown in Table 1.

Patient’s desires for Web-based services are summarized in Table 2. Patients displayed a strong interest in front desk services such as being able to book appointments in real time (mean Likert score, 8.50) over the Internet and getting e-mail reminders about appointments (mean Likert score, 8.61). Back office services ranking high included requesting medication refills online (mean Likert score, 8.47) to requesting a referral (mean Likert score, 8.26). The ability to send a message to “your doctor” also ranked high (mean Likert score, 8.40). There was relatively little interest in taking a virtual tour of the clinic (mean Likert score, 6.26) or having a page of links to health insurance company Web sites (mean Likert score, 6.73).

Patients displayed moderate interest in virtual visits (a patient-to-physician encounter conducted using the Internet alone), with 66.0% showing interest in a virtual visit for a simple medical problem. A slightly lower percentage (57.7%) was interested in a virtual visit for a chronic medical problem. Approximately a third of patients (32.6%) was more interested in a real-time virtual visit that used a personal computer (PC) videoconference rather than a real-time e-mail conversation (ie, “chat room” or one-on-one “chat”). Not surprisingly, a larger percentage of patients was more willing to make a virtual visit if it offered a lower co-payment (62%). Only 46.7% of patients indicated they would be interested in a virtual visit if it required the usual co-payment.

 

 

Interest in virtual visits for simple medical problems was higher among patients who had previously used the Internet to order products online (74.6% vs 45.0%, P < .001). Patients with chronic diseases were more likely to be interested in virtual visits for simple medical problems (70.8% vs 62.2%, P = .213), although this association was not statistically significant. A higher education level was associated with obtaining medical information over the Internet. College graduates were more likely than nongraduates to have used the Internet to obtain medical information (50% vs 33.6%, P < .05).

TABLE 1
Internet use among computer-using patients

Type of use%
Internet used at least once93.8
E-mail used as a means of communication90.0
Hours of Internet use each week
  0–438.4
  5–825.8
  9–1218.2
  13–163.0
  >1614.6
Have used the Internet to order online69.2
Have used the Internet to pay bills online19.1
Have used the Internet to obtain medical information58.4

TABLE 2
Internet services desired by computer-using patients

ServiceMean Likert score*
Receive e-mail reminders about appointments8.61
Receive updates about advances in treatment8.56
Make an appointment online with immediate confirmation8.50
Obtain prescription refills8.47
Send a message to your doctor8.40
Look at your medical records through a secure site8.32
Obtain a referral8.26
Receive e-mail reminders about upcoming health services8.22
Receive e-mail reminders about upcoming clinic services8.14
View immunization records8.04
Complete registration/reason for visit online8.00
Send updates on health/condition to your doctor7.97
Communicate with provider regularly about chronic disease7.90
Send requests for medical record release7.88
Send feedback/suggestions to clinic7.83
Obtain recommendations on good patient education sites7.48
Request an appointment by e-mail, receive response within 24 h7.46
Send a message to billing7.45
Obtain specific directions and map to clinic6.75
Use a computer in the clinic waiting room for medical information6.74
Obtain links to health insurance company Web sites6.73
Take a virtual tour of the clinic or hospital6.26
*Likert scale: from 1 (least important) to 10 (most important).

Discussion

Patients who used computers and the Internet showed significant interest in using Web-based services from their family physicians. These patients were especially interested in using the Internet for front desk services and common tasks, which are frequently provided over a busy telephone line. Services related to providing information were of less interest, and patients displayed only moderate interest in virtual visits. Using PC videoconferencing instead of e-mail communication would increase patients’ interest in a virtual visit. Poor videoconferencing capability over PCs, lack of access, or perhaps a fear of insufficient security over Web-based communications might limit interest.6-8

The survey had several limitations. As noted, only 7.4% of noncomputer users took the survey when requested by front desk staff. Therefore, we limited our analysis to computer-using patients. However, given the current statistics of Internet use and growth in access to all sectors of our population, it is likely that most practices will find a sufficient percentage of “connected” patients to apply the study’s findings. Assessment of online use at a specific clinic site will be useful in prioritizing the need and application of Internet services. The low response rate of our survey is likely due to the voluntary nature of the survey and the challenge of the front desk staff in finding time to encourage patients to take the survey. The practices that participated were busy ones that must move patients in a timely fashion from the front desk area to examination rooms.

Businesses with many employees who use e-commerce and banking services may especially benefit from signing up with a practice that offers online services. Patients with chronic diseases usually require more frequent visits with their physicians. We hope that patients with chronic disease will take advantage of “virtual visits” as they become available, thereby freeing them from transportation costs, lost time, and productivity.

Other desired services such as online appointment scheduling, medication refills, and referral requests might improve the efficiency in front and back office functions by reducing the number of lengthy telephone calls. We hope to perform future studies that evaluate the impact of Internet services on efficiency and patient/provider satisfaction.

Physicians should place a high priority on building service components into their practice Web sites. Interfacing these Web-based services with electronic medical records is another important task that needs further programmer development and attention by physicians. We hope that continued research in e-health care will further catalyze technologic developments that improve disease management, increase practice efficiency and patient satisfaction, and reduce medical errors.

Acknowledgments

The authors thank Lu Sandoval and Coline Bublitz for their help in preparing the data. They also thank Richard Drexilius, MD, at the Swedish Family Medicine Center; Manoj Pawar, MD, at the Exempla Family Medicine Center; and Carl Severin, MD, at the Kaiser Centerpointe Clinic for allowing the authors to perform the survey at their facilities. Special thanks to Perry Dickinson, MD, for his editorial assistance.

References

1. Taylor H. Explosive growth of “cyberchondriacs” continues. New York: Harris Interactive; August 11, 2000. Available at: http://www.harrisinteractive.com/harris_poll/index.asp?PID=104. Accessed April 7, 2002.

2. Helwig AL, Lovelle A, Guse CE, Gottlieb MS. An office based Internet patient education system: a pilot study. J Fam Pract 1999;48:123-7.

3. Sandvik H. Health information and interaction on the Internet: a survey of female urinary incontinence. BMJ [serial online] 1999;319(7201):29-32. Available at: http://www.bmj.com. Accessed January 12, 2002.

4. Coiera E. Information epidemics, economics, and immunity on the Internet: we still know so little about the effect of information on public health. BMJ [serial online] 1998;317(7171):1469-70.Available at: http://www.bmj.com/. Accessed January 12, 2002.

5. McGinnis J. The ehealth landscape: a terrain map of emerging information and communication technologies in health and health care [Acrobat document]. Princeton, NJ: The Robert Wood Johnson Foundation; 2001:14. Available at: http://www.rwjf.org/app/rw_publications_and_links/publicationsPdfs/eHealth.pdf. Accessed April 7, 2002.

6. California HealthCare Foundation and the Internet Healthcare Coalition. Ethics survey of consumer attitudes about health Websites. Oakland, CA: California HealthCare Foundation; January, 2000. Available at: http://ehealth.chcf.org/view.cfm?section=Consumer&itemID=1740. Accessed January 12, 2002.

7. Patrick JR. Gallup survey finds most Americans shun using Internet for personal health information. Turlock, CA: MedicAlert Foundation; November 13, 2000. Available at: http://www.medicalert.org/blue/pressreleases/galluprelease.asp. Accessed April 7, 2002.

8. Sanborn G. Online healthcare consumers focused on privacy. New York: Cyber Dialogue; July 12, 2000. Available online from fulcrum analytics at: http://www.cyberdialogue.com/news/releases/2000/07-12-cch-privacy.html. Accessed April 7, 2002.

Article PDF
Author and Disclosure Information

FRED GROVER, JR, MD
DAVID H. WU, MD, PHD
CHRISTAL BLANFORD, MD
SHERRY HOLCOMB
DIANA TIDLER, DO
Denver, Colorado
From the Department of Family Medicine, University of Colorado, Denver, CO. The authors report no competing interests. Reprint request should be addressed to Fred Grover Jr, MD, A.F. Williams Family Medicine Center, 5250 Leetsdale Drive, Suite 302, Denver, CO 80246-1452. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(06)
Publications
Page Number
570-572
Legacy Keywords
,Internetpatient carecommunicationcomputertechnology. (J Fam Pract 2002; 51:570–572)
Sections
Author and Disclosure Information

FRED GROVER, JR, MD
DAVID H. WU, MD, PHD
CHRISTAL BLANFORD, MD
SHERRY HOLCOMB
DIANA TIDLER, DO
Denver, Colorado
From the Department of Family Medicine, University of Colorado, Denver, CO. The authors report no competing interests. Reprint request should be addressed to Fred Grover Jr, MD, A.F. Williams Family Medicine Center, 5250 Leetsdale Drive, Suite 302, Denver, CO 80246-1452. E-mail: [email protected].

Author and Disclosure Information

FRED GROVER, JR, MD
DAVID H. WU, MD, PHD
CHRISTAL BLANFORD, MD
SHERRY HOLCOMB
DIANA TIDLER, DO
Denver, Colorado
From the Department of Family Medicine, University of Colorado, Denver, CO. The authors report no competing interests. Reprint request should be addressed to Fred Grover Jr, MD, A.F. Williams Family Medicine Center, 5250 Leetsdale Drive, Suite 302, Denver, CO 80246-1452. E-mail: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • Computer-using patients desire Web-based services to augment their care.
  • Practice Web sites should be designed to go beyond information alone and incorporate services such as online appointments.
  • Physicians should consider providing “virtual visits” to assist with disease management.

Patients are increasingly using the Internet to obtain medical information. Few practice Web sites provide services beyond information about the clinic and common medical diseases. We surveyed computer-using patients at 4 family medicine clinics in Denver, Colorado, by assessing their desire for Internet services from their providers. Patients were especially interested in getting e-mail reminders about appointments, online booking of appointments in real time, and receiving updates about new advances in treatment. Patients were also interested in virtual visits for simple and chronic medical problems and for following chronic conditions through virtual means. We concluded that computer-using patients desire Internet services to augment their medical care. As growth and communication via the Internet continue, primary care physicians should move more aggressively toward adding services to their practices’ Internet Web sites beyond the simple provision of information.

Patients are increasingly using the Internet to obtain medical information. A recent Harris poll estimated that 98 million Americans have retrieved health-related information online, an increase of 44 million since 1998.1 Previous studies examined patients’ subjective ratings2 of medical information sites and assessed the quality of medical information available through the World Wide Web.3 However, very little research has been published regarding patients’ interest in “e-health” services.4,5 The health care industry lags far behind other industries in terms of providing useful Internet services for the consumer.

We hypothesized that computer-using patients were interested in using the current and potential future services of Web-based technology to augment their care through clinic-based Web sites. The purpose of this study was to specifically determine the interests and needs of computer-using patients in clinic Web services beyond informational services alone.

Methods

An anonymous survey was given to a convenience sample of patients from 4 Denver Family Medicine clinics, with each surveying anywhere from 40 to 110 patients. The clinical sites used in this survey were socioeconomically diverse and included 1 community-based residency clinic, 1 university-based residency clinic, and 2 health maintenance organization clinics. A total of 600 surveys were distributed. Patient surveys were placed at the front desk, where the personnel were requested to ask patients to complete this volunteer survey. Computer and noncomputer users were asked to take the survey and their computer-using status was noted on the survey. Surveys were completed during the visit and returned to the front desk for collection. The surveys represented visits in these clinics from July 2000 to November 2000. This anonymous survey assessed patient demographics, Internet use, and patients’ interest in Internet services. Preferences for 22 Internet services were assessed on a Likert scale of 1 (definitely would not use) to 10 (definitely would use).

Data were analyzed using SPSS version 10 for Windows (SPSS Inc., Chicago, IL). Only computer users were included in the final calculations because of the very small percentage of noncomputer users (7.4%) who volunteered to take the survey. Frequencies were used to describe the computer-using survey respondents, their use of computers, and their preferences for Web-based services. Tests were used to evaluate significant variations among the survey respondents.

Results

Of 600 surveys, 227 were returned (37.8%). Most respondents were female (66.3%) with a mean age of 44.7 years. The vast majority of those who responded to this survey owned computers at home (90.0%) and/or had them at work (83.7%); 44.5% were college graduates and 52.1% had chronic medical conditions. Data on patients’ current use of the Internet are shown in Table 1.

Patient’s desires for Web-based services are summarized in Table 2. Patients displayed a strong interest in front desk services such as being able to book appointments in real time (mean Likert score, 8.50) over the Internet and getting e-mail reminders about appointments (mean Likert score, 8.61). Back office services ranking high included requesting medication refills online (mean Likert score, 8.47) to requesting a referral (mean Likert score, 8.26). The ability to send a message to “your doctor” also ranked high (mean Likert score, 8.40). There was relatively little interest in taking a virtual tour of the clinic (mean Likert score, 6.26) or having a page of links to health insurance company Web sites (mean Likert score, 6.73).

Patients displayed moderate interest in virtual visits (a patient-to-physician encounter conducted using the Internet alone), with 66.0% showing interest in a virtual visit for a simple medical problem. A slightly lower percentage (57.7%) was interested in a virtual visit for a chronic medical problem. Approximately a third of patients (32.6%) was more interested in a real-time virtual visit that used a personal computer (PC) videoconference rather than a real-time e-mail conversation (ie, “chat room” or one-on-one “chat”). Not surprisingly, a larger percentage of patients was more willing to make a virtual visit if it offered a lower co-payment (62%). Only 46.7% of patients indicated they would be interested in a virtual visit if it required the usual co-payment.

 

 

Interest in virtual visits for simple medical problems was higher among patients who had previously used the Internet to order products online (74.6% vs 45.0%, P < .001). Patients with chronic diseases were more likely to be interested in virtual visits for simple medical problems (70.8% vs 62.2%, P = .213), although this association was not statistically significant. A higher education level was associated with obtaining medical information over the Internet. College graduates were more likely than nongraduates to have used the Internet to obtain medical information (50% vs 33.6%, P < .05).

TABLE 1
Internet use among computer-using patients

Type of use%
Internet used at least once93.8
E-mail used as a means of communication90.0
Hours of Internet use each week
  0–438.4
  5–825.8
  9–1218.2
  13–163.0
  >1614.6
Have used the Internet to order online69.2
Have used the Internet to pay bills online19.1
Have used the Internet to obtain medical information58.4

TABLE 2
Internet services desired by computer-using patients

ServiceMean Likert score*
Receive e-mail reminders about appointments8.61
Receive updates about advances in treatment8.56
Make an appointment online with immediate confirmation8.50
Obtain prescription refills8.47
Send a message to your doctor8.40
Look at your medical records through a secure site8.32
Obtain a referral8.26
Receive e-mail reminders about upcoming health services8.22
Receive e-mail reminders about upcoming clinic services8.14
View immunization records8.04
Complete registration/reason for visit online8.00
Send updates on health/condition to your doctor7.97
Communicate with provider regularly about chronic disease7.90
Send requests for medical record release7.88
Send feedback/suggestions to clinic7.83
Obtain recommendations on good patient education sites7.48
Request an appointment by e-mail, receive response within 24 h7.46
Send a message to billing7.45
Obtain specific directions and map to clinic6.75
Use a computer in the clinic waiting room for medical information6.74
Obtain links to health insurance company Web sites6.73
Take a virtual tour of the clinic or hospital6.26
*Likert scale: from 1 (least important) to 10 (most important).

Discussion

Patients who used computers and the Internet showed significant interest in using Web-based services from their family physicians. These patients were especially interested in using the Internet for front desk services and common tasks, which are frequently provided over a busy telephone line. Services related to providing information were of less interest, and patients displayed only moderate interest in virtual visits. Using PC videoconferencing instead of e-mail communication would increase patients’ interest in a virtual visit. Poor videoconferencing capability over PCs, lack of access, or perhaps a fear of insufficient security over Web-based communications might limit interest.6-8

The survey had several limitations. As noted, only 7.4% of noncomputer users took the survey when requested by front desk staff. Therefore, we limited our analysis to computer-using patients. However, given the current statistics of Internet use and growth in access to all sectors of our population, it is likely that most practices will find a sufficient percentage of “connected” patients to apply the study’s findings. Assessment of online use at a specific clinic site will be useful in prioritizing the need and application of Internet services. The low response rate of our survey is likely due to the voluntary nature of the survey and the challenge of the front desk staff in finding time to encourage patients to take the survey. The practices that participated were busy ones that must move patients in a timely fashion from the front desk area to examination rooms.

Businesses with many employees who use e-commerce and banking services may especially benefit from signing up with a practice that offers online services. Patients with chronic diseases usually require more frequent visits with their physicians. We hope that patients with chronic disease will take advantage of “virtual visits” as they become available, thereby freeing them from transportation costs, lost time, and productivity.

Other desired services such as online appointment scheduling, medication refills, and referral requests might improve the efficiency in front and back office functions by reducing the number of lengthy telephone calls. We hope to perform future studies that evaluate the impact of Internet services on efficiency and patient/provider satisfaction.

Physicians should place a high priority on building service components into their practice Web sites. Interfacing these Web-based services with electronic medical records is another important task that needs further programmer development and attention by physicians. We hope that continued research in e-health care will further catalyze technologic developments that improve disease management, increase practice efficiency and patient satisfaction, and reduce medical errors.

Acknowledgments

The authors thank Lu Sandoval and Coline Bublitz for their help in preparing the data. They also thank Richard Drexilius, MD, at the Swedish Family Medicine Center; Manoj Pawar, MD, at the Exempla Family Medicine Center; and Carl Severin, MD, at the Kaiser Centerpointe Clinic for allowing the authors to perform the survey at their facilities. Special thanks to Perry Dickinson, MD, for his editorial assistance.

KEY POINTS FOR CLINICIANS

  • Computer-using patients desire Web-based services to augment their care.
  • Practice Web sites should be designed to go beyond information alone and incorporate services such as online appointments.
  • Physicians should consider providing “virtual visits” to assist with disease management.

Patients are increasingly using the Internet to obtain medical information. Few practice Web sites provide services beyond information about the clinic and common medical diseases. We surveyed computer-using patients at 4 family medicine clinics in Denver, Colorado, by assessing their desire for Internet services from their providers. Patients were especially interested in getting e-mail reminders about appointments, online booking of appointments in real time, and receiving updates about new advances in treatment. Patients were also interested in virtual visits for simple and chronic medical problems and for following chronic conditions through virtual means. We concluded that computer-using patients desire Internet services to augment their medical care. As growth and communication via the Internet continue, primary care physicians should move more aggressively toward adding services to their practices’ Internet Web sites beyond the simple provision of information.

Patients are increasingly using the Internet to obtain medical information. A recent Harris poll estimated that 98 million Americans have retrieved health-related information online, an increase of 44 million since 1998.1 Previous studies examined patients’ subjective ratings2 of medical information sites and assessed the quality of medical information available through the World Wide Web.3 However, very little research has been published regarding patients’ interest in “e-health” services.4,5 The health care industry lags far behind other industries in terms of providing useful Internet services for the consumer.

We hypothesized that computer-using patients were interested in using the current and potential future services of Web-based technology to augment their care through clinic-based Web sites. The purpose of this study was to specifically determine the interests and needs of computer-using patients in clinic Web services beyond informational services alone.

Methods

An anonymous survey was given to a convenience sample of patients from 4 Denver Family Medicine clinics, with each surveying anywhere from 40 to 110 patients. The clinical sites used in this survey were socioeconomically diverse and included 1 community-based residency clinic, 1 university-based residency clinic, and 2 health maintenance organization clinics. A total of 600 surveys were distributed. Patient surveys were placed at the front desk, where the personnel were requested to ask patients to complete this volunteer survey. Computer and noncomputer users were asked to take the survey and their computer-using status was noted on the survey. Surveys were completed during the visit and returned to the front desk for collection. The surveys represented visits in these clinics from July 2000 to November 2000. This anonymous survey assessed patient demographics, Internet use, and patients’ interest in Internet services. Preferences for 22 Internet services were assessed on a Likert scale of 1 (definitely would not use) to 10 (definitely would use).

Data were analyzed using SPSS version 10 for Windows (SPSS Inc., Chicago, IL). Only computer users were included in the final calculations because of the very small percentage of noncomputer users (7.4%) who volunteered to take the survey. Frequencies were used to describe the computer-using survey respondents, their use of computers, and their preferences for Web-based services. Tests were used to evaluate significant variations among the survey respondents.

Results

Of 600 surveys, 227 were returned (37.8%). Most respondents were female (66.3%) with a mean age of 44.7 years. The vast majority of those who responded to this survey owned computers at home (90.0%) and/or had them at work (83.7%); 44.5% were college graduates and 52.1% had chronic medical conditions. Data on patients’ current use of the Internet are shown in Table 1.

Patient’s desires for Web-based services are summarized in Table 2. Patients displayed a strong interest in front desk services such as being able to book appointments in real time (mean Likert score, 8.50) over the Internet and getting e-mail reminders about appointments (mean Likert score, 8.61). Back office services ranking high included requesting medication refills online (mean Likert score, 8.47) to requesting a referral (mean Likert score, 8.26). The ability to send a message to “your doctor” also ranked high (mean Likert score, 8.40). There was relatively little interest in taking a virtual tour of the clinic (mean Likert score, 6.26) or having a page of links to health insurance company Web sites (mean Likert score, 6.73).

Patients displayed moderate interest in virtual visits (a patient-to-physician encounter conducted using the Internet alone), with 66.0% showing interest in a virtual visit for a simple medical problem. A slightly lower percentage (57.7%) was interested in a virtual visit for a chronic medical problem. Approximately a third of patients (32.6%) was more interested in a real-time virtual visit that used a personal computer (PC) videoconference rather than a real-time e-mail conversation (ie, “chat room” or one-on-one “chat”). Not surprisingly, a larger percentage of patients was more willing to make a virtual visit if it offered a lower co-payment (62%). Only 46.7% of patients indicated they would be interested in a virtual visit if it required the usual co-payment.

 

 

Interest in virtual visits for simple medical problems was higher among patients who had previously used the Internet to order products online (74.6% vs 45.0%, P < .001). Patients with chronic diseases were more likely to be interested in virtual visits for simple medical problems (70.8% vs 62.2%, P = .213), although this association was not statistically significant. A higher education level was associated with obtaining medical information over the Internet. College graduates were more likely than nongraduates to have used the Internet to obtain medical information (50% vs 33.6%, P < .05).

TABLE 1
Internet use among computer-using patients

Type of use%
Internet used at least once93.8
E-mail used as a means of communication90.0
Hours of Internet use each week
  0–438.4
  5–825.8
  9–1218.2
  13–163.0
  >1614.6
Have used the Internet to order online69.2
Have used the Internet to pay bills online19.1
Have used the Internet to obtain medical information58.4

TABLE 2
Internet services desired by computer-using patients

ServiceMean Likert score*
Receive e-mail reminders about appointments8.61
Receive updates about advances in treatment8.56
Make an appointment online with immediate confirmation8.50
Obtain prescription refills8.47
Send a message to your doctor8.40
Look at your medical records through a secure site8.32
Obtain a referral8.26
Receive e-mail reminders about upcoming health services8.22
Receive e-mail reminders about upcoming clinic services8.14
View immunization records8.04
Complete registration/reason for visit online8.00
Send updates on health/condition to your doctor7.97
Communicate with provider regularly about chronic disease7.90
Send requests for medical record release7.88
Send feedback/suggestions to clinic7.83
Obtain recommendations on good patient education sites7.48
Request an appointment by e-mail, receive response within 24 h7.46
Send a message to billing7.45
Obtain specific directions and map to clinic6.75
Use a computer in the clinic waiting room for medical information6.74
Obtain links to health insurance company Web sites6.73
Take a virtual tour of the clinic or hospital6.26
*Likert scale: from 1 (least important) to 10 (most important).

Discussion

Patients who used computers and the Internet showed significant interest in using Web-based services from their family physicians. These patients were especially interested in using the Internet for front desk services and common tasks, which are frequently provided over a busy telephone line. Services related to providing information were of less interest, and patients displayed only moderate interest in virtual visits. Using PC videoconferencing instead of e-mail communication would increase patients’ interest in a virtual visit. Poor videoconferencing capability over PCs, lack of access, or perhaps a fear of insufficient security over Web-based communications might limit interest.6-8

The survey had several limitations. As noted, only 7.4% of noncomputer users took the survey when requested by front desk staff. Therefore, we limited our analysis to computer-using patients. However, given the current statistics of Internet use and growth in access to all sectors of our population, it is likely that most practices will find a sufficient percentage of “connected” patients to apply the study’s findings. Assessment of online use at a specific clinic site will be useful in prioritizing the need and application of Internet services. The low response rate of our survey is likely due to the voluntary nature of the survey and the challenge of the front desk staff in finding time to encourage patients to take the survey. The practices that participated were busy ones that must move patients in a timely fashion from the front desk area to examination rooms.

Businesses with many employees who use e-commerce and banking services may especially benefit from signing up with a practice that offers online services. Patients with chronic diseases usually require more frequent visits with their physicians. We hope that patients with chronic disease will take advantage of “virtual visits” as they become available, thereby freeing them from transportation costs, lost time, and productivity.

Other desired services such as online appointment scheduling, medication refills, and referral requests might improve the efficiency in front and back office functions by reducing the number of lengthy telephone calls. We hope to perform future studies that evaluate the impact of Internet services on efficiency and patient/provider satisfaction.

Physicians should place a high priority on building service components into their practice Web sites. Interfacing these Web-based services with electronic medical records is another important task that needs further programmer development and attention by physicians. We hope that continued research in e-health care will further catalyze technologic developments that improve disease management, increase practice efficiency and patient satisfaction, and reduce medical errors.

Acknowledgments

The authors thank Lu Sandoval and Coline Bublitz for their help in preparing the data. They also thank Richard Drexilius, MD, at the Swedish Family Medicine Center; Manoj Pawar, MD, at the Exempla Family Medicine Center; and Carl Severin, MD, at the Kaiser Centerpointe Clinic for allowing the authors to perform the survey at their facilities. Special thanks to Perry Dickinson, MD, for his editorial assistance.

References

1. Taylor H. Explosive growth of “cyberchondriacs” continues. New York: Harris Interactive; August 11, 2000. Available at: http://www.harrisinteractive.com/harris_poll/index.asp?PID=104. Accessed April 7, 2002.

2. Helwig AL, Lovelle A, Guse CE, Gottlieb MS. An office based Internet patient education system: a pilot study. J Fam Pract 1999;48:123-7.

3. Sandvik H. Health information and interaction on the Internet: a survey of female urinary incontinence. BMJ [serial online] 1999;319(7201):29-32. Available at: http://www.bmj.com. Accessed January 12, 2002.

4. Coiera E. Information epidemics, economics, and immunity on the Internet: we still know so little about the effect of information on public health. BMJ [serial online] 1998;317(7171):1469-70.Available at: http://www.bmj.com/. Accessed January 12, 2002.

5. McGinnis J. The ehealth landscape: a terrain map of emerging information and communication technologies in health and health care [Acrobat document]. Princeton, NJ: The Robert Wood Johnson Foundation; 2001:14. Available at: http://www.rwjf.org/app/rw_publications_and_links/publicationsPdfs/eHealth.pdf. Accessed April 7, 2002.

6. California HealthCare Foundation and the Internet Healthcare Coalition. Ethics survey of consumer attitudes about health Websites. Oakland, CA: California HealthCare Foundation; January, 2000. Available at: http://ehealth.chcf.org/view.cfm?section=Consumer&itemID=1740. Accessed January 12, 2002.

7. Patrick JR. Gallup survey finds most Americans shun using Internet for personal health information. Turlock, CA: MedicAlert Foundation; November 13, 2000. Available at: http://www.medicalert.org/blue/pressreleases/galluprelease.asp. Accessed April 7, 2002.

8. Sanborn G. Online healthcare consumers focused on privacy. New York: Cyber Dialogue; July 12, 2000. Available online from fulcrum analytics at: http://www.cyberdialogue.com/news/releases/2000/07-12-cch-privacy.html. Accessed April 7, 2002.

References

1. Taylor H. Explosive growth of “cyberchondriacs” continues. New York: Harris Interactive; August 11, 2000. Available at: http://www.harrisinteractive.com/harris_poll/index.asp?PID=104. Accessed April 7, 2002.

2. Helwig AL, Lovelle A, Guse CE, Gottlieb MS. An office based Internet patient education system: a pilot study. J Fam Pract 1999;48:123-7.

3. Sandvik H. Health information and interaction on the Internet: a survey of female urinary incontinence. BMJ [serial online] 1999;319(7201):29-32. Available at: http://www.bmj.com. Accessed January 12, 2002.

4. Coiera E. Information epidemics, economics, and immunity on the Internet: we still know so little about the effect of information on public health. BMJ [serial online] 1998;317(7171):1469-70.Available at: http://www.bmj.com/. Accessed January 12, 2002.

5. McGinnis J. The ehealth landscape: a terrain map of emerging information and communication technologies in health and health care [Acrobat document]. Princeton, NJ: The Robert Wood Johnson Foundation; 2001:14. Available at: http://www.rwjf.org/app/rw_publications_and_links/publicationsPdfs/eHealth.pdf. Accessed April 7, 2002.

6. California HealthCare Foundation and the Internet Healthcare Coalition. Ethics survey of consumer attitudes about health Websites. Oakland, CA: California HealthCare Foundation; January, 2000. Available at: http://ehealth.chcf.org/view.cfm?section=Consumer&itemID=1740. Accessed January 12, 2002.

7. Patrick JR. Gallup survey finds most Americans shun using Internet for personal health information. Turlock, CA: MedicAlert Foundation; November 13, 2000. Available at: http://www.medicalert.org/blue/pressreleases/galluprelease.asp. Accessed April 7, 2002.

8. Sanborn G. Online healthcare consumers focused on privacy. New York: Cyber Dialogue; July 12, 2000. Available online from fulcrum analytics at: http://www.cyberdialogue.com/news/releases/2000/07-12-cch-privacy.html. Accessed April 7, 2002.

Issue
The Journal of Family Practice - 51(06)
Issue
The Journal of Family Practice - 51(06)
Page Number
570-572
Page Number
570-572
Publications
Publications
Article Type
Display Headline
Computer-using patients want Internet services from family physicians
Display Headline
Computer-using patients want Internet services from family physicians
Legacy Keywords
,Internetpatient carecommunicationcomputertechnology. (J Fam Pract 2002; 51:570–572)
Legacy Keywords
,Internetpatient carecommunicationcomputertechnology. (J Fam Pract 2002; 51:570–572)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Reasons for after-hours calls

Article Type
Changed
Display Headline
Reasons for after-hours calls

KEY POINTS FOR CLINICIANS

  • High utilizers (6 or more calls per year) represented 0.6% of active patients but accounted for 23% of calls.
  • The most common reasons for after-hours calls were medication refills and concerns, pain, issues of pregnant patients, and fever.
  • The number of after-hours calls peaked in the spring and summer, and doubled on Saturdays.

Previous studies of after-hours calls to family physicians focused on caller demographics, medical triage skills, and patient satisfaction, and were usually conducted for a limited time. We examined the frequency and nature of calls to a family practice residency over 1 year. Caller and patient information, date, time, and chief complaint were obtained from answering service logs. The 5 most frequent chief complaints related to medications, pain, obstetric issues, fever, and nausea. Interestingly, 56 “high utilizers” (0.6% of all patients) accounted for 23% of the calls.

Although telephone calls may account for 10% to 25% of all patient contacts,1,2 few studies have examined the frequency and nature of these calls over an extended time. A month-long study3 found that patients who telephoned after hours were 3 times more likely to rate their problem in the highest severity category compared with the physician’s rating of the problem. This study, done in July, may not reflect the diversity of patient problems, because of seasonal variations; also, it did not appear to include obstetric problems, which are a prominent reason for calls to family practice physicians.4,5 Many physician groups use answering services to screen calls as a method for decreasing the number of calls. The purpose of this study was to document the frequency and nature of after-hours calls to a family practice office over 1 year.

METHODS

All after-hours telephone calls (5 PM to 8 AM, weekends and holidays) made to a freestanding community-based family practice training program were collected for the 12-month period between April 2000 and March 2001. A recorded message directed the caller to call 911 for a life-threatening emergency or stay on the line for operator assistance. Emergency calls were forwarded to the resident physician on call. Sixteen family medicine residents supported by 8 faculty physicians took primary calls on a rotating basis. The practice had approximately 9000 active patients (at least 1 visit in the last 3 years), and about 1350 patient visits per month. Approximately 30% were covered by Medicaid, 10% by Medicare, 35% by managed care, and 12% by indemnity insurance; 13% were uninsured.

The operator recorded date and time, caller’s and patient’s first and last names, primary care physician, patient’s pregnancy status, date of last office visit, chief complaint(s), and whether the caller felt the situation was an emergency.

Previous studies variously classified patient calls based on diagnostic group, chief complaint, symptom, treatment and medication, injury, and organ system affected.1,3,6-10 We followed the lead of Benjamin8and Perkins and colleagues,1 who used the patient’s chief complaint to categorize calls. We classified the patient’s chief complaint by searching for key words such as “heart” (eg, “fast heartbeat,” “pains near heart,” or “isn’t feeling well, heart failure a couple of years ago”). This allowed for the broad inclusion of chief complaints while avoiding the risk of premature diagnosis.

A research assistant entered information from the operator’s records into a Microsoft Access Database. Patients who called more than 6 times after hours during the year were arbitrarily defined as “high utilizers.” We also gathered data on these callers’ hospital emergency room visits and admissions to affiliated hospitals. The HealthOne Institutional Review Board approved the study.

RESULTS

A total of 3538 calls were made by 1564 patients; 2465 were clinical calls, and key words or phrases were used to classify them under chief complaint headings. If a caller had a multiple-symptom complaint (ie, fever and headache), it was classified under all appropriate headings and counted twice. The total number of complaints is therefore higher than the total number of calls. Table 1 presents the frequency and percentage of after-hours clinical calls for all subjects, and separately for high utilizers. Table 2 presents the average number of clinical calls organized by season and day of the week. Thirty-three percent of all calls were made by the patient, 31% by a proxy (spouse, parent, friend), and 36% by other parties (nurse, pharmacy, unidentified party).

Although the rankings of calls for all patients and high utilizers in Table 1 were similar, several differences stand out. High utilizers account for only 0.6% of patients, but 23% of all calls. High utilizers called substantially more for complaints relating to medication, pain, asthma/breathing and chest problems; 39% of their calls were for medication or pain concerns. Of the high utilizers, 39% (22/56) made 46 emergency room visits, but only 7% (4/56) were hospitalized during the year.

 

 

TABLE 1
Percentage of after-hours calls, by chief complaint

 Number of complaints (%)*
Chief complaintAll subjects except utilizers (n = 1564)High utilizers (n = 56)
Medication288 (15.1)110 (19.7)
Pain197 (10.3)107 (19.1)
Obstetric195 (10.2)32 (5.7)
Fever191 (10.0)28 (5.0)
Nausea/vomiting108 (5.7)31 (5.5)
Blood/bleeding84 (4.4)32 (5.7)
Infection72 (3.8)24 (4.3)
Stomach70 (3.7)16 (2.9)
Headache/migraine67 (3.2)19 (3.4)
Asthma/breathing58 (3.0)32 (5.7)
Back55 (2.9)16 (2.9)
Laboratory results54 (2.8)8 (1.4)
Cough46 (2.4)6 (1.1)
Eye42 (2.2)8 (1.4)
Diarrhea41 (2.2)7 (1.2)
Throat38 (2.0)6 (1.1)
Fall36 (1.9)10 (1.8)
Rash34 (1.8)3 (0.5)
Ear33 (1.7)7 (1.2)
Chest30 (1.6)19 (3.4)
Total of top 20 complaints1739521
All other complaints625 (32.7)184 (32.9)
Total complaints2364705
Total calls1906559
Multiple complaint calls458 (24.0)146 (26.1)
Average calls per subject1.310.0
*Information-only calls (n = 1073) not included.
Includes nonobstetric problems in pregnant patients.

TABLE 2
Average number of clinical calls by season and day of week

SeasonMonTueWedThurFriSatSunSeasonal average
Winter (Dec–Feb)8.98.76.18.19.116.611.59.9
Spring (March–May)10.08.58.28.58.216.213.610.4
Summer (Jun–Aug)12.58.88.88.88.215.512.010.6
Fall (Sep–Nov)9.16.58.46.78.412.39.08.6
Daily average10.18.17.88.08.515.611.5 

DISCUSSION

This study expands on previous work by describing the total variety of after-hours phone calls to a family practice office over an entire year. Our findings on reasons for call, time of call, and demographics are similar to those of previous work.3,10 However, our study is one of the first to describe the subset of high utilizers. Introducing a patient health handbook, practice Web site, pharmacy help line, or other practice management tools might reduce the number of “information only” calls. Contrary to our expectation, the highest numbers of average daily calls were in the spring and summer and not in the winter. Saturdays and Sundays were the busiest days of the week for such calls.

Patients called for diverse clinical reasons (Table 1) and therefore physicians might focus their attention on the most frequent reasons for calls, in order to improve the effectiveness of their educational efforts. For example, physicians might discuss the patient’s medication concerns, give specific recommendations to talk to the pharmacist, and possibly offer an automated medication “tracking system” to alert patients during the week when their medications were running out, as a way of reducing the number of calls and allaying patient concerns.

Pain symptoms clearly account for a substantial number of calls. Although some of these calls might be serious emergencies (chest pain) and require immediate action, other calls, such as for migraine headaches, may point to a need to educate and set limits with patients during their regular appointments. For example, patients could be told that migraine headaches are not a “life-threatening” emergency and be urged to use self-management strategies until the next day.

Discussing fever management with new parents at well-child visits might decrease future calls. There is some research to suggest that providing new parents with specific guidelines about when to call if their child has a fever can dramatically reduce after-hours visits to the emergency room.11 Obstetric calls represent an important group requiring immediate callback with very specific questions (eg, fetal movement, bleeding), and might be a target area for physician education.

Out of approximately 9000 patients in the practice and 1564 patients who called the practice during the year, we identified 56 high utilizers (0.6% of all patients). They averaged nearly 10 calls per year in contrast to 1.3 calls for all other callers. Future research might be directed at trying to determine why these patients feel a need to call at nearly 10 times the rate of other patients.

These findings should be interpreted in light of several limitations. Because our findings are based on a family practice residency, the patient population may be different from the typical private family practice office and have less continuity. However, the wide range of calls is likely to be typical of the diverse problems managed by family physicians. This study did not collect information on the management and disposition of these after-hours calls. Certainly, understanding the entire episode of after-hours contact (reason for call, management, outcome, satisfaction) is important, and is the next step in our research.

The diversity and seriousness of medical problems addressed by the after-hours physician highlight the need to provide specific training to physicians for dealing with patient calls and educating patients on the many issues leading to after-hours calls.

ACKNOWLEDGMENTS

The authors thank Ellie Jensen for help with data collection, entry, and analysis.

References

1. Perkins A, Gagnon R, deGruy F. A comparison of after-hours telephone calls concerning ambulatory and nursing home patients. J Fam Pract 1993;37:247-50.

2. Hannis MD, Hazard RL, Rothschild M, Elnicki DM, Keyserling TC, DeVellis RF. Physician attitudes regarding telephone medicine. J Gen Intern Med 1996;11:678-83.

3. Greenhouse D, Probst J. After hours telephone calls in a family practice residency: volume, seriousness and patient satisfaction. Fam Med 1995;27:525-30.

4. Spencer DC, Daugird AJ. The nature and content of physician telephone calls in private practice. J Fam Pract 1988;27:201-5.

5. Bergman JJ, Rosenblatt RA. After hours calls: a 5-year longitudinal study in a family practice group. J Fam Pract 1982;15:101-6.

6. Poole SR, Schmitt BD, Carruth T, Peterson-Smith A, Slusarski M. After-hours telephone coverage: the application of an area-wide telephone triage and advice system for pediatric practices. Pediatrics 1993;92:670-9.

7. Hildebrandt D, Nicholas D, Westfall J. The development of continuity of care and patient satisfaction in a family medicine residency: a 3 year longitudinal study. In preparation, 2001.

8. Benjamin JT. Pediatric residents’ telephone triage experience: relevant to general pediatric practice? Arch Pediatr Adolesc Med 1997;151:1254-7.

9. Crane JD, Benjamin JT. Pediatric residents’ telephone triage experience. Arch Pediatr Adolesc Med 2000;154:71-4.

10. Peters RM. After hours telephone calls to general and subspecialty internists: an observational study. J Gen Intern Med 1994;9:554-7.

11. O’Neill-Murphy K, Liebman M, Barnsteiner JH. Fever education: does it reduce parent fever anxiety? Pediatr Emerg Care 2001;17:47-51.

Article PDF
Author and Disclosure Information

DAVID E. HILDEBRANDT, PHD
JOHN M. WESTFALL, MD, MPH
Denver, Colorado
From the Rose Family Medicine Residency, Denver, CO (D.E.H.) and the Department of Family Medicine, UCHSC at Fitzsimons, Aurora, CO (J.M.W.). The authors report no competing interests. Address reprint requests to David E. Hildebrandt, PhD, Rose Family Medicine Residency, 2149 S. Holly, Denver, CO 80222. Email: [email protected].

Issue
The Journal of Family Practice - 51(06)
Publications
Page Number
567-569
Legacy Keywords
,Family practicetriageemergency service. (J Fam Pract 2002; 51:567–569)
Sections
Author and Disclosure Information

DAVID E. HILDEBRANDT, PHD
JOHN M. WESTFALL, MD, MPH
Denver, Colorado
From the Rose Family Medicine Residency, Denver, CO (D.E.H.) and the Department of Family Medicine, UCHSC at Fitzsimons, Aurora, CO (J.M.W.). The authors report no competing interests. Address reprint requests to David E. Hildebrandt, PhD, Rose Family Medicine Residency, 2149 S. Holly, Denver, CO 80222. Email: [email protected].

Author and Disclosure Information

DAVID E. HILDEBRANDT, PHD
JOHN M. WESTFALL, MD, MPH
Denver, Colorado
From the Rose Family Medicine Residency, Denver, CO (D.E.H.) and the Department of Family Medicine, UCHSC at Fitzsimons, Aurora, CO (J.M.W.). The authors report no competing interests. Address reprint requests to David E. Hildebrandt, PhD, Rose Family Medicine Residency, 2149 S. Holly, Denver, CO 80222. Email: [email protected].

Article PDF
Article PDF

KEY POINTS FOR CLINICIANS

  • High utilizers (6 or more calls per year) represented 0.6% of active patients but accounted for 23% of calls.
  • The most common reasons for after-hours calls were medication refills and concerns, pain, issues of pregnant patients, and fever.
  • The number of after-hours calls peaked in the spring and summer, and doubled on Saturdays.

Previous studies of after-hours calls to family physicians focused on caller demographics, medical triage skills, and patient satisfaction, and were usually conducted for a limited time. We examined the frequency and nature of calls to a family practice residency over 1 year. Caller and patient information, date, time, and chief complaint were obtained from answering service logs. The 5 most frequent chief complaints related to medications, pain, obstetric issues, fever, and nausea. Interestingly, 56 “high utilizers” (0.6% of all patients) accounted for 23% of the calls.

Although telephone calls may account for 10% to 25% of all patient contacts,1,2 few studies have examined the frequency and nature of these calls over an extended time. A month-long study3 found that patients who telephoned after hours were 3 times more likely to rate their problem in the highest severity category compared with the physician’s rating of the problem. This study, done in July, may not reflect the diversity of patient problems, because of seasonal variations; also, it did not appear to include obstetric problems, which are a prominent reason for calls to family practice physicians.4,5 Many physician groups use answering services to screen calls as a method for decreasing the number of calls. The purpose of this study was to document the frequency and nature of after-hours calls to a family practice office over 1 year.

METHODS

All after-hours telephone calls (5 PM to 8 AM, weekends and holidays) made to a freestanding community-based family practice training program were collected for the 12-month period between April 2000 and March 2001. A recorded message directed the caller to call 911 for a life-threatening emergency or stay on the line for operator assistance. Emergency calls were forwarded to the resident physician on call. Sixteen family medicine residents supported by 8 faculty physicians took primary calls on a rotating basis. The practice had approximately 9000 active patients (at least 1 visit in the last 3 years), and about 1350 patient visits per month. Approximately 30% were covered by Medicaid, 10% by Medicare, 35% by managed care, and 12% by indemnity insurance; 13% were uninsured.

The operator recorded date and time, caller’s and patient’s first and last names, primary care physician, patient’s pregnancy status, date of last office visit, chief complaint(s), and whether the caller felt the situation was an emergency.

Previous studies variously classified patient calls based on diagnostic group, chief complaint, symptom, treatment and medication, injury, and organ system affected.1,3,6-10 We followed the lead of Benjamin8and Perkins and colleagues,1 who used the patient’s chief complaint to categorize calls. We classified the patient’s chief complaint by searching for key words such as “heart” (eg, “fast heartbeat,” “pains near heart,” or “isn’t feeling well, heart failure a couple of years ago”). This allowed for the broad inclusion of chief complaints while avoiding the risk of premature diagnosis.

A research assistant entered information from the operator’s records into a Microsoft Access Database. Patients who called more than 6 times after hours during the year were arbitrarily defined as “high utilizers.” We also gathered data on these callers’ hospital emergency room visits and admissions to affiliated hospitals. The HealthOne Institutional Review Board approved the study.

RESULTS

A total of 3538 calls were made by 1564 patients; 2465 were clinical calls, and key words or phrases were used to classify them under chief complaint headings. If a caller had a multiple-symptom complaint (ie, fever and headache), it was classified under all appropriate headings and counted twice. The total number of complaints is therefore higher than the total number of calls. Table 1 presents the frequency and percentage of after-hours clinical calls for all subjects, and separately for high utilizers. Table 2 presents the average number of clinical calls organized by season and day of the week. Thirty-three percent of all calls were made by the patient, 31% by a proxy (spouse, parent, friend), and 36% by other parties (nurse, pharmacy, unidentified party).

Although the rankings of calls for all patients and high utilizers in Table 1 were similar, several differences stand out. High utilizers account for only 0.6% of patients, but 23% of all calls. High utilizers called substantially more for complaints relating to medication, pain, asthma/breathing and chest problems; 39% of their calls were for medication or pain concerns. Of the high utilizers, 39% (22/56) made 46 emergency room visits, but only 7% (4/56) were hospitalized during the year.

 

 

TABLE 1
Percentage of after-hours calls, by chief complaint

 Number of complaints (%)*
Chief complaintAll subjects except utilizers (n = 1564)High utilizers (n = 56)
Medication288 (15.1)110 (19.7)
Pain197 (10.3)107 (19.1)
Obstetric195 (10.2)32 (5.7)
Fever191 (10.0)28 (5.0)
Nausea/vomiting108 (5.7)31 (5.5)
Blood/bleeding84 (4.4)32 (5.7)
Infection72 (3.8)24 (4.3)
Stomach70 (3.7)16 (2.9)
Headache/migraine67 (3.2)19 (3.4)
Asthma/breathing58 (3.0)32 (5.7)
Back55 (2.9)16 (2.9)
Laboratory results54 (2.8)8 (1.4)
Cough46 (2.4)6 (1.1)
Eye42 (2.2)8 (1.4)
Diarrhea41 (2.2)7 (1.2)
Throat38 (2.0)6 (1.1)
Fall36 (1.9)10 (1.8)
Rash34 (1.8)3 (0.5)
Ear33 (1.7)7 (1.2)
Chest30 (1.6)19 (3.4)
Total of top 20 complaints1739521
All other complaints625 (32.7)184 (32.9)
Total complaints2364705
Total calls1906559
Multiple complaint calls458 (24.0)146 (26.1)
Average calls per subject1.310.0
*Information-only calls (n = 1073) not included.
Includes nonobstetric problems in pregnant patients.

TABLE 2
Average number of clinical calls by season and day of week

SeasonMonTueWedThurFriSatSunSeasonal average
Winter (Dec–Feb)8.98.76.18.19.116.611.59.9
Spring (March–May)10.08.58.28.58.216.213.610.4
Summer (Jun–Aug)12.58.88.88.88.215.512.010.6
Fall (Sep–Nov)9.16.58.46.78.412.39.08.6
Daily average10.18.17.88.08.515.611.5 

DISCUSSION

This study expands on previous work by describing the total variety of after-hours phone calls to a family practice office over an entire year. Our findings on reasons for call, time of call, and demographics are similar to those of previous work.3,10 However, our study is one of the first to describe the subset of high utilizers. Introducing a patient health handbook, practice Web site, pharmacy help line, or other practice management tools might reduce the number of “information only” calls. Contrary to our expectation, the highest numbers of average daily calls were in the spring and summer and not in the winter. Saturdays and Sundays were the busiest days of the week for such calls.

Patients called for diverse clinical reasons (Table 1) and therefore physicians might focus their attention on the most frequent reasons for calls, in order to improve the effectiveness of their educational efforts. For example, physicians might discuss the patient’s medication concerns, give specific recommendations to talk to the pharmacist, and possibly offer an automated medication “tracking system” to alert patients during the week when their medications were running out, as a way of reducing the number of calls and allaying patient concerns.

Pain symptoms clearly account for a substantial number of calls. Although some of these calls might be serious emergencies (chest pain) and require immediate action, other calls, such as for migraine headaches, may point to a need to educate and set limits with patients during their regular appointments. For example, patients could be told that migraine headaches are not a “life-threatening” emergency and be urged to use self-management strategies until the next day.

Discussing fever management with new parents at well-child visits might decrease future calls. There is some research to suggest that providing new parents with specific guidelines about when to call if their child has a fever can dramatically reduce after-hours visits to the emergency room.11 Obstetric calls represent an important group requiring immediate callback with very specific questions (eg, fetal movement, bleeding), and might be a target area for physician education.

Out of approximately 9000 patients in the practice and 1564 patients who called the practice during the year, we identified 56 high utilizers (0.6% of all patients). They averaged nearly 10 calls per year in contrast to 1.3 calls for all other callers. Future research might be directed at trying to determine why these patients feel a need to call at nearly 10 times the rate of other patients.

These findings should be interpreted in light of several limitations. Because our findings are based on a family practice residency, the patient population may be different from the typical private family practice office and have less continuity. However, the wide range of calls is likely to be typical of the diverse problems managed by family physicians. This study did not collect information on the management and disposition of these after-hours calls. Certainly, understanding the entire episode of after-hours contact (reason for call, management, outcome, satisfaction) is important, and is the next step in our research.

The diversity and seriousness of medical problems addressed by the after-hours physician highlight the need to provide specific training to physicians for dealing with patient calls and educating patients on the many issues leading to after-hours calls.

ACKNOWLEDGMENTS

The authors thank Ellie Jensen for help with data collection, entry, and analysis.

KEY POINTS FOR CLINICIANS

  • High utilizers (6 or more calls per year) represented 0.6% of active patients but accounted for 23% of calls.
  • The most common reasons for after-hours calls were medication refills and concerns, pain, issues of pregnant patients, and fever.
  • The number of after-hours calls peaked in the spring and summer, and doubled on Saturdays.

Previous studies of after-hours calls to family physicians focused on caller demographics, medical triage skills, and patient satisfaction, and were usually conducted for a limited time. We examined the frequency and nature of calls to a family practice residency over 1 year. Caller and patient information, date, time, and chief complaint were obtained from answering service logs. The 5 most frequent chief complaints related to medications, pain, obstetric issues, fever, and nausea. Interestingly, 56 “high utilizers” (0.6% of all patients) accounted for 23% of the calls.

Although telephone calls may account for 10% to 25% of all patient contacts,1,2 few studies have examined the frequency and nature of these calls over an extended time. A month-long study3 found that patients who telephoned after hours were 3 times more likely to rate their problem in the highest severity category compared with the physician’s rating of the problem. This study, done in July, may not reflect the diversity of patient problems, because of seasonal variations; also, it did not appear to include obstetric problems, which are a prominent reason for calls to family practice physicians.4,5 Many physician groups use answering services to screen calls as a method for decreasing the number of calls. The purpose of this study was to document the frequency and nature of after-hours calls to a family practice office over 1 year.

METHODS

All after-hours telephone calls (5 PM to 8 AM, weekends and holidays) made to a freestanding community-based family practice training program were collected for the 12-month period between April 2000 and March 2001. A recorded message directed the caller to call 911 for a life-threatening emergency or stay on the line for operator assistance. Emergency calls were forwarded to the resident physician on call. Sixteen family medicine residents supported by 8 faculty physicians took primary calls on a rotating basis. The practice had approximately 9000 active patients (at least 1 visit in the last 3 years), and about 1350 patient visits per month. Approximately 30% were covered by Medicaid, 10% by Medicare, 35% by managed care, and 12% by indemnity insurance; 13% were uninsured.

The operator recorded date and time, caller’s and patient’s first and last names, primary care physician, patient’s pregnancy status, date of last office visit, chief complaint(s), and whether the caller felt the situation was an emergency.

Previous studies variously classified patient calls based on diagnostic group, chief complaint, symptom, treatment and medication, injury, and organ system affected.1,3,6-10 We followed the lead of Benjamin8and Perkins and colleagues,1 who used the patient’s chief complaint to categorize calls. We classified the patient’s chief complaint by searching for key words such as “heart” (eg, “fast heartbeat,” “pains near heart,” or “isn’t feeling well, heart failure a couple of years ago”). This allowed for the broad inclusion of chief complaints while avoiding the risk of premature diagnosis.

A research assistant entered information from the operator’s records into a Microsoft Access Database. Patients who called more than 6 times after hours during the year were arbitrarily defined as “high utilizers.” We also gathered data on these callers’ hospital emergency room visits and admissions to affiliated hospitals. The HealthOne Institutional Review Board approved the study.

RESULTS

A total of 3538 calls were made by 1564 patients; 2465 were clinical calls, and key words or phrases were used to classify them under chief complaint headings. If a caller had a multiple-symptom complaint (ie, fever and headache), it was classified under all appropriate headings and counted twice. The total number of complaints is therefore higher than the total number of calls. Table 1 presents the frequency and percentage of after-hours clinical calls for all subjects, and separately for high utilizers. Table 2 presents the average number of clinical calls organized by season and day of the week. Thirty-three percent of all calls were made by the patient, 31% by a proxy (spouse, parent, friend), and 36% by other parties (nurse, pharmacy, unidentified party).

Although the rankings of calls for all patients and high utilizers in Table 1 were similar, several differences stand out. High utilizers account for only 0.6% of patients, but 23% of all calls. High utilizers called substantially more for complaints relating to medication, pain, asthma/breathing and chest problems; 39% of their calls were for medication or pain concerns. Of the high utilizers, 39% (22/56) made 46 emergency room visits, but only 7% (4/56) were hospitalized during the year.

 

 

TABLE 1
Percentage of after-hours calls, by chief complaint

 Number of complaints (%)*
Chief complaintAll subjects except utilizers (n = 1564)High utilizers (n = 56)
Medication288 (15.1)110 (19.7)
Pain197 (10.3)107 (19.1)
Obstetric195 (10.2)32 (5.7)
Fever191 (10.0)28 (5.0)
Nausea/vomiting108 (5.7)31 (5.5)
Blood/bleeding84 (4.4)32 (5.7)
Infection72 (3.8)24 (4.3)
Stomach70 (3.7)16 (2.9)
Headache/migraine67 (3.2)19 (3.4)
Asthma/breathing58 (3.0)32 (5.7)
Back55 (2.9)16 (2.9)
Laboratory results54 (2.8)8 (1.4)
Cough46 (2.4)6 (1.1)
Eye42 (2.2)8 (1.4)
Diarrhea41 (2.2)7 (1.2)
Throat38 (2.0)6 (1.1)
Fall36 (1.9)10 (1.8)
Rash34 (1.8)3 (0.5)
Ear33 (1.7)7 (1.2)
Chest30 (1.6)19 (3.4)
Total of top 20 complaints1739521
All other complaints625 (32.7)184 (32.9)
Total complaints2364705
Total calls1906559
Multiple complaint calls458 (24.0)146 (26.1)
Average calls per subject1.310.0
*Information-only calls (n = 1073) not included.
Includes nonobstetric problems in pregnant patients.

TABLE 2
Average number of clinical calls by season and day of week

SeasonMonTueWedThurFriSatSunSeasonal average
Winter (Dec–Feb)8.98.76.18.19.116.611.59.9
Spring (March–May)10.08.58.28.58.216.213.610.4
Summer (Jun–Aug)12.58.88.88.88.215.512.010.6
Fall (Sep–Nov)9.16.58.46.78.412.39.08.6
Daily average10.18.17.88.08.515.611.5 

DISCUSSION

This study expands on previous work by describing the total variety of after-hours phone calls to a family practice office over an entire year. Our findings on reasons for call, time of call, and demographics are similar to those of previous work.3,10 However, our study is one of the first to describe the subset of high utilizers. Introducing a patient health handbook, practice Web site, pharmacy help line, or other practice management tools might reduce the number of “information only” calls. Contrary to our expectation, the highest numbers of average daily calls were in the spring and summer and not in the winter. Saturdays and Sundays were the busiest days of the week for such calls.

Patients called for diverse clinical reasons (Table 1) and therefore physicians might focus their attention on the most frequent reasons for calls, in order to improve the effectiveness of their educational efforts. For example, physicians might discuss the patient’s medication concerns, give specific recommendations to talk to the pharmacist, and possibly offer an automated medication “tracking system” to alert patients during the week when their medications were running out, as a way of reducing the number of calls and allaying patient concerns.

Pain symptoms clearly account for a substantial number of calls. Although some of these calls might be serious emergencies (chest pain) and require immediate action, other calls, such as for migraine headaches, may point to a need to educate and set limits with patients during their regular appointments. For example, patients could be told that migraine headaches are not a “life-threatening” emergency and be urged to use self-management strategies until the next day.

Discussing fever management with new parents at well-child visits might decrease future calls. There is some research to suggest that providing new parents with specific guidelines about when to call if their child has a fever can dramatically reduce after-hours visits to the emergency room.11 Obstetric calls represent an important group requiring immediate callback with very specific questions (eg, fetal movement, bleeding), and might be a target area for physician education.

Out of approximately 9000 patients in the practice and 1564 patients who called the practice during the year, we identified 56 high utilizers (0.6% of all patients). They averaged nearly 10 calls per year in contrast to 1.3 calls for all other callers. Future research might be directed at trying to determine why these patients feel a need to call at nearly 10 times the rate of other patients.

These findings should be interpreted in light of several limitations. Because our findings are based on a family practice residency, the patient population may be different from the typical private family practice office and have less continuity. However, the wide range of calls is likely to be typical of the diverse problems managed by family physicians. This study did not collect information on the management and disposition of these after-hours calls. Certainly, understanding the entire episode of after-hours contact (reason for call, management, outcome, satisfaction) is important, and is the next step in our research.

The diversity and seriousness of medical problems addressed by the after-hours physician highlight the need to provide specific training to physicians for dealing with patient calls and educating patients on the many issues leading to after-hours calls.

ACKNOWLEDGMENTS

The authors thank Ellie Jensen for help with data collection, entry, and analysis.

References

1. Perkins A, Gagnon R, deGruy F. A comparison of after-hours telephone calls concerning ambulatory and nursing home patients. J Fam Pract 1993;37:247-50.

2. Hannis MD, Hazard RL, Rothschild M, Elnicki DM, Keyserling TC, DeVellis RF. Physician attitudes regarding telephone medicine. J Gen Intern Med 1996;11:678-83.

3. Greenhouse D, Probst J. After hours telephone calls in a family practice residency: volume, seriousness and patient satisfaction. Fam Med 1995;27:525-30.

4. Spencer DC, Daugird AJ. The nature and content of physician telephone calls in private practice. J Fam Pract 1988;27:201-5.

5. Bergman JJ, Rosenblatt RA. After hours calls: a 5-year longitudinal study in a family practice group. J Fam Pract 1982;15:101-6.

6. Poole SR, Schmitt BD, Carruth T, Peterson-Smith A, Slusarski M. After-hours telephone coverage: the application of an area-wide telephone triage and advice system for pediatric practices. Pediatrics 1993;92:670-9.

7. Hildebrandt D, Nicholas D, Westfall J. The development of continuity of care and patient satisfaction in a family medicine residency: a 3 year longitudinal study. In preparation, 2001.

8. Benjamin JT. Pediatric residents’ telephone triage experience: relevant to general pediatric practice? Arch Pediatr Adolesc Med 1997;151:1254-7.

9. Crane JD, Benjamin JT. Pediatric residents’ telephone triage experience. Arch Pediatr Adolesc Med 2000;154:71-4.

10. Peters RM. After hours telephone calls to general and subspecialty internists: an observational study. J Gen Intern Med 1994;9:554-7.

11. O’Neill-Murphy K, Liebman M, Barnsteiner JH. Fever education: does it reduce parent fever anxiety? Pediatr Emerg Care 2001;17:47-51.

References

1. Perkins A, Gagnon R, deGruy F. A comparison of after-hours telephone calls concerning ambulatory and nursing home patients. J Fam Pract 1993;37:247-50.

2. Hannis MD, Hazard RL, Rothschild M, Elnicki DM, Keyserling TC, DeVellis RF. Physician attitudes regarding telephone medicine. J Gen Intern Med 1996;11:678-83.

3. Greenhouse D, Probst J. After hours telephone calls in a family practice residency: volume, seriousness and patient satisfaction. Fam Med 1995;27:525-30.

4. Spencer DC, Daugird AJ. The nature and content of physician telephone calls in private practice. J Fam Pract 1988;27:201-5.

5. Bergman JJ, Rosenblatt RA. After hours calls: a 5-year longitudinal study in a family practice group. J Fam Pract 1982;15:101-6.

6. Poole SR, Schmitt BD, Carruth T, Peterson-Smith A, Slusarski M. After-hours telephone coverage: the application of an area-wide telephone triage and advice system for pediatric practices. Pediatrics 1993;92:670-9.

7. Hildebrandt D, Nicholas D, Westfall J. The development of continuity of care and patient satisfaction in a family medicine residency: a 3 year longitudinal study. In preparation, 2001.

8. Benjamin JT. Pediatric residents’ telephone triage experience: relevant to general pediatric practice? Arch Pediatr Adolesc Med 1997;151:1254-7.

9. Crane JD, Benjamin JT. Pediatric residents’ telephone triage experience. Arch Pediatr Adolesc Med 2000;154:71-4.

10. Peters RM. After hours telephone calls to general and subspecialty internists: an observational study. J Gen Intern Med 1994;9:554-7.

11. O’Neill-Murphy K, Liebman M, Barnsteiner JH. Fever education: does it reduce parent fever anxiety? Pediatr Emerg Care 2001;17:47-51.

Issue
The Journal of Family Practice - 51(06)
Issue
The Journal of Family Practice - 51(06)
Page Number
567-569
Page Number
567-569
Publications
Publications
Article Type
Display Headline
Reasons for after-hours calls
Display Headline
Reasons for after-hours calls
Legacy Keywords
,Family practicetriageemergency service. (J Fam Pract 2002; 51:567–569)
Legacy Keywords
,Family practicetriageemergency service. (J Fam Pract 2002; 51:567–569)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Prevalence of night sweats in primary care patients

Article Type
Changed
Display Headline
Prevalence of night sweats in primary care patients

ABSTRACT

OBJECTIVE: To estimate the prevalence and factors associated with night sweats among adult primary care patients.

STUDY DESIGN: This was a cross-sectional study.

POPULATION: Adult patients in 2 primary care practice-based research networks (PBRNs) during 1 week in the summer and 1 week in the winter in the years 2000 and 2001.

OUTCOMES MEASURES: We measured the prevalence of pure night sweats and night and day sweats in all patients and subgroups defined by age and sex, clinical variables associated with night sweats, and the frequency, severity, and rate of reporting.

RESULTS: Of the 2267 patients who participated, 41% reported experiencing night sweats within the last month, including 23% with pure night sweats and an additional 18% with day and night sweats. The prevalence of night sweats in both men and women was highest in the group aged 41 years to 55 years. In multivariate analyses, factors associated with pure night sweats in women were hot flashes and panic attacks; in men, sleep problems. Variables associated with night and day sweats in women were increased weight, hot flashes, sleep disturbances, and use of antihistamines, selective serotonin reuptake inhibitors (SSRIs), and other (non-SSRI, non-tricyclic) antidepressants; in men, increased weight, hot flashes, and greater alcohol use. A majority of patients had not reported their night sweats to their physicians, even when frequent and severe.

CONCLUSIONS: Night sweats are common and under-reported. Pure night sweats and night and day sweats may have different causes. With regard to the etiologies of pure night sweats, panic attacks and sleep disorders need further investigation.

KEY POINTS FOR CLINICIANS

  • Night sweats are a common experience for primary care patients, but they are frequently not reported to their physicians.
  • There appear to be 2 somewhat distinct patterns of night sweats: pure night sweats and night and day sweats.
  • A history of night sweats should prompt questions about menopause, panic attacks, sleep problems, and certain medications.

Night sweats have been attributed to tuberculosis, other acute and chronic febrile illnesses, menopause, pregnancy, hyperthyroidism, nocturnal hypoglycemia, other endocrine problems, neurologic diseases, sleep disorders (eg, sleep apnea and nightmares), malignancies, autoimmune diseases, coronary artery spasm, congestive heart failure, gastroesophageal reflux disease, psychiatric disorders, and certain medications. In 36 medical and surgical textbooks, night sweats were always discussed within sections covering specific diseases and never as a separate topic. References to the primary literature were never provided. We also searched Micromedix, a comprehensive source of information on medications, using “sweating” and “diaphoresis” as search terms.1 Table W1 contains a comprehensive list of proposed causes of night sweats identified in our searches and accompanying references.

Only 2 epidemiologic studies of night sweats were found in the English language literature. Lea and Aber2 interviewed 174 patients randomly selected from the inpatient units of a university hospital and found that 33% of nonobstetric patients and 60% of obstetric patients reported having had night sweats during the previous 3 months. Twenty-six percent of those with night sweats reported that their nighttime sweating was severe enough to require bathing and changing of bed linens. Reynolds,3 a gastroenterologist, queried 200 consecutive patients seen in his outpatient practice and found that 40% remembered experiencing night sweats at least once during the previous year. A total of 12% reported at least weekly night sweats. A review of the records of 750 patients at the Geriatric Continuity Clinic at the University of Oklahoma Family Medicine Center revealed that 10% reported having experienced night sweats during the previous month, when the question was asked as part of a standard review of systems questionnaire (J.W.M., unpublished data, 1999).

Our study was conducted in an effort to estimate the prevalence of night sweats in adult patients seen in primary care office settings, and to explore the associations of this symptom with demographic factors, physical characteristics, medical problems, and medications. We also sought to determine how distressing this symptom is to those who have it and to their sleep partners, whether patients are likely to report the symptom to their physicians, and what patients and their physicians think causes night sweats in individual cases.

Methods

Physician members of the Oklahoma Physicians Resource/Research Network (OKPRN) and the Texas Academy of Family Physicians Research Network (TAFP-Net) enrolled consecutive patients 18 years and older seen in their clinics during a 1-week period in the summer and a second 1-week period in the winter in the years 2000 and 2001. Patients who agreed to participate signed a consent form and then helped the nurse and physician complete a brief questionnaire on a preaddressed, stamped data collection card. For those who declined to participate, a card was generated containing the physician’s code number and the patient’s age and sex. Questions elicited demographic information; information about a selected set of medical conditions; medications, vitamins, herbs, and alcohol used regularly; and information about recent experiences with night sweats. Participating physicians were asked to check the questionnaires for accuracy and to record their opinions regarding the cause of the patients’ night sweats when they reported having had them. A laminated card with definitions of terms was provided to each physician.

 

 

“Night sweats” was defined as “sweating at night even when it isn’t excessively hot in your bedroom.” “Day sweats” was defined as “excessive sweating during the daytime.” “Pure night sweats” was defined as night sweats, but not day sweats, and “night and day sweats” as the combination of the 2. The time interval was specified as “during the last month.”

Completed questionnaires were mailed to the Oklahoma Center for Family Medicine Research for data entry and analysis. The data collection cards used by the Texas network included questions about race/ethnicity and panic attacks that were not included on the Oklahoma cards. Inadvertently, some of the Texas cards did not include the question about daytime sweating.

Statistix7 (Analytical Software, Tallahassee, Fla) was used for all statistical analyses. Medications were assigned to 1 of 47 categories according to their primary pharmacologic effects. Summary statistics were calculated for all participants and for the following subgroups: season (summer and winter), pattern of night sweats (excessive nighttime sweating only or night and day sweats), and age group. We anticipated that the majority of women with menopausal symptoms would be in the 41- to 55-year age group.

The two patterns of night sweats, “pure night sweats” and “night and day sweats,” were analyzed separately, and by sex and age. Logistic regression was used to identify the most significant predictors of night sweats while controlling for other variables. Variables were entered into the logistic models if they had a univariate association with the dependent variable at a P value of less than .05. They were then removed one at a time, in the order of largest to smallest P value, if they had a P value of greater than .01 after controlling for other variables. Conservative P values were chosen because of the large numbers of variables considered, in order to reduce the probability of type 1 errors. When appropriate, 95% confidence intervals were calculated.

Results

Study population

A total of 2267 patients of 31 different physicians participated in this study, including 1888 patients of 24 Oklahoma physicians and 379 patients of 7 Texas physicians. Their mean (standard deviation) age was 50.7 (18.8) years, with a range of 18 to 97 years. Sixty-nine percent were women. A total of 99% of Oklahoma patients and 93% of Texas patients seen during the study weeks agreed to participate in the study. Among Texas participants, 53% were Hispanic whites, 33% were non-Hispanic whites, 13% were African Americans, and 1% were categorized as other. On the basis of prior OKPRN studies, we suspect that approximately 90% of Oklahoma patients were non-Hispanic whites, but exact proportions were not determined for this study.

Prevalence of night sweats

The prevalence of pure night sweats, night and day sweats, and any night sweats are shown in Table 1. While the prevalence of night and day sweats was lower for older patients, severity tended to be greater. Severity and frequency were positively correlated for all categories of night sweats and for all subgroups of patients (overall Spearman coefficient = 0.33; P < .001). Overall, the frequencies of night sweats among those who reported the condition were: almost never, 18%; 1 to 3 nights per month, 38%; 1 to 3 nights per week, 27%; and 4 to 7 nights per week, 16%. Ten percent of both women and men with night sweats said that their night sweats were bothersome to others.

TABLE 1
Percentage of patients with pure night sweats and night and day

Patient group, by sex and age, in yearsPure night sweats % (95% CI)Night and day sweats % (95% CI)Any night sweats % (95% CI)
All patients23 (21-24)18 (16-20)41 (39-43)
Men22 (19-26)12 (9-14)34 (30-38)
  18-4020 (14-26)14 (9-19)35 (28-42)
  41-5525 (18-32)14 (9-19)40 (33-47)
  56-6924 (16-32)12 (6-18)38 (30-46)
  70+20 (13-27)6 (2-10)26 (19-33)
Women23 (21-25)21 (19-24)44 (42-47)
  18-4022 (18-26)19 (15-23)42 (38-46)
  41-5529 (24-34)32 (28-37)61 (56-66)
  56-6922 (18-27)23 (18-28)43 (37-49)
  70+19 (14-24)9 (5-13)29 (24-34)
CI denotes confidence interval.

Frequency of reporting of night sweats

A minority of patients with night sweats (12%) had reported the symptom to their physicians. This was true even for those with severe night sweats (46%). Women younger than 70 years were more likely than men of the same age to have reported their night sweats to their physicians (15% vs. 6%; P < .001). The reverse was true for those 70 years and older (7% vs 13%; P =.08). Older patients with pure night sweats were more likely than younger patients to have reported them. After controlling for other variables, patients who were older (odds ratio [OR] = 1.03 per year of age; P < .001), those with night and day sweats (OR = 1.74; P =.0015), and those who reported that their night sweats bothered others (OR = 2.89; P =.001) were more likely to have reported the symptom to their physicians. Those who had reported their night sweats were also more likely to have hot flashes (OR = 2.98; P < .001) and to take estrogen (OR = 1.72; P =.003).

 

 

Factors associated with night sweats

The only variable associated with pure night sweats after controlling for all other variables was panic attacks. Variables associated with night and day sweats were younger age, greater body mass index, hot flashes, chronic infection, sleep disturbances, selective serotonin reuptake inhibitors (SSRIs), tricyclic antidepressants, “other” (non–SSRI, non-tricyclic) antidepressants, and xanthines.

For women, the only variable clearly associated with pure night sweats in the multivariate model was hot flashes. Panic attacks nearly reached significance (P =.026) and improved the regression model substantially (deviance reduced from 1446 to 87). Variables associated with night and day sweats were weight, sleep problems, hot flashes, antihistamines, SSRIs, and other (non–SSRI, nontricyclic) antidepressants.

For men, the only variable associated with pure night sweats after controlling for other variables was sleep problems. After exclusion of sleep problems and sedatives from the model on the assumption that they might be the result rather than the cause of night sweats, significant predictors were hot flashes (OR = 2.70; 95% confidence interval [CI], 1.35-5.40; P =.005) and regular use of multivitamins (OR = 1.87; 95% CI, 1.17-2.99; P =.009). Variables associated with night and day sweats included greater weight, hot flashes, and greater alcohol use. The ORs and CIs are shown in Table 1.

Interestingly, 32 men (5%) reported hot flashes, and those who did were more likely to report night sweats of both types. Men with hot flashes were evenly distributed across age categories. Their night sweats were more frequent, but not more severe, and they were more likely to bother others than those without hot flashes. Men with hot flashes were more likely to have told their physicians about their night sweats. After controlling for other variables, men with hot flashes were much more likely to have panic attacks (OR = 28.28; P < .001).

Patients 70 years and older made up 19.5% of our sample (N=429). The only factor associated with pure night sweats in the multivariate model was sleep disturbances (OR = 2.04; = 95% CI, 1.21-3.42; P =.007). Exclusion of sleep disturbances left no associated variables. Variables associated with night and day sweats were hot flashes (OR = 15.14; = 95% CI, 6.43-35.68; P < .001) and corticosteroids (OR = 5.45; 95% CI 1.58-18.86; P =.007).

Suspected causes

In cases where patients reported night sweats, only 19% of the patients and 18% of their physicians recorded opinions regarding causation. The suspected causes listed by patients and physicians were similar. Both groups listed menopause most frequently (48% and 44%, respectively). Other etiologies proposed were stress (12% and 8%) and medications (9% and 10%). Physicians listed diabetes as a possible cause in 11% of cases while only 4% of patients listed it. Other suspected causes included obesity, pregnancy, gastroesophageal reflux disease, sleep discomforts, and ambient temperature.

TABLE 2
Associations between independent variables and night sweats in men and women after using logistic regression modeling to control for all other variables

Patient groupPure night sweatsNight and day sweats
 VariableOR (95% CI)VariableOR (95% CI)
AllPanic attacks4.80 (1.69-13.63)Age*0.99 per yr (0.98-0.99)
BMI1.03 per unit (1.02-1.05)
Hot flashes7.23 (5.45-9.58)
Chronic infections2.05 (1.22-3.42)
Sleep problems1.54 (1.16-2.04)
SSRIs1.82 (1.22-2.70)
TCAs2.43 (1.25-4.74)
Other antidepressants2.85 (1.66-4.89)
Xanthines5.48 (1.60-18.81)
MenSleep problems2.54 (1.7-3.8)Weightper lb (1.00-1.02)
Hot flashes9.41 (4.50-19.8)
Alcohol3.87 (1.60-9.20)
WomenHot flashes3.35 (1.13-9.95)Weight1.01 per lb (1.00-1.01)
Panic attacks4.47 (1.20-16.69)Sleep problems1.74 (1.30-240)
Hot flashes6.75 (5.00-9.20)
SSRIs2.01 (1.30-3.10)
Other antidepressants2.85 (1.70-5.90)
Antihistamines1.88 (1.20-2.90)
*Younger age was associated with a greater likelihood of night and day sweats. Otherwise, presence of or increasing amount of each variable was associated with a greater likelihood of night sweats.
OR denotes odds ratio; CI, confidence interval; BMI, body mass index; SSRIs, selective serotonin reuptake inhibitors; TCAs, tricyclic antidepressants.

Discussion

As far as we know, this is the first systematic study of night sweats in a primary care population. It is exploratory in nature, and, because of its cross-sectional design, no firm conclusions can be drawn about causation.

Both pure night sweats and night and day sweats are extremely common, with a peak prevalence in men and women 41 to 55 years of age. In contrast to pure night sweats, night and day sweats are experienced infrequently by patients 70 years and older. The factors associated with pure night sweats are somewhat different than those associated with night and day sweats, suggesting different, though probably overlapping, sets of causes. The different associations seen for men and women, and for older and younger patients, are also noteworthy. Patients often fail to report night sweats to their primary care physician, even when frequent and severe, associated with sleep disturbances, or bothersome to others.

Because of the sampling method (ie, consecutive patients rather than a random sample of active patients), the prevalence estimates reflect the frequency at which physicians can expect to encounter patients with this symptom, rather than the prevalence of night sweats among active patients. Since patients with more symptoms probably see physicians more often, we assume we have overestimated the true prevalence of night sweats in the larger population. Participating physicians were also not selected randomly. It is impossible to know how this may have affected our results.

 

 

We were surprised that so few of our independent variables were associated with pure night sweats: only panic attacks (all patients), sleep disorders (men and older patients), and hot flashes (women). Factors not associated with pure night sweats included obesity; diabetes, insulin, or oral hypoglycemic agents; acute or chronic infections; gastroesophageal reflux disease; or thyroid medications. Pure night sweats were also not specifically associated with estrogen and progesterone, although they were associated with hot flashes. There was also no association of pure night sweats and alcohol consumption.

The fact that physicians and their patients could only speculate on a cause for night sweats in 1 out of 5 cases suggests a lack of familiarity with the multitude of suspected causes, a failure to detect certain common causes (eg, sleep disorders and panic attacks), or, most likely, that many common causes of night sweats have yet to be elucidated. If the last is correct, it may be an example of the bias in the primary and secondary clinical literature that occurs when clinical research is carried out primarily in the subspecialty clinics of academic medical centers.4-7 Our findings speak to the need for greater support for primary care practice-based research.8,9

In retrospect, the omission of the variable “panic attacks” from the Oklahoma cards was a mistake, since this variable was correlated with pure night sweats in women. It may have been more strongly associated with pure night sweats in men as well, if the number of respondents to this question had been larger. Also, some men complained of hot flashes, and when they did, they were more likely to have night sweats and panic attacks, suggesting that both hot flashes and night sweats in men should prompt physicians to ask additional questions about panic disorder. Although race was also omitted from the Oklahoma cards, this variable did not seem to be associated with differences in night sweats prevalence or association among those for whom this information was available.

The definition and description of night sweats used in this study were arbitrary and may have influenced the prevalence rates obtained. We attempted to exclude environmental temperature as a cause. Although the definitions provided clearly stated “within the last month,” the data collection cards did not specify a time interval. This may have resulted in some variation in interpretation.

The decisions that were made regarding logistic modeling strategies were conservative and may have excluded some important variables. However, with so many variables and no basis on which to judge a priori, we felt that a conservative approach was best. The decision to include in the models variables (eg, sleep problems and sedatives that might be considered consequences) rather than causes of night sweats, was also arbitrary and may have affected the results. An alternative explanation of the associations found between night sweats and sleep problems is that those who are unable to sleep for other reasons are more likely to notice excessive sweating than those who are asleep.

Future studies should more carefully examine factors found in this study to be associated with night sweats, such as panic attacks and sleep disorders, and other potential etiologic factors not considered, such as tobacco abuse, allergic diseases, migraines, congestive heart failure, and chronic lung disease. Given the high prevalence, future studies examining etiology should include appropriate control groups. Case-control and prospective studies should evaluate the natural history of both night sweats patterns and their association with quality and length of life. The potential value of night sweats as a clue to the early diagnosis of important under-recognized pathologies, such as sleep disorders and panic attacks, should be investigated. Finally, randomized trials of treatments to reduce the frequency, severity, and impact of night sweats should be undertaken once the potential causes have been better elucidated.

Acknowledgments

This research was made possible by a grant from the American Academy of Family Physicians Foundation. We would like to acknowledge the assistance of Lavonne Glover in preparing the manuscript and to the following practicing family physicians and their staff who made time in their busy schedules to collect the data: Nathan Boren, Jo Ann Carpenter, Stephen Cobb, Ed Farrow, Cary Fisher, Helen Franklin, Kurt Frantz, David Hadley, Terrill Hulson, Joe Jamison, Dee Legako, Migy Mathew, Tomas Owens, John Pittman, Mike Pontious, Paul Preslar, R. Scott Stewart, David Strickland, Clinton Strong, Terry Truong, Keith Underhill, Kyle Waugh, Dan Woiwode, Mike Woods, Rick Edwards, Bob C. Jones, Leah R. Mabry, Tom Mueller, Mike Ragsdale, Hugh Wilson, Frank D. Wright, and Samuel T. Coleridge.

References

1. “MICROMEDEX” Healthcare Series. Englewood, Colorado. Available online at http://www.micromedex.com/. Accessed in June 2001.

2. Lea MJ, Aber RC. Descriptive epidemiology of night sweats upon admission to a university hospital. South Med J 1985;78:1065-7.

3. Reynolds WA. Are night sweats a sign of esophageal reflux? [Letter] J Clin Gastroenterol 1989;11:590-1.

4. White KC, Williams FF, Greenburg BG. The ecology of medical care. N Engl J Med 1961;265:885-92.

5. Rosser WW, Green L. Update from the ambulatory sentinel practice network of North America. Can Fam Phys 1989;35:843-6.

6. Smith FO. Practice-based research: opportunities for the clinician. So Med J 1991;84:479-82.

7. Green LA, Hames CG, Jr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-6.

8. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-8.

9. Green LA, Dovey SM. Practice based primary care research networks. BMJ 2001;322:567-8.

Article PDF
Author and Disclosure Information

JAMES W. MOLD, MD, MPH
MIGI K. MATHEW, MD
SHUAIB BELGORE, MD, MPH
MARK DEHAVEN, PHD
Oklahoma City, Oklahoma, and Dallas, Texas
From the University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, Oklahoma City (J.W.M., M.K.M., S.B.) and the University of Texas Southwestern Medical Center, Department of Family Medicine, Dallas (M.D.) The authors report no competing interests. All requests for reprints should be addressed to James W. Mold, MD, MPH, University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, 900 N.E. 10th Street, Oklahoma City, OK 73104.
[email protected]

Issue
The Journal of Family Practice - 51(05)
Publications
Page Number
452-456
Legacy Keywords
,Primary careprimary-based research networkdiaphoresisepidemiology. (J Fam Pract 2002; 51:452–456)
Sections
Author and Disclosure Information

JAMES W. MOLD, MD, MPH
MIGI K. MATHEW, MD
SHUAIB BELGORE, MD, MPH
MARK DEHAVEN, PHD
Oklahoma City, Oklahoma, and Dallas, Texas
From the University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, Oklahoma City (J.W.M., M.K.M., S.B.) and the University of Texas Southwestern Medical Center, Department of Family Medicine, Dallas (M.D.) The authors report no competing interests. All requests for reprints should be addressed to James W. Mold, MD, MPH, University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, 900 N.E. 10th Street, Oklahoma City, OK 73104.
[email protected]

Author and Disclosure Information

JAMES W. MOLD, MD, MPH
MIGI K. MATHEW, MD
SHUAIB BELGORE, MD, MPH
MARK DEHAVEN, PHD
Oklahoma City, Oklahoma, and Dallas, Texas
From the University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, Oklahoma City (J.W.M., M.K.M., S.B.) and the University of Texas Southwestern Medical Center, Department of Family Medicine, Dallas (M.D.) The authors report no competing interests. All requests for reprints should be addressed to James W. Mold, MD, MPH, University of Oklahoma Health Sciences Center, Oklahoma Center for Family Medicine Research, Department of Family and Preventive Medicine, 900 N.E. 10th Street, Oklahoma City, OK 73104.
[email protected]

Article PDF
Article PDF

ABSTRACT

OBJECTIVE: To estimate the prevalence and factors associated with night sweats among adult primary care patients.

STUDY DESIGN: This was a cross-sectional study.

POPULATION: Adult patients in 2 primary care practice-based research networks (PBRNs) during 1 week in the summer and 1 week in the winter in the years 2000 and 2001.

OUTCOMES MEASURES: We measured the prevalence of pure night sweats and night and day sweats in all patients and subgroups defined by age and sex, clinical variables associated with night sweats, and the frequency, severity, and rate of reporting.

RESULTS: Of the 2267 patients who participated, 41% reported experiencing night sweats within the last month, including 23% with pure night sweats and an additional 18% with day and night sweats. The prevalence of night sweats in both men and women was highest in the group aged 41 years to 55 years. In multivariate analyses, factors associated with pure night sweats in women were hot flashes and panic attacks; in men, sleep problems. Variables associated with night and day sweats in women were increased weight, hot flashes, sleep disturbances, and use of antihistamines, selective serotonin reuptake inhibitors (SSRIs), and other (non-SSRI, non-tricyclic) antidepressants; in men, increased weight, hot flashes, and greater alcohol use. A majority of patients had not reported their night sweats to their physicians, even when frequent and severe.

CONCLUSIONS: Night sweats are common and under-reported. Pure night sweats and night and day sweats may have different causes. With regard to the etiologies of pure night sweats, panic attacks and sleep disorders need further investigation.

KEY POINTS FOR CLINICIANS

  • Night sweats are a common experience for primary care patients, but they are frequently not reported to their physicians.
  • There appear to be 2 somewhat distinct patterns of night sweats: pure night sweats and night and day sweats.
  • A history of night sweats should prompt questions about menopause, panic attacks, sleep problems, and certain medications.

Night sweats have been attributed to tuberculosis, other acute and chronic febrile illnesses, menopause, pregnancy, hyperthyroidism, nocturnal hypoglycemia, other endocrine problems, neurologic diseases, sleep disorders (eg, sleep apnea and nightmares), malignancies, autoimmune diseases, coronary artery spasm, congestive heart failure, gastroesophageal reflux disease, psychiatric disorders, and certain medications. In 36 medical and surgical textbooks, night sweats were always discussed within sections covering specific diseases and never as a separate topic. References to the primary literature were never provided. We also searched Micromedix, a comprehensive source of information on medications, using “sweating” and “diaphoresis” as search terms.1 Table W1 contains a comprehensive list of proposed causes of night sweats identified in our searches and accompanying references.

Only 2 epidemiologic studies of night sweats were found in the English language literature. Lea and Aber2 interviewed 174 patients randomly selected from the inpatient units of a university hospital and found that 33% of nonobstetric patients and 60% of obstetric patients reported having had night sweats during the previous 3 months. Twenty-six percent of those with night sweats reported that their nighttime sweating was severe enough to require bathing and changing of bed linens. Reynolds,3 a gastroenterologist, queried 200 consecutive patients seen in his outpatient practice and found that 40% remembered experiencing night sweats at least once during the previous year. A total of 12% reported at least weekly night sweats. A review of the records of 750 patients at the Geriatric Continuity Clinic at the University of Oklahoma Family Medicine Center revealed that 10% reported having experienced night sweats during the previous month, when the question was asked as part of a standard review of systems questionnaire (J.W.M., unpublished data, 1999).

Our study was conducted in an effort to estimate the prevalence of night sweats in adult patients seen in primary care office settings, and to explore the associations of this symptom with demographic factors, physical characteristics, medical problems, and medications. We also sought to determine how distressing this symptom is to those who have it and to their sleep partners, whether patients are likely to report the symptom to their physicians, and what patients and their physicians think causes night sweats in individual cases.

Methods

Physician members of the Oklahoma Physicians Resource/Research Network (OKPRN) and the Texas Academy of Family Physicians Research Network (TAFP-Net) enrolled consecutive patients 18 years and older seen in their clinics during a 1-week period in the summer and a second 1-week period in the winter in the years 2000 and 2001. Patients who agreed to participate signed a consent form and then helped the nurse and physician complete a brief questionnaire on a preaddressed, stamped data collection card. For those who declined to participate, a card was generated containing the physician’s code number and the patient’s age and sex. Questions elicited demographic information; information about a selected set of medical conditions; medications, vitamins, herbs, and alcohol used regularly; and information about recent experiences with night sweats. Participating physicians were asked to check the questionnaires for accuracy and to record their opinions regarding the cause of the patients’ night sweats when they reported having had them. A laminated card with definitions of terms was provided to each physician.

 

 

“Night sweats” was defined as “sweating at night even when it isn’t excessively hot in your bedroom.” “Day sweats” was defined as “excessive sweating during the daytime.” “Pure night sweats” was defined as night sweats, but not day sweats, and “night and day sweats” as the combination of the 2. The time interval was specified as “during the last month.”

Completed questionnaires were mailed to the Oklahoma Center for Family Medicine Research for data entry and analysis. The data collection cards used by the Texas network included questions about race/ethnicity and panic attacks that were not included on the Oklahoma cards. Inadvertently, some of the Texas cards did not include the question about daytime sweating.

Statistix7 (Analytical Software, Tallahassee, Fla) was used for all statistical analyses. Medications were assigned to 1 of 47 categories according to their primary pharmacologic effects. Summary statistics were calculated for all participants and for the following subgroups: season (summer and winter), pattern of night sweats (excessive nighttime sweating only or night and day sweats), and age group. We anticipated that the majority of women with menopausal symptoms would be in the 41- to 55-year age group.

The two patterns of night sweats, “pure night sweats” and “night and day sweats,” were analyzed separately, and by sex and age. Logistic regression was used to identify the most significant predictors of night sweats while controlling for other variables. Variables were entered into the logistic models if they had a univariate association with the dependent variable at a P value of less than .05. They were then removed one at a time, in the order of largest to smallest P value, if they had a P value of greater than .01 after controlling for other variables. Conservative P values were chosen because of the large numbers of variables considered, in order to reduce the probability of type 1 errors. When appropriate, 95% confidence intervals were calculated.

Results

Study population

A total of 2267 patients of 31 different physicians participated in this study, including 1888 patients of 24 Oklahoma physicians and 379 patients of 7 Texas physicians. Their mean (standard deviation) age was 50.7 (18.8) years, with a range of 18 to 97 years. Sixty-nine percent were women. A total of 99% of Oklahoma patients and 93% of Texas patients seen during the study weeks agreed to participate in the study. Among Texas participants, 53% were Hispanic whites, 33% were non-Hispanic whites, 13% were African Americans, and 1% were categorized as other. On the basis of prior OKPRN studies, we suspect that approximately 90% of Oklahoma patients were non-Hispanic whites, but exact proportions were not determined for this study.

Prevalence of night sweats

The prevalence of pure night sweats, night and day sweats, and any night sweats are shown in Table 1. While the prevalence of night and day sweats was lower for older patients, severity tended to be greater. Severity and frequency were positively correlated for all categories of night sweats and for all subgroups of patients (overall Spearman coefficient = 0.33; P < .001). Overall, the frequencies of night sweats among those who reported the condition were: almost never, 18%; 1 to 3 nights per month, 38%; 1 to 3 nights per week, 27%; and 4 to 7 nights per week, 16%. Ten percent of both women and men with night sweats said that their night sweats were bothersome to others.

TABLE 1
Percentage of patients with pure night sweats and night and day

Patient group, by sex and age, in yearsPure night sweats % (95% CI)Night and day sweats % (95% CI)Any night sweats % (95% CI)
All patients23 (21-24)18 (16-20)41 (39-43)
Men22 (19-26)12 (9-14)34 (30-38)
  18-4020 (14-26)14 (9-19)35 (28-42)
  41-5525 (18-32)14 (9-19)40 (33-47)
  56-6924 (16-32)12 (6-18)38 (30-46)
  70+20 (13-27)6 (2-10)26 (19-33)
Women23 (21-25)21 (19-24)44 (42-47)
  18-4022 (18-26)19 (15-23)42 (38-46)
  41-5529 (24-34)32 (28-37)61 (56-66)
  56-6922 (18-27)23 (18-28)43 (37-49)
  70+19 (14-24)9 (5-13)29 (24-34)
CI denotes confidence interval.

Frequency of reporting of night sweats

A minority of patients with night sweats (12%) had reported the symptom to their physicians. This was true even for those with severe night sweats (46%). Women younger than 70 years were more likely than men of the same age to have reported their night sweats to their physicians (15% vs. 6%; P < .001). The reverse was true for those 70 years and older (7% vs 13%; P =.08). Older patients with pure night sweats were more likely than younger patients to have reported them. After controlling for other variables, patients who were older (odds ratio [OR] = 1.03 per year of age; P < .001), those with night and day sweats (OR = 1.74; P =.0015), and those who reported that their night sweats bothered others (OR = 2.89; P =.001) were more likely to have reported the symptom to their physicians. Those who had reported their night sweats were also more likely to have hot flashes (OR = 2.98; P < .001) and to take estrogen (OR = 1.72; P =.003).

 

 

Factors associated with night sweats

The only variable associated with pure night sweats after controlling for all other variables was panic attacks. Variables associated with night and day sweats were younger age, greater body mass index, hot flashes, chronic infection, sleep disturbances, selective serotonin reuptake inhibitors (SSRIs), tricyclic antidepressants, “other” (non–SSRI, non-tricyclic) antidepressants, and xanthines.

For women, the only variable clearly associated with pure night sweats in the multivariate model was hot flashes. Panic attacks nearly reached significance (P =.026) and improved the regression model substantially (deviance reduced from 1446 to 87). Variables associated with night and day sweats were weight, sleep problems, hot flashes, antihistamines, SSRIs, and other (non–SSRI, nontricyclic) antidepressants.

For men, the only variable associated with pure night sweats after controlling for other variables was sleep problems. After exclusion of sleep problems and sedatives from the model on the assumption that they might be the result rather than the cause of night sweats, significant predictors were hot flashes (OR = 2.70; 95% confidence interval [CI], 1.35-5.40; P =.005) and regular use of multivitamins (OR = 1.87; 95% CI, 1.17-2.99; P =.009). Variables associated with night and day sweats included greater weight, hot flashes, and greater alcohol use. The ORs and CIs are shown in Table 1.

Interestingly, 32 men (5%) reported hot flashes, and those who did were more likely to report night sweats of both types. Men with hot flashes were evenly distributed across age categories. Their night sweats were more frequent, but not more severe, and they were more likely to bother others than those without hot flashes. Men with hot flashes were more likely to have told their physicians about their night sweats. After controlling for other variables, men with hot flashes were much more likely to have panic attacks (OR = 28.28; P < .001).

Patients 70 years and older made up 19.5% of our sample (N=429). The only factor associated with pure night sweats in the multivariate model was sleep disturbances (OR = 2.04; = 95% CI, 1.21-3.42; P =.007). Exclusion of sleep disturbances left no associated variables. Variables associated with night and day sweats were hot flashes (OR = 15.14; = 95% CI, 6.43-35.68; P < .001) and corticosteroids (OR = 5.45; 95% CI 1.58-18.86; P =.007).

Suspected causes

In cases where patients reported night sweats, only 19% of the patients and 18% of their physicians recorded opinions regarding causation. The suspected causes listed by patients and physicians were similar. Both groups listed menopause most frequently (48% and 44%, respectively). Other etiologies proposed were stress (12% and 8%) and medications (9% and 10%). Physicians listed diabetes as a possible cause in 11% of cases while only 4% of patients listed it. Other suspected causes included obesity, pregnancy, gastroesophageal reflux disease, sleep discomforts, and ambient temperature.

TABLE 2
Associations between independent variables and night sweats in men and women after using logistic regression modeling to control for all other variables

Patient groupPure night sweatsNight and day sweats
 VariableOR (95% CI)VariableOR (95% CI)
AllPanic attacks4.80 (1.69-13.63)Age*0.99 per yr (0.98-0.99)
BMI1.03 per unit (1.02-1.05)
Hot flashes7.23 (5.45-9.58)
Chronic infections2.05 (1.22-3.42)
Sleep problems1.54 (1.16-2.04)
SSRIs1.82 (1.22-2.70)
TCAs2.43 (1.25-4.74)
Other antidepressants2.85 (1.66-4.89)
Xanthines5.48 (1.60-18.81)
MenSleep problems2.54 (1.7-3.8)Weightper lb (1.00-1.02)
Hot flashes9.41 (4.50-19.8)
Alcohol3.87 (1.60-9.20)
WomenHot flashes3.35 (1.13-9.95)Weight1.01 per lb (1.00-1.01)
Panic attacks4.47 (1.20-16.69)Sleep problems1.74 (1.30-240)
Hot flashes6.75 (5.00-9.20)
SSRIs2.01 (1.30-3.10)
Other antidepressants2.85 (1.70-5.90)
Antihistamines1.88 (1.20-2.90)
*Younger age was associated with a greater likelihood of night and day sweats. Otherwise, presence of or increasing amount of each variable was associated with a greater likelihood of night sweats.
OR denotes odds ratio; CI, confidence interval; BMI, body mass index; SSRIs, selective serotonin reuptake inhibitors; TCAs, tricyclic antidepressants.

Discussion

As far as we know, this is the first systematic study of night sweats in a primary care population. It is exploratory in nature, and, because of its cross-sectional design, no firm conclusions can be drawn about causation.

Both pure night sweats and night and day sweats are extremely common, with a peak prevalence in men and women 41 to 55 years of age. In contrast to pure night sweats, night and day sweats are experienced infrequently by patients 70 years and older. The factors associated with pure night sweats are somewhat different than those associated with night and day sweats, suggesting different, though probably overlapping, sets of causes. The different associations seen for men and women, and for older and younger patients, are also noteworthy. Patients often fail to report night sweats to their primary care physician, even when frequent and severe, associated with sleep disturbances, or bothersome to others.

Because of the sampling method (ie, consecutive patients rather than a random sample of active patients), the prevalence estimates reflect the frequency at which physicians can expect to encounter patients with this symptom, rather than the prevalence of night sweats among active patients. Since patients with more symptoms probably see physicians more often, we assume we have overestimated the true prevalence of night sweats in the larger population. Participating physicians were also not selected randomly. It is impossible to know how this may have affected our results.

 

 

We were surprised that so few of our independent variables were associated with pure night sweats: only panic attacks (all patients), sleep disorders (men and older patients), and hot flashes (women). Factors not associated with pure night sweats included obesity; diabetes, insulin, or oral hypoglycemic agents; acute or chronic infections; gastroesophageal reflux disease; or thyroid medications. Pure night sweats were also not specifically associated with estrogen and progesterone, although they were associated with hot flashes. There was also no association of pure night sweats and alcohol consumption.

The fact that physicians and their patients could only speculate on a cause for night sweats in 1 out of 5 cases suggests a lack of familiarity with the multitude of suspected causes, a failure to detect certain common causes (eg, sleep disorders and panic attacks), or, most likely, that many common causes of night sweats have yet to be elucidated. If the last is correct, it may be an example of the bias in the primary and secondary clinical literature that occurs when clinical research is carried out primarily in the subspecialty clinics of academic medical centers.4-7 Our findings speak to the need for greater support for primary care practice-based research.8,9

In retrospect, the omission of the variable “panic attacks” from the Oklahoma cards was a mistake, since this variable was correlated with pure night sweats in women. It may have been more strongly associated with pure night sweats in men as well, if the number of respondents to this question had been larger. Also, some men complained of hot flashes, and when they did, they were more likely to have night sweats and panic attacks, suggesting that both hot flashes and night sweats in men should prompt physicians to ask additional questions about panic disorder. Although race was also omitted from the Oklahoma cards, this variable did not seem to be associated with differences in night sweats prevalence or association among those for whom this information was available.

The definition and description of night sweats used in this study were arbitrary and may have influenced the prevalence rates obtained. We attempted to exclude environmental temperature as a cause. Although the definitions provided clearly stated “within the last month,” the data collection cards did not specify a time interval. This may have resulted in some variation in interpretation.

The decisions that were made regarding logistic modeling strategies were conservative and may have excluded some important variables. However, with so many variables and no basis on which to judge a priori, we felt that a conservative approach was best. The decision to include in the models variables (eg, sleep problems and sedatives that might be considered consequences) rather than causes of night sweats, was also arbitrary and may have affected the results. An alternative explanation of the associations found between night sweats and sleep problems is that those who are unable to sleep for other reasons are more likely to notice excessive sweating than those who are asleep.

Future studies should more carefully examine factors found in this study to be associated with night sweats, such as panic attacks and sleep disorders, and other potential etiologic factors not considered, such as tobacco abuse, allergic diseases, migraines, congestive heart failure, and chronic lung disease. Given the high prevalence, future studies examining etiology should include appropriate control groups. Case-control and prospective studies should evaluate the natural history of both night sweats patterns and their association with quality and length of life. The potential value of night sweats as a clue to the early diagnosis of important under-recognized pathologies, such as sleep disorders and panic attacks, should be investigated. Finally, randomized trials of treatments to reduce the frequency, severity, and impact of night sweats should be undertaken once the potential causes have been better elucidated.

Acknowledgments

This research was made possible by a grant from the American Academy of Family Physicians Foundation. We would like to acknowledge the assistance of Lavonne Glover in preparing the manuscript and to the following practicing family physicians and their staff who made time in their busy schedules to collect the data: Nathan Boren, Jo Ann Carpenter, Stephen Cobb, Ed Farrow, Cary Fisher, Helen Franklin, Kurt Frantz, David Hadley, Terrill Hulson, Joe Jamison, Dee Legako, Migy Mathew, Tomas Owens, John Pittman, Mike Pontious, Paul Preslar, R. Scott Stewart, David Strickland, Clinton Strong, Terry Truong, Keith Underhill, Kyle Waugh, Dan Woiwode, Mike Woods, Rick Edwards, Bob C. Jones, Leah R. Mabry, Tom Mueller, Mike Ragsdale, Hugh Wilson, Frank D. Wright, and Samuel T. Coleridge.

ABSTRACT

OBJECTIVE: To estimate the prevalence and factors associated with night sweats among adult primary care patients.

STUDY DESIGN: This was a cross-sectional study.

POPULATION: Adult patients in 2 primary care practice-based research networks (PBRNs) during 1 week in the summer and 1 week in the winter in the years 2000 and 2001.

OUTCOMES MEASURES: We measured the prevalence of pure night sweats and night and day sweats in all patients and subgroups defined by age and sex, clinical variables associated with night sweats, and the frequency, severity, and rate of reporting.

RESULTS: Of the 2267 patients who participated, 41% reported experiencing night sweats within the last month, including 23% with pure night sweats and an additional 18% with day and night sweats. The prevalence of night sweats in both men and women was highest in the group aged 41 years to 55 years. In multivariate analyses, factors associated with pure night sweats in women were hot flashes and panic attacks; in men, sleep problems. Variables associated with night and day sweats in women were increased weight, hot flashes, sleep disturbances, and use of antihistamines, selective serotonin reuptake inhibitors (SSRIs), and other (non-SSRI, non-tricyclic) antidepressants; in men, increased weight, hot flashes, and greater alcohol use. A majority of patients had not reported their night sweats to their physicians, even when frequent and severe.

CONCLUSIONS: Night sweats are common and under-reported. Pure night sweats and night and day sweats may have different causes. With regard to the etiologies of pure night sweats, panic attacks and sleep disorders need further investigation.

KEY POINTS FOR CLINICIANS

  • Night sweats are a common experience for primary care patients, but they are frequently not reported to their physicians.
  • There appear to be 2 somewhat distinct patterns of night sweats: pure night sweats and night and day sweats.
  • A history of night sweats should prompt questions about menopause, panic attacks, sleep problems, and certain medications.

Night sweats have been attributed to tuberculosis, other acute and chronic febrile illnesses, menopause, pregnancy, hyperthyroidism, nocturnal hypoglycemia, other endocrine problems, neurologic diseases, sleep disorders (eg, sleep apnea and nightmares), malignancies, autoimmune diseases, coronary artery spasm, congestive heart failure, gastroesophageal reflux disease, psychiatric disorders, and certain medications. In 36 medical and surgical textbooks, night sweats were always discussed within sections covering specific diseases and never as a separate topic. References to the primary literature were never provided. We also searched Micromedix, a comprehensive source of information on medications, using “sweating” and “diaphoresis” as search terms.1 Table W1 contains a comprehensive list of proposed causes of night sweats identified in our searches and accompanying references.

Only 2 epidemiologic studies of night sweats were found in the English language literature. Lea and Aber2 interviewed 174 patients randomly selected from the inpatient units of a university hospital and found that 33% of nonobstetric patients and 60% of obstetric patients reported having had night sweats during the previous 3 months. Twenty-six percent of those with night sweats reported that their nighttime sweating was severe enough to require bathing and changing of bed linens. Reynolds,3 a gastroenterologist, queried 200 consecutive patients seen in his outpatient practice and found that 40% remembered experiencing night sweats at least once during the previous year. A total of 12% reported at least weekly night sweats. A review of the records of 750 patients at the Geriatric Continuity Clinic at the University of Oklahoma Family Medicine Center revealed that 10% reported having experienced night sweats during the previous month, when the question was asked as part of a standard review of systems questionnaire (J.W.M., unpublished data, 1999).

Our study was conducted in an effort to estimate the prevalence of night sweats in adult patients seen in primary care office settings, and to explore the associations of this symptom with demographic factors, physical characteristics, medical problems, and medications. We also sought to determine how distressing this symptom is to those who have it and to their sleep partners, whether patients are likely to report the symptom to their physicians, and what patients and their physicians think causes night sweats in individual cases.

Methods

Physician members of the Oklahoma Physicians Resource/Research Network (OKPRN) and the Texas Academy of Family Physicians Research Network (TAFP-Net) enrolled consecutive patients 18 years and older seen in their clinics during a 1-week period in the summer and a second 1-week period in the winter in the years 2000 and 2001. Patients who agreed to participate signed a consent form and then helped the nurse and physician complete a brief questionnaire on a preaddressed, stamped data collection card. For those who declined to participate, a card was generated containing the physician’s code number and the patient’s age and sex. Questions elicited demographic information; information about a selected set of medical conditions; medications, vitamins, herbs, and alcohol used regularly; and information about recent experiences with night sweats. Participating physicians were asked to check the questionnaires for accuracy and to record their opinions regarding the cause of the patients’ night sweats when they reported having had them. A laminated card with definitions of terms was provided to each physician.

 

 

“Night sweats” was defined as “sweating at night even when it isn’t excessively hot in your bedroom.” “Day sweats” was defined as “excessive sweating during the daytime.” “Pure night sweats” was defined as night sweats, but not day sweats, and “night and day sweats” as the combination of the 2. The time interval was specified as “during the last month.”

Completed questionnaires were mailed to the Oklahoma Center for Family Medicine Research for data entry and analysis. The data collection cards used by the Texas network included questions about race/ethnicity and panic attacks that were not included on the Oklahoma cards. Inadvertently, some of the Texas cards did not include the question about daytime sweating.

Statistix7 (Analytical Software, Tallahassee, Fla) was used for all statistical analyses. Medications were assigned to 1 of 47 categories according to their primary pharmacologic effects. Summary statistics were calculated for all participants and for the following subgroups: season (summer and winter), pattern of night sweats (excessive nighttime sweating only or night and day sweats), and age group. We anticipated that the majority of women with menopausal symptoms would be in the 41- to 55-year age group.

The two patterns of night sweats, “pure night sweats” and “night and day sweats,” were analyzed separately, and by sex and age. Logistic regression was used to identify the most significant predictors of night sweats while controlling for other variables. Variables were entered into the logistic models if they had a univariate association with the dependent variable at a P value of less than .05. They were then removed one at a time, in the order of largest to smallest P value, if they had a P value of greater than .01 after controlling for other variables. Conservative P values were chosen because of the large numbers of variables considered, in order to reduce the probability of type 1 errors. When appropriate, 95% confidence intervals were calculated.

Results

Study population

A total of 2267 patients of 31 different physicians participated in this study, including 1888 patients of 24 Oklahoma physicians and 379 patients of 7 Texas physicians. Their mean (standard deviation) age was 50.7 (18.8) years, with a range of 18 to 97 years. Sixty-nine percent were women. A total of 99% of Oklahoma patients and 93% of Texas patients seen during the study weeks agreed to participate in the study. Among Texas participants, 53% were Hispanic whites, 33% were non-Hispanic whites, 13% were African Americans, and 1% were categorized as other. On the basis of prior OKPRN studies, we suspect that approximately 90% of Oklahoma patients were non-Hispanic whites, but exact proportions were not determined for this study.

Prevalence of night sweats

The prevalence of pure night sweats, night and day sweats, and any night sweats are shown in Table 1. While the prevalence of night and day sweats was lower for older patients, severity tended to be greater. Severity and frequency were positively correlated for all categories of night sweats and for all subgroups of patients (overall Spearman coefficient = 0.33; P < .001). Overall, the frequencies of night sweats among those who reported the condition were: almost never, 18%; 1 to 3 nights per month, 38%; 1 to 3 nights per week, 27%; and 4 to 7 nights per week, 16%. Ten percent of both women and men with night sweats said that their night sweats were bothersome to others.

TABLE 1
Percentage of patients with pure night sweats and night and day

Patient group, by sex and age, in yearsPure night sweats % (95% CI)Night and day sweats % (95% CI)Any night sweats % (95% CI)
All patients23 (21-24)18 (16-20)41 (39-43)
Men22 (19-26)12 (9-14)34 (30-38)
  18-4020 (14-26)14 (9-19)35 (28-42)
  41-5525 (18-32)14 (9-19)40 (33-47)
  56-6924 (16-32)12 (6-18)38 (30-46)
  70+20 (13-27)6 (2-10)26 (19-33)
Women23 (21-25)21 (19-24)44 (42-47)
  18-4022 (18-26)19 (15-23)42 (38-46)
  41-5529 (24-34)32 (28-37)61 (56-66)
  56-6922 (18-27)23 (18-28)43 (37-49)
  70+19 (14-24)9 (5-13)29 (24-34)
CI denotes confidence interval.

Frequency of reporting of night sweats

A minority of patients with night sweats (12%) had reported the symptom to their physicians. This was true even for those with severe night sweats (46%). Women younger than 70 years were more likely than men of the same age to have reported their night sweats to their physicians (15% vs. 6%; P < .001). The reverse was true for those 70 years and older (7% vs 13%; P =.08). Older patients with pure night sweats were more likely than younger patients to have reported them. After controlling for other variables, patients who were older (odds ratio [OR] = 1.03 per year of age; P < .001), those with night and day sweats (OR = 1.74; P =.0015), and those who reported that their night sweats bothered others (OR = 2.89; P =.001) were more likely to have reported the symptom to their physicians. Those who had reported their night sweats were also more likely to have hot flashes (OR = 2.98; P < .001) and to take estrogen (OR = 1.72; P =.003).

 

 

Factors associated with night sweats

The only variable associated with pure night sweats after controlling for all other variables was panic attacks. Variables associated with night and day sweats were younger age, greater body mass index, hot flashes, chronic infection, sleep disturbances, selective serotonin reuptake inhibitors (SSRIs), tricyclic antidepressants, “other” (non–SSRI, non-tricyclic) antidepressants, and xanthines.

For women, the only variable clearly associated with pure night sweats in the multivariate model was hot flashes. Panic attacks nearly reached significance (P =.026) and improved the regression model substantially (deviance reduced from 1446 to 87). Variables associated with night and day sweats were weight, sleep problems, hot flashes, antihistamines, SSRIs, and other (non–SSRI, nontricyclic) antidepressants.

For men, the only variable associated with pure night sweats after controlling for other variables was sleep problems. After exclusion of sleep problems and sedatives from the model on the assumption that they might be the result rather than the cause of night sweats, significant predictors were hot flashes (OR = 2.70; 95% confidence interval [CI], 1.35-5.40; P =.005) and regular use of multivitamins (OR = 1.87; 95% CI, 1.17-2.99; P =.009). Variables associated with night and day sweats included greater weight, hot flashes, and greater alcohol use. The ORs and CIs are shown in Table 1.

Interestingly, 32 men (5%) reported hot flashes, and those who did were more likely to report night sweats of both types. Men with hot flashes were evenly distributed across age categories. Their night sweats were more frequent, but not more severe, and they were more likely to bother others than those without hot flashes. Men with hot flashes were more likely to have told their physicians about their night sweats. After controlling for other variables, men with hot flashes were much more likely to have panic attacks (OR = 28.28; P < .001).

Patients 70 years and older made up 19.5% of our sample (N=429). The only factor associated with pure night sweats in the multivariate model was sleep disturbances (OR = 2.04; = 95% CI, 1.21-3.42; P =.007). Exclusion of sleep disturbances left no associated variables. Variables associated with night and day sweats were hot flashes (OR = 15.14; = 95% CI, 6.43-35.68; P < .001) and corticosteroids (OR = 5.45; 95% CI 1.58-18.86; P =.007).

Suspected causes

In cases where patients reported night sweats, only 19% of the patients and 18% of their physicians recorded opinions regarding causation. The suspected causes listed by patients and physicians were similar. Both groups listed menopause most frequently (48% and 44%, respectively). Other etiologies proposed were stress (12% and 8%) and medications (9% and 10%). Physicians listed diabetes as a possible cause in 11% of cases while only 4% of patients listed it. Other suspected causes included obesity, pregnancy, gastroesophageal reflux disease, sleep discomforts, and ambient temperature.

TABLE 2
Associations between independent variables and night sweats in men and women after using logistic regression modeling to control for all other variables

Patient groupPure night sweatsNight and day sweats
 VariableOR (95% CI)VariableOR (95% CI)
AllPanic attacks4.80 (1.69-13.63)Age*0.99 per yr (0.98-0.99)
BMI1.03 per unit (1.02-1.05)
Hot flashes7.23 (5.45-9.58)
Chronic infections2.05 (1.22-3.42)
Sleep problems1.54 (1.16-2.04)
SSRIs1.82 (1.22-2.70)
TCAs2.43 (1.25-4.74)
Other antidepressants2.85 (1.66-4.89)
Xanthines5.48 (1.60-18.81)
MenSleep problems2.54 (1.7-3.8)Weightper lb (1.00-1.02)
Hot flashes9.41 (4.50-19.8)
Alcohol3.87 (1.60-9.20)
WomenHot flashes3.35 (1.13-9.95)Weight1.01 per lb (1.00-1.01)
Panic attacks4.47 (1.20-16.69)Sleep problems1.74 (1.30-240)
Hot flashes6.75 (5.00-9.20)
SSRIs2.01 (1.30-3.10)
Other antidepressants2.85 (1.70-5.90)
Antihistamines1.88 (1.20-2.90)
*Younger age was associated with a greater likelihood of night and day sweats. Otherwise, presence of or increasing amount of each variable was associated with a greater likelihood of night sweats.
OR denotes odds ratio; CI, confidence interval; BMI, body mass index; SSRIs, selective serotonin reuptake inhibitors; TCAs, tricyclic antidepressants.

Discussion

As far as we know, this is the first systematic study of night sweats in a primary care population. It is exploratory in nature, and, because of its cross-sectional design, no firm conclusions can be drawn about causation.

Both pure night sweats and night and day sweats are extremely common, with a peak prevalence in men and women 41 to 55 years of age. In contrast to pure night sweats, night and day sweats are experienced infrequently by patients 70 years and older. The factors associated with pure night sweats are somewhat different than those associated with night and day sweats, suggesting different, though probably overlapping, sets of causes. The different associations seen for men and women, and for older and younger patients, are also noteworthy. Patients often fail to report night sweats to their primary care physician, even when frequent and severe, associated with sleep disturbances, or bothersome to others.

Because of the sampling method (ie, consecutive patients rather than a random sample of active patients), the prevalence estimates reflect the frequency at which physicians can expect to encounter patients with this symptom, rather than the prevalence of night sweats among active patients. Since patients with more symptoms probably see physicians more often, we assume we have overestimated the true prevalence of night sweats in the larger population. Participating physicians were also not selected randomly. It is impossible to know how this may have affected our results.

 

 

We were surprised that so few of our independent variables were associated with pure night sweats: only panic attacks (all patients), sleep disorders (men and older patients), and hot flashes (women). Factors not associated with pure night sweats included obesity; diabetes, insulin, or oral hypoglycemic agents; acute or chronic infections; gastroesophageal reflux disease; or thyroid medications. Pure night sweats were also not specifically associated with estrogen and progesterone, although they were associated with hot flashes. There was also no association of pure night sweats and alcohol consumption.

The fact that physicians and their patients could only speculate on a cause for night sweats in 1 out of 5 cases suggests a lack of familiarity with the multitude of suspected causes, a failure to detect certain common causes (eg, sleep disorders and panic attacks), or, most likely, that many common causes of night sweats have yet to be elucidated. If the last is correct, it may be an example of the bias in the primary and secondary clinical literature that occurs when clinical research is carried out primarily in the subspecialty clinics of academic medical centers.4-7 Our findings speak to the need for greater support for primary care practice-based research.8,9

In retrospect, the omission of the variable “panic attacks” from the Oklahoma cards was a mistake, since this variable was correlated with pure night sweats in women. It may have been more strongly associated with pure night sweats in men as well, if the number of respondents to this question had been larger. Also, some men complained of hot flashes, and when they did, they were more likely to have night sweats and panic attacks, suggesting that both hot flashes and night sweats in men should prompt physicians to ask additional questions about panic disorder. Although race was also omitted from the Oklahoma cards, this variable did not seem to be associated with differences in night sweats prevalence or association among those for whom this information was available.

The definition and description of night sweats used in this study were arbitrary and may have influenced the prevalence rates obtained. We attempted to exclude environmental temperature as a cause. Although the definitions provided clearly stated “within the last month,” the data collection cards did not specify a time interval. This may have resulted in some variation in interpretation.

The decisions that were made regarding logistic modeling strategies were conservative and may have excluded some important variables. However, with so many variables and no basis on which to judge a priori, we felt that a conservative approach was best. The decision to include in the models variables (eg, sleep problems and sedatives that might be considered consequences) rather than causes of night sweats, was also arbitrary and may have affected the results. An alternative explanation of the associations found between night sweats and sleep problems is that those who are unable to sleep for other reasons are more likely to notice excessive sweating than those who are asleep.

Future studies should more carefully examine factors found in this study to be associated with night sweats, such as panic attacks and sleep disorders, and other potential etiologic factors not considered, such as tobacco abuse, allergic diseases, migraines, congestive heart failure, and chronic lung disease. Given the high prevalence, future studies examining etiology should include appropriate control groups. Case-control and prospective studies should evaluate the natural history of both night sweats patterns and their association with quality and length of life. The potential value of night sweats as a clue to the early diagnosis of important under-recognized pathologies, such as sleep disorders and panic attacks, should be investigated. Finally, randomized trials of treatments to reduce the frequency, severity, and impact of night sweats should be undertaken once the potential causes have been better elucidated.

Acknowledgments

This research was made possible by a grant from the American Academy of Family Physicians Foundation. We would like to acknowledge the assistance of Lavonne Glover in preparing the manuscript and to the following practicing family physicians and their staff who made time in their busy schedules to collect the data: Nathan Boren, Jo Ann Carpenter, Stephen Cobb, Ed Farrow, Cary Fisher, Helen Franklin, Kurt Frantz, David Hadley, Terrill Hulson, Joe Jamison, Dee Legako, Migy Mathew, Tomas Owens, John Pittman, Mike Pontious, Paul Preslar, R. Scott Stewart, David Strickland, Clinton Strong, Terry Truong, Keith Underhill, Kyle Waugh, Dan Woiwode, Mike Woods, Rick Edwards, Bob C. Jones, Leah R. Mabry, Tom Mueller, Mike Ragsdale, Hugh Wilson, Frank D. Wright, and Samuel T. Coleridge.

References

1. “MICROMEDEX” Healthcare Series. Englewood, Colorado. Available online at http://www.micromedex.com/. Accessed in June 2001.

2. Lea MJ, Aber RC. Descriptive epidemiology of night sweats upon admission to a university hospital. South Med J 1985;78:1065-7.

3. Reynolds WA. Are night sweats a sign of esophageal reflux? [Letter] J Clin Gastroenterol 1989;11:590-1.

4. White KC, Williams FF, Greenburg BG. The ecology of medical care. N Engl J Med 1961;265:885-92.

5. Rosser WW, Green L. Update from the ambulatory sentinel practice network of North America. Can Fam Phys 1989;35:843-6.

6. Smith FO. Practice-based research: opportunities for the clinician. So Med J 1991;84:479-82.

7. Green LA, Hames CG, Jr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-6.

8. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-8.

9. Green LA, Dovey SM. Practice based primary care research networks. BMJ 2001;322:567-8.

References

1. “MICROMEDEX” Healthcare Series. Englewood, Colorado. Available online at http://www.micromedex.com/. Accessed in June 2001.

2. Lea MJ, Aber RC. Descriptive epidemiology of night sweats upon admission to a university hospital. South Med J 1985;78:1065-7.

3. Reynolds WA. Are night sweats a sign of esophageal reflux? [Letter] J Clin Gastroenterol 1989;11:590-1.

4. White KC, Williams FF, Greenburg BG. The ecology of medical care. N Engl J Med 1961;265:885-92.

5. Rosser WW, Green L. Update from the ambulatory sentinel practice network of North America. Can Fam Phys 1989;35:843-6.

6. Smith FO. Practice-based research: opportunities for the clinician. So Med J 1991;84:479-82.

7. Green LA, Hames CG, Jr, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994;38:400-6.

8. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA 1999;281:686-8.

9. Green LA, Dovey SM. Practice based primary care research networks. BMJ 2001;322:567-8.

Issue
The Journal of Family Practice - 51(05)
Issue
The Journal of Family Practice - 51(05)
Page Number
452-456
Page Number
452-456
Publications
Publications
Article Type
Display Headline
Prevalence of night sweats in primary care patients
Display Headline
Prevalence of night sweats in primary care patients
Legacy Keywords
,Primary careprimary-based research networkdiaphoresisepidemiology. (J Fam Pract 2002; 51:452–456)
Legacy Keywords
,Primary careprimary-based research networkdiaphoresisepidemiology. (J Fam Pract 2002; 51:452–456)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Maternal assessment of neonatal jaundice after hospital discharge

Article Type
Changed
Display Headline
Maternal assessment of neonatal jaundice after hospital discharge

 

ABSTRACT

OBJECTIVE: To determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge.

STUDY DESIGN: Mothers were taught how to examine their infants for jaundice by determining the extent of caudal progression of jaundice and by using an Ingram icterometer. The mothers documented the examinations for 7 days after discharge. Home health nurses examined the babies for jaundice after discharge and obtained serum bilirubin levels.

POPULATION: Mothers of infants cared for in the normal newborn nursery of a 340-bed community hospital.

OUTCOME MEASURED: Maternal assessment of the presence of jaundice and its caudal progression.

RESULTS: Jaundice extending below the nipple line had a positive predictive value of 55% and a negative predictive value of 86% for identifying infants with bilirubin levels of 12 mg/dL. Icterometer readings of 2.5 had a positive predictive value of 44% and a negative predictive value of 87% for identifying infants with bilirubin levels of 12 mg/dL. The 3 infants with bilirubin levels 17 mg/dL were recognized by their mothers as having jaundice below the nipple line and had icterometer readings of 2.5.

CONCLUSIONS: Further study is needed to determine the optimum method of parental education about newborn jaundice. However, maternal use of the Ingram icterometer and determination of jaundice in relation to the infant’s nipple line are both potentially useful methods of assessing jaundice after hospital discharge.

 

KEY POINTS FOR CLINICIANS

 

  • Although kernicterus, or bilirubin encephalopathy, is preventable, it is still occurring.
  • Parents should be provided with educational materials about newborn infants that include information about jaundice.
  • It may be useful for parents to be instructed how to assess the level of jaundice in their infant or to be given an Ingram icterometer to monitor their infants for jaundice after discharge.

From 1% to 4% of full-term infants are readmitted to the hospital for jaundice in the first week of life, representing as many as 109,000 admissions1 Delayed diagnosis of jaundice puts babies at risk for kernicterus, which had virtually disappeared in the United States but is now on the rise. There are anecdotal reports of 22 full-term infants born in the early 1990s who developed kernicterus after discharge from the hospital within 48 hours of birth.1 The Joint Commission on Accreditation of Healthcare Organizations (JCAHO) recently issued a Sentinel Event Alert recommending that organizations take steps to raise awareness among neonatal caregivers of the potential for kernicterus and its risk factors by reviewing their current patient care processes with regard to the identification and management of hyperbilirubinemia in newborns and by identifying risk reduction strategies that could enhance the effectiveness of these processes.2

The JCAHO alert cites the American Academy of Pediatrics (AAP) Practice Parameter for Management of Hyperbilirubinemia in the Healthy Term Newborn, which is based on available data and expert consensus, as an example of a guideline for identifying at-risk newborns and their diagnosis and treatment. The AAP guideline suggests checking for jaundice by blanching the skin with digital pressure to reveal its underlying color. The guideline states that clinical assessment must be done in a well-lighted room and suggests that as the bilirubin level rises, the extent of caudal progression may be helpful in quantifying the degree of jaundice.3

The AAP jaundice guideline suggests that the use of an icterometer (transcutaneous jaundice meter) may be helpful in the clinical assessment of jaundice.3 A variety of instruments have been tested in different patient populations.4-8 A potential role for such devices is their use by parents. The Ingram icterometer (Cascade Health Care Products, Salem, Ore.) is particularly promising because of its low cost ($17) and simplicity.5 It is a simple handheld device, made of clear plastic, on which are painted 5 transverse stripes of precise and graded hue. The stripes and spaces between them are 3/16 inch wide and are numbered from 1 (lightest in color) to 5 (darkest). When the icterometer is used, the painted side is pressed against the tip of the infant’s nose until the skin becomes blanched. The yellow color of the blanched skin can then be matched with the yellow stripes on the instrument, and a jaundice score assigned.

The purpose of my study was to determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge. Maternal assessments were compared with bilirubin levels and home health nurse assessments to determine their accuracy. Serum bilirubin levels were used as the reference standard. Maternal comfort with the examination techniques was also assessed.

 

 

Methods

This study was approved by the Ramsey (now Regions) Hospital institutional review board. Mothers who gave birth at Regions Hospital in St. Paul, Minn., participated in the study. Mothers on the postpartum ward were invited to participate, but were excluded if they were not proficient in reading English, did not have a telephone, or lived more than 10 miles from the hospital. Infants were excluded if they were in the intensive care nursery, were not discharged on the same day as the mother, or if they received phototherapy. Mothers were advised to follow their health care providers’ instructions about timing for the first follow-up visit, and any provider instructions regarding jaundice.

After obtaining consent, the author or a study assistant showed the mothers how to examine their infants for jaundice by 2 methods. Each mother was instructed to examine her baby in a well-lighted room. First, the mother was shown how to look for jaundice by digitally blanching the skin on the cheek. The mother then documented whether she saw any underlying yellow color on her baby. Next, the mother was shown how to determine the caudal progression of the jaundice and to draw a horizontal line on an illustration of a baby corresponding to where the jaundice ended. The distance from the top of the infant’s head to the line drawn by the mother was used to determine the caudal progression. The mother was then shown how to use the Ingram icterometer and obtain a reading from the baby’s nose. Each mother was given an icterometer and a study booklet to document her examination for a total of 7 days, beginning the day after discharge from the hospital. The study booklet also contained some demographic questions, and questions about the mother’s comfort level with both methods of jaundice assessment. The mother was instructed to return the booklet and icterometer by mail when completed. The mother was sent a $25 gift certificate when the study materials were returned.

Within 7 days of discharge, a home health nurse visited each mother and infant in the home. The nurses were trained in the same methods of clinically assessing jaundice, and they assessed each infant by visually determining the caudal progression and by use of the icterometer. The nurse did not share the results of her examination with the mother. The nurse obtained bilirubin levels from all infants and notified the infants’ health care providers of any bilirubin levels higher than 14 mg/dL.

Standard descriptive statistics were calculated for all variables. Categorical relationships were assessed using kappa and chi-square statistics, as appropriate. All analyses were performed using Statistical Package for Social Sciences for Windows, version 10.0.5.

Results

A total 113 of 177 mothers returned their study packets. Home health nurses visited 96 of the 113 mothers; the other 17 mothers were not visited because they declined the visits or could not be located. Although all babies were to have serum bilirubin levels determined whether or not they appeared jaundiced, only 90 of the 96 infants had the blood test. For the other 6 infants, either insufficient blood was drawn or the mother refused the test. On the day of the nurse’s visits, mothers documented in their study booklets the caudal progression of jaundice (for 56 infants) and icterometer readings (for 55 infants).

The educational levels of the mothers were as follows: 15% completed grade school or less; 40% completed high school; and 45% completed college. The mothers reported being from the following racial and ethnic groups: white, 59%; Hispanic, 16%; black, 14%; Asian, 8%; and other, 3%. A total of 53% of the women were primiparous, 84% completed examination forms for their babies for all 7 days, and 53% assessed their infants as being jaundiced during the study.

On the day of the nurse’s visit, there was moderate agreement between the nurses and the mothers about the presence of jaundice in the infants (= 0.50; P < .001). For those infants with jaundice, there was little agreement on the extent of caudal progression between the nurses and the mothers (correlation = 0.36; P > 0.1), but there was moderate agreement between their icterometer readings (correlation = 0.58; P < .05).

The total serum bilirubin results ranged from 0.8 mg/dL to 18.8 mg/dL, with a mean of 7.4 mg/dL. The mean bilirubin level of infants thought to be jaundiced by their mothers was 11.3 mg/dL, while the mean bilirubin of infants not thought to be jaundiced was 4.8 mg/dL (P < .001).

The mothers’ icterometer readings and determinations of jaundice to the nipple line or below it are compared with bilirubin levels in (Table 1). (Table 2) summarizes the diagnostic accuracy of jaundice extending to the nipple line or below it, and for icterometer readings of 2.5, in identifying bilirubin levels of 12 mg/dL and 17 mg/dL. A bilirubin level of 12 mg/dL is the level at which the AAP guideline suggests considering phototherapy for infants aged 24 to 47 hours, and 17 mg/dL is the level at which phototherapy should be considered for infants older than 72 hours.3

 

 

The mothers of the 3 infants with bilirubin levels 17 mg/dL recognized that their infants were jaundiced and determined that the jaundice extended below the nipple line. The icterometer readings obtained by the mothers were 2.5, 3, and 3.5. The corresponding icterometer readings by the nurses were 4.5, 3.5 and 3.

The study booklet contained 6 questions about the mothers’ reactions to the study. Almost all of the mothers (98%) responded that the method for checking for caudal progression of jaundice was explained clearly, and even more (99%) felt the use of the icterometer was explained clearly. A total of 69% of the mothers felt it was “very easy” or “easy” to check for caudal progression, and 80% felt it was “very easy” or “easy” to use the icterometer. Forty-six percent of the mothers reported that checking their babies for jaundice made them “very worried” or “somewhat worried” about their babies’ health. Mothers with less education were significantly more likely to report being worried than mothers with higher education levels (P < .05). However, 93% of the mothers reported that checking their babies for jaundice made them “very reassured” or “somewhat reassured” about their babies’ health.

TABLE 1
Maternal assessment of jaundice, by caudal progression and icterometer readings, compared with serum bilirubin levels

 

Maternal test resultSerum bilirubin level (mg/dL)
 ≥ 12< 12≥ 17< 17
Icterometer ≥ 2.51114322
Icterometer < 2.5426030
Caudal progression at or above nipple line119317
Caudal progression below nipple line531036

TABLE 2
Diagnostic accuracy of maternal visual assessment of jaundice and of the Ingram icterometer

 

TestCut-off (serum bilirubin level, mg/dL)SNSPPV+PV-LR+LR-
Maternal visual assessment below the nipple line≥12.0697755 (CI, 52-58)86 (CI, 84-88)3.10.4
Ingram icterometer reading ≥ 2.5≥12.0736544 (CI, 41-47)87 (CI, 85-89)2.10.4
Maternal visual assessment below the nipple line≥17.01006815 (CI, 13-17)100 (CI, 67-100)3.120
Ingram icterometer reading ≥ 2.5≥17.01005812 (CI, 10–14)100 (CI, 67-100)2.40
SN denotes sensitivity; SP = specificity; PV+ = positive predictive value; PV- = negative predictive value; LR+ = positive likelihood ratio; LR- = negative likelihood ratio; CI = 95% confidence interval.

Discussion

The ability of mothers to detect and respond to jaundice in their newborns after discharge from the hospital has not been previously studied. Opinions about the value of parental education regarding jaundice vary markedly. The AAP recommends that all mothers be able to recognize signs of jaundice before discharge.9 Others are skeptical that such education will be helpful: “Experience suggests that asking mothers to observe infants for the development of jaundice is not satisfactory. Despite such instructions, it is difficult for many parents to recognize significant jaundice.”10

Several studies have documented that jaundice is first seen in the face and progresses caudally to the trunk and extremities.11-13 These studies also found good correlation between serum bilirubin levels and the advancement of dermal icterus. In a previous study, parents were able to accurately assess the caudal progression of jaundice while their babies were in the hospital.14 However, the bilirubin levels in that study were relatively low, reflecting the brief hospital stay of most of the infants. In contrast, a recent study concluded that the clinical examination for jaundice by nurses and physicians had poor reliability and only moderate correlation with bilirubin levels.15 The authors did conclude, however, that finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 12.0 mg/dL. In this study, finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 17.0 mg/dL.

Because of the relatively small number of infants having bilirubin levels high enough to require potential intervention, the measures of diagnostic accuracy in the tables should be interpreted with caution. However, the results of my study confirm several prior reports that restricting bilirubin testing to infants with icterometer readings 2.5 would have safely eliminated many unnecessary tests.6,14,16 Although most of the infants in my study were white, the efficacy of the icterometer has also been documented in Asian and black newborns.17

Previous studies have shown that neonatal jaundice and its treatments are associated with an increased risk of maternal behaviors consistent with the vulnerable child syndrome.18,19 This syndrome was originally described in 1964 in children whose parents believed that their child had suffered a “close call,” and thereafter perceived the child as vulnerable to serious injury or accident.18 Frequent blood tests to monitor bilirubin levels, supplementation or replacement of breast milk with formula, the physical separation of the mother and infant because of phototherapy, and prolonged hospitalization may create the impression that the infant is seriously ill, despite reassurances from medical personnel. Therefore, the mothers were asked whether the study itself served as a source of anxiety. Almost half of the mothers in this study reported that checking their babies for jaundice made them very or somewhat worried about their babies’ health. Some of the women must have felt ambivalent, however, because almost all of them (93%) also reported that checking their babies for jaundice made them very or somewhat reassured about their babies’ health. Most of the 48 comments written by the mothers in the study booklets were very positive.

 

 

Conclusions

One of the strategies recommended by the JCAHO to reduce the risk of kernicterus is to provide parents with adequate educational materials about newborn infants that include information about jaundice.2 The message given to parents should be consistent, and should reassure mothers that most jaundiced infants are basically healthy. My study results suggest that it may also be useful for parents to be shown how to visually assess jaundice or to be given an Ingram icterometer to monitor their infants for jaundice after hospital discharge. Further study is needed to determine the optimal method of parental education about newborn jaundice.

Acknowledgments

This study was funded by a grant from the Ramsey Foundation. The author thanks Laura Lantz, Pamela Ristau, Kim Stone, Annette Swain, Mary Jo Feely, and the nurses at Integrated Home Care for their assistance with this project.

References

 

1. Catz C, Hanson J, Simpson L, Yaffe S. Summary of workshop: early discharge and neonatal hyperbilirubinemia. Pediatrics 1995;96:743-5.

2. Joint Commission on Accreditation of Healthcare Organizations. Sentinel event alert issue 18: kernicterus threatens healthy new-borns; April 2001.

3. Provisional Committee for Quality Improvement and Subcommittee on Hyperbilirubinemia. Practice parameter: management of hyperbilirubinemia in the healthy term newborn. Pediatrics 1994;94:558-65.

4. Smith D, Martin D, Inguillo D, Vreman H, Cohen R, Stevenson D. Use of noninvasive tests to predict significant jaundice in full-term infants: preliminary studies. Pediatrics 1985;75:278-80.

5. Schumacher R. Noninvasive measurements of bilirubin in the newborn. Clin Perinatol 1990;17:417-35.

6. Narayanan I, Banwalikar J, Mehta R, et al. A simple method of evaluation of jaundice in the newborn. Ann Trop Paediatr 1990;10:31-4.

7. Yamanouchi I, Yamauchi Y, Igarashi I. Transcutaneous bilirubinometry: preliminary studies of noninvasive transcutaneous bilirubin meter in the Okayama National Hospital. Pediatrics 1980;65:195-202.

8. Knudsen A. Measurement of the yellow colour of the skin as a test of hyperbilirubinemia in mature newborns. Acta Paediatr Scand 1990;79:1175-81.

9. Committee on Fetus and Newborn. Hospital stay for healthy term newborns. Pediatrics 1995;96:788-90.

10. Maisels M, Newman T. Kernicterus in otherwise healthy, breast-fed term newborns. Pediatrics 1995;96:730-3.

11. Ebbesen F. The relationship between the cephalo-pedal progress of clinical icterus and the serum bilirubin concentration in newborn infants without blood type sensitization. Acta Obstet Gynecol Scand 1975;54:329-32.

12. Kramer LI. Advancement of dermal icterus in the jaundiced newborn. Am J Dis Child 1969;118:454-8.

13. Thong YH, Rahman AA, Choo M, Tor ST, Robinson MJ. Dermal icteric zones and serum bilirubin levels in neonatal jaundice. Singapore Med J 1976;17:184-5.

14. Madlon-Kay D. Recognition of the presence and severity of newborn jaundice by parents, nurses, physicians, and icterometer. Pediatrics 1997;100-e3.

15. Moyer V, Ahn C, Sneed S. Accuracy of clinical judgment in neonatal jaundice. Arch Pediatr Adolesc Med 2000;154:391-4.

16. Gosset I. A perspex icterometer for neonates. Lancet 1960;1:87-90.

17. Schumacher R, Thornbery J, Gutcher G. Transcutaneous bilirubinometry: a comparison of old and new methods. Pediatrics 1985;76:10-4.

18. Kemper K, Forsyth B, McCarthy P. Jaundice, terminating breast-feeding, and the vulnerable child. Pediatrics 1989;84:773-8.

19. Kemper K, Forsyth B, McCarthy P. Persistent perceptions of vulnerability following neonatal jaundice. Am J Dis Child 1990;144:238-41.

Article PDF
Author and Disclosure Information

 

DIANE J. MADLON-KAY, MD
St. Paul, Minnesota
From the Ramsey Family and Community Medicine Residency Program, St. Paul, Minnesota. The author reports no conflicts of interest. All requests for reprints should be addressed to Diane J. Madlon-Kay, MD, 860 Arcade St., St. Paul, MN 55106. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(05)
Publications
Topics
Page Number
445-448
Legacy Keywords
,Jaundice, neonatalhyperbilirubinemiaperinatal. (J Fam Pract 2002; 51:445–448)
Sections
Author and Disclosure Information

 

DIANE J. MADLON-KAY, MD
St. Paul, Minnesota
From the Ramsey Family and Community Medicine Residency Program, St. Paul, Minnesota. The author reports no conflicts of interest. All requests for reprints should be addressed to Diane J. Madlon-Kay, MD, 860 Arcade St., St. Paul, MN 55106. E-mail: [email protected].

Author and Disclosure Information

 

DIANE J. MADLON-KAY, MD
St. Paul, Minnesota
From the Ramsey Family and Community Medicine Residency Program, St. Paul, Minnesota. The author reports no conflicts of interest. All requests for reprints should be addressed to Diane J. Madlon-Kay, MD, 860 Arcade St., St. Paul, MN 55106. E-mail: [email protected].

Article PDF
Article PDF

 

ABSTRACT

OBJECTIVE: To determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge.

STUDY DESIGN: Mothers were taught how to examine their infants for jaundice by determining the extent of caudal progression of jaundice and by using an Ingram icterometer. The mothers documented the examinations for 7 days after discharge. Home health nurses examined the babies for jaundice after discharge and obtained serum bilirubin levels.

POPULATION: Mothers of infants cared for in the normal newborn nursery of a 340-bed community hospital.

OUTCOME MEASURED: Maternal assessment of the presence of jaundice and its caudal progression.

RESULTS: Jaundice extending below the nipple line had a positive predictive value of 55% and a negative predictive value of 86% for identifying infants with bilirubin levels of 12 mg/dL. Icterometer readings of 2.5 had a positive predictive value of 44% and a negative predictive value of 87% for identifying infants with bilirubin levels of 12 mg/dL. The 3 infants with bilirubin levels 17 mg/dL were recognized by their mothers as having jaundice below the nipple line and had icterometer readings of 2.5.

CONCLUSIONS: Further study is needed to determine the optimum method of parental education about newborn jaundice. However, maternal use of the Ingram icterometer and determination of jaundice in relation to the infant’s nipple line are both potentially useful methods of assessing jaundice after hospital discharge.

 

KEY POINTS FOR CLINICIANS

 

  • Although kernicterus, or bilirubin encephalopathy, is preventable, it is still occurring.
  • Parents should be provided with educational materials about newborn infants that include information about jaundice.
  • It may be useful for parents to be instructed how to assess the level of jaundice in their infant or to be given an Ingram icterometer to monitor their infants for jaundice after discharge.

From 1% to 4% of full-term infants are readmitted to the hospital for jaundice in the first week of life, representing as many as 109,000 admissions1 Delayed diagnosis of jaundice puts babies at risk for kernicterus, which had virtually disappeared in the United States but is now on the rise. There are anecdotal reports of 22 full-term infants born in the early 1990s who developed kernicterus after discharge from the hospital within 48 hours of birth.1 The Joint Commission on Accreditation of Healthcare Organizations (JCAHO) recently issued a Sentinel Event Alert recommending that organizations take steps to raise awareness among neonatal caregivers of the potential for kernicterus and its risk factors by reviewing their current patient care processes with regard to the identification and management of hyperbilirubinemia in newborns and by identifying risk reduction strategies that could enhance the effectiveness of these processes.2

The JCAHO alert cites the American Academy of Pediatrics (AAP) Practice Parameter for Management of Hyperbilirubinemia in the Healthy Term Newborn, which is based on available data and expert consensus, as an example of a guideline for identifying at-risk newborns and their diagnosis and treatment. The AAP guideline suggests checking for jaundice by blanching the skin with digital pressure to reveal its underlying color. The guideline states that clinical assessment must be done in a well-lighted room and suggests that as the bilirubin level rises, the extent of caudal progression may be helpful in quantifying the degree of jaundice.3

The AAP jaundice guideline suggests that the use of an icterometer (transcutaneous jaundice meter) may be helpful in the clinical assessment of jaundice.3 A variety of instruments have been tested in different patient populations.4-8 A potential role for such devices is their use by parents. The Ingram icterometer (Cascade Health Care Products, Salem, Ore.) is particularly promising because of its low cost ($17) and simplicity.5 It is a simple handheld device, made of clear plastic, on which are painted 5 transverse stripes of precise and graded hue. The stripes and spaces between them are 3/16 inch wide and are numbered from 1 (lightest in color) to 5 (darkest). When the icterometer is used, the painted side is pressed against the tip of the infant’s nose until the skin becomes blanched. The yellow color of the blanched skin can then be matched with the yellow stripes on the instrument, and a jaundice score assigned.

The purpose of my study was to determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge. Maternal assessments were compared with bilirubin levels and home health nurse assessments to determine their accuracy. Serum bilirubin levels were used as the reference standard. Maternal comfort with the examination techniques was also assessed.

 

 

Methods

This study was approved by the Ramsey (now Regions) Hospital institutional review board. Mothers who gave birth at Regions Hospital in St. Paul, Minn., participated in the study. Mothers on the postpartum ward were invited to participate, but were excluded if they were not proficient in reading English, did not have a telephone, or lived more than 10 miles from the hospital. Infants were excluded if they were in the intensive care nursery, were not discharged on the same day as the mother, or if they received phototherapy. Mothers were advised to follow their health care providers’ instructions about timing for the first follow-up visit, and any provider instructions regarding jaundice.

After obtaining consent, the author or a study assistant showed the mothers how to examine their infants for jaundice by 2 methods. Each mother was instructed to examine her baby in a well-lighted room. First, the mother was shown how to look for jaundice by digitally blanching the skin on the cheek. The mother then documented whether she saw any underlying yellow color on her baby. Next, the mother was shown how to determine the caudal progression of the jaundice and to draw a horizontal line on an illustration of a baby corresponding to where the jaundice ended. The distance from the top of the infant’s head to the line drawn by the mother was used to determine the caudal progression. The mother was then shown how to use the Ingram icterometer and obtain a reading from the baby’s nose. Each mother was given an icterometer and a study booklet to document her examination for a total of 7 days, beginning the day after discharge from the hospital. The study booklet also contained some demographic questions, and questions about the mother’s comfort level with both methods of jaundice assessment. The mother was instructed to return the booklet and icterometer by mail when completed. The mother was sent a $25 gift certificate when the study materials were returned.

Within 7 days of discharge, a home health nurse visited each mother and infant in the home. The nurses were trained in the same methods of clinically assessing jaundice, and they assessed each infant by visually determining the caudal progression and by use of the icterometer. The nurse did not share the results of her examination with the mother. The nurse obtained bilirubin levels from all infants and notified the infants’ health care providers of any bilirubin levels higher than 14 mg/dL.

Standard descriptive statistics were calculated for all variables. Categorical relationships were assessed using kappa and chi-square statistics, as appropriate. All analyses were performed using Statistical Package for Social Sciences for Windows, version 10.0.5.

Results

A total 113 of 177 mothers returned their study packets. Home health nurses visited 96 of the 113 mothers; the other 17 mothers were not visited because they declined the visits or could not be located. Although all babies were to have serum bilirubin levels determined whether or not they appeared jaundiced, only 90 of the 96 infants had the blood test. For the other 6 infants, either insufficient blood was drawn or the mother refused the test. On the day of the nurse’s visits, mothers documented in their study booklets the caudal progression of jaundice (for 56 infants) and icterometer readings (for 55 infants).

The educational levels of the mothers were as follows: 15% completed grade school or less; 40% completed high school; and 45% completed college. The mothers reported being from the following racial and ethnic groups: white, 59%; Hispanic, 16%; black, 14%; Asian, 8%; and other, 3%. A total of 53% of the women were primiparous, 84% completed examination forms for their babies for all 7 days, and 53% assessed their infants as being jaundiced during the study.

On the day of the nurse’s visit, there was moderate agreement between the nurses and the mothers about the presence of jaundice in the infants (= 0.50; P < .001). For those infants with jaundice, there was little agreement on the extent of caudal progression between the nurses and the mothers (correlation = 0.36; P > 0.1), but there was moderate agreement between their icterometer readings (correlation = 0.58; P < .05).

The total serum bilirubin results ranged from 0.8 mg/dL to 18.8 mg/dL, with a mean of 7.4 mg/dL. The mean bilirubin level of infants thought to be jaundiced by their mothers was 11.3 mg/dL, while the mean bilirubin of infants not thought to be jaundiced was 4.8 mg/dL (P < .001).

The mothers’ icterometer readings and determinations of jaundice to the nipple line or below it are compared with bilirubin levels in (Table 1). (Table 2) summarizes the diagnostic accuracy of jaundice extending to the nipple line or below it, and for icterometer readings of 2.5, in identifying bilirubin levels of 12 mg/dL and 17 mg/dL. A bilirubin level of 12 mg/dL is the level at which the AAP guideline suggests considering phototherapy for infants aged 24 to 47 hours, and 17 mg/dL is the level at which phototherapy should be considered for infants older than 72 hours.3

 

 

The mothers of the 3 infants with bilirubin levels 17 mg/dL recognized that their infants were jaundiced and determined that the jaundice extended below the nipple line. The icterometer readings obtained by the mothers were 2.5, 3, and 3.5. The corresponding icterometer readings by the nurses were 4.5, 3.5 and 3.

The study booklet contained 6 questions about the mothers’ reactions to the study. Almost all of the mothers (98%) responded that the method for checking for caudal progression of jaundice was explained clearly, and even more (99%) felt the use of the icterometer was explained clearly. A total of 69% of the mothers felt it was “very easy” or “easy” to check for caudal progression, and 80% felt it was “very easy” or “easy” to use the icterometer. Forty-six percent of the mothers reported that checking their babies for jaundice made them “very worried” or “somewhat worried” about their babies’ health. Mothers with less education were significantly more likely to report being worried than mothers with higher education levels (P < .05). However, 93% of the mothers reported that checking their babies for jaundice made them “very reassured” or “somewhat reassured” about their babies’ health.

TABLE 1
Maternal assessment of jaundice, by caudal progression and icterometer readings, compared with serum bilirubin levels

 

Maternal test resultSerum bilirubin level (mg/dL)
 ≥ 12< 12≥ 17< 17
Icterometer ≥ 2.51114322
Icterometer < 2.5426030
Caudal progression at or above nipple line119317
Caudal progression below nipple line531036

TABLE 2
Diagnostic accuracy of maternal visual assessment of jaundice and of the Ingram icterometer

 

TestCut-off (serum bilirubin level, mg/dL)SNSPPV+PV-LR+LR-
Maternal visual assessment below the nipple line≥12.0697755 (CI, 52-58)86 (CI, 84-88)3.10.4
Ingram icterometer reading ≥ 2.5≥12.0736544 (CI, 41-47)87 (CI, 85-89)2.10.4
Maternal visual assessment below the nipple line≥17.01006815 (CI, 13-17)100 (CI, 67-100)3.120
Ingram icterometer reading ≥ 2.5≥17.01005812 (CI, 10–14)100 (CI, 67-100)2.40
SN denotes sensitivity; SP = specificity; PV+ = positive predictive value; PV- = negative predictive value; LR+ = positive likelihood ratio; LR- = negative likelihood ratio; CI = 95% confidence interval.

Discussion

The ability of mothers to detect and respond to jaundice in their newborns after discharge from the hospital has not been previously studied. Opinions about the value of parental education regarding jaundice vary markedly. The AAP recommends that all mothers be able to recognize signs of jaundice before discharge.9 Others are skeptical that such education will be helpful: “Experience suggests that asking mothers to observe infants for the development of jaundice is not satisfactory. Despite such instructions, it is difficult for many parents to recognize significant jaundice.”10

Several studies have documented that jaundice is first seen in the face and progresses caudally to the trunk and extremities.11-13 These studies also found good correlation between serum bilirubin levels and the advancement of dermal icterus. In a previous study, parents were able to accurately assess the caudal progression of jaundice while their babies were in the hospital.14 However, the bilirubin levels in that study were relatively low, reflecting the brief hospital stay of most of the infants. In contrast, a recent study concluded that the clinical examination for jaundice by nurses and physicians had poor reliability and only moderate correlation with bilirubin levels.15 The authors did conclude, however, that finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 12.0 mg/dL. In this study, finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 17.0 mg/dL.

Because of the relatively small number of infants having bilirubin levels high enough to require potential intervention, the measures of diagnostic accuracy in the tables should be interpreted with caution. However, the results of my study confirm several prior reports that restricting bilirubin testing to infants with icterometer readings 2.5 would have safely eliminated many unnecessary tests.6,14,16 Although most of the infants in my study were white, the efficacy of the icterometer has also been documented in Asian and black newborns.17

Previous studies have shown that neonatal jaundice and its treatments are associated with an increased risk of maternal behaviors consistent with the vulnerable child syndrome.18,19 This syndrome was originally described in 1964 in children whose parents believed that their child had suffered a “close call,” and thereafter perceived the child as vulnerable to serious injury or accident.18 Frequent blood tests to monitor bilirubin levels, supplementation or replacement of breast milk with formula, the physical separation of the mother and infant because of phototherapy, and prolonged hospitalization may create the impression that the infant is seriously ill, despite reassurances from medical personnel. Therefore, the mothers were asked whether the study itself served as a source of anxiety. Almost half of the mothers in this study reported that checking their babies for jaundice made them very or somewhat worried about their babies’ health. Some of the women must have felt ambivalent, however, because almost all of them (93%) also reported that checking their babies for jaundice made them very or somewhat reassured about their babies’ health. Most of the 48 comments written by the mothers in the study booklets were very positive.

 

 

Conclusions

One of the strategies recommended by the JCAHO to reduce the risk of kernicterus is to provide parents with adequate educational materials about newborn infants that include information about jaundice.2 The message given to parents should be consistent, and should reassure mothers that most jaundiced infants are basically healthy. My study results suggest that it may also be useful for parents to be shown how to visually assess jaundice or to be given an Ingram icterometer to monitor their infants for jaundice after hospital discharge. Further study is needed to determine the optimal method of parental education about newborn jaundice.

Acknowledgments

This study was funded by a grant from the Ramsey Foundation. The author thanks Laura Lantz, Pamela Ristau, Kim Stone, Annette Swain, Mary Jo Feely, and the nurses at Integrated Home Care for their assistance with this project.

 

ABSTRACT

OBJECTIVE: To determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge.

STUDY DESIGN: Mothers were taught how to examine their infants for jaundice by determining the extent of caudal progression of jaundice and by using an Ingram icterometer. The mothers documented the examinations for 7 days after discharge. Home health nurses examined the babies for jaundice after discharge and obtained serum bilirubin levels.

POPULATION: Mothers of infants cared for in the normal newborn nursery of a 340-bed community hospital.

OUTCOME MEASURED: Maternal assessment of the presence of jaundice and its caudal progression.

RESULTS: Jaundice extending below the nipple line had a positive predictive value of 55% and a negative predictive value of 86% for identifying infants with bilirubin levels of 12 mg/dL. Icterometer readings of 2.5 had a positive predictive value of 44% and a negative predictive value of 87% for identifying infants with bilirubin levels of 12 mg/dL. The 3 infants with bilirubin levels 17 mg/dL were recognized by their mothers as having jaundice below the nipple line and had icterometer readings of 2.5.

CONCLUSIONS: Further study is needed to determine the optimum method of parental education about newborn jaundice. However, maternal use of the Ingram icterometer and determination of jaundice in relation to the infant’s nipple line are both potentially useful methods of assessing jaundice after hospital discharge.

 

KEY POINTS FOR CLINICIANS

 

  • Although kernicterus, or bilirubin encephalopathy, is preventable, it is still occurring.
  • Parents should be provided with educational materials about newborn infants that include information about jaundice.
  • It may be useful for parents to be instructed how to assess the level of jaundice in their infant or to be given an Ingram icterometer to monitor their infants for jaundice after discharge.

From 1% to 4% of full-term infants are readmitted to the hospital for jaundice in the first week of life, representing as many as 109,000 admissions1 Delayed diagnosis of jaundice puts babies at risk for kernicterus, which had virtually disappeared in the United States but is now on the rise. There are anecdotal reports of 22 full-term infants born in the early 1990s who developed kernicterus after discharge from the hospital within 48 hours of birth.1 The Joint Commission on Accreditation of Healthcare Organizations (JCAHO) recently issued a Sentinel Event Alert recommending that organizations take steps to raise awareness among neonatal caregivers of the potential for kernicterus and its risk factors by reviewing their current patient care processes with regard to the identification and management of hyperbilirubinemia in newborns and by identifying risk reduction strategies that could enhance the effectiveness of these processes.2

The JCAHO alert cites the American Academy of Pediatrics (AAP) Practice Parameter for Management of Hyperbilirubinemia in the Healthy Term Newborn, which is based on available data and expert consensus, as an example of a guideline for identifying at-risk newborns and their diagnosis and treatment. The AAP guideline suggests checking for jaundice by blanching the skin with digital pressure to reveal its underlying color. The guideline states that clinical assessment must be done in a well-lighted room and suggests that as the bilirubin level rises, the extent of caudal progression may be helpful in quantifying the degree of jaundice.3

The AAP jaundice guideline suggests that the use of an icterometer (transcutaneous jaundice meter) may be helpful in the clinical assessment of jaundice.3 A variety of instruments have been tested in different patient populations.4-8 A potential role for such devices is their use by parents. The Ingram icterometer (Cascade Health Care Products, Salem, Ore.) is particularly promising because of its low cost ($17) and simplicity.5 It is a simple handheld device, made of clear plastic, on which are painted 5 transverse stripes of precise and graded hue. The stripes and spaces between them are 3/16 inch wide and are numbered from 1 (lightest in color) to 5 (darkest). When the icterometer is used, the painted side is pressed against the tip of the infant’s nose until the skin becomes blanched. The yellow color of the blanched skin can then be matched with the yellow stripes on the instrument, and a jaundice score assigned.

The purpose of my study was to determine whether mothers can accurately assess the presence and severity of jaundice in their newborns, both visually and with an icterometer, after hospital discharge. Maternal assessments were compared with bilirubin levels and home health nurse assessments to determine their accuracy. Serum bilirubin levels were used as the reference standard. Maternal comfort with the examination techniques was also assessed.

 

 

Methods

This study was approved by the Ramsey (now Regions) Hospital institutional review board. Mothers who gave birth at Regions Hospital in St. Paul, Minn., participated in the study. Mothers on the postpartum ward were invited to participate, but were excluded if they were not proficient in reading English, did not have a telephone, or lived more than 10 miles from the hospital. Infants were excluded if they were in the intensive care nursery, were not discharged on the same day as the mother, or if they received phototherapy. Mothers were advised to follow their health care providers’ instructions about timing for the first follow-up visit, and any provider instructions regarding jaundice.

After obtaining consent, the author or a study assistant showed the mothers how to examine their infants for jaundice by 2 methods. Each mother was instructed to examine her baby in a well-lighted room. First, the mother was shown how to look for jaundice by digitally blanching the skin on the cheek. The mother then documented whether she saw any underlying yellow color on her baby. Next, the mother was shown how to determine the caudal progression of the jaundice and to draw a horizontal line on an illustration of a baby corresponding to where the jaundice ended. The distance from the top of the infant’s head to the line drawn by the mother was used to determine the caudal progression. The mother was then shown how to use the Ingram icterometer and obtain a reading from the baby’s nose. Each mother was given an icterometer and a study booklet to document her examination for a total of 7 days, beginning the day after discharge from the hospital. The study booklet also contained some demographic questions, and questions about the mother’s comfort level with both methods of jaundice assessment. The mother was instructed to return the booklet and icterometer by mail when completed. The mother was sent a $25 gift certificate when the study materials were returned.

Within 7 days of discharge, a home health nurse visited each mother and infant in the home. The nurses were trained in the same methods of clinically assessing jaundice, and they assessed each infant by visually determining the caudal progression and by use of the icterometer. The nurse did not share the results of her examination with the mother. The nurse obtained bilirubin levels from all infants and notified the infants’ health care providers of any bilirubin levels higher than 14 mg/dL.

Standard descriptive statistics were calculated for all variables. Categorical relationships were assessed using kappa and chi-square statistics, as appropriate. All analyses were performed using Statistical Package for Social Sciences for Windows, version 10.0.5.

Results

A total 113 of 177 mothers returned their study packets. Home health nurses visited 96 of the 113 mothers; the other 17 mothers were not visited because they declined the visits or could not be located. Although all babies were to have serum bilirubin levels determined whether or not they appeared jaundiced, only 90 of the 96 infants had the blood test. For the other 6 infants, either insufficient blood was drawn or the mother refused the test. On the day of the nurse’s visits, mothers documented in their study booklets the caudal progression of jaundice (for 56 infants) and icterometer readings (for 55 infants).

The educational levels of the mothers were as follows: 15% completed grade school or less; 40% completed high school; and 45% completed college. The mothers reported being from the following racial and ethnic groups: white, 59%; Hispanic, 16%; black, 14%; Asian, 8%; and other, 3%. A total of 53% of the women were primiparous, 84% completed examination forms for their babies for all 7 days, and 53% assessed their infants as being jaundiced during the study.

On the day of the nurse’s visit, there was moderate agreement between the nurses and the mothers about the presence of jaundice in the infants (= 0.50; P < .001). For those infants with jaundice, there was little agreement on the extent of caudal progression between the nurses and the mothers (correlation = 0.36; P > 0.1), but there was moderate agreement between their icterometer readings (correlation = 0.58; P < .05).

The total serum bilirubin results ranged from 0.8 mg/dL to 18.8 mg/dL, with a mean of 7.4 mg/dL. The mean bilirubin level of infants thought to be jaundiced by their mothers was 11.3 mg/dL, while the mean bilirubin of infants not thought to be jaundiced was 4.8 mg/dL (P < .001).

The mothers’ icterometer readings and determinations of jaundice to the nipple line or below it are compared with bilirubin levels in (Table 1). (Table 2) summarizes the diagnostic accuracy of jaundice extending to the nipple line or below it, and for icterometer readings of 2.5, in identifying bilirubin levels of 12 mg/dL and 17 mg/dL. A bilirubin level of 12 mg/dL is the level at which the AAP guideline suggests considering phototherapy for infants aged 24 to 47 hours, and 17 mg/dL is the level at which phototherapy should be considered for infants older than 72 hours.3

 

 

The mothers of the 3 infants with bilirubin levels 17 mg/dL recognized that their infants were jaundiced and determined that the jaundice extended below the nipple line. The icterometer readings obtained by the mothers were 2.5, 3, and 3.5. The corresponding icterometer readings by the nurses were 4.5, 3.5 and 3.

The study booklet contained 6 questions about the mothers’ reactions to the study. Almost all of the mothers (98%) responded that the method for checking for caudal progression of jaundice was explained clearly, and even more (99%) felt the use of the icterometer was explained clearly. A total of 69% of the mothers felt it was “very easy” or “easy” to check for caudal progression, and 80% felt it was “very easy” or “easy” to use the icterometer. Forty-six percent of the mothers reported that checking their babies for jaundice made them “very worried” or “somewhat worried” about their babies’ health. Mothers with less education were significantly more likely to report being worried than mothers with higher education levels (P < .05). However, 93% of the mothers reported that checking their babies for jaundice made them “very reassured” or “somewhat reassured” about their babies’ health.

TABLE 1
Maternal assessment of jaundice, by caudal progression and icterometer readings, compared with serum bilirubin levels

 

Maternal test resultSerum bilirubin level (mg/dL)
 ≥ 12< 12≥ 17< 17
Icterometer ≥ 2.51114322
Icterometer < 2.5426030
Caudal progression at or above nipple line119317
Caudal progression below nipple line531036

TABLE 2
Diagnostic accuracy of maternal visual assessment of jaundice and of the Ingram icterometer

 

TestCut-off (serum bilirubin level, mg/dL)SNSPPV+PV-LR+LR-
Maternal visual assessment below the nipple line≥12.0697755 (CI, 52-58)86 (CI, 84-88)3.10.4
Ingram icterometer reading ≥ 2.5≥12.0736544 (CI, 41-47)87 (CI, 85-89)2.10.4
Maternal visual assessment below the nipple line≥17.01006815 (CI, 13-17)100 (CI, 67-100)3.120
Ingram icterometer reading ≥ 2.5≥17.01005812 (CI, 10–14)100 (CI, 67-100)2.40
SN denotes sensitivity; SP = specificity; PV+ = positive predictive value; PV- = negative predictive value; LR+ = positive likelihood ratio; LR- = negative likelihood ratio; CI = 95% confidence interval.

Discussion

The ability of mothers to detect and respond to jaundice in their newborns after discharge from the hospital has not been previously studied. Opinions about the value of parental education regarding jaundice vary markedly. The AAP recommends that all mothers be able to recognize signs of jaundice before discharge.9 Others are skeptical that such education will be helpful: “Experience suggests that asking mothers to observe infants for the development of jaundice is not satisfactory. Despite such instructions, it is difficult for many parents to recognize significant jaundice.”10

Several studies have documented that jaundice is first seen in the face and progresses caudally to the trunk and extremities.11-13 These studies also found good correlation between serum bilirubin levels and the advancement of dermal icterus. In a previous study, parents were able to accurately assess the caudal progression of jaundice while their babies were in the hospital.14 However, the bilirubin levels in that study were relatively low, reflecting the brief hospital stay of most of the infants. In contrast, a recent study concluded that the clinical examination for jaundice by nurses and physicians had poor reliability and only moderate correlation with bilirubin levels.15 The authors did conclude, however, that finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 12.0 mg/dL. In this study, finding no jaundice below the nipple line reliably predicted that an infant would have a bilirubin concentration of less than 17.0 mg/dL.

Because of the relatively small number of infants having bilirubin levels high enough to require potential intervention, the measures of diagnostic accuracy in the tables should be interpreted with caution. However, the results of my study confirm several prior reports that restricting bilirubin testing to infants with icterometer readings 2.5 would have safely eliminated many unnecessary tests.6,14,16 Although most of the infants in my study were white, the efficacy of the icterometer has also been documented in Asian and black newborns.17

Previous studies have shown that neonatal jaundice and its treatments are associated with an increased risk of maternal behaviors consistent with the vulnerable child syndrome.18,19 This syndrome was originally described in 1964 in children whose parents believed that their child had suffered a “close call,” and thereafter perceived the child as vulnerable to serious injury or accident.18 Frequent blood tests to monitor bilirubin levels, supplementation or replacement of breast milk with formula, the physical separation of the mother and infant because of phototherapy, and prolonged hospitalization may create the impression that the infant is seriously ill, despite reassurances from medical personnel. Therefore, the mothers were asked whether the study itself served as a source of anxiety. Almost half of the mothers in this study reported that checking their babies for jaundice made them very or somewhat worried about their babies’ health. Some of the women must have felt ambivalent, however, because almost all of them (93%) also reported that checking their babies for jaundice made them very or somewhat reassured about their babies’ health. Most of the 48 comments written by the mothers in the study booklets were very positive.

 

 

Conclusions

One of the strategies recommended by the JCAHO to reduce the risk of kernicterus is to provide parents with adequate educational materials about newborn infants that include information about jaundice.2 The message given to parents should be consistent, and should reassure mothers that most jaundiced infants are basically healthy. My study results suggest that it may also be useful for parents to be shown how to visually assess jaundice or to be given an Ingram icterometer to monitor their infants for jaundice after hospital discharge. Further study is needed to determine the optimal method of parental education about newborn jaundice.

Acknowledgments

This study was funded by a grant from the Ramsey Foundation. The author thanks Laura Lantz, Pamela Ristau, Kim Stone, Annette Swain, Mary Jo Feely, and the nurses at Integrated Home Care for their assistance with this project.

References

 

1. Catz C, Hanson J, Simpson L, Yaffe S. Summary of workshop: early discharge and neonatal hyperbilirubinemia. Pediatrics 1995;96:743-5.

2. Joint Commission on Accreditation of Healthcare Organizations. Sentinel event alert issue 18: kernicterus threatens healthy new-borns; April 2001.

3. Provisional Committee for Quality Improvement and Subcommittee on Hyperbilirubinemia. Practice parameter: management of hyperbilirubinemia in the healthy term newborn. Pediatrics 1994;94:558-65.

4. Smith D, Martin D, Inguillo D, Vreman H, Cohen R, Stevenson D. Use of noninvasive tests to predict significant jaundice in full-term infants: preliminary studies. Pediatrics 1985;75:278-80.

5. Schumacher R. Noninvasive measurements of bilirubin in the newborn. Clin Perinatol 1990;17:417-35.

6. Narayanan I, Banwalikar J, Mehta R, et al. A simple method of evaluation of jaundice in the newborn. Ann Trop Paediatr 1990;10:31-4.

7. Yamanouchi I, Yamauchi Y, Igarashi I. Transcutaneous bilirubinometry: preliminary studies of noninvasive transcutaneous bilirubin meter in the Okayama National Hospital. Pediatrics 1980;65:195-202.

8. Knudsen A. Measurement of the yellow colour of the skin as a test of hyperbilirubinemia in mature newborns. Acta Paediatr Scand 1990;79:1175-81.

9. Committee on Fetus and Newborn. Hospital stay for healthy term newborns. Pediatrics 1995;96:788-90.

10. Maisels M, Newman T. Kernicterus in otherwise healthy, breast-fed term newborns. Pediatrics 1995;96:730-3.

11. Ebbesen F. The relationship between the cephalo-pedal progress of clinical icterus and the serum bilirubin concentration in newborn infants without blood type sensitization. Acta Obstet Gynecol Scand 1975;54:329-32.

12. Kramer LI. Advancement of dermal icterus in the jaundiced newborn. Am J Dis Child 1969;118:454-8.

13. Thong YH, Rahman AA, Choo M, Tor ST, Robinson MJ. Dermal icteric zones and serum bilirubin levels in neonatal jaundice. Singapore Med J 1976;17:184-5.

14. Madlon-Kay D. Recognition of the presence and severity of newborn jaundice by parents, nurses, physicians, and icterometer. Pediatrics 1997;100-e3.

15. Moyer V, Ahn C, Sneed S. Accuracy of clinical judgment in neonatal jaundice. Arch Pediatr Adolesc Med 2000;154:391-4.

16. Gosset I. A perspex icterometer for neonates. Lancet 1960;1:87-90.

17. Schumacher R, Thornbery J, Gutcher G. Transcutaneous bilirubinometry: a comparison of old and new methods. Pediatrics 1985;76:10-4.

18. Kemper K, Forsyth B, McCarthy P. Jaundice, terminating breast-feeding, and the vulnerable child. Pediatrics 1989;84:773-8.

19. Kemper K, Forsyth B, McCarthy P. Persistent perceptions of vulnerability following neonatal jaundice. Am J Dis Child 1990;144:238-41.

References

 

1. Catz C, Hanson J, Simpson L, Yaffe S. Summary of workshop: early discharge and neonatal hyperbilirubinemia. Pediatrics 1995;96:743-5.

2. Joint Commission on Accreditation of Healthcare Organizations. Sentinel event alert issue 18: kernicterus threatens healthy new-borns; April 2001.

3. Provisional Committee for Quality Improvement and Subcommittee on Hyperbilirubinemia. Practice parameter: management of hyperbilirubinemia in the healthy term newborn. Pediatrics 1994;94:558-65.

4. Smith D, Martin D, Inguillo D, Vreman H, Cohen R, Stevenson D. Use of noninvasive tests to predict significant jaundice in full-term infants: preliminary studies. Pediatrics 1985;75:278-80.

5. Schumacher R. Noninvasive measurements of bilirubin in the newborn. Clin Perinatol 1990;17:417-35.

6. Narayanan I, Banwalikar J, Mehta R, et al. A simple method of evaluation of jaundice in the newborn. Ann Trop Paediatr 1990;10:31-4.

7. Yamanouchi I, Yamauchi Y, Igarashi I. Transcutaneous bilirubinometry: preliminary studies of noninvasive transcutaneous bilirubin meter in the Okayama National Hospital. Pediatrics 1980;65:195-202.

8. Knudsen A. Measurement of the yellow colour of the skin as a test of hyperbilirubinemia in mature newborns. Acta Paediatr Scand 1990;79:1175-81.

9. Committee on Fetus and Newborn. Hospital stay for healthy term newborns. Pediatrics 1995;96:788-90.

10. Maisels M, Newman T. Kernicterus in otherwise healthy, breast-fed term newborns. Pediatrics 1995;96:730-3.

11. Ebbesen F. The relationship between the cephalo-pedal progress of clinical icterus and the serum bilirubin concentration in newborn infants without blood type sensitization. Acta Obstet Gynecol Scand 1975;54:329-32.

12. Kramer LI. Advancement of dermal icterus in the jaundiced newborn. Am J Dis Child 1969;118:454-8.

13. Thong YH, Rahman AA, Choo M, Tor ST, Robinson MJ. Dermal icteric zones and serum bilirubin levels in neonatal jaundice. Singapore Med J 1976;17:184-5.

14. Madlon-Kay D. Recognition of the presence and severity of newborn jaundice by parents, nurses, physicians, and icterometer. Pediatrics 1997;100-e3.

15. Moyer V, Ahn C, Sneed S. Accuracy of clinical judgment in neonatal jaundice. Arch Pediatr Adolesc Med 2000;154:391-4.

16. Gosset I. A perspex icterometer for neonates. Lancet 1960;1:87-90.

17. Schumacher R, Thornbery J, Gutcher G. Transcutaneous bilirubinometry: a comparison of old and new methods. Pediatrics 1985;76:10-4.

18. Kemper K, Forsyth B, McCarthy P. Jaundice, terminating breast-feeding, and the vulnerable child. Pediatrics 1989;84:773-8.

19. Kemper K, Forsyth B, McCarthy P. Persistent perceptions of vulnerability following neonatal jaundice. Am J Dis Child 1990;144:238-41.

Issue
The Journal of Family Practice - 51(05)
Issue
The Journal of Family Practice - 51(05)
Page Number
445-448
Page Number
445-448
Publications
Publications
Topics
Article Type
Display Headline
Maternal assessment of neonatal jaundice after hospital discharge
Display Headline
Maternal assessment of neonatal jaundice after hospital discharge
Legacy Keywords
,Jaundice, neonatalhyperbilirubinemiaperinatal. (J Fam Pract 2002; 51:445–448)
Legacy Keywords
,Jaundice, neonatalhyperbilirubinemiaperinatal. (J Fam Pract 2002; 51:445–448)
Sections
Disallow All Ads
Alternative CME
Article PDF Media

Factors associated with weaning in the first 3 months postpartum

Article Type
Changed
Display Headline
Factors associated with weaning in the first 3 months postpartum

 

ABSTRACT

OBJECTIVE: To determine the demographic, behavioral, and clinical factors associated with breastfeeding termination in the first 12 weeks postpartum.

STUDY DESIGN: This was a prospective cohort study.

POPULATION: Breastfeeding women in Michigan and Nebraska were interviewed by telephone at 3, 6, 9, and 12 weeks postpartum or until breastfeeding termination.

OUTCOMES MEASURED: We measured associations of demographic, clinical, and breastfeeding variables with weaning during the first 12 weeks postpartum.

RESULTS: A total of 946 women participated; 75% breastfed until 12 weeks. Women older than 30 years and women with at least a bachelor’s degree were more likely to continue breastfeeding in any given week. Mastitis, breast or nipple pain, bottle use, and milk expression in the first 3 weeks were all associated with termination. Beyond 3 weeks, women who expressed breast milk were 75% less likely to discontinue breastfeeding than women who did not. Women who used a bottle for some feedings during weeks 4 to 12 were 98% less likely to discontinue breastfeeding than women who did not use a bottle. "Not enough milk" was the most common reason given for termination in weeks 1 through 3 (37%) and weeks 4 through 6 (35%); “return to work” was the most common reason given in weeks 7 through 9 (53%) and weeks 10 through 12 (58%).

CONCLUSIONS: Younger women and less educated women need additional support in their breastfeeding efforts. Counseling and assistance should be provided to women with pain and mastitis. Exclusive breastfeeding for the first 3 weeks should be recommended. After the first 3 weeks, bottles and manual expression are not associated with weaning and may improve the likelihood of continuing breastfeeding, at least until 12 weeks.

 

KEY POINTS FOR CLINICIANS

 

  • Younger and less educated women may need extra support for long-term breastfeeding success.
  • Exclusive breastfeeding for the first 3 weeks decreases the risk of early weaning. At least 7 daily feedings of 10 or more minutes per feeding are recommended.
  • The use of bottles and manual expression of milk after 3 weeks does not increase the risk of early weaning.

Family physicians are strongly encouraged to support and promote breastfeeding, the optimal form of infant nutrition.1 Despite its known benefits (fewer infant infections2-6 and decreased maternal risks of premenopausal breast cancer7 and post-menopausal hip fractures8), only 64% of mothers initiated breastfeeding in 19989 and only 29% of mothers fed their 6-month-old infant by breast, well below the Healthy People 2010 goal of 50% breastfeeding at 6 months.10 Clearly, determining the factors that influence breastfeeding beyond the early postpartum period would be beneficial.

Returning to work is a consistent risk factor for weaning.11-14 The impact of early bottle-feeding on the duration of breastfeeding has been studied with less consistent results.15,20 Insufficient milk supply is a common subjective reason given for termination.15,19,21,22 Older women and those with a higher level of education are at less risk of early breastfeeding termination.9,11,15,16,21,23,24

Few investigators have described how breastfeeding patterns may affect breastfeeding duration. Little is known about the effects of timing, frequency, and duration of individual breastfeedings, or the roles of breast pain and infection, sleep, and manual expression on early weaning. We studied women who indicated their intent to breastfeed prenatally to identify demographic factors and breastfeeding patterns associated with weaning in the first 12 weeks postpartum.

Methods

Population

We interviewed breastfeeding women by telephone at 3, 6, 9, and 12 weeks postpartum to investigate lactation mastitis risk factors and predictors of weaning. Pregnant women intending to breastfeed were recruited from 2 geographic sites between June 1994 and January 1998. In suburban Detroit, Michigan, women attending orientation at a freestanding birthing center were asked to participate. In Omaha, Nebraska, women at a single large company were recruited when applying for maternity leave.

Data collection

During the computer-assisted interview, subjects were asked to recall each of the previous 3 weeks. The initial interview, which collected demographic information, typically lasted 15 to 20 minutes; subsequent interviews were shorter. The survey addressed breastfeeding practices and recent health events. Exclusive breastfeeders were women who fed their infants only by breast. We did not collect information on pacifiers; therefore, exclusively breastfed infants may have also received pacifiers. Women who manually expressed or used a device to assist in expression were classified as “pumping” their breasts. Respondents were asked if they had bottle-fed the infant; they were not asked about bottle contents or volume.

Subjects were queried on potential difficulties including breast or nipple pain while nursing, nipple cracks, and mastitis (diagnosed by a health care provider), as well as other health problems and behaviors. Subjects who had stopped breastfeeding in the previous 3 weeks were asked when and why, given a list of possible explanations and an open-ended opportunity. Respondents could provide multiple reasons for termination.

 

 

Data analysis

Kaplan-Meier estimates describe the distribution of weaning times for the 2 sites. A log-rank test was used to assess group differences. Relationships between demographic factors and time of weaning were assessed by Cox regression analysis. Discrete survival analysis was used to determine whether variables measured on a weekly basis were related to breastfeeding cessation. Hazard ratios describe the association of the exposures between women who stopped breastfeeding at a given time and those who continued. Because breastfeeding cessation was a rare event in later weeks of the study, as were certain clinical or behavioral breastfeeding factors, weeks 4-12 were collapsed into a single interval. Two variables, number of daily feedings and duration of each feeding, were examined only in the first 3 weeks because the information was often missing beyond 3 weeks. All analyses were performed using the Statistical Package for the Social Sciences.25

Results

Description of subjects

A total of 1057 women agreed to be contacted. Of those, 946 (89.5%) participated in at least 1 interview. Of the 111 women who did not participate, 11 refused and 100 could not be located. Six hundred fifty-eight (69.6%) women completed all 4 interviews. The 56 women who entered the study at week 6 because they could not be reached for the first interview were similar in all factors to women who entered earlier. Of the 946, 711 (75.2%) were from Michigan and 235 (24.8%) were from Nebraska.

Subjects from Michigan were significantly more likely than those from Nebraska to be older than 30 years (52.0% vs 38.3%), have at least a bachelor’s degree (62.9% vs 48.5%), have 3 or more children (38.5% vs 19.6%), and have had a vaginal delivery (99.6% vs 77.0%) (Table W1).* The groups were similar in race, household income, and marital status.

Demographic factors

A total of 673 women (71.1%) continued breastfeeding until 12 weeks; 28% were exclusive breastfeeders. Michigan women were more likely to breastfeed at weeks 2 through 12 than their Nebraskan counterparts (P < .0001, Figure). A college degree was associated with 40% less weaning (Table 1). Age and annual household income were directly related to continued breastfeeding at both sites. Number of children in the household was not associated with termination. Previous breastfeeding experience showed a nonsignificant but consistent trend toward lower weaning risk.

TABLE 1
Relationships of demographics and other characteristics with time to weaning, by site

 

CharacteristicMichigan women HR* (95% CI)Nebraska women HR* (95% CI)
Older than 30 years0.5 (0.3,0.8)0.7 (0.5, 1.1)
BA/BS or higher0.6 (0.4, 0.9)0.6 (0.4, 0.8)
Number of children in household
  11.01.0
  21.0 (0.6, 1.6)0.7 (0.5, 1.2)
  3 or more0.6 (0.4, 1.0)0.9 (0.6, 1.5)
Household income ≥ $50,0000.8 (0.5, 1.3)0.7 (0.5, 1.0)
Breastfed previously0.7 (0.5, 1.1)0.7 (0.5, 1.1)
Nonvaginal birth0.9 (0.6, 1.4)
NOTE: Bold numbers are significant at P < .05.
HR denotes hazard ratio; CI, confidence interval; BA, bachelor of arts degree; BS, bachelor of science degree.
*A hazard ratio of <1 indicates that subjects with this characteristic were less likely to wean during the 12 weeks. Unless otherwise noted, the referent group is the converse (eg, age < 30 years is the referent group for those older than 30 years).
†Too few observations to provide meaningful results.

 

FIGURE
Probability of breastfeeding, by site, by postpartum week

Clinical and behavioral factors

Because time to weaning differed significantly by site, the survival analyses of clinical and behavioral factors were performed separately for Michigan and Nebraska and controlled for education, age, and previous breastfeeding experience.

During the first 3 weeks, Michigan women with mastitis were nearly 6 times more likely than Michigan women without mastitis to stop breast-feeding in the week of diagnosis (Table 2). Women from Nebraska showed nonsignificant results in the same direction in weeks 4 to 12. (No women from Nebraska with mastitis terminated during weeks 1 through 3.) Although nipple sores and cracks were not associated with weaning, breast pain was associated with weaning. For each day of pain in the first 3 weeks, there was a 10% increase in risk of cessation among Michigan women and a 26% increase among Nebraska women. The association between pain and weaning in weeks 4 through 12 is less clear. In these later weeks, women who reported pain were unexpectedly 75% to 80% more likely to continue breastfeeding than women who did not report pain, yet for Nebraska women the number of days with pain remained significantly associated with breastfeeding cessation.

Subjective depression and breastfeeding cessation were not related. The association between daily sleep and weaning varied by site. During weeks 4 through 12, Michigan women with more daily sleep were less likely to terminate. An opposite, but marginally significant trend, was observed for Nebraska women. Weaning was not associated with outside household help. Nonvaginal birth was not associated with weaning for Nebraska women. (There were only 2 cesarean sections in the Michigan group.)

 

 

Michigan women who expressed breast milk during the first 3 weeks were twice as likely to stop breastfeeding as those who did not pump. During the same period, Michigan women who used a bottle for some feedings were 9 times more likely to wean than nonbottle users. Respondents in Nebraska showed similar nonsignificant trends in the first 3 weeks. By contrast, during weeks 4 through 12, both Nebraska and Michigan women who pumped were about 75% less likely to wean, while women who used a bottle for some feedings were 98% less likely to stop breastfeeding.

Breast milk expression increased gradually over time, from 30% of women pumping an average of 3 times per day in the first 3 weeks to 45% of women pumping 5 times per day in the last 3 weeks. To determine if pumping and bottle-feeding had an effect independent of pain or mastitis on weaning in the first 3 weeks, we performed additional analyses controlling for pain, cracks and sores, and mastitis in the same week. The results were similar to those presented in Table 2. Michigan women who pumped were 3 times more likely to wean than those who did not pump (hazard ratio [HR] = 3.0, 95% confidence interval [CI], 1.3 - 6.7), while for Nebraska women there was no association between pumping and weaning (HR = 0.6, 95% CI, 0.3 - 1.5). Bottle-feeding was again significantly associated with weaning in weeks 1 through 3 for Michigan women (HR = 10.9, 95% CI, 4.5 - 26.7) and not associated in Nebraskans (HR = 0.8, 95% CI, 0.4 - 2.0).

Duration and frequency of feedings were investigated as weaning risk factors. There appeared to be a threshold for both variables during the first 3 weeks in Michigan women. Michigan women who breastfed less than 10 minutes per feeding were nearly 5 times more likely to stop breastfeeding than women who breastfed longer. Michigan women who breastfed 6 or fewer times per day were 8 times more likely to stop than those who breastfed more often. Results for Nebraska women fell in the same direction but were not statistically significant.

TABLE 2
Relationships of clinical and behavioral factors to breastfeeding cessation in the same week, adjusted for mother’s age, education, and previous breastfeeding experience

 

VariableWeekMichigan women HR (95% CI)Nebraska women HR (95% CI)
Mastitis1 - 35.7 (1.3 - 25.9)
4 - 122.1 (0.3 - 17.4)
Engorgement1 - 30.6 (0.2 - 1.5)0.8 (0.3 - 2.1)
4 - 123.2 (0.6 - 15.8)
Nipple sores/cracks1 - 31.1 (0.4 - 2.6)0.9 (0.4 - 2.3)
4 - 122.6 (0.8 - 8.5)2.9 (0.8 - 10.7)
Any pain †1 - 314.7 (6.8 - 32.0)§9.1 (3.9 - 21.2)
4 - 120.3 (0.1 - 0.7)0.2 (0.1 - 0.5)
Days with pain*1 - 31.1 (1.0 - 1.2)1.3 (1.0 - 1.5)
4 - 121.1 (1.0 - 1.2)1.1 (1.0 - 1.2)
Returned to work1 - 30.4 (0.1 - 3.0)
4 - 122.1 (1.1 - 4.0)0.8 (0.4 - 1.7)
Depressed1 - 30.9 (0.3 - 3.0)1.0 (0.4 - 2.6)
4 - 120.9 (0.4 - 2.2)1.3 (0.6 - 2.7)
Daily sleep hours1 - 30.9 (0.7 - 1.1)0.9 (0.8 - 1.2)
4 - 120.7 (0.5 - 0.9)1.2 (1.0 - 1.5)
Outside household help1 - 32.0 (0.8 - 4.8)0.9 (0.4 - 2.1)
4 - 120.7 (0.3 - 2.6)0.7 (0.2 - 2.1)
Pumping1 - 32.2 (1.1 - 4.6)1.3 (0.6 - 2.5)
4 - 120.2 (0.1 - 0.5)§0.3 (0.1 - 0.5) §
Bottle feeding1 - 39.5 (4.3 - 21.0) §1.8 (0.9 - 3.5)
4 - 120.03 (0.003 - 0.2) §0.02 (0.004 - 0.1) §
Minutes per feeding1 - 31.0 (0.9, 1.0)1.1 (1.0, 1.1)
Less than 10 minutes per feeding1 - 34.8 (1.7, 13.4)2.2 (0.6, 8.1)
Feedings per day1 - 30.7 (0.6, 0.8) §0.9 (0.8, 1.1)
Less than 7 feedings/day1 - 38.1 (3.4, 19.2) §1.8 (0.7, 4.6)
NOTE: Bold numbers significant at P = .05 or less; those marked with § are significant at P = .001 or less.
HR denotes hazard ratio; CI, confidence interval.
*Subjects answered affirmatively to any of the following types of pain: pain when latching on, pain while nursing, pain when not nursing.
† Measured in 3-week periods.
‡ Indicates there were too few observations to provide meaningful results; for example, there were no Nebraska women who had mastitis and stopped breastfeeding in the same week during weeks 1-3.

Subjective factors

At each interview, women who had stopped breastfeeding in the previous 3 weeks were asked why they had made that decision. Most women (75%) provided only one reason. At the first interview, insufficient milk supply (37.3%) and breast pain or mastitis (32.9%) were the most common reasons for termination (Table 3). Insufficient milk supply was the reason most often given (35.0%) during weeks 4 through 6. At both weeks 9 and 12, return to work was the reason given most often (53.1% and 58.3%, respectively).

 

 

TABLE 3
Percentage of women citing given reason for termination of breastfeeding

 

 Week 3Week 6Week 9Week 12
Reason(n = 67)(n = 60)(n = 32)(n = 36)
Insufficient milk supply37.335.025.013.9
Inconvenient17.925.021.933.3
Returned to work4.531.753.158.3
Breast pain or infection32.923.305.6
Baby stopped nursing7.55.03.111.1
Other22.418.33.15.6
NOTE: Percentages total more than 100% because respondents could cite multiple reasons.

Discussion

Mastitis, pain, and days with pain in the first 3 weeks were important clinical factors associated with breastfeeding cessation in this cohort of women who prenatally self-identified as intending to breastfeed. Women who intend to breastfeed should be counseled regarding these possible complications, their temporary nature, prevention, and treatment. Mastitis is not an indication for breastfeeding termination; in fact, increased feedings and milk expression are considered treatment.26,27 Women who reported pain the first 3 weeks were more likely to stop breastfeeding than women who reported pain after the first 3 weeks. It is difficult to explain this finding; perhaps there are women who have pain during their entire breastfeeding career and yet continue to breastfeed because they are more pain-tolerant, have less severe or frequent pain than those who wean, or are more committed to breastfeeding.

Other clinical factors investigated were depression and daily sleep hours. Weaning was not associated with subjective depression. However, subjects did not undergo formal psychological testing as in the study that reported an association.24 The relationship between daily sleep hours and termination was not consistent, and likely not clinically significant.

The demographic risk factors related to breast-feeding termination in our study are similar to those previously reported,14,15,20,21,23,24 namely, younger maternal age and lower educational level. Investigations of parity have been inconsistent.16,28 We found no association of weaning with parity. Prior breastfeeding experience has been reported as improving breastfeeding rates15,28; our results are consistent with those findings, but not significantly so. All subjects had access to prenatal breastfeeding education and postnatal breastfeeding support, which may have diminished the differences between women with breastfeeding experience and those without experience.20

Michigan and Nebraska women who pumped or bottle-fed during weeks 4 through 12 were significantly less likely to terminate breastfeeding. In contrast, Michigan women who pumped or bottle-fed during the first 3 weeks postpartum were more likely to terminate even after controlling for pain and mastitis. A commitment to exclusive breastfeeding may be necessary in the early postpartum period for long-term success.15,19 To our knowledge, the seemingly protective effect associated with pumping and bottle-feeding after the first 3 weeks has not been previously reported.

Breastfeeding 6 or fewer times per day and feedings of 10 minutes or less were associated with termination during the first 3 weeks. Other studies also indicate that the ratio of breast to bottle feedings is important for long-term success. Feinstein and colleagues15 found that more than one daily bottle of formula supplementation was associated with shorter breastfeeding duration, which was minimized if there were 7 or more breastfeedings per day. Another study found no weaning difference between women who offered their infant only one bottle daily during weeks 2 through 6 and a bottle-avoiding group.17

The most frequent reasons given for termination were similar to those reported by others, namely, insufficient milk supply and return to work.11-15,21,22 Insufficient milk supply was a more common reason in the first few weeks after birth; return to work became an increasingly common reason after week 6.

We were unable to examine the role of pacifiers or smoking in breastfeedng termination because pacifier information was not collected and there were too few smokers for meaningful analysis. Smoking has been consistently reported as associated with early cessation.15,20,29,30 Although pacifier use does not appear to be directly related,31,32 it has been proposed as a marker for breastfeeding problems. The homogeneity of the sample limits our ability to make generalizations regarding other populations, such as women of color. However, the large sample size and the similarity of termination risk factors between 2 different populations of women lend confidence to our conclusions. As we did not assess mothers’ intentions, some of the variables found associated with termination might be intentional activities of weaning rather than risk factors for termination. The significant difference in termination risk between the sites also may be related to mothers’ intentions or level of commitment. The Michigan women may have intended to breastfeed longer from the outset. The Michigan recruitment site was an alternative birthing center. Women being delivered there may be more persistent in their breast-feeding efforts. Both sites provided access to breast-feeding support personnel, but the Michigan women, as a group, may have been more motivated to continue.

Our results provide clinically useful information. Additional support may be needed for younger and less educated women. Special efforts should be made for early diagnosis and treatment of mastitis and breast pain, particularly during the first 3 weeks. Exclusive breastfeeding without bottle supplementation should be recommended for the first 3 weeks, with at least 7 feedings per day. Each feeding should preferably last more than 10 minutes.

 

 

These results should also reassure breastfeeding women and their providers regarding the use of bottles. Bottle-feeding after 3 weeks does not appear to jeopardize breastfeeding success up to 12 weeks and may even improve it.

* Table W1 appears on the JFP Web site at www.jfponline.com.

Acknowledgments

This study was supported by National Institutes of Health grant #30866.

References

 

1. American Academy of Family Physicians. Policies on Health Issues: Infant Health. URL: http://aafp.org/policy/issues/i3.html

2. Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. J Pediatr 1995;126:696-702.

3. Dewey K, Heinig M, Nommsen-Rivers LA. Differences in morbidity between breast-fed and formula-fed infants. J Pediatr 1995;126:191-7.

4. Duncan B, Ey J, Holberg CJ, Wright AL, Martinez FD, Taussig LM. Exclusive breast-feeding for at least 4 months protects against otitis media. Pediatrics 1993;91:867-72.

5. Raisler J, Alexander C, O’Campo P. Breast-feeding and infant illness: a dose-reponse relationship? Am J Public Health 2000;90:1478-9.

6. Hanson LA. Breastfeeding provides passive and likely long-lasting active immunity. Ann Allergy Asthma Immunol 1998;81:523-33.

7. Newcomb P, Storer B, Longnecker M, et al. Lactation and a reduced risk of premenopausal breast cancer. N Engl J Med 1994;330:81-7.

8. Cumming RG, Klinieberg RJ. Breastfeeding and other reproductive factors and the risk of hip fractures in elderly women. Int J Epidemiol 1993;22:884-91.

9. Mother’s Survey, Ross Products Division, Abbot Laboratories, Inc. Columbus OH, 1998.

10. U.S. Department of Health and Human Services. Healthy People 2010. (Conference edition in 2 volumes.) Washington, DC: January 2000.

11. Gielen AC, Faden RR, O’Campo P, Brown CH, Paige DM. Maternal employment during the early postpartum period: effects on initiation and continuation of breastfeeding. Pediatrics 1991;87:298-305.

12. Fein SB, Roe B. The effect of work status on initiation and duration of breast-feeding. Am J Public Health 1998;88:1042-6.

13. Kurinij N, Shiono PH, Ezrine SF, Rhoads GG. Does maternal employment affect breast-feeding? Am J Public Health 1989;79:1247-50.

14. Kearney MH, Cronenwett L. Breastfeeding and employment. J Obstet Gynecol Neonatal Nurs 1991;20:471-80.

15. Feinstein JM, Berkelhamer JE, Gruszka ME, Wong CA, Carey AE. Factors related to early termination of breast-feeding in an urban population. Pediatrics 1986;78:210-5.

16. Ryan AS, Wysong JL, Martinez GA, Simon SD. Duration of breast-feeding patterns established in the hospital. Clin Pediatr 1990;29:99-107.

17. Cronenwett L, Strukel T, Kearney M, et al. Single daily bottle use in the early weeks postpartum and breast-feeding outcomes. Pediatrics 1992;90:760-6.

18. Gray-Donald K, Kramer MS, Munday S, Leduc DG. Effect of formula supplementation in the hospital on the duration of breast-feeding; a controlled clinical trial. Pediatrics 1985;75:514-8.

19. Hill PD, Humenick SS, Brennan ML, Woolley D. Does early supplementation affect long-term breastfeeding? Clin Pediatr 1997;June:345-350.

20. Wright HJ, Walker PC. Prediction of duration of breast feeding in primiparas. J Epidemiol Comm Health 1983;37:89-94.

21. Hawkins LM, Nichols FH, Tanner JL. Predictors of the duration of breastfeeding in low-income women. Birth 1987;14:204-9.

22. Hill PD, Aldag JC. Insufficient milk supply among black and white breast-feeding mothers. Res Nurs Health 1993;16:203-11.

23. Kurinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

24. Cooper PJ, Murray L, Stein A. Psychosocial factors associated with the early termination of breast-feeding. J Psychosom Res 1993;37:171-6.

25. Statistical Package for the Social Sciences. Chicago, IL: SPSS Inc; 1998.

26. Marshall B, Hepper J. Zirbel. Sporadic mastitis: an infection that need not interrupt lactation. JAMA 1975;233:1377-9.

27. Lawrence R. Mastitis. In: Breastfeeding: a guide for the medical profession. 4th ed. St. Louis: Mosby; 1994.

28. Hill PD, Humenick SS, Argubright T, Aldag JC. Effects of parity and weaning practices on breastfeeding duration. Public Health Nurs 1997;14:227-34.

29. Hill PD, Aldag JC. Smoking and breastfeeding status. Res Nurs Health 1996;19:125-32.

30. Woodward A, Hand K. Smoking and reduced duration of breast-feeding. Med J Australia 1988;148:477-8.

31. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence. Pediatrics 1997;99:445-3.

32. Howard CR, Howard FM, Lanphear B, deBlieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration? Pediatrics 1999;103:E33.-

Article PDF
Author and Disclosure Information

 

KENDRA SCHWARTZ, MD, MSPH
HANNAH J. S. D’ARCY, MS
BRENDA GILLESPIE, PHD
JANET BOBO, PHD
MARYLOU LONGEWAY, MSN
BETSY FOXMAN, PHD
Detroit, Ann Arbor, and Southfield, Michigan; and Omaha, Nebraska
From the Department of Family Medicine, Wayne State University, Detroit (K.S.); the Center for Statistical Consultation and Research, University of Michigan, Ann Arbor (H.J.S.D., B.G.); the Department of Preventive and Societal Medicine, University of Nebraska, Omaha (J.B.); the Family Birthing Center, Providence Hospital, Southfield, Michigan (M.L.); and the School of Public Health, University of Michigan, Ann Arbor (B.F.). The authors report no competing interests. All requests for reprints should be addressed to Kendra Schwartz, MD, MSPH, Department of Family Medicine, Wayne State University, 101 E. Alexandrine, Detroit, MI 48201. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(05)
Publications
Topics
Page Number
439-444
Legacy Keywords
,Breastfeedingterminationweaningcohort study. (J Fam Pract 2002; 51:439–444)
Sections
Author and Disclosure Information

 

KENDRA SCHWARTZ, MD, MSPH
HANNAH J. S. D’ARCY, MS
BRENDA GILLESPIE, PHD
JANET BOBO, PHD
MARYLOU LONGEWAY, MSN
BETSY FOXMAN, PHD
Detroit, Ann Arbor, and Southfield, Michigan; and Omaha, Nebraska
From the Department of Family Medicine, Wayne State University, Detroit (K.S.); the Center for Statistical Consultation and Research, University of Michigan, Ann Arbor (H.J.S.D., B.G.); the Department of Preventive and Societal Medicine, University of Nebraska, Omaha (J.B.); the Family Birthing Center, Providence Hospital, Southfield, Michigan (M.L.); and the School of Public Health, University of Michigan, Ann Arbor (B.F.). The authors report no competing interests. All requests for reprints should be addressed to Kendra Schwartz, MD, MSPH, Department of Family Medicine, Wayne State University, 101 E. Alexandrine, Detroit, MI 48201. E-mail: [email protected].

Author and Disclosure Information

 

KENDRA SCHWARTZ, MD, MSPH
HANNAH J. S. D’ARCY, MS
BRENDA GILLESPIE, PHD
JANET BOBO, PHD
MARYLOU LONGEWAY, MSN
BETSY FOXMAN, PHD
Detroit, Ann Arbor, and Southfield, Michigan; and Omaha, Nebraska
From the Department of Family Medicine, Wayne State University, Detroit (K.S.); the Center for Statistical Consultation and Research, University of Michigan, Ann Arbor (H.J.S.D., B.G.); the Department of Preventive and Societal Medicine, University of Nebraska, Omaha (J.B.); the Family Birthing Center, Providence Hospital, Southfield, Michigan (M.L.); and the School of Public Health, University of Michigan, Ann Arbor (B.F.). The authors report no competing interests. All requests for reprints should be addressed to Kendra Schwartz, MD, MSPH, Department of Family Medicine, Wayne State University, 101 E. Alexandrine, Detroit, MI 48201. E-mail: [email protected].

Article PDF
Article PDF

 

ABSTRACT

OBJECTIVE: To determine the demographic, behavioral, and clinical factors associated with breastfeeding termination in the first 12 weeks postpartum.

STUDY DESIGN: This was a prospective cohort study.

POPULATION: Breastfeeding women in Michigan and Nebraska were interviewed by telephone at 3, 6, 9, and 12 weeks postpartum or until breastfeeding termination.

OUTCOMES MEASURED: We measured associations of demographic, clinical, and breastfeeding variables with weaning during the first 12 weeks postpartum.

RESULTS: A total of 946 women participated; 75% breastfed until 12 weeks. Women older than 30 years and women with at least a bachelor’s degree were more likely to continue breastfeeding in any given week. Mastitis, breast or nipple pain, bottle use, and milk expression in the first 3 weeks were all associated with termination. Beyond 3 weeks, women who expressed breast milk were 75% less likely to discontinue breastfeeding than women who did not. Women who used a bottle for some feedings during weeks 4 to 12 were 98% less likely to discontinue breastfeeding than women who did not use a bottle. "Not enough milk" was the most common reason given for termination in weeks 1 through 3 (37%) and weeks 4 through 6 (35%); “return to work” was the most common reason given in weeks 7 through 9 (53%) and weeks 10 through 12 (58%).

CONCLUSIONS: Younger women and less educated women need additional support in their breastfeeding efforts. Counseling and assistance should be provided to women with pain and mastitis. Exclusive breastfeeding for the first 3 weeks should be recommended. After the first 3 weeks, bottles and manual expression are not associated with weaning and may improve the likelihood of continuing breastfeeding, at least until 12 weeks.

 

KEY POINTS FOR CLINICIANS

 

  • Younger and less educated women may need extra support for long-term breastfeeding success.
  • Exclusive breastfeeding for the first 3 weeks decreases the risk of early weaning. At least 7 daily feedings of 10 or more minutes per feeding are recommended.
  • The use of bottles and manual expression of milk after 3 weeks does not increase the risk of early weaning.

Family physicians are strongly encouraged to support and promote breastfeeding, the optimal form of infant nutrition.1 Despite its known benefits (fewer infant infections2-6 and decreased maternal risks of premenopausal breast cancer7 and post-menopausal hip fractures8), only 64% of mothers initiated breastfeeding in 19989 and only 29% of mothers fed their 6-month-old infant by breast, well below the Healthy People 2010 goal of 50% breastfeeding at 6 months.10 Clearly, determining the factors that influence breastfeeding beyond the early postpartum period would be beneficial.

Returning to work is a consistent risk factor for weaning.11-14 The impact of early bottle-feeding on the duration of breastfeeding has been studied with less consistent results.15,20 Insufficient milk supply is a common subjective reason given for termination.15,19,21,22 Older women and those with a higher level of education are at less risk of early breastfeeding termination.9,11,15,16,21,23,24

Few investigators have described how breastfeeding patterns may affect breastfeeding duration. Little is known about the effects of timing, frequency, and duration of individual breastfeedings, or the roles of breast pain and infection, sleep, and manual expression on early weaning. We studied women who indicated their intent to breastfeed prenatally to identify demographic factors and breastfeeding patterns associated with weaning in the first 12 weeks postpartum.

Methods

Population

We interviewed breastfeeding women by telephone at 3, 6, 9, and 12 weeks postpartum to investigate lactation mastitis risk factors and predictors of weaning. Pregnant women intending to breastfeed were recruited from 2 geographic sites between June 1994 and January 1998. In suburban Detroit, Michigan, women attending orientation at a freestanding birthing center were asked to participate. In Omaha, Nebraska, women at a single large company were recruited when applying for maternity leave.

Data collection

During the computer-assisted interview, subjects were asked to recall each of the previous 3 weeks. The initial interview, which collected demographic information, typically lasted 15 to 20 minutes; subsequent interviews were shorter. The survey addressed breastfeeding practices and recent health events. Exclusive breastfeeders were women who fed their infants only by breast. We did not collect information on pacifiers; therefore, exclusively breastfed infants may have also received pacifiers. Women who manually expressed or used a device to assist in expression were classified as “pumping” their breasts. Respondents were asked if they had bottle-fed the infant; they were not asked about bottle contents or volume.

Subjects were queried on potential difficulties including breast or nipple pain while nursing, nipple cracks, and mastitis (diagnosed by a health care provider), as well as other health problems and behaviors. Subjects who had stopped breastfeeding in the previous 3 weeks were asked when and why, given a list of possible explanations and an open-ended opportunity. Respondents could provide multiple reasons for termination.

 

 

Data analysis

Kaplan-Meier estimates describe the distribution of weaning times for the 2 sites. A log-rank test was used to assess group differences. Relationships between demographic factors and time of weaning were assessed by Cox regression analysis. Discrete survival analysis was used to determine whether variables measured on a weekly basis were related to breastfeeding cessation. Hazard ratios describe the association of the exposures between women who stopped breastfeeding at a given time and those who continued. Because breastfeeding cessation was a rare event in later weeks of the study, as were certain clinical or behavioral breastfeeding factors, weeks 4-12 were collapsed into a single interval. Two variables, number of daily feedings and duration of each feeding, were examined only in the first 3 weeks because the information was often missing beyond 3 weeks. All analyses were performed using the Statistical Package for the Social Sciences.25

Results

Description of subjects

A total of 1057 women agreed to be contacted. Of those, 946 (89.5%) participated in at least 1 interview. Of the 111 women who did not participate, 11 refused and 100 could not be located. Six hundred fifty-eight (69.6%) women completed all 4 interviews. The 56 women who entered the study at week 6 because they could not be reached for the first interview were similar in all factors to women who entered earlier. Of the 946, 711 (75.2%) were from Michigan and 235 (24.8%) were from Nebraska.

Subjects from Michigan were significantly more likely than those from Nebraska to be older than 30 years (52.0% vs 38.3%), have at least a bachelor’s degree (62.9% vs 48.5%), have 3 or more children (38.5% vs 19.6%), and have had a vaginal delivery (99.6% vs 77.0%) (Table W1).* The groups were similar in race, household income, and marital status.

Demographic factors

A total of 673 women (71.1%) continued breastfeeding until 12 weeks; 28% were exclusive breastfeeders. Michigan women were more likely to breastfeed at weeks 2 through 12 than their Nebraskan counterparts (P < .0001, Figure). A college degree was associated with 40% less weaning (Table 1). Age and annual household income were directly related to continued breastfeeding at both sites. Number of children in the household was not associated with termination. Previous breastfeeding experience showed a nonsignificant but consistent trend toward lower weaning risk.

TABLE 1
Relationships of demographics and other characteristics with time to weaning, by site

 

CharacteristicMichigan women HR* (95% CI)Nebraska women HR* (95% CI)
Older than 30 years0.5 (0.3,0.8)0.7 (0.5, 1.1)
BA/BS or higher0.6 (0.4, 0.9)0.6 (0.4, 0.8)
Number of children in household
  11.01.0
  21.0 (0.6, 1.6)0.7 (0.5, 1.2)
  3 or more0.6 (0.4, 1.0)0.9 (0.6, 1.5)
Household income ≥ $50,0000.8 (0.5, 1.3)0.7 (0.5, 1.0)
Breastfed previously0.7 (0.5, 1.1)0.7 (0.5, 1.1)
Nonvaginal birth0.9 (0.6, 1.4)
NOTE: Bold numbers are significant at P < .05.
HR denotes hazard ratio; CI, confidence interval; BA, bachelor of arts degree; BS, bachelor of science degree.
*A hazard ratio of <1 indicates that subjects with this characteristic were less likely to wean during the 12 weeks. Unless otherwise noted, the referent group is the converse (eg, age < 30 years is the referent group for those older than 30 years).
†Too few observations to provide meaningful results.

 

FIGURE
Probability of breastfeeding, by site, by postpartum week

Clinical and behavioral factors

Because time to weaning differed significantly by site, the survival analyses of clinical and behavioral factors were performed separately for Michigan and Nebraska and controlled for education, age, and previous breastfeeding experience.

During the first 3 weeks, Michigan women with mastitis were nearly 6 times more likely than Michigan women without mastitis to stop breast-feeding in the week of diagnosis (Table 2). Women from Nebraska showed nonsignificant results in the same direction in weeks 4 to 12. (No women from Nebraska with mastitis terminated during weeks 1 through 3.) Although nipple sores and cracks were not associated with weaning, breast pain was associated with weaning. For each day of pain in the first 3 weeks, there was a 10% increase in risk of cessation among Michigan women and a 26% increase among Nebraska women. The association between pain and weaning in weeks 4 through 12 is less clear. In these later weeks, women who reported pain were unexpectedly 75% to 80% more likely to continue breastfeeding than women who did not report pain, yet for Nebraska women the number of days with pain remained significantly associated with breastfeeding cessation.

Subjective depression and breastfeeding cessation were not related. The association between daily sleep and weaning varied by site. During weeks 4 through 12, Michigan women with more daily sleep were less likely to terminate. An opposite, but marginally significant trend, was observed for Nebraska women. Weaning was not associated with outside household help. Nonvaginal birth was not associated with weaning for Nebraska women. (There were only 2 cesarean sections in the Michigan group.)

 

 

Michigan women who expressed breast milk during the first 3 weeks were twice as likely to stop breastfeeding as those who did not pump. During the same period, Michigan women who used a bottle for some feedings were 9 times more likely to wean than nonbottle users. Respondents in Nebraska showed similar nonsignificant trends in the first 3 weeks. By contrast, during weeks 4 through 12, both Nebraska and Michigan women who pumped were about 75% less likely to wean, while women who used a bottle for some feedings were 98% less likely to stop breastfeeding.

Breast milk expression increased gradually over time, from 30% of women pumping an average of 3 times per day in the first 3 weeks to 45% of women pumping 5 times per day in the last 3 weeks. To determine if pumping and bottle-feeding had an effect independent of pain or mastitis on weaning in the first 3 weeks, we performed additional analyses controlling for pain, cracks and sores, and mastitis in the same week. The results were similar to those presented in Table 2. Michigan women who pumped were 3 times more likely to wean than those who did not pump (hazard ratio [HR] = 3.0, 95% confidence interval [CI], 1.3 - 6.7), while for Nebraska women there was no association between pumping and weaning (HR = 0.6, 95% CI, 0.3 - 1.5). Bottle-feeding was again significantly associated with weaning in weeks 1 through 3 for Michigan women (HR = 10.9, 95% CI, 4.5 - 26.7) and not associated in Nebraskans (HR = 0.8, 95% CI, 0.4 - 2.0).

Duration and frequency of feedings were investigated as weaning risk factors. There appeared to be a threshold for both variables during the first 3 weeks in Michigan women. Michigan women who breastfed less than 10 minutes per feeding were nearly 5 times more likely to stop breastfeeding than women who breastfed longer. Michigan women who breastfed 6 or fewer times per day were 8 times more likely to stop than those who breastfed more often. Results for Nebraska women fell in the same direction but were not statistically significant.

TABLE 2
Relationships of clinical and behavioral factors to breastfeeding cessation in the same week, adjusted for mother’s age, education, and previous breastfeeding experience

 

VariableWeekMichigan women HR (95% CI)Nebraska women HR (95% CI)
Mastitis1 - 35.7 (1.3 - 25.9)
4 - 122.1 (0.3 - 17.4)
Engorgement1 - 30.6 (0.2 - 1.5)0.8 (0.3 - 2.1)
4 - 123.2 (0.6 - 15.8)
Nipple sores/cracks1 - 31.1 (0.4 - 2.6)0.9 (0.4 - 2.3)
4 - 122.6 (0.8 - 8.5)2.9 (0.8 - 10.7)
Any pain †1 - 314.7 (6.8 - 32.0)§9.1 (3.9 - 21.2)
4 - 120.3 (0.1 - 0.7)0.2 (0.1 - 0.5)
Days with pain*1 - 31.1 (1.0 - 1.2)1.3 (1.0 - 1.5)
4 - 121.1 (1.0 - 1.2)1.1 (1.0 - 1.2)
Returned to work1 - 30.4 (0.1 - 3.0)
4 - 122.1 (1.1 - 4.0)0.8 (0.4 - 1.7)
Depressed1 - 30.9 (0.3 - 3.0)1.0 (0.4 - 2.6)
4 - 120.9 (0.4 - 2.2)1.3 (0.6 - 2.7)
Daily sleep hours1 - 30.9 (0.7 - 1.1)0.9 (0.8 - 1.2)
4 - 120.7 (0.5 - 0.9)1.2 (1.0 - 1.5)
Outside household help1 - 32.0 (0.8 - 4.8)0.9 (0.4 - 2.1)
4 - 120.7 (0.3 - 2.6)0.7 (0.2 - 2.1)
Pumping1 - 32.2 (1.1 - 4.6)1.3 (0.6 - 2.5)
4 - 120.2 (0.1 - 0.5)§0.3 (0.1 - 0.5) §
Bottle feeding1 - 39.5 (4.3 - 21.0) §1.8 (0.9 - 3.5)
4 - 120.03 (0.003 - 0.2) §0.02 (0.004 - 0.1) §
Minutes per feeding1 - 31.0 (0.9, 1.0)1.1 (1.0, 1.1)
Less than 10 minutes per feeding1 - 34.8 (1.7, 13.4)2.2 (0.6, 8.1)
Feedings per day1 - 30.7 (0.6, 0.8) §0.9 (0.8, 1.1)
Less than 7 feedings/day1 - 38.1 (3.4, 19.2) §1.8 (0.7, 4.6)
NOTE: Bold numbers significant at P = .05 or less; those marked with § are significant at P = .001 or less.
HR denotes hazard ratio; CI, confidence interval.
*Subjects answered affirmatively to any of the following types of pain: pain when latching on, pain while nursing, pain when not nursing.
† Measured in 3-week periods.
‡ Indicates there were too few observations to provide meaningful results; for example, there were no Nebraska women who had mastitis and stopped breastfeeding in the same week during weeks 1-3.

Subjective factors

At each interview, women who had stopped breastfeeding in the previous 3 weeks were asked why they had made that decision. Most women (75%) provided only one reason. At the first interview, insufficient milk supply (37.3%) and breast pain or mastitis (32.9%) were the most common reasons for termination (Table 3). Insufficient milk supply was the reason most often given (35.0%) during weeks 4 through 6. At both weeks 9 and 12, return to work was the reason given most often (53.1% and 58.3%, respectively).

 

 

TABLE 3
Percentage of women citing given reason for termination of breastfeeding

 

 Week 3Week 6Week 9Week 12
Reason(n = 67)(n = 60)(n = 32)(n = 36)
Insufficient milk supply37.335.025.013.9
Inconvenient17.925.021.933.3
Returned to work4.531.753.158.3
Breast pain or infection32.923.305.6
Baby stopped nursing7.55.03.111.1
Other22.418.33.15.6
NOTE: Percentages total more than 100% because respondents could cite multiple reasons.

Discussion

Mastitis, pain, and days with pain in the first 3 weeks were important clinical factors associated with breastfeeding cessation in this cohort of women who prenatally self-identified as intending to breastfeed. Women who intend to breastfeed should be counseled regarding these possible complications, their temporary nature, prevention, and treatment. Mastitis is not an indication for breastfeeding termination; in fact, increased feedings and milk expression are considered treatment.26,27 Women who reported pain the first 3 weeks were more likely to stop breastfeeding than women who reported pain after the first 3 weeks. It is difficult to explain this finding; perhaps there are women who have pain during their entire breastfeeding career and yet continue to breastfeed because they are more pain-tolerant, have less severe or frequent pain than those who wean, or are more committed to breastfeeding.

Other clinical factors investigated were depression and daily sleep hours. Weaning was not associated with subjective depression. However, subjects did not undergo formal psychological testing as in the study that reported an association.24 The relationship between daily sleep hours and termination was not consistent, and likely not clinically significant.

The demographic risk factors related to breast-feeding termination in our study are similar to those previously reported,14,15,20,21,23,24 namely, younger maternal age and lower educational level. Investigations of parity have been inconsistent.16,28 We found no association of weaning with parity. Prior breastfeeding experience has been reported as improving breastfeeding rates15,28; our results are consistent with those findings, but not significantly so. All subjects had access to prenatal breastfeeding education and postnatal breastfeeding support, which may have diminished the differences between women with breastfeeding experience and those without experience.20

Michigan and Nebraska women who pumped or bottle-fed during weeks 4 through 12 were significantly less likely to terminate breastfeeding. In contrast, Michigan women who pumped or bottle-fed during the first 3 weeks postpartum were more likely to terminate even after controlling for pain and mastitis. A commitment to exclusive breastfeeding may be necessary in the early postpartum period for long-term success.15,19 To our knowledge, the seemingly protective effect associated with pumping and bottle-feeding after the first 3 weeks has not been previously reported.

Breastfeeding 6 or fewer times per day and feedings of 10 minutes or less were associated with termination during the first 3 weeks. Other studies also indicate that the ratio of breast to bottle feedings is important for long-term success. Feinstein and colleagues15 found that more than one daily bottle of formula supplementation was associated with shorter breastfeeding duration, which was minimized if there were 7 or more breastfeedings per day. Another study found no weaning difference between women who offered their infant only one bottle daily during weeks 2 through 6 and a bottle-avoiding group.17

The most frequent reasons given for termination were similar to those reported by others, namely, insufficient milk supply and return to work.11-15,21,22 Insufficient milk supply was a more common reason in the first few weeks after birth; return to work became an increasingly common reason after week 6.

We were unable to examine the role of pacifiers or smoking in breastfeedng termination because pacifier information was not collected and there were too few smokers for meaningful analysis. Smoking has been consistently reported as associated with early cessation.15,20,29,30 Although pacifier use does not appear to be directly related,31,32 it has been proposed as a marker for breastfeeding problems. The homogeneity of the sample limits our ability to make generalizations regarding other populations, such as women of color. However, the large sample size and the similarity of termination risk factors between 2 different populations of women lend confidence to our conclusions. As we did not assess mothers’ intentions, some of the variables found associated with termination might be intentional activities of weaning rather than risk factors for termination. The significant difference in termination risk between the sites also may be related to mothers’ intentions or level of commitment. The Michigan women may have intended to breastfeed longer from the outset. The Michigan recruitment site was an alternative birthing center. Women being delivered there may be more persistent in their breast-feeding efforts. Both sites provided access to breast-feeding support personnel, but the Michigan women, as a group, may have been more motivated to continue.

Our results provide clinically useful information. Additional support may be needed for younger and less educated women. Special efforts should be made for early diagnosis and treatment of mastitis and breast pain, particularly during the first 3 weeks. Exclusive breastfeeding without bottle supplementation should be recommended for the first 3 weeks, with at least 7 feedings per day. Each feeding should preferably last more than 10 minutes.

 

 

These results should also reassure breastfeeding women and their providers regarding the use of bottles. Bottle-feeding after 3 weeks does not appear to jeopardize breastfeeding success up to 12 weeks and may even improve it.

* Table W1 appears on the JFP Web site at www.jfponline.com.

Acknowledgments

This study was supported by National Institutes of Health grant #30866.

 

ABSTRACT

OBJECTIVE: To determine the demographic, behavioral, and clinical factors associated with breastfeeding termination in the first 12 weeks postpartum.

STUDY DESIGN: This was a prospective cohort study.

POPULATION: Breastfeeding women in Michigan and Nebraska were interviewed by telephone at 3, 6, 9, and 12 weeks postpartum or until breastfeeding termination.

OUTCOMES MEASURED: We measured associations of demographic, clinical, and breastfeeding variables with weaning during the first 12 weeks postpartum.

RESULTS: A total of 946 women participated; 75% breastfed until 12 weeks. Women older than 30 years and women with at least a bachelor’s degree were more likely to continue breastfeeding in any given week. Mastitis, breast or nipple pain, bottle use, and milk expression in the first 3 weeks were all associated with termination. Beyond 3 weeks, women who expressed breast milk were 75% less likely to discontinue breastfeeding than women who did not. Women who used a bottle for some feedings during weeks 4 to 12 were 98% less likely to discontinue breastfeeding than women who did not use a bottle. "Not enough milk" was the most common reason given for termination in weeks 1 through 3 (37%) and weeks 4 through 6 (35%); “return to work” was the most common reason given in weeks 7 through 9 (53%) and weeks 10 through 12 (58%).

CONCLUSIONS: Younger women and less educated women need additional support in their breastfeeding efforts. Counseling and assistance should be provided to women with pain and mastitis. Exclusive breastfeeding for the first 3 weeks should be recommended. After the first 3 weeks, bottles and manual expression are not associated with weaning and may improve the likelihood of continuing breastfeeding, at least until 12 weeks.

 

KEY POINTS FOR CLINICIANS

 

  • Younger and less educated women may need extra support for long-term breastfeeding success.
  • Exclusive breastfeeding for the first 3 weeks decreases the risk of early weaning. At least 7 daily feedings of 10 or more minutes per feeding are recommended.
  • The use of bottles and manual expression of milk after 3 weeks does not increase the risk of early weaning.

Family physicians are strongly encouraged to support and promote breastfeeding, the optimal form of infant nutrition.1 Despite its known benefits (fewer infant infections2-6 and decreased maternal risks of premenopausal breast cancer7 and post-menopausal hip fractures8), only 64% of mothers initiated breastfeeding in 19989 and only 29% of mothers fed their 6-month-old infant by breast, well below the Healthy People 2010 goal of 50% breastfeeding at 6 months.10 Clearly, determining the factors that influence breastfeeding beyond the early postpartum period would be beneficial.

Returning to work is a consistent risk factor for weaning.11-14 The impact of early bottle-feeding on the duration of breastfeeding has been studied with less consistent results.15,20 Insufficient milk supply is a common subjective reason given for termination.15,19,21,22 Older women and those with a higher level of education are at less risk of early breastfeeding termination.9,11,15,16,21,23,24

Few investigators have described how breastfeeding patterns may affect breastfeeding duration. Little is known about the effects of timing, frequency, and duration of individual breastfeedings, or the roles of breast pain and infection, sleep, and manual expression on early weaning. We studied women who indicated their intent to breastfeed prenatally to identify demographic factors and breastfeeding patterns associated with weaning in the first 12 weeks postpartum.

Methods

Population

We interviewed breastfeeding women by telephone at 3, 6, 9, and 12 weeks postpartum to investigate lactation mastitis risk factors and predictors of weaning. Pregnant women intending to breastfeed were recruited from 2 geographic sites between June 1994 and January 1998. In suburban Detroit, Michigan, women attending orientation at a freestanding birthing center were asked to participate. In Omaha, Nebraska, women at a single large company were recruited when applying for maternity leave.

Data collection

During the computer-assisted interview, subjects were asked to recall each of the previous 3 weeks. The initial interview, which collected demographic information, typically lasted 15 to 20 minutes; subsequent interviews were shorter. The survey addressed breastfeeding practices and recent health events. Exclusive breastfeeders were women who fed their infants only by breast. We did not collect information on pacifiers; therefore, exclusively breastfed infants may have also received pacifiers. Women who manually expressed or used a device to assist in expression were classified as “pumping” their breasts. Respondents were asked if they had bottle-fed the infant; they were not asked about bottle contents or volume.

Subjects were queried on potential difficulties including breast or nipple pain while nursing, nipple cracks, and mastitis (diagnosed by a health care provider), as well as other health problems and behaviors. Subjects who had stopped breastfeeding in the previous 3 weeks were asked when and why, given a list of possible explanations and an open-ended opportunity. Respondents could provide multiple reasons for termination.

 

 

Data analysis

Kaplan-Meier estimates describe the distribution of weaning times for the 2 sites. A log-rank test was used to assess group differences. Relationships between demographic factors and time of weaning were assessed by Cox regression analysis. Discrete survival analysis was used to determine whether variables measured on a weekly basis were related to breastfeeding cessation. Hazard ratios describe the association of the exposures between women who stopped breastfeeding at a given time and those who continued. Because breastfeeding cessation was a rare event in later weeks of the study, as were certain clinical or behavioral breastfeeding factors, weeks 4-12 were collapsed into a single interval. Two variables, number of daily feedings and duration of each feeding, were examined only in the first 3 weeks because the information was often missing beyond 3 weeks. All analyses were performed using the Statistical Package for the Social Sciences.25

Results

Description of subjects

A total of 1057 women agreed to be contacted. Of those, 946 (89.5%) participated in at least 1 interview. Of the 111 women who did not participate, 11 refused and 100 could not be located. Six hundred fifty-eight (69.6%) women completed all 4 interviews. The 56 women who entered the study at week 6 because they could not be reached for the first interview were similar in all factors to women who entered earlier. Of the 946, 711 (75.2%) were from Michigan and 235 (24.8%) were from Nebraska.

Subjects from Michigan were significantly more likely than those from Nebraska to be older than 30 years (52.0% vs 38.3%), have at least a bachelor’s degree (62.9% vs 48.5%), have 3 or more children (38.5% vs 19.6%), and have had a vaginal delivery (99.6% vs 77.0%) (Table W1).* The groups were similar in race, household income, and marital status.

Demographic factors

A total of 673 women (71.1%) continued breastfeeding until 12 weeks; 28% were exclusive breastfeeders. Michigan women were more likely to breastfeed at weeks 2 through 12 than their Nebraskan counterparts (P < .0001, Figure). A college degree was associated with 40% less weaning (Table 1). Age and annual household income were directly related to continued breastfeeding at both sites. Number of children in the household was not associated with termination. Previous breastfeeding experience showed a nonsignificant but consistent trend toward lower weaning risk.

TABLE 1
Relationships of demographics and other characteristics with time to weaning, by site

 

CharacteristicMichigan women HR* (95% CI)Nebraska women HR* (95% CI)
Older than 30 years0.5 (0.3,0.8)0.7 (0.5, 1.1)
BA/BS or higher0.6 (0.4, 0.9)0.6 (0.4, 0.8)
Number of children in household
  11.01.0
  21.0 (0.6, 1.6)0.7 (0.5, 1.2)
  3 or more0.6 (0.4, 1.0)0.9 (0.6, 1.5)
Household income ≥ $50,0000.8 (0.5, 1.3)0.7 (0.5, 1.0)
Breastfed previously0.7 (0.5, 1.1)0.7 (0.5, 1.1)
Nonvaginal birth0.9 (0.6, 1.4)
NOTE: Bold numbers are significant at P < .05.
HR denotes hazard ratio; CI, confidence interval; BA, bachelor of arts degree; BS, bachelor of science degree.
*A hazard ratio of <1 indicates that subjects with this characteristic were less likely to wean during the 12 weeks. Unless otherwise noted, the referent group is the converse (eg, age < 30 years is the referent group for those older than 30 years).
†Too few observations to provide meaningful results.

 

FIGURE
Probability of breastfeeding, by site, by postpartum week

Clinical and behavioral factors

Because time to weaning differed significantly by site, the survival analyses of clinical and behavioral factors were performed separately for Michigan and Nebraska and controlled for education, age, and previous breastfeeding experience.

During the first 3 weeks, Michigan women with mastitis were nearly 6 times more likely than Michigan women without mastitis to stop breast-feeding in the week of diagnosis (Table 2). Women from Nebraska showed nonsignificant results in the same direction in weeks 4 to 12. (No women from Nebraska with mastitis terminated during weeks 1 through 3.) Although nipple sores and cracks were not associated with weaning, breast pain was associated with weaning. For each day of pain in the first 3 weeks, there was a 10% increase in risk of cessation among Michigan women and a 26% increase among Nebraska women. The association between pain and weaning in weeks 4 through 12 is less clear. In these later weeks, women who reported pain were unexpectedly 75% to 80% more likely to continue breastfeeding than women who did not report pain, yet for Nebraska women the number of days with pain remained significantly associated with breastfeeding cessation.

Subjective depression and breastfeeding cessation were not related. The association between daily sleep and weaning varied by site. During weeks 4 through 12, Michigan women with more daily sleep were less likely to terminate. An opposite, but marginally significant trend, was observed for Nebraska women. Weaning was not associated with outside household help. Nonvaginal birth was not associated with weaning for Nebraska women. (There were only 2 cesarean sections in the Michigan group.)

 

 

Michigan women who expressed breast milk during the first 3 weeks were twice as likely to stop breastfeeding as those who did not pump. During the same period, Michigan women who used a bottle for some feedings were 9 times more likely to wean than nonbottle users. Respondents in Nebraska showed similar nonsignificant trends in the first 3 weeks. By contrast, during weeks 4 through 12, both Nebraska and Michigan women who pumped were about 75% less likely to wean, while women who used a bottle for some feedings were 98% less likely to stop breastfeeding.

Breast milk expression increased gradually over time, from 30% of women pumping an average of 3 times per day in the first 3 weeks to 45% of women pumping 5 times per day in the last 3 weeks. To determine if pumping and bottle-feeding had an effect independent of pain or mastitis on weaning in the first 3 weeks, we performed additional analyses controlling for pain, cracks and sores, and mastitis in the same week. The results were similar to those presented in Table 2. Michigan women who pumped were 3 times more likely to wean than those who did not pump (hazard ratio [HR] = 3.0, 95% confidence interval [CI], 1.3 - 6.7), while for Nebraska women there was no association between pumping and weaning (HR = 0.6, 95% CI, 0.3 - 1.5). Bottle-feeding was again significantly associated with weaning in weeks 1 through 3 for Michigan women (HR = 10.9, 95% CI, 4.5 - 26.7) and not associated in Nebraskans (HR = 0.8, 95% CI, 0.4 - 2.0).

Duration and frequency of feedings were investigated as weaning risk factors. There appeared to be a threshold for both variables during the first 3 weeks in Michigan women. Michigan women who breastfed less than 10 minutes per feeding were nearly 5 times more likely to stop breastfeeding than women who breastfed longer. Michigan women who breastfed 6 or fewer times per day were 8 times more likely to stop than those who breastfed more often. Results for Nebraska women fell in the same direction but were not statistically significant.

TABLE 2
Relationships of clinical and behavioral factors to breastfeeding cessation in the same week, adjusted for mother’s age, education, and previous breastfeeding experience

 

VariableWeekMichigan women HR (95% CI)Nebraska women HR (95% CI)
Mastitis1 - 35.7 (1.3 - 25.9)
4 - 122.1 (0.3 - 17.4)
Engorgement1 - 30.6 (0.2 - 1.5)0.8 (0.3 - 2.1)
4 - 123.2 (0.6 - 15.8)
Nipple sores/cracks1 - 31.1 (0.4 - 2.6)0.9 (0.4 - 2.3)
4 - 122.6 (0.8 - 8.5)2.9 (0.8 - 10.7)
Any pain †1 - 314.7 (6.8 - 32.0)§9.1 (3.9 - 21.2)
4 - 120.3 (0.1 - 0.7)0.2 (0.1 - 0.5)
Days with pain*1 - 31.1 (1.0 - 1.2)1.3 (1.0 - 1.5)
4 - 121.1 (1.0 - 1.2)1.1 (1.0 - 1.2)
Returned to work1 - 30.4 (0.1 - 3.0)
4 - 122.1 (1.1 - 4.0)0.8 (0.4 - 1.7)
Depressed1 - 30.9 (0.3 - 3.0)1.0 (0.4 - 2.6)
4 - 120.9 (0.4 - 2.2)1.3 (0.6 - 2.7)
Daily sleep hours1 - 30.9 (0.7 - 1.1)0.9 (0.8 - 1.2)
4 - 120.7 (0.5 - 0.9)1.2 (1.0 - 1.5)
Outside household help1 - 32.0 (0.8 - 4.8)0.9 (0.4 - 2.1)
4 - 120.7 (0.3 - 2.6)0.7 (0.2 - 2.1)
Pumping1 - 32.2 (1.1 - 4.6)1.3 (0.6 - 2.5)
4 - 120.2 (0.1 - 0.5)§0.3 (0.1 - 0.5) §
Bottle feeding1 - 39.5 (4.3 - 21.0) §1.8 (0.9 - 3.5)
4 - 120.03 (0.003 - 0.2) §0.02 (0.004 - 0.1) §
Minutes per feeding1 - 31.0 (0.9, 1.0)1.1 (1.0, 1.1)
Less than 10 minutes per feeding1 - 34.8 (1.7, 13.4)2.2 (0.6, 8.1)
Feedings per day1 - 30.7 (0.6, 0.8) §0.9 (0.8, 1.1)
Less than 7 feedings/day1 - 38.1 (3.4, 19.2) §1.8 (0.7, 4.6)
NOTE: Bold numbers significant at P = .05 or less; those marked with § are significant at P = .001 or less.
HR denotes hazard ratio; CI, confidence interval.
*Subjects answered affirmatively to any of the following types of pain: pain when latching on, pain while nursing, pain when not nursing.
† Measured in 3-week periods.
‡ Indicates there were too few observations to provide meaningful results; for example, there were no Nebraska women who had mastitis and stopped breastfeeding in the same week during weeks 1-3.

Subjective factors

At each interview, women who had stopped breastfeeding in the previous 3 weeks were asked why they had made that decision. Most women (75%) provided only one reason. At the first interview, insufficient milk supply (37.3%) and breast pain or mastitis (32.9%) were the most common reasons for termination (Table 3). Insufficient milk supply was the reason most often given (35.0%) during weeks 4 through 6. At both weeks 9 and 12, return to work was the reason given most often (53.1% and 58.3%, respectively).

 

 

TABLE 3
Percentage of women citing given reason for termination of breastfeeding

 

 Week 3Week 6Week 9Week 12
Reason(n = 67)(n = 60)(n = 32)(n = 36)
Insufficient milk supply37.335.025.013.9
Inconvenient17.925.021.933.3
Returned to work4.531.753.158.3
Breast pain or infection32.923.305.6
Baby stopped nursing7.55.03.111.1
Other22.418.33.15.6
NOTE: Percentages total more than 100% because respondents could cite multiple reasons.

Discussion

Mastitis, pain, and days with pain in the first 3 weeks were important clinical factors associated with breastfeeding cessation in this cohort of women who prenatally self-identified as intending to breastfeed. Women who intend to breastfeed should be counseled regarding these possible complications, their temporary nature, prevention, and treatment. Mastitis is not an indication for breastfeeding termination; in fact, increased feedings and milk expression are considered treatment.26,27 Women who reported pain the first 3 weeks were more likely to stop breastfeeding than women who reported pain after the first 3 weeks. It is difficult to explain this finding; perhaps there are women who have pain during their entire breastfeeding career and yet continue to breastfeed because they are more pain-tolerant, have less severe or frequent pain than those who wean, or are more committed to breastfeeding.

Other clinical factors investigated were depression and daily sleep hours. Weaning was not associated with subjective depression. However, subjects did not undergo formal psychological testing as in the study that reported an association.24 The relationship between daily sleep hours and termination was not consistent, and likely not clinically significant.

The demographic risk factors related to breast-feeding termination in our study are similar to those previously reported,14,15,20,21,23,24 namely, younger maternal age and lower educational level. Investigations of parity have been inconsistent.16,28 We found no association of weaning with parity. Prior breastfeeding experience has been reported as improving breastfeeding rates15,28; our results are consistent with those findings, but not significantly so. All subjects had access to prenatal breastfeeding education and postnatal breastfeeding support, which may have diminished the differences between women with breastfeeding experience and those without experience.20

Michigan and Nebraska women who pumped or bottle-fed during weeks 4 through 12 were significantly less likely to terminate breastfeeding. In contrast, Michigan women who pumped or bottle-fed during the first 3 weeks postpartum were more likely to terminate even after controlling for pain and mastitis. A commitment to exclusive breastfeeding may be necessary in the early postpartum period for long-term success.15,19 To our knowledge, the seemingly protective effect associated with pumping and bottle-feeding after the first 3 weeks has not been previously reported.

Breastfeeding 6 or fewer times per day and feedings of 10 minutes or less were associated with termination during the first 3 weeks. Other studies also indicate that the ratio of breast to bottle feedings is important for long-term success. Feinstein and colleagues15 found that more than one daily bottle of formula supplementation was associated with shorter breastfeeding duration, which was minimized if there were 7 or more breastfeedings per day. Another study found no weaning difference between women who offered their infant only one bottle daily during weeks 2 through 6 and a bottle-avoiding group.17

The most frequent reasons given for termination were similar to those reported by others, namely, insufficient milk supply and return to work.11-15,21,22 Insufficient milk supply was a more common reason in the first few weeks after birth; return to work became an increasingly common reason after week 6.

We were unable to examine the role of pacifiers or smoking in breastfeedng termination because pacifier information was not collected and there were too few smokers for meaningful analysis. Smoking has been consistently reported as associated with early cessation.15,20,29,30 Although pacifier use does not appear to be directly related,31,32 it has been proposed as a marker for breastfeeding problems. The homogeneity of the sample limits our ability to make generalizations regarding other populations, such as women of color. However, the large sample size and the similarity of termination risk factors between 2 different populations of women lend confidence to our conclusions. As we did not assess mothers’ intentions, some of the variables found associated with termination might be intentional activities of weaning rather than risk factors for termination. The significant difference in termination risk between the sites also may be related to mothers’ intentions or level of commitment. The Michigan women may have intended to breastfeed longer from the outset. The Michigan recruitment site was an alternative birthing center. Women being delivered there may be more persistent in their breast-feeding efforts. Both sites provided access to breast-feeding support personnel, but the Michigan women, as a group, may have been more motivated to continue.

Our results provide clinically useful information. Additional support may be needed for younger and less educated women. Special efforts should be made for early diagnosis and treatment of mastitis and breast pain, particularly during the first 3 weeks. Exclusive breastfeeding without bottle supplementation should be recommended for the first 3 weeks, with at least 7 feedings per day. Each feeding should preferably last more than 10 minutes.

 

 

These results should also reassure breastfeeding women and their providers regarding the use of bottles. Bottle-feeding after 3 weeks does not appear to jeopardize breastfeeding success up to 12 weeks and may even improve it.

* Table W1 appears on the JFP Web site at www.jfponline.com.

Acknowledgments

This study was supported by National Institutes of Health grant #30866.

References

 

1. American Academy of Family Physicians. Policies on Health Issues: Infant Health. URL: http://aafp.org/policy/issues/i3.html

2. Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. J Pediatr 1995;126:696-702.

3. Dewey K, Heinig M, Nommsen-Rivers LA. Differences in morbidity between breast-fed and formula-fed infants. J Pediatr 1995;126:191-7.

4. Duncan B, Ey J, Holberg CJ, Wright AL, Martinez FD, Taussig LM. Exclusive breast-feeding for at least 4 months protects against otitis media. Pediatrics 1993;91:867-72.

5. Raisler J, Alexander C, O’Campo P. Breast-feeding and infant illness: a dose-reponse relationship? Am J Public Health 2000;90:1478-9.

6. Hanson LA. Breastfeeding provides passive and likely long-lasting active immunity. Ann Allergy Asthma Immunol 1998;81:523-33.

7. Newcomb P, Storer B, Longnecker M, et al. Lactation and a reduced risk of premenopausal breast cancer. N Engl J Med 1994;330:81-7.

8. Cumming RG, Klinieberg RJ. Breastfeeding and other reproductive factors and the risk of hip fractures in elderly women. Int J Epidemiol 1993;22:884-91.

9. Mother’s Survey, Ross Products Division, Abbot Laboratories, Inc. Columbus OH, 1998.

10. U.S. Department of Health and Human Services. Healthy People 2010. (Conference edition in 2 volumes.) Washington, DC: January 2000.

11. Gielen AC, Faden RR, O’Campo P, Brown CH, Paige DM. Maternal employment during the early postpartum period: effects on initiation and continuation of breastfeeding. Pediatrics 1991;87:298-305.

12. Fein SB, Roe B. The effect of work status on initiation and duration of breast-feeding. Am J Public Health 1998;88:1042-6.

13. Kurinij N, Shiono PH, Ezrine SF, Rhoads GG. Does maternal employment affect breast-feeding? Am J Public Health 1989;79:1247-50.

14. Kearney MH, Cronenwett L. Breastfeeding and employment. J Obstet Gynecol Neonatal Nurs 1991;20:471-80.

15. Feinstein JM, Berkelhamer JE, Gruszka ME, Wong CA, Carey AE. Factors related to early termination of breast-feeding in an urban population. Pediatrics 1986;78:210-5.

16. Ryan AS, Wysong JL, Martinez GA, Simon SD. Duration of breast-feeding patterns established in the hospital. Clin Pediatr 1990;29:99-107.

17. Cronenwett L, Strukel T, Kearney M, et al. Single daily bottle use in the early weeks postpartum and breast-feeding outcomes. Pediatrics 1992;90:760-6.

18. Gray-Donald K, Kramer MS, Munday S, Leduc DG. Effect of formula supplementation in the hospital on the duration of breast-feeding; a controlled clinical trial. Pediatrics 1985;75:514-8.

19. Hill PD, Humenick SS, Brennan ML, Woolley D. Does early supplementation affect long-term breastfeeding? Clin Pediatr 1997;June:345-350.

20. Wright HJ, Walker PC. Prediction of duration of breast feeding in primiparas. J Epidemiol Comm Health 1983;37:89-94.

21. Hawkins LM, Nichols FH, Tanner JL. Predictors of the duration of breastfeeding in low-income women. Birth 1987;14:204-9.

22. Hill PD, Aldag JC. Insufficient milk supply among black and white breast-feeding mothers. Res Nurs Health 1993;16:203-11.

23. Kurinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

24. Cooper PJ, Murray L, Stein A. Psychosocial factors associated with the early termination of breast-feeding. J Psychosom Res 1993;37:171-6.

25. Statistical Package for the Social Sciences. Chicago, IL: SPSS Inc; 1998.

26. Marshall B, Hepper J. Zirbel. Sporadic mastitis: an infection that need not interrupt lactation. JAMA 1975;233:1377-9.

27. Lawrence R. Mastitis. In: Breastfeeding: a guide for the medical profession. 4th ed. St. Louis: Mosby; 1994.

28. Hill PD, Humenick SS, Argubright T, Aldag JC. Effects of parity and weaning practices on breastfeeding duration. Public Health Nurs 1997;14:227-34.

29. Hill PD, Aldag JC. Smoking and breastfeeding status. Res Nurs Health 1996;19:125-32.

30. Woodward A, Hand K. Smoking and reduced duration of breast-feeding. Med J Australia 1988;148:477-8.

31. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence. Pediatrics 1997;99:445-3.

32. Howard CR, Howard FM, Lanphear B, deBlieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration? Pediatrics 1999;103:E33.-

References

 

1. American Academy of Family Physicians. Policies on Health Issues: Infant Health. URL: http://aafp.org/policy/issues/i3.html

2. Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. J Pediatr 1995;126:696-702.

3. Dewey K, Heinig M, Nommsen-Rivers LA. Differences in morbidity between breast-fed and formula-fed infants. J Pediatr 1995;126:191-7.

4. Duncan B, Ey J, Holberg CJ, Wright AL, Martinez FD, Taussig LM. Exclusive breast-feeding for at least 4 months protects against otitis media. Pediatrics 1993;91:867-72.

5. Raisler J, Alexander C, O’Campo P. Breast-feeding and infant illness: a dose-reponse relationship? Am J Public Health 2000;90:1478-9.

6. Hanson LA. Breastfeeding provides passive and likely long-lasting active immunity. Ann Allergy Asthma Immunol 1998;81:523-33.

7. Newcomb P, Storer B, Longnecker M, et al. Lactation and a reduced risk of premenopausal breast cancer. N Engl J Med 1994;330:81-7.

8. Cumming RG, Klinieberg RJ. Breastfeeding and other reproductive factors and the risk of hip fractures in elderly women. Int J Epidemiol 1993;22:884-91.

9. Mother’s Survey, Ross Products Division, Abbot Laboratories, Inc. Columbus OH, 1998.

10. U.S. Department of Health and Human Services. Healthy People 2010. (Conference edition in 2 volumes.) Washington, DC: January 2000.

11. Gielen AC, Faden RR, O’Campo P, Brown CH, Paige DM. Maternal employment during the early postpartum period: effects on initiation and continuation of breastfeeding. Pediatrics 1991;87:298-305.

12. Fein SB, Roe B. The effect of work status on initiation and duration of breast-feeding. Am J Public Health 1998;88:1042-6.

13. Kurinij N, Shiono PH, Ezrine SF, Rhoads GG. Does maternal employment affect breast-feeding? Am J Public Health 1989;79:1247-50.

14. Kearney MH, Cronenwett L. Breastfeeding and employment. J Obstet Gynecol Neonatal Nurs 1991;20:471-80.

15. Feinstein JM, Berkelhamer JE, Gruszka ME, Wong CA, Carey AE. Factors related to early termination of breast-feeding in an urban population. Pediatrics 1986;78:210-5.

16. Ryan AS, Wysong JL, Martinez GA, Simon SD. Duration of breast-feeding patterns established in the hospital. Clin Pediatr 1990;29:99-107.

17. Cronenwett L, Strukel T, Kearney M, et al. Single daily bottle use in the early weeks postpartum and breast-feeding outcomes. Pediatrics 1992;90:760-6.

18. Gray-Donald K, Kramer MS, Munday S, Leduc DG. Effect of formula supplementation in the hospital on the duration of breast-feeding; a controlled clinical trial. Pediatrics 1985;75:514-8.

19. Hill PD, Humenick SS, Brennan ML, Woolley D. Does early supplementation affect long-term breastfeeding? Clin Pediatr 1997;June:345-350.

20. Wright HJ, Walker PC. Prediction of duration of breast feeding in primiparas. J Epidemiol Comm Health 1983;37:89-94.

21. Hawkins LM, Nichols FH, Tanner JL. Predictors of the duration of breastfeeding in low-income women. Birth 1987;14:204-9.

22. Hill PD, Aldag JC. Insufficient milk supply among black and white breast-feeding mothers. Res Nurs Health 1993;16:203-11.

23. Kurinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

24. Cooper PJ, Murray L, Stein A. Psychosocial factors associated with the early termination of breast-feeding. J Psychosom Res 1993;37:171-6.

25. Statistical Package for the Social Sciences. Chicago, IL: SPSS Inc; 1998.

26. Marshall B, Hepper J. Zirbel. Sporadic mastitis: an infection that need not interrupt lactation. JAMA 1975;233:1377-9.

27. Lawrence R. Mastitis. In: Breastfeeding: a guide for the medical profession. 4th ed. St. Louis: Mosby; 1994.

28. Hill PD, Humenick SS, Argubright T, Aldag JC. Effects of parity and weaning practices on breastfeeding duration. Public Health Nurs 1997;14:227-34.

29. Hill PD, Aldag JC. Smoking and breastfeeding status. Res Nurs Health 1996;19:125-32.

30. Woodward A, Hand K. Smoking and reduced duration of breast-feeding. Med J Australia 1988;148:477-8.

31. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence. Pediatrics 1997;99:445-3.

32. Howard CR, Howard FM, Lanphear B, deBlieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration? Pediatrics 1999;103:E33.-

Issue
The Journal of Family Practice - 51(05)
Issue
The Journal of Family Practice - 51(05)
Page Number
439-444
Page Number
439-444
Publications
Publications
Topics
Article Type
Display Headline
Factors associated with weaning in the first 3 months postpartum
Display Headline
Factors associated with weaning in the first 3 months postpartum
Legacy Keywords
,Breastfeedingterminationweaningcohort study. (J Fam Pract 2002; 51:439–444)
Legacy Keywords
,Breastfeedingterminationweaningcohort study. (J Fam Pract 2002; 51:439–444)
Sections
Disallow All Ads
Alternative CME
Article PDF Media

Safety and efficacy of S-adenosylmethionine (SAMe) for osteoarthritis

Article Type
Changed
Display Headline
Safety and efficacy of S-adenosylmethionine (SAMe) for osteoarthritis

ABSTRACT

OBJECTIVE: We assessed the efficacy of S-adenosylmethionine (SAMe), a dietary supplement now available in the United States, compared with that of placebo or nonsteroidal anti-inflammatory drugs (NSAIDs) in the treatment of osteoarthritis (OA).

STUDY DESIGN: This was a meta-analysis of randomized controlled trials.

DATA SOURCES: We identified randomized controlled trials of SAMe versus placebo or NSAIDS for the treatment of OA through computerized database searches and reference lists.

OUTCOMES MEASURED: The outcomes considered were pain, functional limitation, and adverse effects.

RESULTS: Eleven studies that met the inclusion criteria were weighted on the basis of precision and were combined for each outcome variable. When compared with placebo, SAMe is more effective in reducing functional limitation in patients with OA (effect size [ES] = .31; 95% confidence interval [CI], .098 - .519), but not in reducing pain (ES = .22; 95% CI, -.247 to .693). This result, however, is based on only 2 studies. SAMe seems to be comparable with NSAIDs (pain: ES = .12; 95% CI, -.029 to .273; functional limitation: ES = .025; 95% CI, -.127 to .176). However, those treated with SAMe were less likely to report adverse effects than those receiving NSAIDs.

CONCLUSIONS: SAMe appears to be as effective as NSAIDs in reducing pain and improving functional limitation in patients with OA without the adverse effects often associated with NSAID therapies.

KEY POINTS FOR CLINICIANS

  • S-adenosylmethionine (SAMe) is as effective as NSAIDs in offering pain relief and improving functional limitation with less risk of side effects.
  • When compared with placebo, SAMe improved functional limitations of osteoarthritis, but there was no improvement in pain.
  • The tolerability of SAMe was similar to that of placebo and greater than that of NSAIDs.

One alternative therapy for osteoarthritis (OA) is Sadenosylmethionine (SAMe), a naturally occurring sulphur-containing physiologic compound synthesized from amino acid L-methionine and adenosine triphosphate (ATP).1,2 Although scientists are not certain how it works to control pain, SAMe plays a key role in 3 major pathways: transmethylation, transsulfuration, and aminopropylation.2 SAMe was introduced in the United States in 1999 as a dietary supplement to promote joint health, mobility, and joint comfort. On the basis of a 1987 review of 12 clinical studies involving more than 20,000 patients, SAMe has been touted as “the prototype of a new class of safe drugs for the treatment of osteoarthritis.”3 However, the majority of the patients in those studies (97%) were enrolled in a single open field trial.

Although systematic reviews have demonstrated the benefit of other alternative strategies for OA, such as glucosamine and chondroitin,4,5 there has been no systematic review of SAMe for OA. Because individual studies of SAMe vary in their sample sizes and report conflicting results, we conducted a meta-analysis to assess the efficacy of SAMe for OA as compared with that of placebo or NSAIDs. We also examined whether study quality, drug dosage, or length of treatment is associated with the effect, and we identified needs for future research.

Methods

Literature search and data sources

We conducted computerized searches using the term “arthritis” and all synonyms for SAMe: “S-Adenosylmethionine,” “Ademetionine,” “S-adenosyl-L-methionine,” “Adenosyl-l-methionine,” “Samyr,” “Gumbaral,” “Sammy,” and “SAM-e.” Results were then combined into the optimally sensitive search strategy for retrieving all clinical trials.6,7 All languages were included. Our database search included MEDLINE (1966- September 2000), EMBASE (1987-2000), CAMPAIN (Complementary and Alternative Medicine and Pain), Science Citation Index, International Pharmaceutical Abstracts, The Cochrane Complementary Medicine Field Registry, National Institutes of Health Office of Dietary Supplements Database, and Micromedix. We also hand searched the 3 journals with the highest impact factors for rheumatology (Arthritis and Rheumatism, British Journal of Rheumatology, and Journal of Rheumatology, 1985-1999),8 English-language journals from which we had already retrieved articles, and complementary medicine journals (inception to 1999). In addition, we examined bibliographies from retrieved articles, books, and Web sites related to SAMe and contacted manufacturers of SAMe for previously unidentified research studies.

Inclusion criteria

Criteria for inclusion were established a priori. Studies had to include a sample of patients with a diagnosis of OA; be a randomized controlled trial; compare SAMe with placebo or NSAID; and report data for at least 1 of the outcome variables: pain, functional limitation, and adverse effects. Two raters independently screened studies to determine whether they met the inclusion criteria and agreed in their assessments.

Quality assessment and data extraction

Two raters independently rated study quality of the English studies using the 5-point Jadad scale9 that assesses random allocation, double-blinding, and the reporting of withdrawals and dropouts. An additional rating item concerned concealed allocation. Only 1 of the 2 raters assessed the quality of the 4 non-English articles. Two reviewers also independently extracted descriptive information and outcomes that reflected pain, functional impairment, and adverse effects. Any differences in ratings and data extraction were discussed and a consensus was reached.

 

 

For pain and functional impairment we computed the difference in the average response between treatment groups and control groups, standardized to account for differences in the measurement scale across studies. The result is a difference effect size (ES) with a positive ES favoring SAMe. We also applied a correction factor10 that adjusts for the positive bias in the ES estimate for small samples. For the binary outcome of adverse effects, we computed the odds ratio (OR) for the individual trials.11 An OR of less than 1 indicated that treatment with SAMe was more effective than the control.

Heterogeneity in the strategy to measure pain was expected. Either individual studies pooled several pain items (eg, day pain and resting pain) that were rated using a 4- or 5-point rating scale or Visual Analog Scale (VAS), or studies used a single-item VAS. Functional limitation reflects stiffness, swelling, and joint mobility as rated by the physician according to the degree of joint movement (eg, flexion, extension, abduction, adduction, and rotation). In some studies, this score also included a pain item. Adverse effects refer to patient reports of nonspecific gastrointestinal complaints, mucocutaneous symptoms, and central nervous systems disturbances. Finally, a pooled dropout rate because of side effects was computed across studies as a measure of the tolerability of SAMe.

Statistical analysis

Outcomes for each subject measured at multiple time points tend to be correlated, which introduces dependency between corresponding ESs. To avoid this dependency, we computed the ES for the end-of-treatment only, rather than for all time points. Although dependency is also a concern when results are reported for more than one outcome within a study,12-14 we did not control for this. Following the test for homogeneity or consistency within the set of ESs using the Q statistic with α = .10,11we computed the weighted mean ES with 95% confidence intervals (CI) across studies for each outcome, weighting for sample size (the inverse of the variance). The choice of a fixed-effects model was dependent on the finding of homogeneity of results.

To assess sensitivity of the results, we examined the relationship of the ES to the dosage of SAMe, length of treatment, and study quality rating. Subgroup analyses examined differences related to the location of the OA to estimate the robustness of results. Finally, we assessed potential publication bias informally by using the funnel plot of ES by precision, and statistically through the rank correlation between the standardized ES and standardized study variance.15

Results

Description of studies

Twenty studies were identified through our search and 11 of them16-26 met the inclusion criteria (Table). We excluded one duplicate study27and one study whose sample included persons with rheumatoid arthritis.28 Other excluded studies compared the routes of administration of SAMe,29 compared SAMe plus ketoprofen with ketoprofen alone,30 or were not randomized controlled trials.31-34 Four of the included studies18,20,21,25 were published in Italian; the others were published in English. The majority of studies (7 of 11) were conducted in Italy.

Quality assessment

Percent agreement between raters for the items on the Jadad scale averaged 87.5%. Following discussion, the raters reached consensus for all items. Using Jadad’s criteria, all studies were rated of high quality (score 3), although only 2 studies16,23 included a description of the method of randomization. None of the studies addressed allocation concealment.

Study characteristics

Ten of the 11 studies used a parallel groups design including one with 3 arms19; the 11th one25 used a crossover design (Table W1).* The SAMe dosage in 6 studies was 1200 mg per day orally18,19,22-24,26; 3 studies used 600 mg per day orally17,21,25; and one used 400 mg per day intravenously.20 In one study16 the dosage varied. Duration of treatment ranged from 10 days to 84 days; a duration of 28 or 30 days was used in 8 of the studies. A variety of NSAIDs served as active comparators and 2 studies16,19 used placebo. The studies involved 1442 subjects with a mean age of 60.3 years, of whom 70.1% were women. Mean duration of OA was 5.7 years, ranging from 2.6 years to 9.1 years. In 5 studies, the majority of subjects had OA of the knee; across all studies 54.2% of the subjects had OA of the knee.

TABLE
Characteristics of studies included in meta-analysis

Study, by first authorSample size: treatment/controlJadad score*SAMe intervention†Control group
Bradley1624/24 (site A)5 (2+2+1)(A) 400 mg/day IV for 5 days;Placebo
17/17 (site B)(B) 600 mg/day for 23 days
Capretto1753/584 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Caroli1830/304 (1+2+1)1200 mg/day for 42 daysAspirin 3000 mg/day
Caruso19(1) 248/2414 (1+2+1)1200 mg/day for 30 days(1) Placebo
(2) 248/245(2) Naproxen 750 mg/day
Ceccato2048/474 (1+2+1)400 mg/day IV for 30 daysIbuprofen 1200 mg/day
Cucinotta2120/204 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Maccagno2224/244 (1+2+1)1200 mg/day for 84 daysPiroxicam 20 mg/day
Marcolongo2375/755 (2+2+1)1200 mg/day for 30 daysIbuprofen 1200 mg/day
Müller-Fassbender2418/183 (1+1+1)1200 mg/day for 28 daysIbuprofen 1200 mg/day
Pelligrini2550/503 (1+2+0)600 mg/day for10 days; 5-day washoutSulindac 200 mg/day
Vetter2618/183 (1+1+1)1200 mg/day for 28 daysIndomethacin 150 mg/day
IV denotes intravenously.
*Numbers in parentheses are randomization + blinding + dropouts.
†Interventions are oral, unless otherwise noted.
 

 

Analysis of outcomes

Pain. Twelve ESs from 7 studies16,18-20,22,23,25 were computed for pain, ranging from -.501 to +.794. Because of borderline heterogeneity of the results for SAMe versus placebo (Q[2] = 5.41; P= .067), a more conservative random effects model was used to compute the mean ES of .223 (P= .352; 95% CI, -.247 to .693). Homogeneity was present for SAMe versus NSAIDs (Q[8] = 9.31, P= .317) and on the basis of a fixed effects model, the weighted mean ES was .122 (P= .057; 95% CI, -.029 to .273). Among the studies of SAMe versus NSAIDs, effect size was not related to study quality (P= .32), length of intervention (P= .31), or dosage of SAMe (P= .97). Finally, there was no evidence of publication bias according to the funnel P lot (Figure W1)* or the rank order correlation (P= .297) for studies of SAMe versus NSAIDs.

Functional limitation. Six studies17-20,24,26 contributed 10 effect sizes for functional limitation. The length of the intervention phase was 28 days to 42 days for all 6 studies. Only one study19 compared SAMe with placebo (ES = .309; P= .002; 95% CI, .098 - .519). Among the studies comparing SAMe with NSAIDs, there was homogeneity of results (Q[8] = 2.53; P= .96) with a weighted mean ES of .025 (95% CI, -.127 to .176), indicating no difference between SAMe and NSAIDs with respect to functional limitation. There was no relationship of ES to study quality (P = .30), length of treatment (P= .71), or dosage of SAMe (P= .48). Both the funnel plot (Figure W2)** and the rank correlation of standardized ES and variance (P= .097) suggested no evidence of publication bias with respect to the functional limitation outcome for SAMe versus NSAIDs.

Adverse effects. Two studies16,19 reported adverse effects when comparing SAMe with placebo. Results were homogenous (Q[2] = 2.035; P= .362), with a pooled OR of 1.37 (95% CI, .81 - 2.32). Among the studies comparing SAMe with NSAIDs results also were homogeneous (Q[6] = 4.41; P =.622), with a pooled OR of .424 (95% CI, .294 - .611). Again, the effect size was not related to quality of study (P= .409), length of treatment (P= .367), or dosage of SAMe (P= .341). That is, those treated with SAMe were 58% less likely to experience side effects than those treated with NSAIDs. Further, this was independent of study quality, dosage of SAMe, or the length of the intervention.

As an additional indication of tolerability we compared the overall dropout rates due to side effects. The dropout rate was highest (6.9%) among those treated with NSAIDs, followed by those receiving placebo (5.0%). The dropout rate for SAMe users was lowest at 2.6%. The only significant difference was between those treated with SAMe and with NSAIDs (P= .001).

Discussion

Results of this meta-analysis indicate that SAMe has a comparable effect to that of NSAIDs in reducing pain and functional limitation. In addition, there was significantly less likelihood of patients reporting adverse effects with the use of SAMe. When SAMe is compared with placebo, however, there is no differential effect on pain according to 2 studies, although there is minimal improved functional limitation according to one study. This improvement corresponds to a 15% decrease in functional limitation in the SAMe group as compared with placebo. The likelihood of adverse effects was similar in the 2 groups. Given the combined sample sizes in this meta-analysis, there was a more than 90% power to detect a moderate difference between groups at a .05 level of significance.

Several reporting issues were noted during the extraction of study data. Some researchers did not adequately describe study dropouts and how they were handled. Sample characteristics may have been reported for the initial sample, but there was no mention of the characteristics of the final sample, so that bias in subject loss could not be assessed in any studies that did not use intention-to-treat analysis. Some authors reported intervention results on the basis of the location of the OA, but only reported characteristics (age, sex, duration of disease) for the full sample. This precluded examining the relationship of intervention effect size to demographic characteristics. Finally, because not all authors provided complete descriptive statistics, we based the computation of the ES for one study on post-test scores only, rather than on the change from baseline, a strategy that could underestimate the ES. This potential underestimation occurred in a study with one of the larger sample sizes that, in turn, would carry more weight in the analysis.

 

 

Limitations

Potential limitations must also be noted in our analysis. First, in 6 of the studies, the SAMe dosage of 1200 mg per day exceeded the dosing recommendations for SAMe. These recommendations include 800 mg per day for 2 weeks followed by 400 mg per day as a maintenance dose, or to increase from 200 mg per day to 1200 mg per day over a 19-day period followed by 400 mg per day thereafter.35 Dosage was not related to the ES, however, in studies comparing SAMe with NSAIDs. Second, most studies used a short intervention (28 to 30 days). It may be that NSAIDs are more effective in the long run, that a longer treatment period is needed for patients to realize the effect of SAMe, or that there are more adverse side effects with SAMe over time. It is not yet clear how effective SAMe is over time. Those studies that did have an intervention longer than 30 days18,22 did not compare SAMe with ibuprofen. In general, concomitant medications for treatment of OA were not permitted, but 3 studies24-26 failed to provide this information. Finally, most of the studies looked at OA of the knee and/or hip, so generalizability of the results to other locations of OA is limited. Although we included subgroup analyses by location of OA, statistical power for subgroup analysis was low because of the smaller number of subjects for whom data were available.

Conclusions

Although SAMe appears to offer pain relief and improve functional limitations associated with OA without the side effects of NSAIDs, it must be remembered that SAMe is not considered a drug in the United States and is therefore not subject to federal regulations. (In contrast, Samyr is a prescription drug in Italy and is available in 200 mg and 400 mg doses.) Recent testing by ConsumerLab.com of over-the-counter brands of SAMe in the United States found, on average, that for 6 of the 13 brands tested, less than half the amount of SAMe stated on the label was actually present.36 Patients who use SAMe in the United States may fail to experience relief because of this dose inconsistency.

We offer several suggestions for further research. First, the long-term effectiveness of SAMe for the treatment of OA has not been investigated in a randomized controlled trial. Since OA is the most prevalent form of arthritis, the long-term effectiveness of SAMe should be assessed in this manner. Second, given that SAMe has been shown to decrease depression,1 it seems prudent to use multivariate techniques to examine both depression and OA outcomes (pain and functional limitation) to determine whether the effect of SAMe is directly on the joint or indirectly mediated through depression. Perhaps in the short term SAMe does decrease pain through decreasing depressive symptoms, but in the long term the effectiveness related to pain may diminish. Third, whether SAMe treats the symptoms of the disease or alters the course of the disease by increasing the production of new cartilage, as suggested by animal models, has not been investigated. Finally, can use of SAMe enhance the effectiveness of other nonpharmacologic modalities? These questions should all be investigated before we can make a determination about the efficacy and safety of SAMe for the treatment of OA.

Acknowledgments

This research was supported by grant #5-P50-AT00084-02 from the National Center for Complementary and Alternative Medicine, National Institutes of Health.

References

1. Gaster B. S-adenosylmethionine (SAMe) for treatment of depression. Altern Med Alert 1999;2:133-5.

2. Stramentinoli G. Pharmacologic aspects of S-adenosylmethionine. Am J Med 1987;83(suppl 5A):35-42.

3. DiPadova C. S-adenosylmethionine in the treatment of osteoarthritis: review of the clinical studies. Am J Med 1987;83(suppl 5A):60-5.

4. Leeb BF, Schweitzer H, Montag K, Smolen JS. A meta-analysis of chondroitin sulfate in the treatment of osteoarthritis. J Rheumatol 2000;27:205-11.

5. McAlindon TE, LaValley MP, Gulin JP, Felson DT. Glucosamine and chondroitin for treatment of osteoarthritis: a systematic quality assessment and meta-analysis. JAMA 2000;28:1469-75.

6. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309:1286-91.

7. Jadad AR, Carrol D, Moore A, McQuay H. Developing a database of published reports of randomized controlled trials in pain research. Pain 1996;66:239-46.

8. Journal Citation Report Science Edition, Institute for Scientific Information, 1998.

9. Jadad AR, Carrol D, Moore A, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1-12.

10. Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull 1982;92:490-9.

11. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song J. Methods for meta-analysis in medical research. New York: John Wiley & Sons, Ltd, 2000.

12. Hedges LV, Olkin I. Statistical methods for meta-analysis. New York: Academic Press, 1985.

13. Rosenthal R. Meta-analytic procedures for social research (rev ed). Newbury Park, Calif: Sage Publications, 1991.

14. Glesser LJ, Olkin I. Stochastically dependent effect sizes. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;339-56.

15. Begg CB. Publication bias. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;399-409.

16. Bradley JD, Flusser D, Katz BP, et al. A randomized, double blind, placebo controlled trial of intravenous loading with S-adenosyl-methionine (SAM) followed by oral SAM therapy in patients with knee osteoarthritis. J Rheumatol 1994;21:905-11.

17. Capretto C, Cremona C, Canaparo L. A double-blind controlled study of S-adenosylmethionine (SAMe) v.ibuprofen in gonarthrosis, coxarthrosis and spondylarthrosis. Clin Trials J 1985;22:15-2-43.

18. Caroli A. Studio in doppio cieco SAMe (capsule) - Aspirina nell’osteoartrosi. G Clin Med 1980;61:844-57.

19. Caruso I, Pietrogrande V. Italian double-blind multicenter study comparing S-adenosylmethionine, naproxen, and placebo in the treatment of degenerative joint disease. Am J Med 1987;83(suppl 5A):66-71.

20. Ceccato S, Cucinotta D, Carapezzi C, Ferretti G, Passeri M. Stuio clinico in doppio cieco sull’effetto terapeutico della SAMe e del-l’ibuprofen nella patologia degenerativa articolare. G Clin Med 1980;61:148-62.

21. Cucinotta D, Mancini M, Ceccato S, Castino E. Studio clinico controllato sull’attivita della SAMe somministrata per via orale nella patologia degenerative osteo-articolare. G Clin Med 1980;61:553-65.

22. Maccagno A, DiGiorgio EE, Caston OL, Sagasta CL. Double-blind controlled clinical trial of oral S-adenosylmethionine versus piroxicam in knee osteoarthritis. Am J Med 1987;83 (suppl 5A):72-7.

23. Marcolongo R, Giordano N, Colombo B, et al. Double-blind multicentre study of the activity of S-adenosyl-methionine in hip and knee osteoarthritis. Curr Ther Res 1985;37:82-94.

24. Müller-Fassbender H. Double-blind clinical trial of S-adenosylme-thionine vesus ibuprofen in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):81-3.

25. Pellegrini P. La S-adenosil-metionina (SAMe) nell’osteoartrosi studio in doppio cieco crossover per via orale. G Clin Med 1980;61:616-27.

26. Vetter G. Double-blind comparative clinical trial with S-adenosyl-methionine and indomethacin in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):78-80.

27. Glorioso S, Todesco S, Mazzi A, et al. Double-blind multicentre study of the activity of S-adenosylmethionine in hip and knee osteoarthritis. Int J Clin Pharm Res 1985;1:39-49.

28. Polli E, Cortellaro M, Parrini L, Tessari L, Ligniere GC. Aspetti farmacologici e clinici della solfo-adenosil-metionina (SAMe) nella artropatia degnerativa primaria (osteoartrosi). Min Med 1975;66:4443-59.

29. Bach GL, Gmeiner G. Wochen-doppelblindstudie mit ademetionin (Gumbaral(r)) bei gonarthrose zur ermittlung der äquivalenz intravenöser und oraler dosen. In: Bach GL, Muller-Fassbender H, editors. Arthrose-workshop uber Gumbaral(r) (Ademetionin). Frankfurt am Main:Verlag GmbH. 1986;23-30.

30. Ceccato S, Cucinotta D, Carapezzi C, Passeri M. Indagine clinica aperta e comparativa sull’impiego della SAMe e del ketoprofen nell’osteoartrosi. Progr Med 1979;35:177-91.

31. Berger R, Nowak H. A new medication approach to the treatment of osteoarthritis: report of an open phase IV study with ademethionine (Gumbaral(r)). Am J Med 1987;83(suppl 5A):84-8.

32. Konig B. A long-term (two years) clinical trial with S-adenosylmethionine for the treatment of osteoarthritis. Am J Med 1987;83(suppl 5A):89-94.

33. Domljan Z, Vrhovac B, Dürrigl T, Pu_ar I. A double-blind trial of ademetionine vs naproxen in activated gonarthritis. Int J Clin Pharmacol Ther Toxicol 1989;27:329-33.

34. Montrone F, Fumagalli M, Sarzi Puttini P, et al. Double-blind study of S-adenosyl-methionine versus placebo in hip and knee arthrosis [letter]. Clin Rheumatol 1985;4:484-5.

35. Mitchell D. The SAMe solution. New York: Warner Books, Inc., 1999.

36. ConsumerLab.com. Product review: SAMe. [http://www.consumerlab.com]. Accessed March 11, 2002.

Article PDF
Author and Disclosure Information

KAREN L. SOEKEN, PHD
WEN-LIN LEE, RN, PHD
BARKER R. BAUSELL, PHD
MARIA AGELLI, MD, MS
BRIAN M. BERMAN, MD
Baltimore, Maryland
From the University of Maryland School of Nursing (K.L.S.); the Complementary Medicine Program, University of Maryland, School of Medicine (K.L.S., W.L.L., R.B.B., B.M.B.); and the Department of Epidemiology and Preventive Medicine, University of Maryland, School of Medicine (M.A.), Baltimore. The authors report no conflicts of interest. All requests for reprints should be addressed to Karen L. Soeken, PhD, Complementary Medicine Program, University of Maryland, School of Medicine, Kernan Hospital Mansion, 2200 Kernan Drive, Baltimore, MD 21207-6697. E-mail: [email protected].

Issue
The Journal of Family Practice - 51(05)
Publications
Page Number
425-430
Legacy Keywords
,S-adenosylmethionineosteoarthritismeta-analysissystematic review [non-MeSH]complementary therapy [non-MeSH]. (J Fam Pract 2002; 51:425–430)
Sections
Author and Disclosure Information

KAREN L. SOEKEN, PHD
WEN-LIN LEE, RN, PHD
BARKER R. BAUSELL, PHD
MARIA AGELLI, MD, MS
BRIAN M. BERMAN, MD
Baltimore, Maryland
From the University of Maryland School of Nursing (K.L.S.); the Complementary Medicine Program, University of Maryland, School of Medicine (K.L.S., W.L.L., R.B.B., B.M.B.); and the Department of Epidemiology and Preventive Medicine, University of Maryland, School of Medicine (M.A.), Baltimore. The authors report no conflicts of interest. All requests for reprints should be addressed to Karen L. Soeken, PhD, Complementary Medicine Program, University of Maryland, School of Medicine, Kernan Hospital Mansion, 2200 Kernan Drive, Baltimore, MD 21207-6697. E-mail: [email protected].

Author and Disclosure Information

KAREN L. SOEKEN, PHD
WEN-LIN LEE, RN, PHD
BARKER R. BAUSELL, PHD
MARIA AGELLI, MD, MS
BRIAN M. BERMAN, MD
Baltimore, Maryland
From the University of Maryland School of Nursing (K.L.S.); the Complementary Medicine Program, University of Maryland, School of Medicine (K.L.S., W.L.L., R.B.B., B.M.B.); and the Department of Epidemiology and Preventive Medicine, University of Maryland, School of Medicine (M.A.), Baltimore. The authors report no conflicts of interest. All requests for reprints should be addressed to Karen L. Soeken, PhD, Complementary Medicine Program, University of Maryland, School of Medicine, Kernan Hospital Mansion, 2200 Kernan Drive, Baltimore, MD 21207-6697. E-mail: [email protected].

Article PDF
Article PDF

ABSTRACT

OBJECTIVE: We assessed the efficacy of S-adenosylmethionine (SAMe), a dietary supplement now available in the United States, compared with that of placebo or nonsteroidal anti-inflammatory drugs (NSAIDs) in the treatment of osteoarthritis (OA).

STUDY DESIGN: This was a meta-analysis of randomized controlled trials.

DATA SOURCES: We identified randomized controlled trials of SAMe versus placebo or NSAIDS for the treatment of OA through computerized database searches and reference lists.

OUTCOMES MEASURED: The outcomes considered were pain, functional limitation, and adverse effects.

RESULTS: Eleven studies that met the inclusion criteria were weighted on the basis of precision and were combined for each outcome variable. When compared with placebo, SAMe is more effective in reducing functional limitation in patients with OA (effect size [ES] = .31; 95% confidence interval [CI], .098 - .519), but not in reducing pain (ES = .22; 95% CI, -.247 to .693). This result, however, is based on only 2 studies. SAMe seems to be comparable with NSAIDs (pain: ES = .12; 95% CI, -.029 to .273; functional limitation: ES = .025; 95% CI, -.127 to .176). However, those treated with SAMe were less likely to report adverse effects than those receiving NSAIDs.

CONCLUSIONS: SAMe appears to be as effective as NSAIDs in reducing pain and improving functional limitation in patients with OA without the adverse effects often associated with NSAID therapies.

KEY POINTS FOR CLINICIANS

  • S-adenosylmethionine (SAMe) is as effective as NSAIDs in offering pain relief and improving functional limitation with less risk of side effects.
  • When compared with placebo, SAMe improved functional limitations of osteoarthritis, but there was no improvement in pain.
  • The tolerability of SAMe was similar to that of placebo and greater than that of NSAIDs.

One alternative therapy for osteoarthritis (OA) is Sadenosylmethionine (SAMe), a naturally occurring sulphur-containing physiologic compound synthesized from amino acid L-methionine and adenosine triphosphate (ATP).1,2 Although scientists are not certain how it works to control pain, SAMe plays a key role in 3 major pathways: transmethylation, transsulfuration, and aminopropylation.2 SAMe was introduced in the United States in 1999 as a dietary supplement to promote joint health, mobility, and joint comfort. On the basis of a 1987 review of 12 clinical studies involving more than 20,000 patients, SAMe has been touted as “the prototype of a new class of safe drugs for the treatment of osteoarthritis.”3 However, the majority of the patients in those studies (97%) were enrolled in a single open field trial.

Although systematic reviews have demonstrated the benefit of other alternative strategies for OA, such as glucosamine and chondroitin,4,5 there has been no systematic review of SAMe for OA. Because individual studies of SAMe vary in their sample sizes and report conflicting results, we conducted a meta-analysis to assess the efficacy of SAMe for OA as compared with that of placebo or NSAIDs. We also examined whether study quality, drug dosage, or length of treatment is associated with the effect, and we identified needs for future research.

Methods

Literature search and data sources

We conducted computerized searches using the term “arthritis” and all synonyms for SAMe: “S-Adenosylmethionine,” “Ademetionine,” “S-adenosyl-L-methionine,” “Adenosyl-l-methionine,” “Samyr,” “Gumbaral,” “Sammy,” and “SAM-e.” Results were then combined into the optimally sensitive search strategy for retrieving all clinical trials.6,7 All languages were included. Our database search included MEDLINE (1966- September 2000), EMBASE (1987-2000), CAMPAIN (Complementary and Alternative Medicine and Pain), Science Citation Index, International Pharmaceutical Abstracts, The Cochrane Complementary Medicine Field Registry, National Institutes of Health Office of Dietary Supplements Database, and Micromedix. We also hand searched the 3 journals with the highest impact factors for rheumatology (Arthritis and Rheumatism, British Journal of Rheumatology, and Journal of Rheumatology, 1985-1999),8 English-language journals from which we had already retrieved articles, and complementary medicine journals (inception to 1999). In addition, we examined bibliographies from retrieved articles, books, and Web sites related to SAMe and contacted manufacturers of SAMe for previously unidentified research studies.

Inclusion criteria

Criteria for inclusion were established a priori. Studies had to include a sample of patients with a diagnosis of OA; be a randomized controlled trial; compare SAMe with placebo or NSAID; and report data for at least 1 of the outcome variables: pain, functional limitation, and adverse effects. Two raters independently screened studies to determine whether they met the inclusion criteria and agreed in their assessments.

Quality assessment and data extraction

Two raters independently rated study quality of the English studies using the 5-point Jadad scale9 that assesses random allocation, double-blinding, and the reporting of withdrawals and dropouts. An additional rating item concerned concealed allocation. Only 1 of the 2 raters assessed the quality of the 4 non-English articles. Two reviewers also independently extracted descriptive information and outcomes that reflected pain, functional impairment, and adverse effects. Any differences in ratings and data extraction were discussed and a consensus was reached.

 

 

For pain and functional impairment we computed the difference in the average response between treatment groups and control groups, standardized to account for differences in the measurement scale across studies. The result is a difference effect size (ES) with a positive ES favoring SAMe. We also applied a correction factor10 that adjusts for the positive bias in the ES estimate for small samples. For the binary outcome of adverse effects, we computed the odds ratio (OR) for the individual trials.11 An OR of less than 1 indicated that treatment with SAMe was more effective than the control.

Heterogeneity in the strategy to measure pain was expected. Either individual studies pooled several pain items (eg, day pain and resting pain) that were rated using a 4- or 5-point rating scale or Visual Analog Scale (VAS), or studies used a single-item VAS. Functional limitation reflects stiffness, swelling, and joint mobility as rated by the physician according to the degree of joint movement (eg, flexion, extension, abduction, adduction, and rotation). In some studies, this score also included a pain item. Adverse effects refer to patient reports of nonspecific gastrointestinal complaints, mucocutaneous symptoms, and central nervous systems disturbances. Finally, a pooled dropout rate because of side effects was computed across studies as a measure of the tolerability of SAMe.

Statistical analysis

Outcomes for each subject measured at multiple time points tend to be correlated, which introduces dependency between corresponding ESs. To avoid this dependency, we computed the ES for the end-of-treatment only, rather than for all time points. Although dependency is also a concern when results are reported for more than one outcome within a study,12-14 we did not control for this. Following the test for homogeneity or consistency within the set of ESs using the Q statistic with α = .10,11we computed the weighted mean ES with 95% confidence intervals (CI) across studies for each outcome, weighting for sample size (the inverse of the variance). The choice of a fixed-effects model was dependent on the finding of homogeneity of results.

To assess sensitivity of the results, we examined the relationship of the ES to the dosage of SAMe, length of treatment, and study quality rating. Subgroup analyses examined differences related to the location of the OA to estimate the robustness of results. Finally, we assessed potential publication bias informally by using the funnel plot of ES by precision, and statistically through the rank correlation between the standardized ES and standardized study variance.15

Results

Description of studies

Twenty studies were identified through our search and 11 of them16-26 met the inclusion criteria (Table). We excluded one duplicate study27and one study whose sample included persons with rheumatoid arthritis.28 Other excluded studies compared the routes of administration of SAMe,29 compared SAMe plus ketoprofen with ketoprofen alone,30 or were not randomized controlled trials.31-34 Four of the included studies18,20,21,25 were published in Italian; the others were published in English. The majority of studies (7 of 11) were conducted in Italy.

Quality assessment

Percent agreement between raters for the items on the Jadad scale averaged 87.5%. Following discussion, the raters reached consensus for all items. Using Jadad’s criteria, all studies were rated of high quality (score 3), although only 2 studies16,23 included a description of the method of randomization. None of the studies addressed allocation concealment.

Study characteristics

Ten of the 11 studies used a parallel groups design including one with 3 arms19; the 11th one25 used a crossover design (Table W1).* The SAMe dosage in 6 studies was 1200 mg per day orally18,19,22-24,26; 3 studies used 600 mg per day orally17,21,25; and one used 400 mg per day intravenously.20 In one study16 the dosage varied. Duration of treatment ranged from 10 days to 84 days; a duration of 28 or 30 days was used in 8 of the studies. A variety of NSAIDs served as active comparators and 2 studies16,19 used placebo. The studies involved 1442 subjects with a mean age of 60.3 years, of whom 70.1% were women. Mean duration of OA was 5.7 years, ranging from 2.6 years to 9.1 years. In 5 studies, the majority of subjects had OA of the knee; across all studies 54.2% of the subjects had OA of the knee.

TABLE
Characteristics of studies included in meta-analysis

Study, by first authorSample size: treatment/controlJadad score*SAMe intervention†Control group
Bradley1624/24 (site A)5 (2+2+1)(A) 400 mg/day IV for 5 days;Placebo
17/17 (site B)(B) 600 mg/day for 23 days
Capretto1753/584 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Caroli1830/304 (1+2+1)1200 mg/day for 42 daysAspirin 3000 mg/day
Caruso19(1) 248/2414 (1+2+1)1200 mg/day for 30 days(1) Placebo
(2) 248/245(2) Naproxen 750 mg/day
Ceccato2048/474 (1+2+1)400 mg/day IV for 30 daysIbuprofen 1200 mg/day
Cucinotta2120/204 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Maccagno2224/244 (1+2+1)1200 mg/day for 84 daysPiroxicam 20 mg/day
Marcolongo2375/755 (2+2+1)1200 mg/day for 30 daysIbuprofen 1200 mg/day
Müller-Fassbender2418/183 (1+1+1)1200 mg/day for 28 daysIbuprofen 1200 mg/day
Pelligrini2550/503 (1+2+0)600 mg/day for10 days; 5-day washoutSulindac 200 mg/day
Vetter2618/183 (1+1+1)1200 mg/day for 28 daysIndomethacin 150 mg/day
IV denotes intravenously.
*Numbers in parentheses are randomization + blinding + dropouts.
†Interventions are oral, unless otherwise noted.
 

 

Analysis of outcomes

Pain. Twelve ESs from 7 studies16,18-20,22,23,25 were computed for pain, ranging from -.501 to +.794. Because of borderline heterogeneity of the results for SAMe versus placebo (Q[2] = 5.41; P= .067), a more conservative random effects model was used to compute the mean ES of .223 (P= .352; 95% CI, -.247 to .693). Homogeneity was present for SAMe versus NSAIDs (Q[8] = 9.31, P= .317) and on the basis of a fixed effects model, the weighted mean ES was .122 (P= .057; 95% CI, -.029 to .273). Among the studies of SAMe versus NSAIDs, effect size was not related to study quality (P= .32), length of intervention (P= .31), or dosage of SAMe (P= .97). Finally, there was no evidence of publication bias according to the funnel P lot (Figure W1)* or the rank order correlation (P= .297) for studies of SAMe versus NSAIDs.

Functional limitation. Six studies17-20,24,26 contributed 10 effect sizes for functional limitation. The length of the intervention phase was 28 days to 42 days for all 6 studies. Only one study19 compared SAMe with placebo (ES = .309; P= .002; 95% CI, .098 - .519). Among the studies comparing SAMe with NSAIDs, there was homogeneity of results (Q[8] = 2.53; P= .96) with a weighted mean ES of .025 (95% CI, -.127 to .176), indicating no difference between SAMe and NSAIDs with respect to functional limitation. There was no relationship of ES to study quality (P = .30), length of treatment (P= .71), or dosage of SAMe (P= .48). Both the funnel plot (Figure W2)** and the rank correlation of standardized ES and variance (P= .097) suggested no evidence of publication bias with respect to the functional limitation outcome for SAMe versus NSAIDs.

Adverse effects. Two studies16,19 reported adverse effects when comparing SAMe with placebo. Results were homogenous (Q[2] = 2.035; P= .362), with a pooled OR of 1.37 (95% CI, .81 - 2.32). Among the studies comparing SAMe with NSAIDs results also were homogeneous (Q[6] = 4.41; P =.622), with a pooled OR of .424 (95% CI, .294 - .611). Again, the effect size was not related to quality of study (P= .409), length of treatment (P= .367), or dosage of SAMe (P= .341). That is, those treated with SAMe were 58% less likely to experience side effects than those treated with NSAIDs. Further, this was independent of study quality, dosage of SAMe, or the length of the intervention.

As an additional indication of tolerability we compared the overall dropout rates due to side effects. The dropout rate was highest (6.9%) among those treated with NSAIDs, followed by those receiving placebo (5.0%). The dropout rate for SAMe users was lowest at 2.6%. The only significant difference was between those treated with SAMe and with NSAIDs (P= .001).

Discussion

Results of this meta-analysis indicate that SAMe has a comparable effect to that of NSAIDs in reducing pain and functional limitation. In addition, there was significantly less likelihood of patients reporting adverse effects with the use of SAMe. When SAMe is compared with placebo, however, there is no differential effect on pain according to 2 studies, although there is minimal improved functional limitation according to one study. This improvement corresponds to a 15% decrease in functional limitation in the SAMe group as compared with placebo. The likelihood of adverse effects was similar in the 2 groups. Given the combined sample sizes in this meta-analysis, there was a more than 90% power to detect a moderate difference between groups at a .05 level of significance.

Several reporting issues were noted during the extraction of study data. Some researchers did not adequately describe study dropouts and how they were handled. Sample characteristics may have been reported for the initial sample, but there was no mention of the characteristics of the final sample, so that bias in subject loss could not be assessed in any studies that did not use intention-to-treat analysis. Some authors reported intervention results on the basis of the location of the OA, but only reported characteristics (age, sex, duration of disease) for the full sample. This precluded examining the relationship of intervention effect size to demographic characteristics. Finally, because not all authors provided complete descriptive statistics, we based the computation of the ES for one study on post-test scores only, rather than on the change from baseline, a strategy that could underestimate the ES. This potential underestimation occurred in a study with one of the larger sample sizes that, in turn, would carry more weight in the analysis.

 

 

Limitations

Potential limitations must also be noted in our analysis. First, in 6 of the studies, the SAMe dosage of 1200 mg per day exceeded the dosing recommendations for SAMe. These recommendations include 800 mg per day for 2 weeks followed by 400 mg per day as a maintenance dose, or to increase from 200 mg per day to 1200 mg per day over a 19-day period followed by 400 mg per day thereafter.35 Dosage was not related to the ES, however, in studies comparing SAMe with NSAIDs. Second, most studies used a short intervention (28 to 30 days). It may be that NSAIDs are more effective in the long run, that a longer treatment period is needed for patients to realize the effect of SAMe, or that there are more adverse side effects with SAMe over time. It is not yet clear how effective SAMe is over time. Those studies that did have an intervention longer than 30 days18,22 did not compare SAMe with ibuprofen. In general, concomitant medications for treatment of OA were not permitted, but 3 studies24-26 failed to provide this information. Finally, most of the studies looked at OA of the knee and/or hip, so generalizability of the results to other locations of OA is limited. Although we included subgroup analyses by location of OA, statistical power for subgroup analysis was low because of the smaller number of subjects for whom data were available.

Conclusions

Although SAMe appears to offer pain relief and improve functional limitations associated with OA without the side effects of NSAIDs, it must be remembered that SAMe is not considered a drug in the United States and is therefore not subject to federal regulations. (In contrast, Samyr is a prescription drug in Italy and is available in 200 mg and 400 mg doses.) Recent testing by ConsumerLab.com of over-the-counter brands of SAMe in the United States found, on average, that for 6 of the 13 brands tested, less than half the amount of SAMe stated on the label was actually present.36 Patients who use SAMe in the United States may fail to experience relief because of this dose inconsistency.

We offer several suggestions for further research. First, the long-term effectiveness of SAMe for the treatment of OA has not been investigated in a randomized controlled trial. Since OA is the most prevalent form of arthritis, the long-term effectiveness of SAMe should be assessed in this manner. Second, given that SAMe has been shown to decrease depression,1 it seems prudent to use multivariate techniques to examine both depression and OA outcomes (pain and functional limitation) to determine whether the effect of SAMe is directly on the joint or indirectly mediated through depression. Perhaps in the short term SAMe does decrease pain through decreasing depressive symptoms, but in the long term the effectiveness related to pain may diminish. Third, whether SAMe treats the symptoms of the disease or alters the course of the disease by increasing the production of new cartilage, as suggested by animal models, has not been investigated. Finally, can use of SAMe enhance the effectiveness of other nonpharmacologic modalities? These questions should all be investigated before we can make a determination about the efficacy and safety of SAMe for the treatment of OA.

Acknowledgments

This research was supported by grant #5-P50-AT00084-02 from the National Center for Complementary and Alternative Medicine, National Institutes of Health.

ABSTRACT

OBJECTIVE: We assessed the efficacy of S-adenosylmethionine (SAMe), a dietary supplement now available in the United States, compared with that of placebo or nonsteroidal anti-inflammatory drugs (NSAIDs) in the treatment of osteoarthritis (OA).

STUDY DESIGN: This was a meta-analysis of randomized controlled trials.

DATA SOURCES: We identified randomized controlled trials of SAMe versus placebo or NSAIDS for the treatment of OA through computerized database searches and reference lists.

OUTCOMES MEASURED: The outcomes considered were pain, functional limitation, and adverse effects.

RESULTS: Eleven studies that met the inclusion criteria were weighted on the basis of precision and were combined for each outcome variable. When compared with placebo, SAMe is more effective in reducing functional limitation in patients with OA (effect size [ES] = .31; 95% confidence interval [CI], .098 - .519), but not in reducing pain (ES = .22; 95% CI, -.247 to .693). This result, however, is based on only 2 studies. SAMe seems to be comparable with NSAIDs (pain: ES = .12; 95% CI, -.029 to .273; functional limitation: ES = .025; 95% CI, -.127 to .176). However, those treated with SAMe were less likely to report adverse effects than those receiving NSAIDs.

CONCLUSIONS: SAMe appears to be as effective as NSAIDs in reducing pain and improving functional limitation in patients with OA without the adverse effects often associated with NSAID therapies.

KEY POINTS FOR CLINICIANS

  • S-adenosylmethionine (SAMe) is as effective as NSAIDs in offering pain relief and improving functional limitation with less risk of side effects.
  • When compared with placebo, SAMe improved functional limitations of osteoarthritis, but there was no improvement in pain.
  • The tolerability of SAMe was similar to that of placebo and greater than that of NSAIDs.

One alternative therapy for osteoarthritis (OA) is Sadenosylmethionine (SAMe), a naturally occurring sulphur-containing physiologic compound synthesized from amino acid L-methionine and adenosine triphosphate (ATP).1,2 Although scientists are not certain how it works to control pain, SAMe plays a key role in 3 major pathways: transmethylation, transsulfuration, and aminopropylation.2 SAMe was introduced in the United States in 1999 as a dietary supplement to promote joint health, mobility, and joint comfort. On the basis of a 1987 review of 12 clinical studies involving more than 20,000 patients, SAMe has been touted as “the prototype of a new class of safe drugs for the treatment of osteoarthritis.”3 However, the majority of the patients in those studies (97%) were enrolled in a single open field trial.

Although systematic reviews have demonstrated the benefit of other alternative strategies for OA, such as glucosamine and chondroitin,4,5 there has been no systematic review of SAMe for OA. Because individual studies of SAMe vary in their sample sizes and report conflicting results, we conducted a meta-analysis to assess the efficacy of SAMe for OA as compared with that of placebo or NSAIDs. We also examined whether study quality, drug dosage, or length of treatment is associated with the effect, and we identified needs for future research.

Methods

Literature search and data sources

We conducted computerized searches using the term “arthritis” and all synonyms for SAMe: “S-Adenosylmethionine,” “Ademetionine,” “S-adenosyl-L-methionine,” “Adenosyl-l-methionine,” “Samyr,” “Gumbaral,” “Sammy,” and “SAM-e.” Results were then combined into the optimally sensitive search strategy for retrieving all clinical trials.6,7 All languages were included. Our database search included MEDLINE (1966- September 2000), EMBASE (1987-2000), CAMPAIN (Complementary and Alternative Medicine and Pain), Science Citation Index, International Pharmaceutical Abstracts, The Cochrane Complementary Medicine Field Registry, National Institutes of Health Office of Dietary Supplements Database, and Micromedix. We also hand searched the 3 journals with the highest impact factors for rheumatology (Arthritis and Rheumatism, British Journal of Rheumatology, and Journal of Rheumatology, 1985-1999),8 English-language journals from which we had already retrieved articles, and complementary medicine journals (inception to 1999). In addition, we examined bibliographies from retrieved articles, books, and Web sites related to SAMe and contacted manufacturers of SAMe for previously unidentified research studies.

Inclusion criteria

Criteria for inclusion were established a priori. Studies had to include a sample of patients with a diagnosis of OA; be a randomized controlled trial; compare SAMe with placebo or NSAID; and report data for at least 1 of the outcome variables: pain, functional limitation, and adverse effects. Two raters independently screened studies to determine whether they met the inclusion criteria and agreed in their assessments.

Quality assessment and data extraction

Two raters independently rated study quality of the English studies using the 5-point Jadad scale9 that assesses random allocation, double-blinding, and the reporting of withdrawals and dropouts. An additional rating item concerned concealed allocation. Only 1 of the 2 raters assessed the quality of the 4 non-English articles. Two reviewers also independently extracted descriptive information and outcomes that reflected pain, functional impairment, and adverse effects. Any differences in ratings and data extraction were discussed and a consensus was reached.

 

 

For pain and functional impairment we computed the difference in the average response between treatment groups and control groups, standardized to account for differences in the measurement scale across studies. The result is a difference effect size (ES) with a positive ES favoring SAMe. We also applied a correction factor10 that adjusts for the positive bias in the ES estimate for small samples. For the binary outcome of adverse effects, we computed the odds ratio (OR) for the individual trials.11 An OR of less than 1 indicated that treatment with SAMe was more effective than the control.

Heterogeneity in the strategy to measure pain was expected. Either individual studies pooled several pain items (eg, day pain and resting pain) that were rated using a 4- or 5-point rating scale or Visual Analog Scale (VAS), or studies used a single-item VAS. Functional limitation reflects stiffness, swelling, and joint mobility as rated by the physician according to the degree of joint movement (eg, flexion, extension, abduction, adduction, and rotation). In some studies, this score also included a pain item. Adverse effects refer to patient reports of nonspecific gastrointestinal complaints, mucocutaneous symptoms, and central nervous systems disturbances. Finally, a pooled dropout rate because of side effects was computed across studies as a measure of the tolerability of SAMe.

Statistical analysis

Outcomes for each subject measured at multiple time points tend to be correlated, which introduces dependency between corresponding ESs. To avoid this dependency, we computed the ES for the end-of-treatment only, rather than for all time points. Although dependency is also a concern when results are reported for more than one outcome within a study,12-14 we did not control for this. Following the test for homogeneity or consistency within the set of ESs using the Q statistic with α = .10,11we computed the weighted mean ES with 95% confidence intervals (CI) across studies for each outcome, weighting for sample size (the inverse of the variance). The choice of a fixed-effects model was dependent on the finding of homogeneity of results.

To assess sensitivity of the results, we examined the relationship of the ES to the dosage of SAMe, length of treatment, and study quality rating. Subgroup analyses examined differences related to the location of the OA to estimate the robustness of results. Finally, we assessed potential publication bias informally by using the funnel plot of ES by precision, and statistically through the rank correlation between the standardized ES and standardized study variance.15

Results

Description of studies

Twenty studies were identified through our search and 11 of them16-26 met the inclusion criteria (Table). We excluded one duplicate study27and one study whose sample included persons with rheumatoid arthritis.28 Other excluded studies compared the routes of administration of SAMe,29 compared SAMe plus ketoprofen with ketoprofen alone,30 or were not randomized controlled trials.31-34 Four of the included studies18,20,21,25 were published in Italian; the others were published in English. The majority of studies (7 of 11) were conducted in Italy.

Quality assessment

Percent agreement between raters for the items on the Jadad scale averaged 87.5%. Following discussion, the raters reached consensus for all items. Using Jadad’s criteria, all studies were rated of high quality (score 3), although only 2 studies16,23 included a description of the method of randomization. None of the studies addressed allocation concealment.

Study characteristics

Ten of the 11 studies used a parallel groups design including one with 3 arms19; the 11th one25 used a crossover design (Table W1).* The SAMe dosage in 6 studies was 1200 mg per day orally18,19,22-24,26; 3 studies used 600 mg per day orally17,21,25; and one used 400 mg per day intravenously.20 In one study16 the dosage varied. Duration of treatment ranged from 10 days to 84 days; a duration of 28 or 30 days was used in 8 of the studies. A variety of NSAIDs served as active comparators and 2 studies16,19 used placebo. The studies involved 1442 subjects with a mean age of 60.3 years, of whom 70.1% were women. Mean duration of OA was 5.7 years, ranging from 2.6 years to 9.1 years. In 5 studies, the majority of subjects had OA of the knee; across all studies 54.2% of the subjects had OA of the knee.

TABLE
Characteristics of studies included in meta-analysis

Study, by first authorSample size: treatment/controlJadad score*SAMe intervention†Control group
Bradley1624/24 (site A)5 (2+2+1)(A) 400 mg/day IV for 5 days;Placebo
17/17 (site B)(B) 600 mg/day for 23 days
Capretto1753/584 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Caroli1830/304 (1+2+1)1200 mg/day for 42 daysAspirin 3000 mg/day
Caruso19(1) 248/2414 (1+2+1)1200 mg/day for 30 days(1) Placebo
(2) 248/245(2) Naproxen 750 mg/day
Ceccato2048/474 (1+2+1)400 mg/day IV for 30 daysIbuprofen 1200 mg/day
Cucinotta2120/204 (1+2+1)600 mg/day for 30 daysIbuprofen 1200 mg/day
Maccagno2224/244 (1+2+1)1200 mg/day for 84 daysPiroxicam 20 mg/day
Marcolongo2375/755 (2+2+1)1200 mg/day for 30 daysIbuprofen 1200 mg/day
Müller-Fassbender2418/183 (1+1+1)1200 mg/day for 28 daysIbuprofen 1200 mg/day
Pelligrini2550/503 (1+2+0)600 mg/day for10 days; 5-day washoutSulindac 200 mg/day
Vetter2618/183 (1+1+1)1200 mg/day for 28 daysIndomethacin 150 mg/day
IV denotes intravenously.
*Numbers in parentheses are randomization + blinding + dropouts.
†Interventions are oral, unless otherwise noted.
 

 

Analysis of outcomes

Pain. Twelve ESs from 7 studies16,18-20,22,23,25 were computed for pain, ranging from -.501 to +.794. Because of borderline heterogeneity of the results for SAMe versus placebo (Q[2] = 5.41; P= .067), a more conservative random effects model was used to compute the mean ES of .223 (P= .352; 95% CI, -.247 to .693). Homogeneity was present for SAMe versus NSAIDs (Q[8] = 9.31, P= .317) and on the basis of a fixed effects model, the weighted mean ES was .122 (P= .057; 95% CI, -.029 to .273). Among the studies of SAMe versus NSAIDs, effect size was not related to study quality (P= .32), length of intervention (P= .31), or dosage of SAMe (P= .97). Finally, there was no evidence of publication bias according to the funnel P lot (Figure W1)* or the rank order correlation (P= .297) for studies of SAMe versus NSAIDs.

Functional limitation. Six studies17-20,24,26 contributed 10 effect sizes for functional limitation. The length of the intervention phase was 28 days to 42 days for all 6 studies. Only one study19 compared SAMe with placebo (ES = .309; P= .002; 95% CI, .098 - .519). Among the studies comparing SAMe with NSAIDs, there was homogeneity of results (Q[8] = 2.53; P= .96) with a weighted mean ES of .025 (95% CI, -.127 to .176), indicating no difference between SAMe and NSAIDs with respect to functional limitation. There was no relationship of ES to study quality (P = .30), length of treatment (P= .71), or dosage of SAMe (P= .48). Both the funnel plot (Figure W2)** and the rank correlation of standardized ES and variance (P= .097) suggested no evidence of publication bias with respect to the functional limitation outcome for SAMe versus NSAIDs.

Adverse effects. Two studies16,19 reported adverse effects when comparing SAMe with placebo. Results were homogenous (Q[2] = 2.035; P= .362), with a pooled OR of 1.37 (95% CI, .81 - 2.32). Among the studies comparing SAMe with NSAIDs results also were homogeneous (Q[6] = 4.41; P =.622), with a pooled OR of .424 (95% CI, .294 - .611). Again, the effect size was not related to quality of study (P= .409), length of treatment (P= .367), or dosage of SAMe (P= .341). That is, those treated with SAMe were 58% less likely to experience side effects than those treated with NSAIDs. Further, this was independent of study quality, dosage of SAMe, or the length of the intervention.

As an additional indication of tolerability we compared the overall dropout rates due to side effects. The dropout rate was highest (6.9%) among those treated with NSAIDs, followed by those receiving placebo (5.0%). The dropout rate for SAMe users was lowest at 2.6%. The only significant difference was between those treated with SAMe and with NSAIDs (P= .001).

Discussion

Results of this meta-analysis indicate that SAMe has a comparable effect to that of NSAIDs in reducing pain and functional limitation. In addition, there was significantly less likelihood of patients reporting adverse effects with the use of SAMe. When SAMe is compared with placebo, however, there is no differential effect on pain according to 2 studies, although there is minimal improved functional limitation according to one study. This improvement corresponds to a 15% decrease in functional limitation in the SAMe group as compared with placebo. The likelihood of adverse effects was similar in the 2 groups. Given the combined sample sizes in this meta-analysis, there was a more than 90% power to detect a moderate difference between groups at a .05 level of significance.

Several reporting issues were noted during the extraction of study data. Some researchers did not adequately describe study dropouts and how they were handled. Sample characteristics may have been reported for the initial sample, but there was no mention of the characteristics of the final sample, so that bias in subject loss could not be assessed in any studies that did not use intention-to-treat analysis. Some authors reported intervention results on the basis of the location of the OA, but only reported characteristics (age, sex, duration of disease) for the full sample. This precluded examining the relationship of intervention effect size to demographic characteristics. Finally, because not all authors provided complete descriptive statistics, we based the computation of the ES for one study on post-test scores only, rather than on the change from baseline, a strategy that could underestimate the ES. This potential underestimation occurred in a study with one of the larger sample sizes that, in turn, would carry more weight in the analysis.

 

 

Limitations

Potential limitations must also be noted in our analysis. First, in 6 of the studies, the SAMe dosage of 1200 mg per day exceeded the dosing recommendations for SAMe. These recommendations include 800 mg per day for 2 weeks followed by 400 mg per day as a maintenance dose, or to increase from 200 mg per day to 1200 mg per day over a 19-day period followed by 400 mg per day thereafter.35 Dosage was not related to the ES, however, in studies comparing SAMe with NSAIDs. Second, most studies used a short intervention (28 to 30 days). It may be that NSAIDs are more effective in the long run, that a longer treatment period is needed for patients to realize the effect of SAMe, or that there are more adverse side effects with SAMe over time. It is not yet clear how effective SAMe is over time. Those studies that did have an intervention longer than 30 days18,22 did not compare SAMe with ibuprofen. In general, concomitant medications for treatment of OA were not permitted, but 3 studies24-26 failed to provide this information. Finally, most of the studies looked at OA of the knee and/or hip, so generalizability of the results to other locations of OA is limited. Although we included subgroup analyses by location of OA, statistical power for subgroup analysis was low because of the smaller number of subjects for whom data were available.

Conclusions

Although SAMe appears to offer pain relief and improve functional limitations associated with OA without the side effects of NSAIDs, it must be remembered that SAMe is not considered a drug in the United States and is therefore not subject to federal regulations. (In contrast, Samyr is a prescription drug in Italy and is available in 200 mg and 400 mg doses.) Recent testing by ConsumerLab.com of over-the-counter brands of SAMe in the United States found, on average, that for 6 of the 13 brands tested, less than half the amount of SAMe stated on the label was actually present.36 Patients who use SAMe in the United States may fail to experience relief because of this dose inconsistency.

We offer several suggestions for further research. First, the long-term effectiveness of SAMe for the treatment of OA has not been investigated in a randomized controlled trial. Since OA is the most prevalent form of arthritis, the long-term effectiveness of SAMe should be assessed in this manner. Second, given that SAMe has been shown to decrease depression,1 it seems prudent to use multivariate techniques to examine both depression and OA outcomes (pain and functional limitation) to determine whether the effect of SAMe is directly on the joint or indirectly mediated through depression. Perhaps in the short term SAMe does decrease pain through decreasing depressive symptoms, but in the long term the effectiveness related to pain may diminish. Third, whether SAMe treats the symptoms of the disease or alters the course of the disease by increasing the production of new cartilage, as suggested by animal models, has not been investigated. Finally, can use of SAMe enhance the effectiveness of other nonpharmacologic modalities? These questions should all be investigated before we can make a determination about the efficacy and safety of SAMe for the treatment of OA.

Acknowledgments

This research was supported by grant #5-P50-AT00084-02 from the National Center for Complementary and Alternative Medicine, National Institutes of Health.

References

1. Gaster B. S-adenosylmethionine (SAMe) for treatment of depression. Altern Med Alert 1999;2:133-5.

2. Stramentinoli G. Pharmacologic aspects of S-adenosylmethionine. Am J Med 1987;83(suppl 5A):35-42.

3. DiPadova C. S-adenosylmethionine in the treatment of osteoarthritis: review of the clinical studies. Am J Med 1987;83(suppl 5A):60-5.

4. Leeb BF, Schweitzer H, Montag K, Smolen JS. A meta-analysis of chondroitin sulfate in the treatment of osteoarthritis. J Rheumatol 2000;27:205-11.

5. McAlindon TE, LaValley MP, Gulin JP, Felson DT. Glucosamine and chondroitin for treatment of osteoarthritis: a systematic quality assessment and meta-analysis. JAMA 2000;28:1469-75.

6. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309:1286-91.

7. Jadad AR, Carrol D, Moore A, McQuay H. Developing a database of published reports of randomized controlled trials in pain research. Pain 1996;66:239-46.

8. Journal Citation Report Science Edition, Institute for Scientific Information, 1998.

9. Jadad AR, Carrol D, Moore A, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1-12.

10. Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull 1982;92:490-9.

11. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song J. Methods for meta-analysis in medical research. New York: John Wiley & Sons, Ltd, 2000.

12. Hedges LV, Olkin I. Statistical methods for meta-analysis. New York: Academic Press, 1985.

13. Rosenthal R. Meta-analytic procedures for social research (rev ed). Newbury Park, Calif: Sage Publications, 1991.

14. Glesser LJ, Olkin I. Stochastically dependent effect sizes. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;339-56.

15. Begg CB. Publication bias. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;399-409.

16. Bradley JD, Flusser D, Katz BP, et al. A randomized, double blind, placebo controlled trial of intravenous loading with S-adenosyl-methionine (SAM) followed by oral SAM therapy in patients with knee osteoarthritis. J Rheumatol 1994;21:905-11.

17. Capretto C, Cremona C, Canaparo L. A double-blind controlled study of S-adenosylmethionine (SAMe) v.ibuprofen in gonarthrosis, coxarthrosis and spondylarthrosis. Clin Trials J 1985;22:15-2-43.

18. Caroli A. Studio in doppio cieco SAMe (capsule) - Aspirina nell’osteoartrosi. G Clin Med 1980;61:844-57.

19. Caruso I, Pietrogrande V. Italian double-blind multicenter study comparing S-adenosylmethionine, naproxen, and placebo in the treatment of degenerative joint disease. Am J Med 1987;83(suppl 5A):66-71.

20. Ceccato S, Cucinotta D, Carapezzi C, Ferretti G, Passeri M. Stuio clinico in doppio cieco sull’effetto terapeutico della SAMe e del-l’ibuprofen nella patologia degenerativa articolare. G Clin Med 1980;61:148-62.

21. Cucinotta D, Mancini M, Ceccato S, Castino E. Studio clinico controllato sull’attivita della SAMe somministrata per via orale nella patologia degenerative osteo-articolare. G Clin Med 1980;61:553-65.

22. Maccagno A, DiGiorgio EE, Caston OL, Sagasta CL. Double-blind controlled clinical trial of oral S-adenosylmethionine versus piroxicam in knee osteoarthritis. Am J Med 1987;83 (suppl 5A):72-7.

23. Marcolongo R, Giordano N, Colombo B, et al. Double-blind multicentre study of the activity of S-adenosyl-methionine in hip and knee osteoarthritis. Curr Ther Res 1985;37:82-94.

24. Müller-Fassbender H. Double-blind clinical trial of S-adenosylme-thionine vesus ibuprofen in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):81-3.

25. Pellegrini P. La S-adenosil-metionina (SAMe) nell’osteoartrosi studio in doppio cieco crossover per via orale. G Clin Med 1980;61:616-27.

26. Vetter G. Double-blind comparative clinical trial with S-adenosyl-methionine and indomethacin in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):78-80.

27. Glorioso S, Todesco S, Mazzi A, et al. Double-blind multicentre study of the activity of S-adenosylmethionine in hip and knee osteoarthritis. Int J Clin Pharm Res 1985;1:39-49.

28. Polli E, Cortellaro M, Parrini L, Tessari L, Ligniere GC. Aspetti farmacologici e clinici della solfo-adenosil-metionina (SAMe) nella artropatia degnerativa primaria (osteoartrosi). Min Med 1975;66:4443-59.

29. Bach GL, Gmeiner G. Wochen-doppelblindstudie mit ademetionin (Gumbaral(r)) bei gonarthrose zur ermittlung der äquivalenz intravenöser und oraler dosen. In: Bach GL, Muller-Fassbender H, editors. Arthrose-workshop uber Gumbaral(r) (Ademetionin). Frankfurt am Main:Verlag GmbH. 1986;23-30.

30. Ceccato S, Cucinotta D, Carapezzi C, Passeri M. Indagine clinica aperta e comparativa sull’impiego della SAMe e del ketoprofen nell’osteoartrosi. Progr Med 1979;35:177-91.

31. Berger R, Nowak H. A new medication approach to the treatment of osteoarthritis: report of an open phase IV study with ademethionine (Gumbaral(r)). Am J Med 1987;83(suppl 5A):84-8.

32. Konig B. A long-term (two years) clinical trial with S-adenosylmethionine for the treatment of osteoarthritis. Am J Med 1987;83(suppl 5A):89-94.

33. Domljan Z, Vrhovac B, Dürrigl T, Pu_ar I. A double-blind trial of ademetionine vs naproxen in activated gonarthritis. Int J Clin Pharmacol Ther Toxicol 1989;27:329-33.

34. Montrone F, Fumagalli M, Sarzi Puttini P, et al. Double-blind study of S-adenosyl-methionine versus placebo in hip and knee arthrosis [letter]. Clin Rheumatol 1985;4:484-5.

35. Mitchell D. The SAMe solution. New York: Warner Books, Inc., 1999.

36. ConsumerLab.com. Product review: SAMe. [http://www.consumerlab.com]. Accessed March 11, 2002.

References

1. Gaster B. S-adenosylmethionine (SAMe) for treatment of depression. Altern Med Alert 1999;2:133-5.

2. Stramentinoli G. Pharmacologic aspects of S-adenosylmethionine. Am J Med 1987;83(suppl 5A):35-42.

3. DiPadova C. S-adenosylmethionine in the treatment of osteoarthritis: review of the clinical studies. Am J Med 1987;83(suppl 5A):60-5.

4. Leeb BF, Schweitzer H, Montag K, Smolen JS. A meta-analysis of chondroitin sulfate in the treatment of osteoarthritis. J Rheumatol 2000;27:205-11.

5. McAlindon TE, LaValley MP, Gulin JP, Felson DT. Glucosamine and chondroitin for treatment of osteoarthritis: a systematic quality assessment and meta-analysis. JAMA 2000;28:1469-75.

6. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309:1286-91.

7. Jadad AR, Carrol D, Moore A, McQuay H. Developing a database of published reports of randomized controlled trials in pain research. Pain 1996;66:239-46.

8. Journal Citation Report Science Edition, Institute for Scientific Information, 1998.

9. Jadad AR, Carrol D, Moore A, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1-12.

10. Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull 1982;92:490-9.

11. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song J. Methods for meta-analysis in medical research. New York: John Wiley & Sons, Ltd, 2000.

12. Hedges LV, Olkin I. Statistical methods for meta-analysis. New York: Academic Press, 1985.

13. Rosenthal R. Meta-analytic procedures for social research (rev ed). Newbury Park, Calif: Sage Publications, 1991.

14. Glesser LJ, Olkin I. Stochastically dependent effect sizes. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;339-56.

15. Begg CB. Publication bias. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994;399-409.

16. Bradley JD, Flusser D, Katz BP, et al. A randomized, double blind, placebo controlled trial of intravenous loading with S-adenosyl-methionine (SAM) followed by oral SAM therapy in patients with knee osteoarthritis. J Rheumatol 1994;21:905-11.

17. Capretto C, Cremona C, Canaparo L. A double-blind controlled study of S-adenosylmethionine (SAMe) v.ibuprofen in gonarthrosis, coxarthrosis and spondylarthrosis. Clin Trials J 1985;22:15-2-43.

18. Caroli A. Studio in doppio cieco SAMe (capsule) - Aspirina nell’osteoartrosi. G Clin Med 1980;61:844-57.

19. Caruso I, Pietrogrande V. Italian double-blind multicenter study comparing S-adenosylmethionine, naproxen, and placebo in the treatment of degenerative joint disease. Am J Med 1987;83(suppl 5A):66-71.

20. Ceccato S, Cucinotta D, Carapezzi C, Ferretti G, Passeri M. Stuio clinico in doppio cieco sull’effetto terapeutico della SAMe e del-l’ibuprofen nella patologia degenerativa articolare. G Clin Med 1980;61:148-62.

21. Cucinotta D, Mancini M, Ceccato S, Castino E. Studio clinico controllato sull’attivita della SAMe somministrata per via orale nella patologia degenerative osteo-articolare. G Clin Med 1980;61:553-65.

22. Maccagno A, DiGiorgio EE, Caston OL, Sagasta CL. Double-blind controlled clinical trial of oral S-adenosylmethionine versus piroxicam in knee osteoarthritis. Am J Med 1987;83 (suppl 5A):72-7.

23. Marcolongo R, Giordano N, Colombo B, et al. Double-blind multicentre study of the activity of S-adenosyl-methionine in hip and knee osteoarthritis. Curr Ther Res 1985;37:82-94.

24. Müller-Fassbender H. Double-blind clinical trial of S-adenosylme-thionine vesus ibuprofen in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):81-3.

25. Pellegrini P. La S-adenosil-metionina (SAMe) nell’osteoartrosi studio in doppio cieco crossover per via orale. G Clin Med 1980;61:616-27.

26. Vetter G. Double-blind comparative clinical trial with S-adenosyl-methionine and indomethacin in the treatment of osteoarthritis. Am J Med 1987;83 (suppl 5A):78-80.

27. Glorioso S, Todesco S, Mazzi A, et al. Double-blind multicentre study of the activity of S-adenosylmethionine in hip and knee osteoarthritis. Int J Clin Pharm Res 1985;1:39-49.

28. Polli E, Cortellaro M, Parrini L, Tessari L, Ligniere GC. Aspetti farmacologici e clinici della solfo-adenosil-metionina (SAMe) nella artropatia degnerativa primaria (osteoartrosi). Min Med 1975;66:4443-59.

29. Bach GL, Gmeiner G. Wochen-doppelblindstudie mit ademetionin (Gumbaral(r)) bei gonarthrose zur ermittlung der äquivalenz intravenöser und oraler dosen. In: Bach GL, Muller-Fassbender H, editors. Arthrose-workshop uber Gumbaral(r) (Ademetionin). Frankfurt am Main:Verlag GmbH. 1986;23-30.

30. Ceccato S, Cucinotta D, Carapezzi C, Passeri M. Indagine clinica aperta e comparativa sull’impiego della SAMe e del ketoprofen nell’osteoartrosi. Progr Med 1979;35:177-91.

31. Berger R, Nowak H. A new medication approach to the treatment of osteoarthritis: report of an open phase IV study with ademethionine (Gumbaral(r)). Am J Med 1987;83(suppl 5A):84-8.

32. Konig B. A long-term (two years) clinical trial with S-adenosylmethionine for the treatment of osteoarthritis. Am J Med 1987;83(suppl 5A):89-94.

33. Domljan Z, Vrhovac B, Dürrigl T, Pu_ar I. A double-blind trial of ademetionine vs naproxen in activated gonarthritis. Int J Clin Pharmacol Ther Toxicol 1989;27:329-33.

34. Montrone F, Fumagalli M, Sarzi Puttini P, et al. Double-blind study of S-adenosyl-methionine versus placebo in hip and knee arthrosis [letter]. Clin Rheumatol 1985;4:484-5.

35. Mitchell D. The SAMe solution. New York: Warner Books, Inc., 1999.

36. ConsumerLab.com. Product review: SAMe. [http://www.consumerlab.com]. Accessed March 11, 2002.

Issue
The Journal of Family Practice - 51(05)
Issue
The Journal of Family Practice - 51(05)
Page Number
425-430
Page Number
425-430
Publications
Publications
Article Type
Display Headline
Safety and efficacy of S-adenosylmethionine (SAMe) for osteoarthritis
Display Headline
Safety and efficacy of S-adenosylmethionine (SAMe) for osteoarthritis
Legacy Keywords
,S-adenosylmethionineosteoarthritismeta-analysissystematic review [non-MeSH]complementary therapy [non-MeSH]. (J Fam Pract 2002; 51:425–430)
Legacy Keywords
,S-adenosylmethionineosteoarthritismeta-analysissystematic review [non-MeSH]complementary therapy [non-MeSH]. (J Fam Pract 2002; 51:425–430)
Sections
Article Source

PURLs Copyright

Inside the Article

Article PDF Media

Associations of pacifier use, digit sucking, and child care attendance with cessation of breastfeeding

Article Type
Changed
Display Headline
Associations of pacifier use, digit sucking, and child care attendance with cessation of breastfeeding

 

ABSTRACT

OBJECTIVE: Breast milk is the recommended method of nutrition for newborns and infants. Several studies have investigated factors associated with the cessation of breastfeeding. This study assessed the associations between pacifier use, digit sucking, child care attendance, and breastfeeding cessation among 1387 infants in the Iowa Fluoride Study.

STUDY DESIGN: This was a longitudinal questionnaire survey. Mothers completed mailed questionnaires sent at age 6 weeks, 3 months, and 6 months.

POPULATION: Parents were recruited postpartum at 8 Iowa hospitals.

OUTCOMES MEASURED: Survival analysis (using Cox proportional hazards model) assessed the time covariate effects of pacifier use, digit sucking, and child care attendance on cessation of breastfeeding, while adjusting for other possible confounding variables (not planning to breastfeed, maternal smoking, infants’ sex and antibiotic use, maternal and paternal age and education, and income group).

RESULTS: Percentages of women who did any breastfeeding were 46%, 36%, and 27%, at 6 weeks, 3 months, and 6 months, respectively. Percentages using pacifiers were 81%, 71%, and 59%. Combinations of pacifier use and digit sucking for various levels of child care had statistically significant associations with cessation of breastfeeding, with the effect being strongest for pacifier users and digit suckers with no child care days (hazard ratio = 1.88; 95% CI, 1.36-2.62).

CONCLUSIONS: Pacifier use and digit sucking were associated with cessation of breastfeeding, with results dependent on the level of child care attendance. The strongest associations were found for those not attending child care and for combined use of pacifier with digit sucking.

Breastfeeding is associated with lower rates of infant mortality and morbidity,1-6 a reduced rate of sudden infant death syndrome (SIDS),7,8 delayed resumption of fertility,9 and reduced health care cost.10,11 The American Academy of Family Physicians has issued a policy statement supporting breastfeeding as the optimal form of nutrition for infants12 and the American Academy of Pediatrics recommends that infants should be breastfed for at least 12 months.13 Therefore, it is important to understand the factors associated with reduced breastfeeding. In previous studies, the factors associated with reduced breastfeeding included maternal employment,14 child care attendance,15 maternal smoking,14,16,17 and demographic factors.16,18,19

Several recent studies have also identified an association between non-nutritive sucking (eg, pacifiers) and reduced breastfeeding20-35 that is consistent with the World Health Organization (and UNICEF) recommendation that pacifiers not be used by breastfeeding infants.36 Cross-sectional investigations in Sweden,20-22 Brazil,23 New Zealand,24 England,25 Greece,26 and Sweden and Norway27 found strong associations between pacifier use and reduced breastfeeding (either less exclusive breastfeeding, shorter duration of breastfeeding, or breastfeeding problems), with only one26 not reporting statistically significant findings.

Of particular interest were several longitudinal studies in Brazil (2 studies), Sweden, Italy, and the United States. In Brazil, one found that pacifier users had an adjusted relative risk of 2.87 for weaning,28 and the other an adjusted odds ratio of 2.5 for the cessation of breastfeeding associated with pacifier use.29

Hörnell and colleagues30 and Aarts and colleagues31 reported longitudinal data from 506 mothers’ daily infant feeding practices in Uppsala, Sweden. All mothers had at least one previous child breastfed at least 4 months and were planning to breastfeed the study child for at least 6 months. Thumb sucking was not associated with the breastfeeding pattern, but infants using a pacifier frequently had approximately 1 less breastfeeding session and 15 minutes to 30 minutes less total breastfeeding time per day than those not using a pacifier at 2, 4, 8, and 12-week follow-up points. Cross-sectional and survival analyses of breastfeeding at 4 months compared with non-nutritive sucking at 1 month showed no significant relationship with thumb sucking, but a significant relationship with pacifier use, with increasing frequency of pacifier use related to a decline in breastfeeding duration. Riva and coworkers32 studied 1601 women in Italy and showed that pacifier use was associated with an elevated hazard ratio of 1.18 (95% confidence interval [CI], 1.04-1.34) for breastfeeding cessation in adjusted analyses.

In the only published US study, Howard and colleagues33 reported on the effects of early pacifier use on breastfeeding duration among 265 infants in the Rochester, New York area on the basis of maternal telephone interviews at 2, 6, 12, and 24 weeks and every 90 days thereafter until the breastfeeding ended. Results were adjusted for factors such as maternal age, breastfeeding goals, and plans to work. Pacifier introduction by 6 weeks was significantly associated with shortened duration of some breastfeeding (hazard ratio [HR] = 1.61; 95% CI, 1.19-2.19; P = .002), as was a plan to return to work (HR = 1.42). Digit sucking was not examined and interactions were not assessed.

 

 

We found only one prospective study31 that considered the effects of both pacifier and digit sucking, and one study that considered the effects of pacifier and plans to return to work33 on breastfeeding duration. However, no studies simultaneously looked at the effects of maternal employment or child care, pacifier use, digit sucking, and any potential interactions, although they have been shown to be individually associated with cessation of breastfeeding. Thus, the purpose of this study was to assess the associations of non-nutritive sucking (pacifiers and fingers) with cessation of breastfeeding, while considering child care attendance, from birth to age 6 months, using a longitudinal study design in a sample of children in the United States.

Methods

The data were collected as part of a larger, prospective study of a birth cohort assessing fluoride exposures longitudinally and relationships with dental caries and dental fluorosis.16,37-43 Mothers were recruited at the time of their infants’ births at 8 hospitals in eastern Iowa from March 1992 to February 1995, using appropriate informed consent procedures approved by the Institutional Review Board. The recruitment questionnaire assessed household smoking patterns during pregnancy, whether women planned to breastfeed, and other demographic factors.

Information regarding infants’ weight, feeding practices (breastfeeding vs bottle-feeding), non-nutritive sucking (pacifier use and sucking thumb or fingers), child care attendance (number of full or half days), maternal smoking, otitis media experience, and antibiotic use was collected by mailed questionnaire sent at 6 weeks, 3 months, and 6 months of age. Each questionnaire concerned the preceding time period. Nonrespondents received follow-up mailings after 3 weeks and telephone follow-up after 6 weeks. Direct validation of responses was not conducted, but subjects were contacted by mail or telephone, when necessary, to clarify or correct responses. Data were double-entered and verified.

Breastfeeding and bottle-feeding practices for each period were summarized in 3 ways: (1) exclusive breastfeeding, (2) any breastfeeding, and (3) mostly bottle-feeding (defined as at least 75% of estimated total calories based on body weight from formula, milk, or juice). These definitions generally correspond to those proposed by Labbok and Krasovec44 of full, almost exclusive breastfeeding, and low, partial breastfeeding, respectively.

Time until cessation of all breastfeeding was modeled using the Cox proportional hazard regression model45 against 3 main factors of interest: pacifier use (yes/no), digit sucking (yes/no), and child care attendance (total number of child care days). Since no information was collected regarding maternal employment, we considered child care attendance as a proxy. Pacifier use and digit sucking were coded “yes” if the child started using the pacifier or sucking on the digit, respectively, any time during the first 6 weeks of life. Main effects, 2-way interactions among these variables, and nonlinear effects of child care days were tested while adjusting for maternal and paternal age and education, family income, breastfeeding plans, maternal smoking, infant’s sex, and infant antibiotic use. We used the likelihood ratio test to assess significance at an alpha level of 0.05, and the statistical analyses were conducted using PROC PHREG in SAS software.46

Results

The number of mothers who were successfully recruited and who provided at least one subsequent completed questionnaire was 1387. There were 1236 (89%) respondents at 6 weeks, 1196 (86%) at 3 months, and 1048 (76%) at 6 months.

Table 1 summarizes the study sample at baseline recruitment. Approximately two thirds of mothers and fathers had at least some college education; 76% had family income of at least $20,000; 95% were white; 43% of the infants were the mother’s first-born child; and 65% of the mothers said they were planning to breastfeed their infants.

Table 2 summarizes the breastfeeding practices of the cohort by presenting the percentages of infants at each time point with different feeding practices. Approximately 46% reported some breastfeeding on the 6-week questionnaire, declining to 36% at 3 months and 27% at 6 months. Only 16% of the infants were exclusively breastfed at 6 weeks, dropping to 1% by 6 months. A high percentage of infants were mostly bottle fed at each of the 3 corresponding time periods.

Table 2 also summarizes the patterns of non-nutritive sucking across the infant ages. A high percentage of the infants practiced some form of non-nutritive sucking during each period (86.3%, 92.0%, and 86.3% at 6 weeks, 3 months, and 6 months, respectively). From the 6-week to 6-month responses, pacifier use declined from 81% to 59%, while digit sucking increased from 50% to 83% and then declined to 76%. Table 3 summarizes child care attendance during the 6 months, with half days and full days of child care added together. Thirty-four percent of the infants attended some child care, with approximately 12% receiving more than 25 full days of child care by the age of 6 months or the time of censor/failure, where censor in this case is loss to follow-up prior to reaching 6 months of age.

 

 

We next analyzed the data using Cox regression, an analysis method designed for longitudinal data on event times, such as time until death. The outcome variable was time until cessation of all breastfeeding. The median failure time (cessation) was 72 days (95% CI, 68-78) with interquartile range from 53 to 192 days. Seventy-four percent had ceased breastfeeding by 6 months and 26% were censored because of continued breastfeeding at 6 months when analysis ended or earlier loss to follow-up.

Table 4 reports the relative hazard ratios and 95% confidence intervals at various levels of child care, pacifier use, and digit sucking, while adjusting for the other potential confounders considered (see Methods section). The baseline category (or reference cell) is a child with no child care and no non-nutritive sucking. We see from Table 4 that the estimated risk of breastfeeding cessation is the highest, with a value of 1.88 (95% CI, 1.36-2.62), for a child who sucks on both pacifier and digit at no child care days. This hazard ratio drops to 1.52 (95% CI, 1.03-2.25) at 15 child care days and then becomes nonsignificant at 30 and 60 child care days.

Our results in Table 4 also show that pacifier use at zero child care days has a significant effect in that a child who sucks only on a pacifier has a 67% increase in the hazard of cessation of breastfeeding, compared with a child with no non-nutritive sucking. At higher levels of child care days, this effect changes and becomes a protective effect, although this effect was not significant at 15 child care days, was significant at 30 child care days, and was borderline significant at 60 child care days. Finally, the effect of digit sucking and child care by themselves tended not to be significant at the 0.05 level, with the one exception at 15 child care days where there is a significant effect of 1.41 (95% CI, 1.02-1.96).

Discussion

Our findings concerning pacifiers are generally consistent with several recent studies that have demonstrated associations between pacifier use and reduced breastfeeding, including the few reported longitudinal studies. However, these other studies did not control for child care attendance. We found that the effect of pacifier use changed with increasing number of child care days. For example, in the absence of child care, children who sucked on a pacifier were about 1.7 times as likely to cease breastfeeding than children who did not use a pacifier. For 15 days to 60 days of child care, the hazard ratios were less than 1.0, with results statistically significant at only 30 days.

Furthermore, our analyses showed the joint effect of pacifier use and digit sucking at various child care days. We found a significant reduction in breastfeeding for children who use both pacifier and a digit by the age of 6 weeks. But this joint non-nutritive effect reduces to being nonsignificant with 30 or more child care days. Although we found that digit sucking and child care days by themselves had little effect on cessation of breastfeeding, it was important to consider them because these variables significantly interacted with pacifier use.

Our study found that for infants who did not attend child care, pacifier use significantly increased the odds of breastfeeding cessation, as did the combination of pacifier use and digit sucking. However, digit sucking with no pacifier use in the absence of child care did not increase the odds of breastfeeding cessation. In contrast, for infants who attended child care for 30 days in the first 6 months of life, pacifier use alone appeared to be somewhat protective in maintaining breastfeeding, while digit sucking, either alone or in combination with pacifier, increased the odds of breastfeeding cessation, with significance at 15 days. It is possible that pacifiers were used sparingly in child care, whereas digits were available and more widely used, so that non-nutritive sucking interference with breastfeeding was more strongly influenced by digit sucking. Alternatively, it is possible that mothers who placed their infants in child care early in life used pacifiers differently than mothers who did not. That is, for non-child care infants, pacifiers may have been part of a planned strategy to wean from breastfeeding, whereas for children in child care, pacifiers may have been part of a planned strategy to encourage sucking behavior and comfort children until the mother was available for breastfeeding. In such a scenario, digit sucking was less under parental control, particularly at child care, so that it may have interfered with breastfeeding despite parental planning or desires.

 

 

Limitations

There are several limitations when considering our study’s findings. The study group was not a probability sample fully representative of a defined population. It was of generally high socioeconomic status and, representative of Iowa, had little minority inclusion. Respondents were more educated than nonrespondents.39 Although response rates were generally favorable, approximately 100 to 300 did not respond at a given time point, resulting in censoring of 26% of the cases. Data on breastfeeding, sucking, childcare, and so forth were collected at 3 discrete time points and not on a more frequent, daily, weekly, or monthly basis. Although recall bias was limited by the short-term nature of recall with 6-week and 3-month intervals, it could have an effect on results. Since so few infants exclusively breastfed, any breastfeeding was the only suitable dependent variable. No maternal employment data were collected and quantification of pacifier use was not included.

Only our study and that of Howard and colleagues33 reported results from the United States. The statistical analyses by Howard and colleagues concerning pacifier use adjusted for a number of factors, including plans to return to work, family and paternal preferences for breastfeeding, and breastfeeding goal. Our study adjusted for plans to breastfeed and demographic factors while assessing the effects of pacifier use, digit sucking, and number of child care days. However, neither study specifically assessed reasons for use of the pacifier, in particular, in relation to work and child care requirements. So pacifiers could have been used to facilitate weaning, thus resulting in the association with reduced breastfeeding. Also, there may be other confounding differences between those using pacifiers and those who did not.

Although decisions by mothers to return to work, or for other reasons, have their infants attend child care were not generally associated with reductions in breastfeeding, our results suggest that child care has an important impact on determining the relationships between non-nutritive sucking behaviors and cessation of breastfeeding. It has been suggested that infants’ abilities to easily and successfully breastfeed are adversely affected by non-nutritive sucking, resulting in reductions in the frequency and consistency of the breastfeeding sessions. Our data support the concept. However, it is important to acknowledge that decisions to stop breastfeeding (often prior to return to work) may have preceded and led to the increase in non-nutritive sucking, rather than sucking leading to cessation of breastfeeding. That is, after the decision has been made to stop breastfeeding, a pacifier may be introduced to ease the transition to bottle feeding.

Additional studies involving in-depth interviews concerning initial and subsequent breastfeeding, employment, and child care plans would be warranted to address this question further. In addition, more controlled studies to determine whether there is any biological relationship between non-nutritive sucking and breastfeeding difficulties are warranted. Clearly, the social, biological, and economic factors involved in decisions to initiate and cease breastfeeding are complex and will require more study, both in the United States and throughout the world.

Acknowledgments

Our study was supported in part by National Institutes of Health grants #RO1-DE09551 and #P30-DE10126 and the University of Iowa’s Obermann Center for Advanced Study. We thank the staff of the Iowa Fluoride Study for their assistance in implementing the study, and Tina Craig for manuscript preparation.

References

 

1. Molbak K, Gottschau A, Aaby P, Hojlyng N, Ingholt L, daSilva AP. Prolonged breastfeeding, diarrheal disease, and survival of children in Guinea-Bissau. BMJ 1994;308:1403-6.

2. Victora CG, Smith PG, Vaughan JP, et al. Evidence for protection by breast-feeding against infant deaths from infectious diseases in Brazil. Lancet 1987;2:319-22.

3. Cesar JA, Victora CG, Barros FC, Santos S, Flores JA. Impact of breastfeeding on admissions for pneumonia during post neonatal period in Brazil: nested case-control study. BMJ 1999;318:1316-20.

4. Cushing AH, Samet JM, Lambert WE, et al. Breastfeeding reduces risk of respiratory illness in infants. Am J Epidemiol 1998;147:863-70.

5. Scariati PD, Grummer-Strawn LM, Fein SB. A longitudinal analysis of infant morbidity and the extent of breastfeeding in the United States. Pediatrics 1997;99:E5.-

6. Duffy LC, Faden H, Wasielewski R, Wolf J, Krystofik D. Exclusive breastfeeding protects against bacterial colonization and day care exposure to otitis media. Pediatrics 1997;100:E7.-

7. Gilbert RE, Wigfield RE, Fleming PJ, Berry PJ, Rudd PT. Bottle feeding and the sudden infant death syndrome. BMJ 1995;310:88-90.

8. L’Hoir MP, Engelberts AC, van Well GT, et al. Dummy use, thumb sucking, mouth breathing and cot death. Eur J Pediatr 1999;158:896-901.

9. The World Health Organization multinational study of breastfeeding and lactational amenorrhea. IV. Postpartum bleeding and lochia in breastfeeding women. World Health Organization Task Force on Methods for the Natural Regulation of Fertility. Fertil Steril 1999;72:441-7.

10. Ball TM, Wright AL. Health care costs of formula-feeding in the first year of life. Pediatrics 1999;103(4 Pt. 2):870-6.

11. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant feeding practice. Pediatrics 1984;74:603-14.

12. American Academy of Family Physicians. Breastfeeding and Infant Nutrition. Available at: www.aafp.org/policy/issues/i3.htmal. Accessed July 16, 2001.

13. American Academy of Pediatrics. Work Group on Breastfeeding. Breastfeeding and the use of human milk. Pediatrics 1997;100:1035-9.

14. Piper S, Parks PL. Predicting the duration of lactation: evidence from a national survey. Birth 1996;23:7-12.

15. Weile B, Rubin DH, Krasilnikoff PA, Kuo HS, Jekel JF. Infant feeding patterns during the first year of life in Denmark: factors associated with the discontinuation of breastfeeding. J Clin Epidemiol 1990;43:1305-11.

16. Levy BT, Bergus GR, Levy SM, Kiritsy MC, Slager SL. Longitudinal feeding patterns of Iowa infants. Ambulatory Child Health 1996;2:25-34.

17. Rutishauser IH, Carlin JB. Body mass index and duration of breastfeeding: a survival analysis during the first six months of life. J Epidemiol Community Health 1992;46:559-65.

18. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74:603-14.

19. Kruinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

20. Righard L, Alade MO. Sucking technique and its effect on success of breastfeeding. Birth 1992;19:185-9.

21. Righard L, Alade MO. Breastfeeding and the use of pacifiers. Birth 1997;24:116-20.

22. Righard L. Are breastfeeding problems related to incorrect breastfeeding technique and the use of pacifiers and bottles? Birth 1998;25:40-4.

23. Victora CG, Tomasi E, Olinto MT, Barros FC. Use of pacifiers and breastfeeding duration. Lancet 1993;341:404-6.

24. Ford RP, Mitchell EA, Scragg R, Stewart AW, Taylor BJ, Allen EM. Factors adversely associated with breastfeeding in New Zealand. J Pediatrics Child Health 1994;30:483-9.

25. Clements MS, Mitchell EA, Wright SP, Esmail A, Jones DR, Ford RP. Influences on breastfeeding in southeast England. Acta Paediatrica 1997;86:51-6.

26. Vadiakas G, Oulis C, Berdouses E. Profile of non-nutritive sucking habits in relation to nursing behavior in pre-school children. J Clin Pediatr Dent 1998;22:133-6.

27. Larsson E. Orthodontic aspects on feeding of young children: a comparison between Swedish and Norwegian-Sami children. Swed Dent J 1998;22:117-21.

28. Barros FC, Victora CG, Semer TC, Tonioli Filho S, Tomasi E, Weiderpass E. Use of pacifiers is associated with decreased breast-feeding duration. Pediatrics 1995;95:497-9.

29. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence? Pediatrics 1997;99:445-53.

30. Hörnell A, Aarts C, Kylberg E, Hofvander Y, Gebre-Medhin M. Breastfeeding patterns in exclusively breastfed infants: a longitudinal prospective study in Uppsala, Sweden. Acta Paediatr 1999;88:203-11.

31. Aarts C, Hornell A, Kylberg E, Hofvander Y. Gebre-Medhin. Breastfeeding patterns in relation to thumb sucking and pacifier use. Pediatrics 1999;104:e50.-

32. Riva E, Banderali G, Agostoni C, Silano M, Radaelli G, Giovannini M. Factors associated with initiation and duration of breastfeeding in Italy. Acta Paediatr 1999;88:411-5.

33. Howard CR, Howard FM, Lanphear B, de Blieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration. Pediatrics 1999;103:E33.-

34. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74(4 Part 2):603-14.

35. Palmer B. The influence of breastfeeding on the development of the oral cavity: a commentary. J Hum Lact 1998;14:93-8.

36. Protecting, promoting, and supporting breast-feeding: the special role of maternity services. A joint WHO/UNICEF statement. Geneva: World Health Organization, 1989.

37. Bergus GR, Levy BT, Levy SM, Slager SL, Kiritsy MC. A longitudinal study of the exposure of infants to antibiotics during the first 200 days of life. Arch Fam Med 1996;5:523-6.

38. Bergus GR, Levy SM, Kirchner L, Warren JJ, Levy BT. A prospective study of infection and associated antibiotic use in young children. Pediatr Perinatal Epidemiol 2001;15:61-7.

39. Levy SM, Kiritsy MC, Slager SL, Warren JJ, Kohout FJ. Patterns of fluoride dentifrice use among infants. Pediatr Dent 1997;19:50-5.

40. Heilman JR, Kiritsy MC, Levy SM, Wefel JR. Fluoride content of infant foods and cereals. JADA 1997;128:857-63.

41. Levy SM, Kiritsy MC, Slager SL, Warren JJ. Patterns of fluoride supplement use during infancy. J Public Health Dent 1998;58:228-33.

42. Heilman JR, Kiritsy MC, Levy SM, Wefel JS. Fluoride levels of carbonated soft drinks. J Am Dent Assoc 1999;130:1593-9.

43. Levy SM, Warren JJ, Davis CS, Kirchner HL, Kanellis MJ, Wefel JS. Patterns of fluoride intake from birth to 36 months. J Public Health Dent 2001;61:70-7.

44. Labbok M, Kroasovec K. Toward consistency in breast-feeding definitions. Stud Fam Planning 1990;21:226-30.

45. Cox DR. Regression models and life-tables (with discussion). J Royal Stat Soc 1972;B34:187-220.

46. SAS Institute, Inc. SAS technical report P-229, SAS/STAT software: changes and enhancements. Release 6.07. Cary, NC: SAS Institute, 1992.

All correspondence should be addressed to Dr. Steven M. Levy, University of Iowa College of Dentistry, Department of Preventive & Community Dentistry, N329 Dental Science Building, Iowa City, IA 52242. E-mail: [email protected]

To submit a letter to the editor on this topic, click here: [email protected].

Article PDF
Author and Disclosure Information

 

Steven M. Levy, DDS, MPH
Susan L. Slager, PhD
John J. Warren, DDS, MS
Barcey T. Levy, PhD, MD
Arthur J. Nowak, DMD, MA
Iowa City, Iowa
From The University of Iowa Colleges of Dentistry, (S.M.L., S.L.S., J.J.W., A.J.N.), Public Health (S.M.L.), and Medicine (B.T.L.), Iowa City. Earlier versions of this paper were presented at the 1997 annual meetings of the International Association for Dental Research and the American Association of Public Health Dentistry. The authors report no competing interests.

Issue
The Journal of Family Practice - 51(05)
Publications
Topics
Page Number
1-1
Legacy Keywords
,Non-nutritive suckingbreastfeedingchildcarepacifier usedigit sucking. (J Fam Pract 2002; 51:465)
Sections
Author and Disclosure Information

 

Steven M. Levy, DDS, MPH
Susan L. Slager, PhD
John J. Warren, DDS, MS
Barcey T. Levy, PhD, MD
Arthur J. Nowak, DMD, MA
Iowa City, Iowa
From The University of Iowa Colleges of Dentistry, (S.M.L., S.L.S., J.J.W., A.J.N.), Public Health (S.M.L.), and Medicine (B.T.L.), Iowa City. Earlier versions of this paper were presented at the 1997 annual meetings of the International Association for Dental Research and the American Association of Public Health Dentistry. The authors report no competing interests.

Author and Disclosure Information

 

Steven M. Levy, DDS, MPH
Susan L. Slager, PhD
John J. Warren, DDS, MS
Barcey T. Levy, PhD, MD
Arthur J. Nowak, DMD, MA
Iowa City, Iowa
From The University of Iowa Colleges of Dentistry, (S.M.L., S.L.S., J.J.W., A.J.N.), Public Health (S.M.L.), and Medicine (B.T.L.), Iowa City. Earlier versions of this paper were presented at the 1997 annual meetings of the International Association for Dental Research and the American Association of Public Health Dentistry. The authors report no competing interests.

Article PDF
Article PDF

 

ABSTRACT

OBJECTIVE: Breast milk is the recommended method of nutrition for newborns and infants. Several studies have investigated factors associated with the cessation of breastfeeding. This study assessed the associations between pacifier use, digit sucking, child care attendance, and breastfeeding cessation among 1387 infants in the Iowa Fluoride Study.

STUDY DESIGN: This was a longitudinal questionnaire survey. Mothers completed mailed questionnaires sent at age 6 weeks, 3 months, and 6 months.

POPULATION: Parents were recruited postpartum at 8 Iowa hospitals.

OUTCOMES MEASURED: Survival analysis (using Cox proportional hazards model) assessed the time covariate effects of pacifier use, digit sucking, and child care attendance on cessation of breastfeeding, while adjusting for other possible confounding variables (not planning to breastfeed, maternal smoking, infants’ sex and antibiotic use, maternal and paternal age and education, and income group).

RESULTS: Percentages of women who did any breastfeeding were 46%, 36%, and 27%, at 6 weeks, 3 months, and 6 months, respectively. Percentages using pacifiers were 81%, 71%, and 59%. Combinations of pacifier use and digit sucking for various levels of child care had statistically significant associations with cessation of breastfeeding, with the effect being strongest for pacifier users and digit suckers with no child care days (hazard ratio = 1.88; 95% CI, 1.36-2.62).

CONCLUSIONS: Pacifier use and digit sucking were associated with cessation of breastfeeding, with results dependent on the level of child care attendance. The strongest associations were found for those not attending child care and for combined use of pacifier with digit sucking.

Breastfeeding is associated with lower rates of infant mortality and morbidity,1-6 a reduced rate of sudden infant death syndrome (SIDS),7,8 delayed resumption of fertility,9 and reduced health care cost.10,11 The American Academy of Family Physicians has issued a policy statement supporting breastfeeding as the optimal form of nutrition for infants12 and the American Academy of Pediatrics recommends that infants should be breastfed for at least 12 months.13 Therefore, it is important to understand the factors associated with reduced breastfeeding. In previous studies, the factors associated with reduced breastfeeding included maternal employment,14 child care attendance,15 maternal smoking,14,16,17 and demographic factors.16,18,19

Several recent studies have also identified an association between non-nutritive sucking (eg, pacifiers) and reduced breastfeeding20-35 that is consistent with the World Health Organization (and UNICEF) recommendation that pacifiers not be used by breastfeeding infants.36 Cross-sectional investigations in Sweden,20-22 Brazil,23 New Zealand,24 England,25 Greece,26 and Sweden and Norway27 found strong associations between pacifier use and reduced breastfeeding (either less exclusive breastfeeding, shorter duration of breastfeeding, or breastfeeding problems), with only one26 not reporting statistically significant findings.

Of particular interest were several longitudinal studies in Brazil (2 studies), Sweden, Italy, and the United States. In Brazil, one found that pacifier users had an adjusted relative risk of 2.87 for weaning,28 and the other an adjusted odds ratio of 2.5 for the cessation of breastfeeding associated with pacifier use.29

Hörnell and colleagues30 and Aarts and colleagues31 reported longitudinal data from 506 mothers’ daily infant feeding practices in Uppsala, Sweden. All mothers had at least one previous child breastfed at least 4 months and were planning to breastfeed the study child for at least 6 months. Thumb sucking was not associated with the breastfeeding pattern, but infants using a pacifier frequently had approximately 1 less breastfeeding session and 15 minutes to 30 minutes less total breastfeeding time per day than those not using a pacifier at 2, 4, 8, and 12-week follow-up points. Cross-sectional and survival analyses of breastfeeding at 4 months compared with non-nutritive sucking at 1 month showed no significant relationship with thumb sucking, but a significant relationship with pacifier use, with increasing frequency of pacifier use related to a decline in breastfeeding duration. Riva and coworkers32 studied 1601 women in Italy and showed that pacifier use was associated with an elevated hazard ratio of 1.18 (95% confidence interval [CI], 1.04-1.34) for breastfeeding cessation in adjusted analyses.

In the only published US study, Howard and colleagues33 reported on the effects of early pacifier use on breastfeeding duration among 265 infants in the Rochester, New York area on the basis of maternal telephone interviews at 2, 6, 12, and 24 weeks and every 90 days thereafter until the breastfeeding ended. Results were adjusted for factors such as maternal age, breastfeeding goals, and plans to work. Pacifier introduction by 6 weeks was significantly associated with shortened duration of some breastfeeding (hazard ratio [HR] = 1.61; 95% CI, 1.19-2.19; P = .002), as was a plan to return to work (HR = 1.42). Digit sucking was not examined and interactions were not assessed.

 

 

We found only one prospective study31 that considered the effects of both pacifier and digit sucking, and one study that considered the effects of pacifier and plans to return to work33 on breastfeeding duration. However, no studies simultaneously looked at the effects of maternal employment or child care, pacifier use, digit sucking, and any potential interactions, although they have been shown to be individually associated with cessation of breastfeeding. Thus, the purpose of this study was to assess the associations of non-nutritive sucking (pacifiers and fingers) with cessation of breastfeeding, while considering child care attendance, from birth to age 6 months, using a longitudinal study design in a sample of children in the United States.

Methods

The data were collected as part of a larger, prospective study of a birth cohort assessing fluoride exposures longitudinally and relationships with dental caries and dental fluorosis.16,37-43 Mothers were recruited at the time of their infants’ births at 8 hospitals in eastern Iowa from March 1992 to February 1995, using appropriate informed consent procedures approved by the Institutional Review Board. The recruitment questionnaire assessed household smoking patterns during pregnancy, whether women planned to breastfeed, and other demographic factors.

Information regarding infants’ weight, feeding practices (breastfeeding vs bottle-feeding), non-nutritive sucking (pacifier use and sucking thumb or fingers), child care attendance (number of full or half days), maternal smoking, otitis media experience, and antibiotic use was collected by mailed questionnaire sent at 6 weeks, 3 months, and 6 months of age. Each questionnaire concerned the preceding time period. Nonrespondents received follow-up mailings after 3 weeks and telephone follow-up after 6 weeks. Direct validation of responses was not conducted, but subjects were contacted by mail or telephone, when necessary, to clarify or correct responses. Data were double-entered and verified.

Breastfeeding and bottle-feeding practices for each period were summarized in 3 ways: (1) exclusive breastfeeding, (2) any breastfeeding, and (3) mostly bottle-feeding (defined as at least 75% of estimated total calories based on body weight from formula, milk, or juice). These definitions generally correspond to those proposed by Labbok and Krasovec44 of full, almost exclusive breastfeeding, and low, partial breastfeeding, respectively.

Time until cessation of all breastfeeding was modeled using the Cox proportional hazard regression model45 against 3 main factors of interest: pacifier use (yes/no), digit sucking (yes/no), and child care attendance (total number of child care days). Since no information was collected regarding maternal employment, we considered child care attendance as a proxy. Pacifier use and digit sucking were coded “yes” if the child started using the pacifier or sucking on the digit, respectively, any time during the first 6 weeks of life. Main effects, 2-way interactions among these variables, and nonlinear effects of child care days were tested while adjusting for maternal and paternal age and education, family income, breastfeeding plans, maternal smoking, infant’s sex, and infant antibiotic use. We used the likelihood ratio test to assess significance at an alpha level of 0.05, and the statistical analyses were conducted using PROC PHREG in SAS software.46

Results

The number of mothers who were successfully recruited and who provided at least one subsequent completed questionnaire was 1387. There were 1236 (89%) respondents at 6 weeks, 1196 (86%) at 3 months, and 1048 (76%) at 6 months.

Table 1 summarizes the study sample at baseline recruitment. Approximately two thirds of mothers and fathers had at least some college education; 76% had family income of at least $20,000; 95% were white; 43% of the infants were the mother’s first-born child; and 65% of the mothers said they were planning to breastfeed their infants.

Table 2 summarizes the breastfeeding practices of the cohort by presenting the percentages of infants at each time point with different feeding practices. Approximately 46% reported some breastfeeding on the 6-week questionnaire, declining to 36% at 3 months and 27% at 6 months. Only 16% of the infants were exclusively breastfed at 6 weeks, dropping to 1% by 6 months. A high percentage of infants were mostly bottle fed at each of the 3 corresponding time periods.

Table 2 also summarizes the patterns of non-nutritive sucking across the infant ages. A high percentage of the infants practiced some form of non-nutritive sucking during each period (86.3%, 92.0%, and 86.3% at 6 weeks, 3 months, and 6 months, respectively). From the 6-week to 6-month responses, pacifier use declined from 81% to 59%, while digit sucking increased from 50% to 83% and then declined to 76%. Table 3 summarizes child care attendance during the 6 months, with half days and full days of child care added together. Thirty-four percent of the infants attended some child care, with approximately 12% receiving more than 25 full days of child care by the age of 6 months or the time of censor/failure, where censor in this case is loss to follow-up prior to reaching 6 months of age.

 

 

We next analyzed the data using Cox regression, an analysis method designed for longitudinal data on event times, such as time until death. The outcome variable was time until cessation of all breastfeeding. The median failure time (cessation) was 72 days (95% CI, 68-78) with interquartile range from 53 to 192 days. Seventy-four percent had ceased breastfeeding by 6 months and 26% were censored because of continued breastfeeding at 6 months when analysis ended or earlier loss to follow-up.

Table 4 reports the relative hazard ratios and 95% confidence intervals at various levels of child care, pacifier use, and digit sucking, while adjusting for the other potential confounders considered (see Methods section). The baseline category (or reference cell) is a child with no child care and no non-nutritive sucking. We see from Table 4 that the estimated risk of breastfeeding cessation is the highest, with a value of 1.88 (95% CI, 1.36-2.62), for a child who sucks on both pacifier and digit at no child care days. This hazard ratio drops to 1.52 (95% CI, 1.03-2.25) at 15 child care days and then becomes nonsignificant at 30 and 60 child care days.

Our results in Table 4 also show that pacifier use at zero child care days has a significant effect in that a child who sucks only on a pacifier has a 67% increase in the hazard of cessation of breastfeeding, compared with a child with no non-nutritive sucking. At higher levels of child care days, this effect changes and becomes a protective effect, although this effect was not significant at 15 child care days, was significant at 30 child care days, and was borderline significant at 60 child care days. Finally, the effect of digit sucking and child care by themselves tended not to be significant at the 0.05 level, with the one exception at 15 child care days where there is a significant effect of 1.41 (95% CI, 1.02-1.96).

Discussion

Our findings concerning pacifiers are generally consistent with several recent studies that have demonstrated associations between pacifier use and reduced breastfeeding, including the few reported longitudinal studies. However, these other studies did not control for child care attendance. We found that the effect of pacifier use changed with increasing number of child care days. For example, in the absence of child care, children who sucked on a pacifier were about 1.7 times as likely to cease breastfeeding than children who did not use a pacifier. For 15 days to 60 days of child care, the hazard ratios were less than 1.0, with results statistically significant at only 30 days.

Furthermore, our analyses showed the joint effect of pacifier use and digit sucking at various child care days. We found a significant reduction in breastfeeding for children who use both pacifier and a digit by the age of 6 weeks. But this joint non-nutritive effect reduces to being nonsignificant with 30 or more child care days. Although we found that digit sucking and child care days by themselves had little effect on cessation of breastfeeding, it was important to consider them because these variables significantly interacted with pacifier use.

Our study found that for infants who did not attend child care, pacifier use significantly increased the odds of breastfeeding cessation, as did the combination of pacifier use and digit sucking. However, digit sucking with no pacifier use in the absence of child care did not increase the odds of breastfeeding cessation. In contrast, for infants who attended child care for 30 days in the first 6 months of life, pacifier use alone appeared to be somewhat protective in maintaining breastfeeding, while digit sucking, either alone or in combination with pacifier, increased the odds of breastfeeding cessation, with significance at 15 days. It is possible that pacifiers were used sparingly in child care, whereas digits were available and more widely used, so that non-nutritive sucking interference with breastfeeding was more strongly influenced by digit sucking. Alternatively, it is possible that mothers who placed their infants in child care early in life used pacifiers differently than mothers who did not. That is, for non-child care infants, pacifiers may have been part of a planned strategy to wean from breastfeeding, whereas for children in child care, pacifiers may have been part of a planned strategy to encourage sucking behavior and comfort children until the mother was available for breastfeeding. In such a scenario, digit sucking was less under parental control, particularly at child care, so that it may have interfered with breastfeeding despite parental planning or desires.

 

 

Limitations

There are several limitations when considering our study’s findings. The study group was not a probability sample fully representative of a defined population. It was of generally high socioeconomic status and, representative of Iowa, had little minority inclusion. Respondents were more educated than nonrespondents.39 Although response rates were generally favorable, approximately 100 to 300 did not respond at a given time point, resulting in censoring of 26% of the cases. Data on breastfeeding, sucking, childcare, and so forth were collected at 3 discrete time points and not on a more frequent, daily, weekly, or monthly basis. Although recall bias was limited by the short-term nature of recall with 6-week and 3-month intervals, it could have an effect on results. Since so few infants exclusively breastfed, any breastfeeding was the only suitable dependent variable. No maternal employment data were collected and quantification of pacifier use was not included.

Only our study and that of Howard and colleagues33 reported results from the United States. The statistical analyses by Howard and colleagues concerning pacifier use adjusted for a number of factors, including plans to return to work, family and paternal preferences for breastfeeding, and breastfeeding goal. Our study adjusted for plans to breastfeed and demographic factors while assessing the effects of pacifier use, digit sucking, and number of child care days. However, neither study specifically assessed reasons for use of the pacifier, in particular, in relation to work and child care requirements. So pacifiers could have been used to facilitate weaning, thus resulting in the association with reduced breastfeeding. Also, there may be other confounding differences between those using pacifiers and those who did not.

Although decisions by mothers to return to work, or for other reasons, have their infants attend child care were not generally associated with reductions in breastfeeding, our results suggest that child care has an important impact on determining the relationships between non-nutritive sucking behaviors and cessation of breastfeeding. It has been suggested that infants’ abilities to easily and successfully breastfeed are adversely affected by non-nutritive sucking, resulting in reductions in the frequency and consistency of the breastfeeding sessions. Our data support the concept. However, it is important to acknowledge that decisions to stop breastfeeding (often prior to return to work) may have preceded and led to the increase in non-nutritive sucking, rather than sucking leading to cessation of breastfeeding. That is, after the decision has been made to stop breastfeeding, a pacifier may be introduced to ease the transition to bottle feeding.

Additional studies involving in-depth interviews concerning initial and subsequent breastfeeding, employment, and child care plans would be warranted to address this question further. In addition, more controlled studies to determine whether there is any biological relationship between non-nutritive sucking and breastfeeding difficulties are warranted. Clearly, the social, biological, and economic factors involved in decisions to initiate and cease breastfeeding are complex and will require more study, both in the United States and throughout the world.

Acknowledgments

Our study was supported in part by National Institutes of Health grants #RO1-DE09551 and #P30-DE10126 and the University of Iowa’s Obermann Center for Advanced Study. We thank the staff of the Iowa Fluoride Study for their assistance in implementing the study, and Tina Craig for manuscript preparation.

 

ABSTRACT

OBJECTIVE: Breast milk is the recommended method of nutrition for newborns and infants. Several studies have investigated factors associated with the cessation of breastfeeding. This study assessed the associations between pacifier use, digit sucking, child care attendance, and breastfeeding cessation among 1387 infants in the Iowa Fluoride Study.

STUDY DESIGN: This was a longitudinal questionnaire survey. Mothers completed mailed questionnaires sent at age 6 weeks, 3 months, and 6 months.

POPULATION: Parents were recruited postpartum at 8 Iowa hospitals.

OUTCOMES MEASURED: Survival analysis (using Cox proportional hazards model) assessed the time covariate effects of pacifier use, digit sucking, and child care attendance on cessation of breastfeeding, while adjusting for other possible confounding variables (not planning to breastfeed, maternal smoking, infants’ sex and antibiotic use, maternal and paternal age and education, and income group).

RESULTS: Percentages of women who did any breastfeeding were 46%, 36%, and 27%, at 6 weeks, 3 months, and 6 months, respectively. Percentages using pacifiers were 81%, 71%, and 59%. Combinations of pacifier use and digit sucking for various levels of child care had statistically significant associations with cessation of breastfeeding, with the effect being strongest for pacifier users and digit suckers with no child care days (hazard ratio = 1.88; 95% CI, 1.36-2.62).

CONCLUSIONS: Pacifier use and digit sucking were associated with cessation of breastfeeding, with results dependent on the level of child care attendance. The strongest associations were found for those not attending child care and for combined use of pacifier with digit sucking.

Breastfeeding is associated with lower rates of infant mortality and morbidity,1-6 a reduced rate of sudden infant death syndrome (SIDS),7,8 delayed resumption of fertility,9 and reduced health care cost.10,11 The American Academy of Family Physicians has issued a policy statement supporting breastfeeding as the optimal form of nutrition for infants12 and the American Academy of Pediatrics recommends that infants should be breastfed for at least 12 months.13 Therefore, it is important to understand the factors associated with reduced breastfeeding. In previous studies, the factors associated with reduced breastfeeding included maternal employment,14 child care attendance,15 maternal smoking,14,16,17 and demographic factors.16,18,19

Several recent studies have also identified an association between non-nutritive sucking (eg, pacifiers) and reduced breastfeeding20-35 that is consistent with the World Health Organization (and UNICEF) recommendation that pacifiers not be used by breastfeeding infants.36 Cross-sectional investigations in Sweden,20-22 Brazil,23 New Zealand,24 England,25 Greece,26 and Sweden and Norway27 found strong associations between pacifier use and reduced breastfeeding (either less exclusive breastfeeding, shorter duration of breastfeeding, or breastfeeding problems), with only one26 not reporting statistically significant findings.

Of particular interest were several longitudinal studies in Brazil (2 studies), Sweden, Italy, and the United States. In Brazil, one found that pacifier users had an adjusted relative risk of 2.87 for weaning,28 and the other an adjusted odds ratio of 2.5 for the cessation of breastfeeding associated with pacifier use.29

Hörnell and colleagues30 and Aarts and colleagues31 reported longitudinal data from 506 mothers’ daily infant feeding practices in Uppsala, Sweden. All mothers had at least one previous child breastfed at least 4 months and were planning to breastfeed the study child for at least 6 months. Thumb sucking was not associated with the breastfeeding pattern, but infants using a pacifier frequently had approximately 1 less breastfeeding session and 15 minutes to 30 minutes less total breastfeeding time per day than those not using a pacifier at 2, 4, 8, and 12-week follow-up points. Cross-sectional and survival analyses of breastfeeding at 4 months compared with non-nutritive sucking at 1 month showed no significant relationship with thumb sucking, but a significant relationship with pacifier use, with increasing frequency of pacifier use related to a decline in breastfeeding duration. Riva and coworkers32 studied 1601 women in Italy and showed that pacifier use was associated with an elevated hazard ratio of 1.18 (95% confidence interval [CI], 1.04-1.34) for breastfeeding cessation in adjusted analyses.

In the only published US study, Howard and colleagues33 reported on the effects of early pacifier use on breastfeeding duration among 265 infants in the Rochester, New York area on the basis of maternal telephone interviews at 2, 6, 12, and 24 weeks and every 90 days thereafter until the breastfeeding ended. Results were adjusted for factors such as maternal age, breastfeeding goals, and plans to work. Pacifier introduction by 6 weeks was significantly associated with shortened duration of some breastfeeding (hazard ratio [HR] = 1.61; 95% CI, 1.19-2.19; P = .002), as was a plan to return to work (HR = 1.42). Digit sucking was not examined and interactions were not assessed.

 

 

We found only one prospective study31 that considered the effects of both pacifier and digit sucking, and one study that considered the effects of pacifier and plans to return to work33 on breastfeeding duration. However, no studies simultaneously looked at the effects of maternal employment or child care, pacifier use, digit sucking, and any potential interactions, although they have been shown to be individually associated with cessation of breastfeeding. Thus, the purpose of this study was to assess the associations of non-nutritive sucking (pacifiers and fingers) with cessation of breastfeeding, while considering child care attendance, from birth to age 6 months, using a longitudinal study design in a sample of children in the United States.

Methods

The data were collected as part of a larger, prospective study of a birth cohort assessing fluoride exposures longitudinally and relationships with dental caries and dental fluorosis.16,37-43 Mothers were recruited at the time of their infants’ births at 8 hospitals in eastern Iowa from March 1992 to February 1995, using appropriate informed consent procedures approved by the Institutional Review Board. The recruitment questionnaire assessed household smoking patterns during pregnancy, whether women planned to breastfeed, and other demographic factors.

Information regarding infants’ weight, feeding practices (breastfeeding vs bottle-feeding), non-nutritive sucking (pacifier use and sucking thumb or fingers), child care attendance (number of full or half days), maternal smoking, otitis media experience, and antibiotic use was collected by mailed questionnaire sent at 6 weeks, 3 months, and 6 months of age. Each questionnaire concerned the preceding time period. Nonrespondents received follow-up mailings after 3 weeks and telephone follow-up after 6 weeks. Direct validation of responses was not conducted, but subjects were contacted by mail or telephone, when necessary, to clarify or correct responses. Data were double-entered and verified.

Breastfeeding and bottle-feeding practices for each period were summarized in 3 ways: (1) exclusive breastfeeding, (2) any breastfeeding, and (3) mostly bottle-feeding (defined as at least 75% of estimated total calories based on body weight from formula, milk, or juice). These definitions generally correspond to those proposed by Labbok and Krasovec44 of full, almost exclusive breastfeeding, and low, partial breastfeeding, respectively.

Time until cessation of all breastfeeding was modeled using the Cox proportional hazard regression model45 against 3 main factors of interest: pacifier use (yes/no), digit sucking (yes/no), and child care attendance (total number of child care days). Since no information was collected regarding maternal employment, we considered child care attendance as a proxy. Pacifier use and digit sucking were coded “yes” if the child started using the pacifier or sucking on the digit, respectively, any time during the first 6 weeks of life. Main effects, 2-way interactions among these variables, and nonlinear effects of child care days were tested while adjusting for maternal and paternal age and education, family income, breastfeeding plans, maternal smoking, infant’s sex, and infant antibiotic use. We used the likelihood ratio test to assess significance at an alpha level of 0.05, and the statistical analyses were conducted using PROC PHREG in SAS software.46

Results

The number of mothers who were successfully recruited and who provided at least one subsequent completed questionnaire was 1387. There were 1236 (89%) respondents at 6 weeks, 1196 (86%) at 3 months, and 1048 (76%) at 6 months.

Table 1 summarizes the study sample at baseline recruitment. Approximately two thirds of mothers and fathers had at least some college education; 76% had family income of at least $20,000; 95% were white; 43% of the infants were the mother’s first-born child; and 65% of the mothers said they were planning to breastfeed their infants.

Table 2 summarizes the breastfeeding practices of the cohort by presenting the percentages of infants at each time point with different feeding practices. Approximately 46% reported some breastfeeding on the 6-week questionnaire, declining to 36% at 3 months and 27% at 6 months. Only 16% of the infants were exclusively breastfed at 6 weeks, dropping to 1% by 6 months. A high percentage of infants were mostly bottle fed at each of the 3 corresponding time periods.

Table 2 also summarizes the patterns of non-nutritive sucking across the infant ages. A high percentage of the infants practiced some form of non-nutritive sucking during each period (86.3%, 92.0%, and 86.3% at 6 weeks, 3 months, and 6 months, respectively). From the 6-week to 6-month responses, pacifier use declined from 81% to 59%, while digit sucking increased from 50% to 83% and then declined to 76%. Table 3 summarizes child care attendance during the 6 months, with half days and full days of child care added together. Thirty-four percent of the infants attended some child care, with approximately 12% receiving more than 25 full days of child care by the age of 6 months or the time of censor/failure, where censor in this case is loss to follow-up prior to reaching 6 months of age.

 

 

We next analyzed the data using Cox regression, an analysis method designed for longitudinal data on event times, such as time until death. The outcome variable was time until cessation of all breastfeeding. The median failure time (cessation) was 72 days (95% CI, 68-78) with interquartile range from 53 to 192 days. Seventy-four percent had ceased breastfeeding by 6 months and 26% were censored because of continued breastfeeding at 6 months when analysis ended or earlier loss to follow-up.

Table 4 reports the relative hazard ratios and 95% confidence intervals at various levels of child care, pacifier use, and digit sucking, while adjusting for the other potential confounders considered (see Methods section). The baseline category (or reference cell) is a child with no child care and no non-nutritive sucking. We see from Table 4 that the estimated risk of breastfeeding cessation is the highest, with a value of 1.88 (95% CI, 1.36-2.62), for a child who sucks on both pacifier and digit at no child care days. This hazard ratio drops to 1.52 (95% CI, 1.03-2.25) at 15 child care days and then becomes nonsignificant at 30 and 60 child care days.

Our results in Table 4 also show that pacifier use at zero child care days has a significant effect in that a child who sucks only on a pacifier has a 67% increase in the hazard of cessation of breastfeeding, compared with a child with no non-nutritive sucking. At higher levels of child care days, this effect changes and becomes a protective effect, although this effect was not significant at 15 child care days, was significant at 30 child care days, and was borderline significant at 60 child care days. Finally, the effect of digit sucking and child care by themselves tended not to be significant at the 0.05 level, with the one exception at 15 child care days where there is a significant effect of 1.41 (95% CI, 1.02-1.96).

Discussion

Our findings concerning pacifiers are generally consistent with several recent studies that have demonstrated associations between pacifier use and reduced breastfeeding, including the few reported longitudinal studies. However, these other studies did not control for child care attendance. We found that the effect of pacifier use changed with increasing number of child care days. For example, in the absence of child care, children who sucked on a pacifier were about 1.7 times as likely to cease breastfeeding than children who did not use a pacifier. For 15 days to 60 days of child care, the hazard ratios were less than 1.0, with results statistically significant at only 30 days.

Furthermore, our analyses showed the joint effect of pacifier use and digit sucking at various child care days. We found a significant reduction in breastfeeding for children who use both pacifier and a digit by the age of 6 weeks. But this joint non-nutritive effect reduces to being nonsignificant with 30 or more child care days. Although we found that digit sucking and child care days by themselves had little effect on cessation of breastfeeding, it was important to consider them because these variables significantly interacted with pacifier use.

Our study found that for infants who did not attend child care, pacifier use significantly increased the odds of breastfeeding cessation, as did the combination of pacifier use and digit sucking. However, digit sucking with no pacifier use in the absence of child care did not increase the odds of breastfeeding cessation. In contrast, for infants who attended child care for 30 days in the first 6 months of life, pacifier use alone appeared to be somewhat protective in maintaining breastfeeding, while digit sucking, either alone or in combination with pacifier, increased the odds of breastfeeding cessation, with significance at 15 days. It is possible that pacifiers were used sparingly in child care, whereas digits were available and more widely used, so that non-nutritive sucking interference with breastfeeding was more strongly influenced by digit sucking. Alternatively, it is possible that mothers who placed their infants in child care early in life used pacifiers differently than mothers who did not. That is, for non-child care infants, pacifiers may have been part of a planned strategy to wean from breastfeeding, whereas for children in child care, pacifiers may have been part of a planned strategy to encourage sucking behavior and comfort children until the mother was available for breastfeeding. In such a scenario, digit sucking was less under parental control, particularly at child care, so that it may have interfered with breastfeeding despite parental planning or desires.

 

 

Limitations

There are several limitations when considering our study’s findings. The study group was not a probability sample fully representative of a defined population. It was of generally high socioeconomic status and, representative of Iowa, had little minority inclusion. Respondents were more educated than nonrespondents.39 Although response rates were generally favorable, approximately 100 to 300 did not respond at a given time point, resulting in censoring of 26% of the cases. Data on breastfeeding, sucking, childcare, and so forth were collected at 3 discrete time points and not on a more frequent, daily, weekly, or monthly basis. Although recall bias was limited by the short-term nature of recall with 6-week and 3-month intervals, it could have an effect on results. Since so few infants exclusively breastfed, any breastfeeding was the only suitable dependent variable. No maternal employment data were collected and quantification of pacifier use was not included.

Only our study and that of Howard and colleagues33 reported results from the United States. The statistical analyses by Howard and colleagues concerning pacifier use adjusted for a number of factors, including plans to return to work, family and paternal preferences for breastfeeding, and breastfeeding goal. Our study adjusted for plans to breastfeed and demographic factors while assessing the effects of pacifier use, digit sucking, and number of child care days. However, neither study specifically assessed reasons for use of the pacifier, in particular, in relation to work and child care requirements. So pacifiers could have been used to facilitate weaning, thus resulting in the association with reduced breastfeeding. Also, there may be other confounding differences between those using pacifiers and those who did not.

Although decisions by mothers to return to work, or for other reasons, have their infants attend child care were not generally associated with reductions in breastfeeding, our results suggest that child care has an important impact on determining the relationships between non-nutritive sucking behaviors and cessation of breastfeeding. It has been suggested that infants’ abilities to easily and successfully breastfeed are adversely affected by non-nutritive sucking, resulting in reductions in the frequency and consistency of the breastfeeding sessions. Our data support the concept. However, it is important to acknowledge that decisions to stop breastfeeding (often prior to return to work) may have preceded and led to the increase in non-nutritive sucking, rather than sucking leading to cessation of breastfeeding. That is, after the decision has been made to stop breastfeeding, a pacifier may be introduced to ease the transition to bottle feeding.

Additional studies involving in-depth interviews concerning initial and subsequent breastfeeding, employment, and child care plans would be warranted to address this question further. In addition, more controlled studies to determine whether there is any biological relationship between non-nutritive sucking and breastfeeding difficulties are warranted. Clearly, the social, biological, and economic factors involved in decisions to initiate and cease breastfeeding are complex and will require more study, both in the United States and throughout the world.

Acknowledgments

Our study was supported in part by National Institutes of Health grants #RO1-DE09551 and #P30-DE10126 and the University of Iowa’s Obermann Center for Advanced Study. We thank the staff of the Iowa Fluoride Study for their assistance in implementing the study, and Tina Craig for manuscript preparation.

References

 

1. Molbak K, Gottschau A, Aaby P, Hojlyng N, Ingholt L, daSilva AP. Prolonged breastfeeding, diarrheal disease, and survival of children in Guinea-Bissau. BMJ 1994;308:1403-6.

2. Victora CG, Smith PG, Vaughan JP, et al. Evidence for protection by breast-feeding against infant deaths from infectious diseases in Brazil. Lancet 1987;2:319-22.

3. Cesar JA, Victora CG, Barros FC, Santos S, Flores JA. Impact of breastfeeding on admissions for pneumonia during post neonatal period in Brazil: nested case-control study. BMJ 1999;318:1316-20.

4. Cushing AH, Samet JM, Lambert WE, et al. Breastfeeding reduces risk of respiratory illness in infants. Am J Epidemiol 1998;147:863-70.

5. Scariati PD, Grummer-Strawn LM, Fein SB. A longitudinal analysis of infant morbidity and the extent of breastfeeding in the United States. Pediatrics 1997;99:E5.-

6. Duffy LC, Faden H, Wasielewski R, Wolf J, Krystofik D. Exclusive breastfeeding protects against bacterial colonization and day care exposure to otitis media. Pediatrics 1997;100:E7.-

7. Gilbert RE, Wigfield RE, Fleming PJ, Berry PJ, Rudd PT. Bottle feeding and the sudden infant death syndrome. BMJ 1995;310:88-90.

8. L’Hoir MP, Engelberts AC, van Well GT, et al. Dummy use, thumb sucking, mouth breathing and cot death. Eur J Pediatr 1999;158:896-901.

9. The World Health Organization multinational study of breastfeeding and lactational amenorrhea. IV. Postpartum bleeding and lochia in breastfeeding women. World Health Organization Task Force on Methods for the Natural Regulation of Fertility. Fertil Steril 1999;72:441-7.

10. Ball TM, Wright AL. Health care costs of formula-feeding in the first year of life. Pediatrics 1999;103(4 Pt. 2):870-6.

11. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant feeding practice. Pediatrics 1984;74:603-14.

12. American Academy of Family Physicians. Breastfeeding and Infant Nutrition. Available at: www.aafp.org/policy/issues/i3.htmal. Accessed July 16, 2001.

13. American Academy of Pediatrics. Work Group on Breastfeeding. Breastfeeding and the use of human milk. Pediatrics 1997;100:1035-9.

14. Piper S, Parks PL. Predicting the duration of lactation: evidence from a national survey. Birth 1996;23:7-12.

15. Weile B, Rubin DH, Krasilnikoff PA, Kuo HS, Jekel JF. Infant feeding patterns during the first year of life in Denmark: factors associated with the discontinuation of breastfeeding. J Clin Epidemiol 1990;43:1305-11.

16. Levy BT, Bergus GR, Levy SM, Kiritsy MC, Slager SL. Longitudinal feeding patterns of Iowa infants. Ambulatory Child Health 1996;2:25-34.

17. Rutishauser IH, Carlin JB. Body mass index and duration of breastfeeding: a survival analysis during the first six months of life. J Epidemiol Community Health 1992;46:559-65.

18. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74:603-14.

19. Kruinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

20. Righard L, Alade MO. Sucking technique and its effect on success of breastfeeding. Birth 1992;19:185-9.

21. Righard L, Alade MO. Breastfeeding and the use of pacifiers. Birth 1997;24:116-20.

22. Righard L. Are breastfeeding problems related to incorrect breastfeeding technique and the use of pacifiers and bottles? Birth 1998;25:40-4.

23. Victora CG, Tomasi E, Olinto MT, Barros FC. Use of pacifiers and breastfeeding duration. Lancet 1993;341:404-6.

24. Ford RP, Mitchell EA, Scragg R, Stewart AW, Taylor BJ, Allen EM. Factors adversely associated with breastfeeding in New Zealand. J Pediatrics Child Health 1994;30:483-9.

25. Clements MS, Mitchell EA, Wright SP, Esmail A, Jones DR, Ford RP. Influences on breastfeeding in southeast England. Acta Paediatrica 1997;86:51-6.

26. Vadiakas G, Oulis C, Berdouses E. Profile of non-nutritive sucking habits in relation to nursing behavior in pre-school children. J Clin Pediatr Dent 1998;22:133-6.

27. Larsson E. Orthodontic aspects on feeding of young children: a comparison between Swedish and Norwegian-Sami children. Swed Dent J 1998;22:117-21.

28. Barros FC, Victora CG, Semer TC, Tonioli Filho S, Tomasi E, Weiderpass E. Use of pacifiers is associated with decreased breast-feeding duration. Pediatrics 1995;95:497-9.

29. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence? Pediatrics 1997;99:445-53.

30. Hörnell A, Aarts C, Kylberg E, Hofvander Y, Gebre-Medhin M. Breastfeeding patterns in exclusively breastfed infants: a longitudinal prospective study in Uppsala, Sweden. Acta Paediatr 1999;88:203-11.

31. Aarts C, Hornell A, Kylberg E, Hofvander Y. Gebre-Medhin. Breastfeeding patterns in relation to thumb sucking and pacifier use. Pediatrics 1999;104:e50.-

32. Riva E, Banderali G, Agostoni C, Silano M, Radaelli G, Giovannini M. Factors associated with initiation and duration of breastfeeding in Italy. Acta Paediatr 1999;88:411-5.

33. Howard CR, Howard FM, Lanphear B, de Blieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration. Pediatrics 1999;103:E33.-

34. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74(4 Part 2):603-14.

35. Palmer B. The influence of breastfeeding on the development of the oral cavity: a commentary. J Hum Lact 1998;14:93-8.

36. Protecting, promoting, and supporting breast-feeding: the special role of maternity services. A joint WHO/UNICEF statement. Geneva: World Health Organization, 1989.

37. Bergus GR, Levy BT, Levy SM, Slager SL, Kiritsy MC. A longitudinal study of the exposure of infants to antibiotics during the first 200 days of life. Arch Fam Med 1996;5:523-6.

38. Bergus GR, Levy SM, Kirchner L, Warren JJ, Levy BT. A prospective study of infection and associated antibiotic use in young children. Pediatr Perinatal Epidemiol 2001;15:61-7.

39. Levy SM, Kiritsy MC, Slager SL, Warren JJ, Kohout FJ. Patterns of fluoride dentifrice use among infants. Pediatr Dent 1997;19:50-5.

40. Heilman JR, Kiritsy MC, Levy SM, Wefel JR. Fluoride content of infant foods and cereals. JADA 1997;128:857-63.

41. Levy SM, Kiritsy MC, Slager SL, Warren JJ. Patterns of fluoride supplement use during infancy. J Public Health Dent 1998;58:228-33.

42. Heilman JR, Kiritsy MC, Levy SM, Wefel JS. Fluoride levels of carbonated soft drinks. J Am Dent Assoc 1999;130:1593-9.

43. Levy SM, Warren JJ, Davis CS, Kirchner HL, Kanellis MJ, Wefel JS. Patterns of fluoride intake from birth to 36 months. J Public Health Dent 2001;61:70-7.

44. Labbok M, Kroasovec K. Toward consistency in breast-feeding definitions. Stud Fam Planning 1990;21:226-30.

45. Cox DR. Regression models and life-tables (with discussion). J Royal Stat Soc 1972;B34:187-220.

46. SAS Institute, Inc. SAS technical report P-229, SAS/STAT software: changes and enhancements. Release 6.07. Cary, NC: SAS Institute, 1992.

All correspondence should be addressed to Dr. Steven M. Levy, University of Iowa College of Dentistry, Department of Preventive & Community Dentistry, N329 Dental Science Building, Iowa City, IA 52242. E-mail: [email protected]

To submit a letter to the editor on this topic, click here: [email protected].

References

 

1. Molbak K, Gottschau A, Aaby P, Hojlyng N, Ingholt L, daSilva AP. Prolonged breastfeeding, diarrheal disease, and survival of children in Guinea-Bissau. BMJ 1994;308:1403-6.

2. Victora CG, Smith PG, Vaughan JP, et al. Evidence for protection by breast-feeding against infant deaths from infectious diseases in Brazil. Lancet 1987;2:319-22.

3. Cesar JA, Victora CG, Barros FC, Santos S, Flores JA. Impact of breastfeeding on admissions for pneumonia during post neonatal period in Brazil: nested case-control study. BMJ 1999;318:1316-20.

4. Cushing AH, Samet JM, Lambert WE, et al. Breastfeeding reduces risk of respiratory illness in infants. Am J Epidemiol 1998;147:863-70.

5. Scariati PD, Grummer-Strawn LM, Fein SB. A longitudinal analysis of infant morbidity and the extent of breastfeeding in the United States. Pediatrics 1997;99:E5.-

6. Duffy LC, Faden H, Wasielewski R, Wolf J, Krystofik D. Exclusive breastfeeding protects against bacterial colonization and day care exposure to otitis media. Pediatrics 1997;100:E7.-

7. Gilbert RE, Wigfield RE, Fleming PJ, Berry PJ, Rudd PT. Bottle feeding and the sudden infant death syndrome. BMJ 1995;310:88-90.

8. L’Hoir MP, Engelberts AC, van Well GT, et al. Dummy use, thumb sucking, mouth breathing and cot death. Eur J Pediatr 1999;158:896-901.

9. The World Health Organization multinational study of breastfeeding and lactational amenorrhea. IV. Postpartum bleeding and lochia in breastfeeding women. World Health Organization Task Force on Methods for the Natural Regulation of Fertility. Fertil Steril 1999;72:441-7.

10. Ball TM, Wright AL. Health care costs of formula-feeding in the first year of life. Pediatrics 1999;103(4 Pt. 2):870-6.

11. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant feeding practice. Pediatrics 1984;74:603-14.

12. American Academy of Family Physicians. Breastfeeding and Infant Nutrition. Available at: www.aafp.org/policy/issues/i3.htmal. Accessed July 16, 2001.

13. American Academy of Pediatrics. Work Group on Breastfeeding. Breastfeeding and the use of human milk. Pediatrics 1997;100:1035-9.

14. Piper S, Parks PL. Predicting the duration of lactation: evidence from a national survey. Birth 1996;23:7-12.

15. Weile B, Rubin DH, Krasilnikoff PA, Kuo HS, Jekel JF. Infant feeding patterns during the first year of life in Denmark: factors associated with the discontinuation of breastfeeding. J Clin Epidemiol 1990;43:1305-11.

16. Levy BT, Bergus GR, Levy SM, Kiritsy MC, Slager SL. Longitudinal feeding patterns of Iowa infants. Ambulatory Child Health 1996;2:25-34.

17. Rutishauser IH, Carlin JB. Body mass index and duration of breastfeeding: a survival analysis during the first six months of life. J Epidemiol Community Health 1992;46:559-65.

18. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74:603-14.

19. Kruinij N, Shiono PH, Rhoads GG. Breast-feeding incidence and duration in black and white women. Pediatrics 1988;81:365-71.

20. Righard L, Alade MO. Sucking technique and its effect on success of breastfeeding. Birth 1992;19:185-9.

21. Righard L, Alade MO. Breastfeeding and the use of pacifiers. Birth 1997;24:116-20.

22. Righard L. Are breastfeeding problems related to incorrect breastfeeding technique and the use of pacifiers and bottles? Birth 1998;25:40-4.

23. Victora CG, Tomasi E, Olinto MT, Barros FC. Use of pacifiers and breastfeeding duration. Lancet 1993;341:404-6.

24. Ford RP, Mitchell EA, Scragg R, Stewart AW, Taylor BJ, Allen EM. Factors adversely associated with breastfeeding in New Zealand. J Pediatrics Child Health 1994;30:483-9.

25. Clements MS, Mitchell EA, Wright SP, Esmail A, Jones DR, Ford RP. Influences on breastfeeding in southeast England. Acta Paediatrica 1997;86:51-6.

26. Vadiakas G, Oulis C, Berdouses E. Profile of non-nutritive sucking habits in relation to nursing behavior in pre-school children. J Clin Pediatr Dent 1998;22:133-6.

27. Larsson E. Orthodontic aspects on feeding of young children: a comparison between Swedish and Norwegian-Sami children. Swed Dent J 1998;22:117-21.

28. Barros FC, Victora CG, Semer TC, Tonioli Filho S, Tomasi E, Weiderpass E. Use of pacifiers is associated with decreased breast-feeding duration. Pediatrics 1995;95:497-9.

29. Victora CG, Behague DP, Barros FC, Olinto MT, Weiderpass E. Pacifier use and short breastfeeding duration: cause, consequence, or coincidence? Pediatrics 1997;99:445-53.

30. Hörnell A, Aarts C, Kylberg E, Hofvander Y, Gebre-Medhin M. Breastfeeding patterns in exclusively breastfed infants: a longitudinal prospective study in Uppsala, Sweden. Acta Paediatr 1999;88:203-11.

31. Aarts C, Hornell A, Kylberg E, Hofvander Y. Gebre-Medhin. Breastfeeding patterns in relation to thumb sucking and pacifier use. Pediatrics 1999;104:e50.-

32. Riva E, Banderali G, Agostoni C, Silano M, Radaelli G, Giovannini M. Factors associated with initiation and duration of breastfeeding in Italy. Acta Paediatr 1999;88:411-5.

33. Howard CR, Howard FM, Lanphear B, de Blieck EA, Eberly S, Lawrence RA. The effects of early pacifier use on breastfeeding duration. Pediatrics 1999;103:E33.-

34. Simopoulos AP, Grave GD. Factors associated with the choice and duration of infant-feeding practice. Pediatrics 1984;74(4 Part 2):603-14.

35. Palmer B. The influence of breastfeeding on the development of the oral cavity: a commentary. J Hum Lact 1998;14:93-8.

36. Protecting, promoting, and supporting breast-feeding: the special role of maternity services. A joint WHO/UNICEF statement. Geneva: World Health Organization, 1989.

37. Bergus GR, Levy BT, Levy SM, Slager SL, Kiritsy MC. A longitudinal study of the exposure of infants to antibiotics during the first 200 days of life. Arch Fam Med 1996;5:523-6.

38. Bergus GR, Levy SM, Kirchner L, Warren JJ, Levy BT. A prospective study of infection and associated antibiotic use in young children. Pediatr Perinatal Epidemiol 2001;15:61-7.

39. Levy SM, Kiritsy MC, Slager SL, Warren JJ, Kohout FJ. Patterns of fluoride dentifrice use among infants. Pediatr Dent 1997;19:50-5.

40. Heilman JR, Kiritsy MC, Levy SM, Wefel JR. Fluoride content of infant foods and cereals. JADA 1997;128:857-63.

41. Levy SM, Kiritsy MC, Slager SL, Warren JJ. Patterns of fluoride supplement use during infancy. J Public Health Dent 1998;58:228-33.

42. Heilman JR, Kiritsy MC, Levy SM, Wefel JS. Fluoride levels of carbonated soft drinks. J Am Dent Assoc 1999;130:1593-9.

43. Levy SM, Warren JJ, Davis CS, Kirchner HL, Kanellis MJ, Wefel JS. Patterns of fluoride intake from birth to 36 months. J Public Health Dent 2001;61:70-7.

44. Labbok M, Kroasovec K. Toward consistency in breast-feeding definitions. Stud Fam Planning 1990;21:226-30.

45. Cox DR. Regression models and life-tables (with discussion). J Royal Stat Soc 1972;B34:187-220.

46. SAS Institute, Inc. SAS technical report P-229, SAS/STAT software: changes and enhancements. Release 6.07. Cary, NC: SAS Institute, 1992.

All correspondence should be addressed to Dr. Steven M. Levy, University of Iowa College of Dentistry, Department of Preventive & Community Dentistry, N329 Dental Science Building, Iowa City, IA 52242. E-mail: [email protected]

To submit a letter to the editor on this topic, click here: [email protected].

Issue
The Journal of Family Practice - 51(05)
Issue
The Journal of Family Practice - 51(05)
Page Number
1-1
Page Number
1-1
Publications
Publications
Topics
Article Type
Display Headline
Associations of pacifier use, digit sucking, and child care attendance with cessation of breastfeeding
Display Headline
Associations of pacifier use, digit sucking, and child care attendance with cessation of breastfeeding
Legacy Keywords
,Non-nutritive suckingbreastfeedingchildcarepacifier usedigit sucking. (J Fam Pract 2002; 51:465)
Legacy Keywords
,Non-nutritive suckingbreastfeedingchildcarepacifier usedigit sucking. (J Fam Pract 2002; 51:465)
Sections
Disallow All Ads
Alternative CME
Article PDF Media