User login
Shoulder Joint Capsule Distension (Hydroplasty)
“Frozen shoulder,” most often caused by adhesive capsulitis, is frequently treated with intraarticular steroid injections, physical therapy, and surgical manipulation under anesthesia. These therapies provide limited benefits. Hydraulic distension of the shoulder joint capsule (hydroplasty) has potential to provide rapid relief of pain and immediate improvement of shoulder function for patients with adhesive capsulitis. We performed 21 hydroplasty procedures on 16 patients over a 4-year period. Ninety-four percent (17/18) of the procedures improved patients’ measured mobility immediately after the procedure. Fifty-three percent (10/19) of the procedures produced immediate, short term, and sustained improvement in comfort and function. No significant complications of the procedure were detected. Our series suggests that the hydroplasty procedure should be further evaluated.
“Frozen shoulder” is a clinical diagnosis frequently made for patients with shoulder pain and limited motion. Adhesive capsulitis is the most likely cause of the frozen shoulder syndrome in middle-aged adults.1 This pathophysiologic process involves joint capsular contraction from intraarticular adhesion of synovial folds. The medical literature frequently regards frozen shoulder and adhesive capsulitis as synonyms.
Although many treatment options have been proposed for the frozen shoulder syndrome, each has limitations. Home exercises may not improve the rate of natural recovery.2,3 Benefits from intensive physical therapy are slow.4 Manipulation while anesthetized can be effective, but significant complications have been documented and publications report protracted recovery.5 Injection of intraarticular steroids may benefit some patients, but this hypothesis is based on few quality studies.4,6 Arthroscopic release done under general anesthesia is invasive and few patients’ outcomes are reported.7,8
An infrequently cited option is hydraulic joint capsule distension under local anesthesia (hydroplasty). This is an office technique without arthrography, and was initially reported by Fareed and Gallivan9 in a case series of 20 patients. The patients in this report noted immediate pain resolution, return to normal sleep, and return of normal function. Benefits persisted for up to 10 years. Variations of this intervention are described in the orthopedic literature and results are favorable.10,11 We found no publications addressing the use of hydroplasty in a primary care office. In this study, we performed this procedure on a series of patients in a family medicine residency clinic.
Methods
Enrollment and data collection
We offered hydroplasty to a group of patients suffering from stiff and painful shoulders with limited range of motion (ROM) in a capsular pattern (reduced external rotation, abduction, and internal rotation) and pain in the C5 dermatome that had persisted for at least 1 month.12 Informed consent was obtained from patients who underwent the procedure.
Demographic and medical information was collected for all participants. One of the authors (RM) or a trained associate systematically measured pre-and post-procedure ROM on 18 of 21 procedures. Because of scheduling difficulties, 3 patients were not measured immediately before and after the procedure. Hydroplasty procedures were performed or supervised by the other author (LH). Subsequent information was collected during consultations after the procedure. Prior to this report, current shoulder status was assessed by telephone.
Hydroplasty technique
The hydroplasty procedure we used was adapted from Fareed.9 The anterior shoulder is prepped with the patient in a supine position. The affected humerus is externally rotated as tolerated. The glenohumeral crease is palpated to identify a subcoracoid window to enter the joint space. The skin is anesthetized using 1% lidocaine. The joint space is entered with an 18-gauge 1.5-inch needle angling slightly medially and superiorly, pointing toward the presumptive center of the glenoid fossa. Once the joint space is entered, approximately 5 ml of 1% lidocaine is injected. Minimal plunger resistance during this injection helps ensure joint space entry. With a severely contracted joint capsule, more plunger resistance may be encountered. One ml of triamcinolone (40 mg) is injected. Then up to 40 ml of sterile, chilled saline are forcibly injected into the joint space using 10-ml increment syringes. Clear fluid efflux from the needle is usually seen when syringes are changed. A sensation of reduced resistance to injection during saline injection suggests capsular distension or rupture.
Results
The hydroplasty procedure was offered and performed on 21 shoulders of 16 patients over 4 years. Subjects ranged in age from 37 to 76 years. Eleven female and 10 male shoulders were treated. Two patients had both shoulders treated, and 3 patients had the same shoulder treated on 2 separate occasions. One or both of the authors reevaluated 15 of 16 patients approximately 1 week (range 1 to 6 weeks) subsequent to the procedure.
ROM increased immediately post-procedure in 17 of 18 procedures in which measurements were recorded. The sum of changes in external rotation and internal rotation is reported in the Table. One patient experienced decreased ROM following a painful injection, but return to baseline of pain, motion, and function occurred within 24 hours.
Functional improvement was defined as the ability to accomplish a specific task that had been impossible prior to the procedure. Example functions included combing hair, putting an arm around a spouse, freestyle swimming, and reaching into a back pocket.
Pain relief was immediate in 11 of 21 shoulders. Temporary injection pain occurred in some procedures but injection pain resolved spontaneously. Significant pain relief was reported approximately 1 week following the procedure in 15 of 21 treatments.
Sustained benefits were confirmed by a telephone survey for the 14 patients whom we were able to contact. Ten of nineteen procedures (53%) produced enduring benefit of comfort, motion, and function for up to 55 months. One patient was lost to follow-up and one patient died prior to the telephone survey. The deceased patient suffered from gallbladder cancer and died in Mexico after a cancer-related operation 7 months after the hydroplasty procedure. Results are summarized in the Table.
TABLE 1
ROTATION CHANGES FOLLOWING HYDROPLASTY PROCEDURE
Procedure Number | Patient (Shoulder Treated) | Duration of Symptoms, in Months | Change in ROM | Immediate Function Benefit | Immediate Effect on Pain* | Pain at 1 to 6 Weeks* | Prolonged Benefit † (Months) |
---|---|---|---|---|---|---|---|
1 | A (L) | 4 | NM | Y | ↓ | ↓ | Y (55) |
2 | B (L) | 3 | +50 | Y | ↓ | ↓ | Y (41) |
3 | B (R) | 3 | +35 | Y | → | ↓ | Y (40) |
4 | C (R) | 8 | +30 | Y | ↓ | ↓ | Y (36) |
5 | D (R) | 60 | +25 | Y | ↓ | ↓ | N |
6 | E (L) | 6 | NM | Y | ↓ | Lost | Lost |
7 | F (R) | 12 | NM | Y | ↓ | ↓ | Y (30) |
8 | G (L) | 8 | +30 | Y | ↓ | ↓ | Y (4) |
9 | H (R) | 19 | +20 | Y | → | ↓ | N |
10 | I (L) | 84 | -35 | N | ↑ | ↑ | N |
11 | G (L) | 8 | +50 | Y | ↓ | ↓ | Y (25) |
12 | J (R) | 7 | +25 | Y | ↓ | ↓ | Y (1) |
13 | J (R) | 8 | +5 | N | ↑ | ↑ | D |
14 | A (R) | 3 | +45 | Y | ↓ | ↓ | Y (16) |
15 | K (L) | 3 | +25 | N | ↑ | ↑ | N |
16 | L (R) | 4 | +30 | Y | → | ↓ | N |
17 | M (L) | 4 | +20 | Y | → | → | N |
18 | L (R) | 7 | +30 | N | ↑ | ↑ | N |
19 | N (R) | 1 | +20 | N | → | → | N |
20 | O (L) | 4 | +20 | Y | → | ↓ | Y (7) |
21 | P (L) | 6 | +10 | Y | ↓ | ↓ | N |
Summary Results | 16 patients; 21 treatments | Average = 12.5 months | 17/18 (94%) increased ROM | 16/21 (76%) improved function | 11/21 (52%) immediate relief | 15/20 (75%) relief at 1-6 weeks | 10/19 (53%) prolonged benefit |
NM denotes not measured; Lost, lost to follow-up. | |||||||
*Pain abbreviations: ↓=Pain decreased; →=Pain was unchanged; ↑Pain increased. | |||||||
† Y denotes yes; N, no; D, deceased. |
Discussion
In our case series of hydroplasty for an unrestricted population of patients with capsular syndrome in the primary care office, 52% percent of patients experienced immediate pain relief and functional improvement. Benefits were sustained in 53% of patients for up to 55 months. Individuals who experienced improvement considered the benefits dramatic.
Study limitations include few patients, failure to record patients who refused the procedure, potential selection bias, and pathophysiologic diagnostic uncertainty. Although a few patients declined the procedure by authors’ recollection, these were not tallied. Patients were encountered by presenting to an author or by word-of-mouth publicity. Patients who were pleased by the results of their procedure referred other patients. This may not be typical of a primary care practice.
Because this was not a randomized controlled trial, we cannot be certain that the benefit was a result of injected medications or saline distension. We attempted to exclude the anesthetic effect by reassessing pain and function approximately 1 week after the procedure. Corticosteroid injection was unlikely to explain the immediate benefits observed.
The question of diagnostic uncertainty is important. Adhesive capsulitis could logically respond to capsular distension. A clinical examination may be insufficient to differentiate this process from other inflammatory processes that cause pain and tethering loss of motion. Hydroplasty would likely fail if a capsular contraction process were not in progress.
Reports of some other published trials suggest results superior to our series.9,10,11 There are several possible explanations. Visualization during arthrography might improve diagnostic certainty and consequently improve patient selection. More restrictive clinical patient selection parameters might improve the likelihood of treating patients who actually have adhesive capsulitis. Success might also depend on technical details, such as the volume and pressure applied during the distention injections. Randomized controlled trials comparing this treatment to other treatments were methodologically flawed.13,14 A systematic review concludes there is little evidence to support or refute efficacy of common interventions.6
Conclusions
Shoulder hydroplasty is an office procedure that may provide immediate and dramatic benefit to patients suffering from adhesive capsulitis. There is a need for a comprehensive study of this syndrome and its treatment by primary care clinicians. Explicit definitions and prospective evaluation of treatments might clarify options for the patient and the front-line clinician. Use of expanded symptom scoring systems such as the Simple Shoulder Test and the Medical Outcomes Study Short-Form Health Survey could provide valid, reliable outcome measures.2 While hydroplasty is an option for treatment of stiff and painful shoulders, it should ideally be compared with other treatment modalities in a randomized controlled trial.
Acknowledgments
The authors are grateful to Martee Robinson, Sheri Price, Vickie Greenwood, and the Cox Family Practice Residency writing group for immeasurable support and assistance.
1. Siegel LB, Cohen NJ, Gall EP. Adhesive capsulitis: a sticky issue. Am Fam Physician 1999;59:1843-50.
2. O’Kane JW, Jackson S, Sidles JA, Smith KL, Matsen FA. Simple home program for frozen shoulder to improve patient’s assessment of shoulder function and health status. J Am Board Fam Pract 1999;12:270-77.
3. Reeves B. The natural history of the frozen shoulder syndrome. Scand J Rheumatology 1975;4:193-96.
4. van der Windt DAWM, Koes BW, Deville W, Boeke AJP, de Jong BA, Bouter LM. Effectiveness of corticosteroid injections versus physiotherapy for treatment of painful stiff shoulder in primary care: randomized trial. BMJ 1998;317:1292-96.
5. Dodenhoff RM, Levy O, Wilson A, Copeland SA. Manipulation under anesthesia for primary frozen shoulder: effect on early recovery and return to activity. J Shoulder Elbow Surg 2000;9:23-26.
6. Green S, Buchbinder R, Glazier R, Forbes A. Systematic review of randomized controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacy. BMJ 1998;315:354-60.
7. Warner JJ, Allen A, Marks PH, Wong P. Arthroscopic release for chronic, refractory adhesive capsulitis of the shoulder. J Bone Joint Surg [Am] 1996;78:1808-16.
8. Harryman DT, Matsen FA, Sidles JA. Arthroscopic management of refractory shoulder stiffness: arthroscopy: J Arthroscopic Related Surg 1997;13:133-47.
9. Fareed DO, Gallivan WR. Office management of frozen shoulder syndrome: treatment with hydraulic distension under local anesthesia. Clin Orthopaed Related Res 1989;242:177-83.
10. Andren L, Lundberg BJ. Treatment of Rigid shoulders by Joint Distension During Arthrography. Acta Orthop Scand 1965;36:45-53.
11. Van Royen BJ, Pavlov PW. Treatment of frozen shoulder by distension and manipulation under local anaesthesia. Int Orthopaed (SCIOT) 1996;20:207-10.
12. van der Windt DAWM, Koes BW, de Jong BA, Bouter LM. Shoulder disorders in general practice: incidence, patient characteristics and management. Ann Rheum Dis 1995;54:959-64.
13. Corbeil V, Dussault RG, Leduc BE, Fleury J. Capsulite retractile de l’epaule: etude comparative de l’arthrographie avec corticotherapie intra-articulaire avec ou sans distension capsulaire. J Can Assoc Radiol 1992;43:127-30.
14. Jacobs LGH, Barton MAJ, Wallace WA, Ferrousis J, Dunn NA, Bossingham DH. Intra-articular distension and steroids in the management of capsulitis of the shoulder. BMJ 1991;302:1498-501.
“Frozen shoulder,” most often caused by adhesive capsulitis, is frequently treated with intraarticular steroid injections, physical therapy, and surgical manipulation under anesthesia. These therapies provide limited benefits. Hydraulic distension of the shoulder joint capsule (hydroplasty) has potential to provide rapid relief of pain and immediate improvement of shoulder function for patients with adhesive capsulitis. We performed 21 hydroplasty procedures on 16 patients over a 4-year period. Ninety-four percent (17/18) of the procedures improved patients’ measured mobility immediately after the procedure. Fifty-three percent (10/19) of the procedures produced immediate, short term, and sustained improvement in comfort and function. No significant complications of the procedure were detected. Our series suggests that the hydroplasty procedure should be further evaluated.
“Frozen shoulder” is a clinical diagnosis frequently made for patients with shoulder pain and limited motion. Adhesive capsulitis is the most likely cause of the frozen shoulder syndrome in middle-aged adults.1 This pathophysiologic process involves joint capsular contraction from intraarticular adhesion of synovial folds. The medical literature frequently regards frozen shoulder and adhesive capsulitis as synonyms.
Although many treatment options have been proposed for the frozen shoulder syndrome, each has limitations. Home exercises may not improve the rate of natural recovery.2,3 Benefits from intensive physical therapy are slow.4 Manipulation while anesthetized can be effective, but significant complications have been documented and publications report protracted recovery.5 Injection of intraarticular steroids may benefit some patients, but this hypothesis is based on few quality studies.4,6 Arthroscopic release done under general anesthesia is invasive and few patients’ outcomes are reported.7,8
An infrequently cited option is hydraulic joint capsule distension under local anesthesia (hydroplasty). This is an office technique without arthrography, and was initially reported by Fareed and Gallivan9 in a case series of 20 patients. The patients in this report noted immediate pain resolution, return to normal sleep, and return of normal function. Benefits persisted for up to 10 years. Variations of this intervention are described in the orthopedic literature and results are favorable.10,11 We found no publications addressing the use of hydroplasty in a primary care office. In this study, we performed this procedure on a series of patients in a family medicine residency clinic.
Methods
Enrollment and data collection
We offered hydroplasty to a group of patients suffering from stiff and painful shoulders with limited range of motion (ROM) in a capsular pattern (reduced external rotation, abduction, and internal rotation) and pain in the C5 dermatome that had persisted for at least 1 month.12 Informed consent was obtained from patients who underwent the procedure.
Demographic and medical information was collected for all participants. One of the authors (RM) or a trained associate systematically measured pre-and post-procedure ROM on 18 of 21 procedures. Because of scheduling difficulties, 3 patients were not measured immediately before and after the procedure. Hydroplasty procedures were performed or supervised by the other author (LH). Subsequent information was collected during consultations after the procedure. Prior to this report, current shoulder status was assessed by telephone.
Hydroplasty technique
The hydroplasty procedure we used was adapted from Fareed.9 The anterior shoulder is prepped with the patient in a supine position. The affected humerus is externally rotated as tolerated. The glenohumeral crease is palpated to identify a subcoracoid window to enter the joint space. The skin is anesthetized using 1% lidocaine. The joint space is entered with an 18-gauge 1.5-inch needle angling slightly medially and superiorly, pointing toward the presumptive center of the glenoid fossa. Once the joint space is entered, approximately 5 ml of 1% lidocaine is injected. Minimal plunger resistance during this injection helps ensure joint space entry. With a severely contracted joint capsule, more plunger resistance may be encountered. One ml of triamcinolone (40 mg) is injected. Then up to 40 ml of sterile, chilled saline are forcibly injected into the joint space using 10-ml increment syringes. Clear fluid efflux from the needle is usually seen when syringes are changed. A sensation of reduced resistance to injection during saline injection suggests capsular distension or rupture.
Results
The hydroplasty procedure was offered and performed on 21 shoulders of 16 patients over 4 years. Subjects ranged in age from 37 to 76 years. Eleven female and 10 male shoulders were treated. Two patients had both shoulders treated, and 3 patients had the same shoulder treated on 2 separate occasions. One or both of the authors reevaluated 15 of 16 patients approximately 1 week (range 1 to 6 weeks) subsequent to the procedure.
ROM increased immediately post-procedure in 17 of 18 procedures in which measurements were recorded. The sum of changes in external rotation and internal rotation is reported in the Table. One patient experienced decreased ROM following a painful injection, but return to baseline of pain, motion, and function occurred within 24 hours.
Functional improvement was defined as the ability to accomplish a specific task that had been impossible prior to the procedure. Example functions included combing hair, putting an arm around a spouse, freestyle swimming, and reaching into a back pocket.
Pain relief was immediate in 11 of 21 shoulders. Temporary injection pain occurred in some procedures but injection pain resolved spontaneously. Significant pain relief was reported approximately 1 week following the procedure in 15 of 21 treatments.
Sustained benefits were confirmed by a telephone survey for the 14 patients whom we were able to contact. Ten of nineteen procedures (53%) produced enduring benefit of comfort, motion, and function for up to 55 months. One patient was lost to follow-up and one patient died prior to the telephone survey. The deceased patient suffered from gallbladder cancer and died in Mexico after a cancer-related operation 7 months after the hydroplasty procedure. Results are summarized in the Table.
TABLE 1
ROTATION CHANGES FOLLOWING HYDROPLASTY PROCEDURE
Procedure Number | Patient (Shoulder Treated) | Duration of Symptoms, in Months | Change in ROM | Immediate Function Benefit | Immediate Effect on Pain* | Pain at 1 to 6 Weeks* | Prolonged Benefit † (Months) |
---|---|---|---|---|---|---|---|
1 | A (L) | 4 | NM | Y | ↓ | ↓ | Y (55) |
2 | B (L) | 3 | +50 | Y | ↓ | ↓ | Y (41) |
3 | B (R) | 3 | +35 | Y | → | ↓ | Y (40) |
4 | C (R) | 8 | +30 | Y | ↓ | ↓ | Y (36) |
5 | D (R) | 60 | +25 | Y | ↓ | ↓ | N |
6 | E (L) | 6 | NM | Y | ↓ | Lost | Lost |
7 | F (R) | 12 | NM | Y | ↓ | ↓ | Y (30) |
8 | G (L) | 8 | +30 | Y | ↓ | ↓ | Y (4) |
9 | H (R) | 19 | +20 | Y | → | ↓ | N |
10 | I (L) | 84 | -35 | N | ↑ | ↑ | N |
11 | G (L) | 8 | +50 | Y | ↓ | ↓ | Y (25) |
12 | J (R) | 7 | +25 | Y | ↓ | ↓ | Y (1) |
13 | J (R) | 8 | +5 | N | ↑ | ↑ | D |
14 | A (R) | 3 | +45 | Y | ↓ | ↓ | Y (16) |
15 | K (L) | 3 | +25 | N | ↑ | ↑ | N |
16 | L (R) | 4 | +30 | Y | → | ↓ | N |
17 | M (L) | 4 | +20 | Y | → | → | N |
18 | L (R) | 7 | +30 | N | ↑ | ↑ | N |
19 | N (R) | 1 | +20 | N | → | → | N |
20 | O (L) | 4 | +20 | Y | → | ↓ | Y (7) |
21 | P (L) | 6 | +10 | Y | ↓ | ↓ | N |
Summary Results | 16 patients; 21 treatments | Average = 12.5 months | 17/18 (94%) increased ROM | 16/21 (76%) improved function | 11/21 (52%) immediate relief | 15/20 (75%) relief at 1-6 weeks | 10/19 (53%) prolonged benefit |
NM denotes not measured; Lost, lost to follow-up. | |||||||
*Pain abbreviations: ↓=Pain decreased; →=Pain was unchanged; ↑Pain increased. | |||||||
† Y denotes yes; N, no; D, deceased. |
Discussion
In our case series of hydroplasty for an unrestricted population of patients with capsular syndrome in the primary care office, 52% percent of patients experienced immediate pain relief and functional improvement. Benefits were sustained in 53% of patients for up to 55 months. Individuals who experienced improvement considered the benefits dramatic.
Study limitations include few patients, failure to record patients who refused the procedure, potential selection bias, and pathophysiologic diagnostic uncertainty. Although a few patients declined the procedure by authors’ recollection, these were not tallied. Patients were encountered by presenting to an author or by word-of-mouth publicity. Patients who were pleased by the results of their procedure referred other patients. This may not be typical of a primary care practice.
Because this was not a randomized controlled trial, we cannot be certain that the benefit was a result of injected medications or saline distension. We attempted to exclude the anesthetic effect by reassessing pain and function approximately 1 week after the procedure. Corticosteroid injection was unlikely to explain the immediate benefits observed.
The question of diagnostic uncertainty is important. Adhesive capsulitis could logically respond to capsular distension. A clinical examination may be insufficient to differentiate this process from other inflammatory processes that cause pain and tethering loss of motion. Hydroplasty would likely fail if a capsular contraction process were not in progress.
Reports of some other published trials suggest results superior to our series.9,10,11 There are several possible explanations. Visualization during arthrography might improve diagnostic certainty and consequently improve patient selection. More restrictive clinical patient selection parameters might improve the likelihood of treating patients who actually have adhesive capsulitis. Success might also depend on technical details, such as the volume and pressure applied during the distention injections. Randomized controlled trials comparing this treatment to other treatments were methodologically flawed.13,14 A systematic review concludes there is little evidence to support or refute efficacy of common interventions.6
Conclusions
Shoulder hydroplasty is an office procedure that may provide immediate and dramatic benefit to patients suffering from adhesive capsulitis. There is a need for a comprehensive study of this syndrome and its treatment by primary care clinicians. Explicit definitions and prospective evaluation of treatments might clarify options for the patient and the front-line clinician. Use of expanded symptom scoring systems such as the Simple Shoulder Test and the Medical Outcomes Study Short-Form Health Survey could provide valid, reliable outcome measures.2 While hydroplasty is an option for treatment of stiff and painful shoulders, it should ideally be compared with other treatment modalities in a randomized controlled trial.
Acknowledgments
The authors are grateful to Martee Robinson, Sheri Price, Vickie Greenwood, and the Cox Family Practice Residency writing group for immeasurable support and assistance.
“Frozen shoulder,” most often caused by adhesive capsulitis, is frequently treated with intraarticular steroid injections, physical therapy, and surgical manipulation under anesthesia. These therapies provide limited benefits. Hydraulic distension of the shoulder joint capsule (hydroplasty) has potential to provide rapid relief of pain and immediate improvement of shoulder function for patients with adhesive capsulitis. We performed 21 hydroplasty procedures on 16 patients over a 4-year period. Ninety-four percent (17/18) of the procedures improved patients’ measured mobility immediately after the procedure. Fifty-three percent (10/19) of the procedures produced immediate, short term, and sustained improvement in comfort and function. No significant complications of the procedure were detected. Our series suggests that the hydroplasty procedure should be further evaluated.
“Frozen shoulder” is a clinical diagnosis frequently made for patients with shoulder pain and limited motion. Adhesive capsulitis is the most likely cause of the frozen shoulder syndrome in middle-aged adults.1 This pathophysiologic process involves joint capsular contraction from intraarticular adhesion of synovial folds. The medical literature frequently regards frozen shoulder and adhesive capsulitis as synonyms.
Although many treatment options have been proposed for the frozen shoulder syndrome, each has limitations. Home exercises may not improve the rate of natural recovery.2,3 Benefits from intensive physical therapy are slow.4 Manipulation while anesthetized can be effective, but significant complications have been documented and publications report protracted recovery.5 Injection of intraarticular steroids may benefit some patients, but this hypothesis is based on few quality studies.4,6 Arthroscopic release done under general anesthesia is invasive and few patients’ outcomes are reported.7,8
An infrequently cited option is hydraulic joint capsule distension under local anesthesia (hydroplasty). This is an office technique without arthrography, and was initially reported by Fareed and Gallivan9 in a case series of 20 patients. The patients in this report noted immediate pain resolution, return to normal sleep, and return of normal function. Benefits persisted for up to 10 years. Variations of this intervention are described in the orthopedic literature and results are favorable.10,11 We found no publications addressing the use of hydroplasty in a primary care office. In this study, we performed this procedure on a series of patients in a family medicine residency clinic.
Methods
Enrollment and data collection
We offered hydroplasty to a group of patients suffering from stiff and painful shoulders with limited range of motion (ROM) in a capsular pattern (reduced external rotation, abduction, and internal rotation) and pain in the C5 dermatome that had persisted for at least 1 month.12 Informed consent was obtained from patients who underwent the procedure.
Demographic and medical information was collected for all participants. One of the authors (RM) or a trained associate systematically measured pre-and post-procedure ROM on 18 of 21 procedures. Because of scheduling difficulties, 3 patients were not measured immediately before and after the procedure. Hydroplasty procedures were performed or supervised by the other author (LH). Subsequent information was collected during consultations after the procedure. Prior to this report, current shoulder status was assessed by telephone.
Hydroplasty technique
The hydroplasty procedure we used was adapted from Fareed.9 The anterior shoulder is prepped with the patient in a supine position. The affected humerus is externally rotated as tolerated. The glenohumeral crease is palpated to identify a subcoracoid window to enter the joint space. The skin is anesthetized using 1% lidocaine. The joint space is entered with an 18-gauge 1.5-inch needle angling slightly medially and superiorly, pointing toward the presumptive center of the glenoid fossa. Once the joint space is entered, approximately 5 ml of 1% lidocaine is injected. Minimal plunger resistance during this injection helps ensure joint space entry. With a severely contracted joint capsule, more plunger resistance may be encountered. One ml of triamcinolone (40 mg) is injected. Then up to 40 ml of sterile, chilled saline are forcibly injected into the joint space using 10-ml increment syringes. Clear fluid efflux from the needle is usually seen when syringes are changed. A sensation of reduced resistance to injection during saline injection suggests capsular distension or rupture.
Results
The hydroplasty procedure was offered and performed on 21 shoulders of 16 patients over 4 years. Subjects ranged in age from 37 to 76 years. Eleven female and 10 male shoulders were treated. Two patients had both shoulders treated, and 3 patients had the same shoulder treated on 2 separate occasions. One or both of the authors reevaluated 15 of 16 patients approximately 1 week (range 1 to 6 weeks) subsequent to the procedure.
ROM increased immediately post-procedure in 17 of 18 procedures in which measurements were recorded. The sum of changes in external rotation and internal rotation is reported in the Table. One patient experienced decreased ROM following a painful injection, but return to baseline of pain, motion, and function occurred within 24 hours.
Functional improvement was defined as the ability to accomplish a specific task that had been impossible prior to the procedure. Example functions included combing hair, putting an arm around a spouse, freestyle swimming, and reaching into a back pocket.
Pain relief was immediate in 11 of 21 shoulders. Temporary injection pain occurred in some procedures but injection pain resolved spontaneously. Significant pain relief was reported approximately 1 week following the procedure in 15 of 21 treatments.
Sustained benefits were confirmed by a telephone survey for the 14 patients whom we were able to contact. Ten of nineteen procedures (53%) produced enduring benefit of comfort, motion, and function for up to 55 months. One patient was lost to follow-up and one patient died prior to the telephone survey. The deceased patient suffered from gallbladder cancer and died in Mexico after a cancer-related operation 7 months after the hydroplasty procedure. Results are summarized in the Table.
TABLE 1
ROTATION CHANGES FOLLOWING HYDROPLASTY PROCEDURE
Procedure Number | Patient (Shoulder Treated) | Duration of Symptoms, in Months | Change in ROM | Immediate Function Benefit | Immediate Effect on Pain* | Pain at 1 to 6 Weeks* | Prolonged Benefit † (Months) |
---|---|---|---|---|---|---|---|
1 | A (L) | 4 | NM | Y | ↓ | ↓ | Y (55) |
2 | B (L) | 3 | +50 | Y | ↓ | ↓ | Y (41) |
3 | B (R) | 3 | +35 | Y | → | ↓ | Y (40) |
4 | C (R) | 8 | +30 | Y | ↓ | ↓ | Y (36) |
5 | D (R) | 60 | +25 | Y | ↓ | ↓ | N |
6 | E (L) | 6 | NM | Y | ↓ | Lost | Lost |
7 | F (R) | 12 | NM | Y | ↓ | ↓ | Y (30) |
8 | G (L) | 8 | +30 | Y | ↓ | ↓ | Y (4) |
9 | H (R) | 19 | +20 | Y | → | ↓ | N |
10 | I (L) | 84 | -35 | N | ↑ | ↑ | N |
11 | G (L) | 8 | +50 | Y | ↓ | ↓ | Y (25) |
12 | J (R) | 7 | +25 | Y | ↓ | ↓ | Y (1) |
13 | J (R) | 8 | +5 | N | ↑ | ↑ | D |
14 | A (R) | 3 | +45 | Y | ↓ | ↓ | Y (16) |
15 | K (L) | 3 | +25 | N | ↑ | ↑ | N |
16 | L (R) | 4 | +30 | Y | → | ↓ | N |
17 | M (L) | 4 | +20 | Y | → | → | N |
18 | L (R) | 7 | +30 | N | ↑ | ↑ | N |
19 | N (R) | 1 | +20 | N | → | → | N |
20 | O (L) | 4 | +20 | Y | → | ↓ | Y (7) |
21 | P (L) | 6 | +10 | Y | ↓ | ↓ | N |
Summary Results | 16 patients; 21 treatments | Average = 12.5 months | 17/18 (94%) increased ROM | 16/21 (76%) improved function | 11/21 (52%) immediate relief | 15/20 (75%) relief at 1-6 weeks | 10/19 (53%) prolonged benefit |
NM denotes not measured; Lost, lost to follow-up. | |||||||
*Pain abbreviations: ↓=Pain decreased; →=Pain was unchanged; ↑Pain increased. | |||||||
† Y denotes yes; N, no; D, deceased. |
Discussion
In our case series of hydroplasty for an unrestricted population of patients with capsular syndrome in the primary care office, 52% percent of patients experienced immediate pain relief and functional improvement. Benefits were sustained in 53% of patients for up to 55 months. Individuals who experienced improvement considered the benefits dramatic.
Study limitations include few patients, failure to record patients who refused the procedure, potential selection bias, and pathophysiologic diagnostic uncertainty. Although a few patients declined the procedure by authors’ recollection, these were not tallied. Patients were encountered by presenting to an author or by word-of-mouth publicity. Patients who were pleased by the results of their procedure referred other patients. This may not be typical of a primary care practice.
Because this was not a randomized controlled trial, we cannot be certain that the benefit was a result of injected medications or saline distension. We attempted to exclude the anesthetic effect by reassessing pain and function approximately 1 week after the procedure. Corticosteroid injection was unlikely to explain the immediate benefits observed.
The question of diagnostic uncertainty is important. Adhesive capsulitis could logically respond to capsular distension. A clinical examination may be insufficient to differentiate this process from other inflammatory processes that cause pain and tethering loss of motion. Hydroplasty would likely fail if a capsular contraction process were not in progress.
Reports of some other published trials suggest results superior to our series.9,10,11 There are several possible explanations. Visualization during arthrography might improve diagnostic certainty and consequently improve patient selection. More restrictive clinical patient selection parameters might improve the likelihood of treating patients who actually have adhesive capsulitis. Success might also depend on technical details, such as the volume and pressure applied during the distention injections. Randomized controlled trials comparing this treatment to other treatments were methodologically flawed.13,14 A systematic review concludes there is little evidence to support or refute efficacy of common interventions.6
Conclusions
Shoulder hydroplasty is an office procedure that may provide immediate and dramatic benefit to patients suffering from adhesive capsulitis. There is a need for a comprehensive study of this syndrome and its treatment by primary care clinicians. Explicit definitions and prospective evaluation of treatments might clarify options for the patient and the front-line clinician. Use of expanded symptom scoring systems such as the Simple Shoulder Test and the Medical Outcomes Study Short-Form Health Survey could provide valid, reliable outcome measures.2 While hydroplasty is an option for treatment of stiff and painful shoulders, it should ideally be compared with other treatment modalities in a randomized controlled trial.
Acknowledgments
The authors are grateful to Martee Robinson, Sheri Price, Vickie Greenwood, and the Cox Family Practice Residency writing group for immeasurable support and assistance.
1. Siegel LB, Cohen NJ, Gall EP. Adhesive capsulitis: a sticky issue. Am Fam Physician 1999;59:1843-50.
2. O’Kane JW, Jackson S, Sidles JA, Smith KL, Matsen FA. Simple home program for frozen shoulder to improve patient’s assessment of shoulder function and health status. J Am Board Fam Pract 1999;12:270-77.
3. Reeves B. The natural history of the frozen shoulder syndrome. Scand J Rheumatology 1975;4:193-96.
4. van der Windt DAWM, Koes BW, Deville W, Boeke AJP, de Jong BA, Bouter LM. Effectiveness of corticosteroid injections versus physiotherapy for treatment of painful stiff shoulder in primary care: randomized trial. BMJ 1998;317:1292-96.
5. Dodenhoff RM, Levy O, Wilson A, Copeland SA. Manipulation under anesthesia for primary frozen shoulder: effect on early recovery and return to activity. J Shoulder Elbow Surg 2000;9:23-26.
6. Green S, Buchbinder R, Glazier R, Forbes A. Systematic review of randomized controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacy. BMJ 1998;315:354-60.
7. Warner JJ, Allen A, Marks PH, Wong P. Arthroscopic release for chronic, refractory adhesive capsulitis of the shoulder. J Bone Joint Surg [Am] 1996;78:1808-16.
8. Harryman DT, Matsen FA, Sidles JA. Arthroscopic management of refractory shoulder stiffness: arthroscopy: J Arthroscopic Related Surg 1997;13:133-47.
9. Fareed DO, Gallivan WR. Office management of frozen shoulder syndrome: treatment with hydraulic distension under local anesthesia. Clin Orthopaed Related Res 1989;242:177-83.
10. Andren L, Lundberg BJ. Treatment of Rigid shoulders by Joint Distension During Arthrography. Acta Orthop Scand 1965;36:45-53.
11. Van Royen BJ, Pavlov PW. Treatment of frozen shoulder by distension and manipulation under local anaesthesia. Int Orthopaed (SCIOT) 1996;20:207-10.
12. van der Windt DAWM, Koes BW, de Jong BA, Bouter LM. Shoulder disorders in general practice: incidence, patient characteristics and management. Ann Rheum Dis 1995;54:959-64.
13. Corbeil V, Dussault RG, Leduc BE, Fleury J. Capsulite retractile de l’epaule: etude comparative de l’arthrographie avec corticotherapie intra-articulaire avec ou sans distension capsulaire. J Can Assoc Radiol 1992;43:127-30.
14. Jacobs LGH, Barton MAJ, Wallace WA, Ferrousis J, Dunn NA, Bossingham DH. Intra-articular distension and steroids in the management of capsulitis of the shoulder. BMJ 1991;302:1498-501.
1. Siegel LB, Cohen NJ, Gall EP. Adhesive capsulitis: a sticky issue. Am Fam Physician 1999;59:1843-50.
2. O’Kane JW, Jackson S, Sidles JA, Smith KL, Matsen FA. Simple home program for frozen shoulder to improve patient’s assessment of shoulder function and health status. J Am Board Fam Pract 1999;12:270-77.
3. Reeves B. The natural history of the frozen shoulder syndrome. Scand J Rheumatology 1975;4:193-96.
4. van der Windt DAWM, Koes BW, Deville W, Boeke AJP, de Jong BA, Bouter LM. Effectiveness of corticosteroid injections versus physiotherapy for treatment of painful stiff shoulder in primary care: randomized trial. BMJ 1998;317:1292-96.
5. Dodenhoff RM, Levy O, Wilson A, Copeland SA. Manipulation under anesthesia for primary frozen shoulder: effect on early recovery and return to activity. J Shoulder Elbow Surg 2000;9:23-26.
6. Green S, Buchbinder R, Glazier R, Forbes A. Systematic review of randomized controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacy. BMJ 1998;315:354-60.
7. Warner JJ, Allen A, Marks PH, Wong P. Arthroscopic release for chronic, refractory adhesive capsulitis of the shoulder. J Bone Joint Surg [Am] 1996;78:1808-16.
8. Harryman DT, Matsen FA, Sidles JA. Arthroscopic management of refractory shoulder stiffness: arthroscopy: J Arthroscopic Related Surg 1997;13:133-47.
9. Fareed DO, Gallivan WR. Office management of frozen shoulder syndrome: treatment with hydraulic distension under local anesthesia. Clin Orthopaed Related Res 1989;242:177-83.
10. Andren L, Lundberg BJ. Treatment of Rigid shoulders by Joint Distension During Arthrography. Acta Orthop Scand 1965;36:45-53.
11. Van Royen BJ, Pavlov PW. Treatment of frozen shoulder by distension and manipulation under local anaesthesia. Int Orthopaed (SCIOT) 1996;20:207-10.
12. van der Windt DAWM, Koes BW, de Jong BA, Bouter LM. Shoulder disorders in general practice: incidence, patient characteristics and management. Ann Rheum Dis 1995;54:959-64.
13. Corbeil V, Dussault RG, Leduc BE, Fleury J. Capsulite retractile de l’epaule: etude comparative de l’arthrographie avec corticotherapie intra-articulaire avec ou sans distension capsulaire. J Can Assoc Radiol 1992;43:127-30.
14. Jacobs LGH, Barton MAJ, Wallace WA, Ferrousis J, Dunn NA, Bossingham DH. Intra-articular distension and steroids in the management of capsulitis of the shoulder. BMJ 1991;302:1498-501.
When Physicians and Patients Think Alike: Patient-Centered Beliefs and Their Impact on Satisfaction and Trust
Study Design: Physicians provided demographic information and completed a scale assessing their beliefs about sharing information and power with their patients. A sample of their patients filled out the same scale and made evaluations of their physicians before and after a target visit.
Population: Physicians and patients in a large multispecialty group practice and a model health maintenance organization were included. Forty-five physicians in internal medicine, family practice, and cardiology participated, as well as 909 of their patients who had a significant concern.
Outcomes Measured: We measured trust in the physician pre-visit, and visit satisfaction and physician endorsement immediately post-visit.
Results: Among patients, patient-centered beliefs (a preference for information and control) were associated with being female, white, younger, more educated, and having a higher income; among physicians these beliefs were unrelated to sex, ethnicity, or experience. The patients of patient-centered physicians were no more trusting or endorsing of their physicians, and they were not more satisfied with the target visit. However, patients whose beliefs were congruent with their physicians’ beliefs were more likely to trust and endorse them, even though they were not more satisfied with the target visit.
Conclusions: The extent of congruence between physicians’ and patients’ beliefs plays an important role in determining how patients evaluate their physicians, although satisfaction with a specific visit and overall trust may be determined differently.
A patient-centered approach to care has been widely advocated1-5 Although there are several dimensions to patient-centeredness, one key element involves patient participation and the sharing of power and information between the patient and physician. Physicians who take a patient-centered orientation approach are more likely to treat them as partners, and assist them in making informed choices among several options. This approach has been associated with a range of positive outcomes, such as heightened patient satisfaction, better adherence, and improved health outcomes.6-10
Yet in spite of the general effectiveness of patient-centeredness, it is reasonable to ask whether a one-size-fits-all approach to patient care is the best one. Some patients—such as the elderly, or patients of certain ethnic backgrounds, for example—may desire a physician whose style is more structured and who provides more guidance.11-14 Patients who are sick or have serious health concerns may also want their physicians to provide more direction.15,16 Therefore, while accepting the overall value of patient-centeredness, some physicians and researchers have advocated that the degree of “fit” between patients and physicians, (the extent to which the physician holds attitudes and beliefs that are congruent with those of the patient17,21 ) should have an independent effect upon patients’ reactions to their health care providers.
Our study involves the measurement of patient-centeredness among both physicians and patients, in particular beliefs about the sharing of power and information. We asked what personal characteristics were associated with patient-centered beliefs among physicians and patients, and investigated the extent to which patients felt positively about clinicians who hold matching opinions about power and information sharing.
Methods
The data we report come from the Physician Patient Communication Project, a large observational study conducted in the Sacramento, California, metropolitan area. Patients in the study were surveyed before and immediately after a target outpatient visit. The physicians provided data before and immediately after the same visits.
Physician Sampling and Data Collection
All physicians and patients in the study were affiliated with one of the 2 major health care systems in the region, the University of California, Davis, Medical Group (UCDMG) or Kaiser Permanente (KP). All physicians were involved in direct patient care at least 20 hours per week in family medicine, internal medicine, or cardiology. Forty-five physicians took part in the study (22 from UCDMG, 23 from KP). Eighteen practiced general internal medicine; 16 were family physicians; and 11 were cardiologists. The UCDMG and KP physicians did not differ significantly with regard to age or sex.
All participating physicians filled out the Clinician Background Questionnaire, which contained basic demographic questions, a 15-item work satisfaction scale,22 and the 9-item Sharing subscale of the Patient-Practitioner Orientation Scale (PPOS).23-25 The PPOS, which has been shown to have good reliability and validity, measures the beliefs of patients and physicians along a dimension that ranges from patient-centered to physician-centered. The sharing subscale of the PPOS assesses beliefs toward sharing information (eg, “It is often best for patients if they do not have a full explanation of their medical condition.”) and power and control (eg, “The doctor is the one who should decide what gets talked about during a visit.”) on a 6-point Likert scale from strongly agree to strongly disagree. A higher score indicates an orientation that is more patient-centered (ie, more approving of sharing power and information sharing).
Patient Sampling and Data Collection
We identified the English-speaking adult patients of the participating physicians who could complete the questionnaires with minimal assistance. Because of the larger study’s interest in patient expectations and requests, selected patients had all indicated that they had a new or worsening problem, or that they were at least “somewhat concerned” about a serious undiagnosed condition. Contact with randomly selected patients was made through the physicians’ appointment lists 1 to -2 days in advance of the visit. During the 11- month patient enrollment period, 2606 telephone contacts were made; 677 patients declined to participate, and another 737 were deemed ineligible (69% of these because they had no significant health concern). Of the 1332 eligible consenting patients, 1071 completed screening forms, and 909 completed questionnaires at the scheduled visit.
Eligible patients filled out the Sharing subscale of the PPOS at the end of the screening interview. The instructions for the patients were the same as those for the physicians, and the items and response scale were identical to those filled out by the physicians. Immediately before their office visits, patients filled out the Trust in Physician Scale26 a 9-item instrument asking patients how much confidence they have in their physicians about specific issues (eg, to always tell the truth, to put your medical needs above all other considerations, including cost). This scale could only be completed by patients who had seen the physician at least once before (n=714).
At the end of the visit, patients provided basic demographic and personal information and evaluated the physician and their visit. Visit evaluation was measured using the sum of 5 items assessing satisfaction with care received (ie, amount of time the doctor spent with you today, explanation of what was done for you, personal manner of the doctor, technical skill of the doctor, and overall satisfaction; a=.88). Using a 5-point Likert scale (from strongly agree to strongly disagree), patients additionally evaluated their physician on 3 items (ie, I would make a special effort to see this doctor in the future; I intend to follow the advice of this doctor; and I would highly recommend this doctor to a friend). These were summed to form an Endorsement of Physician Scale a=0.90). All of the evaluative instruments were scored so that a higher score indicated a more positive evaluation.
Statistical Analysis
We first analyzed the data separately for patients and physicians using 1-way analysis of variance to determine the relationship between patient-centered beliefs and personal characteristics. Then analyses were conducted to determine the relationship of (1) the patient’s beliefs, (2) the physician’s beliefs, and (3) the difference between patient’s and physician’s beliefs (“belief congruence”) to patients’ evaluations. Because patients were clustered within-physicians, these analyses were conducted using multivariate generalized estimating equations (GEE) analysis using the Stata 6.0 software (Stata Corporation; College Station, Tex) xtgee procedure. This procedure accounts for within-physician correlation, thereby ensuring that standard errors are not overestimated. In those cases where the GEE analyses indicated significant relationships, analysis of covariance was conducted to determine whether the results remained significant after controlling for potential confounding variables.
Results
Since physicians’ and patients’ attitudes toward sharing power and information were both measured using the Sharing subscale of the PPOS, we were able to compare the scores of each group. Patients’ scores covered the full possible range of the scale, while the range for the 45 physicians was somewhat more constricted (from 2.7 to 5.7). Physicians’ mean scores were significantly higher than those of the patients (4.5 vs 4.2, P <.04), indicating a stronger belief in sharing power and information. The correlation between the score of a given patient with his or her physician scores was extremely small (r = 0.03), and the observed difference scores for each pair (patient score minus physician score) ranged from 3.33 to -3.78.
Patient Characteristics and Beliefs in Sharing
The patient sample contained somewhat more women than men (56% women) and had a mean age of 57 years. More than three fourths (77%) of the sample had completed at least some college, and 30.2% had at least a bachelor’s degree; median income was in the $40,000 to $60,000 range. The vast majority of the patients were white (81.4%), with a small representation of Latinos (6.6%), African Americans (5.4%), Asian/Pacific Islanders (3.1%), and Native Americans/Alaskans (1.9%). Of the patients, 40% were being seen in internal medicine, 37% by family physicians, and 23% in cardiology.
Women were significantly more patient-centered in their beliefs, as were patients who were younger, more educated, and had a higher income Table 1. The scores of patients aged 18 to 39 years, 40 to 49 years, and 50 to 59 years were homogeneous, and as a whole they were significantly more patient-centered than those of patients between ages 60 and 69, and those 70 years and older (using post hoc-tests, the Student-Newman-Keuls statistic). Similarly, those who had completed high school or less were less patient centered than those with some college, who differed from those with at least a bachelor’s degree. Income differences were noted between those who reported $20,000 or less versus compared with those in the $40,000 to $80,000 range compared with those of $80,000 or more. Overall, white patients were more patient-centered than nonwhites; and although the numbers of Latinos, African Americans, Asians, and Native Americans were too small for meaningful statistical comparisons, the scores of African Americans were almost identical to those of the white patients, while Latinos’ and Asian/Pacific Islanders’ scores were somewhat lower and closer to one another. The cardiology patients were less patient centered than those who were being seen in internal medicine or family practice; however, these differences may be explained by the fact that the cardiology patients were significantly older than the other 2 patient groups (mean age = 64.2 years vs 56.2 years for internal medicine and 53.4 years for family practice; F=35.80, df=2, P <.001).
Physician Characteristics and Beliefs in Sharing
Seventy percent of the study physicians were men, and 71% were white (the largest group of nonwhites were Asian/Pacific Islanders [n=6]). Their mean age was 43.9 years, with a median of 13 years since graduation from medical school. They had been affiliated with their current system for a mean of 8.3 years. Ninety-six percent of the physicians were board certified in their primary specialty, and they spent a mean of 39.1 hours in patient care weekly.
Although the physician sample was limited in size, we also explored the relationship of physicians’ PPOS scores to their demographic and personal characteristics. In contrast to the patients, the scores of men and women were very similar (4.52 and 4.45, respectively), and no significant differences were found according to ethnicity, specialty, time spent in patient care, or workplace satisfaction. Also, beliefs in power and information sharing did not differ according to experience, either by splitting physicians at the median on age or the time since their graduation from medical school.
Attitudes Toward Sharing and Patient Evaluations
Patients’ evaluations of their physicians and their visits were measured in 3 different ways: trust (pre-visit), visit satisfaction, and endorsement of physician (both post-visit). Although these measures were themselves highly intercorrelated (between 0.45 and 0.48), separate GEE analyses were performed for each. For each measure, the analysis was run 3 times, each using a different predictor. First, we entered the patients’ PPOS scores as the only predictor, then the physicians’ PPOS scores, and then the difference between the patients’ and physicians’ scores (patient minus physician). As indicated in Table 2, visit satisfaction was not significantly related to any of the predictors. However, patient-centered patients and those whose attitudes were discrepant from their physicians, were both significantly less trusting and less likely to endorse their physicians. Physicians who were patient-centered were marginally more likely to be trusted (P=.09).
Since patients’ PPOS scores were related to several other variables, the GEE analyses for those variables showing significant (P <.05) associations between beliefs and belief congruence as predictors and patient evaluations as outcomes were run a second time, controlling this time for sex, age, education, income, and ethnicity. The results of these analyses did not weaken any of the relationships. Patient PPOS and degree of congruity were each found to be stronger independent predictors of the trust and endorsement than any of the potentially confounding variables.
Discussion
The results of our study provide us with information about where patient-centered beliefs reside. Among patients, a belief that power and information should be shared appears to be a cultural phenomenon; younger age, female sex, white ethnicity, higher income, and more education were all closely associated with a desire for sharing. Yet we found a somewhat unexpected pattern for age: There was little difference among the 3 youngest categories (18-39, 40-49, and 50-59 years), and then relatively sharp drops among those in their 60s and older. These findings suggest that if there is a generation gap in patients’ beliefs about empowerment, it exists not so much between younger and middle-aged patients as it does between those older than 60 years and those younger than 60 years.
Although the comparable physician data have to be interpreted with extreme caution because of the small sample size, we found that physicians are apparently less affected by those same societal factors that shape patients’ attitudes about sharing of power and information. Consistent with previous administrations of the scale to other physician samples,24 male and female physicians did not differ in their patient-centered beliefs, nor did we find significant relationships between patient-centeredness and physician experience. Contrary to the stereotype that older physicians take a more authoritarian orientation toward patient relationships, the data suggest that patients seeking a physician who values information and power sharing are likely to be disappointed if they merely use physician age as a proxy for patient-centeredness.
Perhaps the most significant finding of this study was that the degree to which patients and physicians held similar orientations was a strong predictor of 2 of the 3 patient evaluation measures. Patients whose beliefs were congruent with their physicians’ beliefs trusted them more, as indicated before they completed the target visit. After the visit, they were also more likely to recommend to others, follow the advice of, and make a special effort to see their physicians (the 3 components of the endorsement index).
Limitations
Generalizations from our data are limited not only by the small sample size of physicians, and by the fact that the patients and physicians all came from managed care systems in one region of the country. Another limiting factor may be that the visits studied represented a targeted subsample of patients who had an ongoing or worsening problem that concerned them. Nonetheless, the most surprising finding was that physicians who held patient-centered beliefs about power and information sharing were rated no more positively on measures of satisfaction, trust, and endorsement. One possible explanation for this may have to do with the study sample of patients, all of whom had a significant problem or concern. Previous research15,16 has indicated that physicians act differently toward their patients who are more ill or more emotionally distressed, showing greater signs of conflict or tension. It is therefore possible that the power-sharing beliefs of the patient-centered physicians were not translated as directly into action in the course of treating these patients. A second possible explanation is that patients who have strong health concerns may actually want their physicians to revert to ways that are more authoritarian and to take greater control during the course of the visit.
Conclusions
In the light of the demonstrated relationship of congruence to trust and endorsement, it is striking that visit satisfaction did not reflect the same strength of relationship with congruence, even though the outcome measures were themselves highly correlated. We suggest that this pattern reflects the manner in which belief congruence operates within the physician-patient relationship. That is, even when patient and physician have a shared sense of how much control makes them both feel comfortable, this may not be reflected in the success of any single encounter. Attempts to meet a patient’s expectation do not always result in visit satisfaction.27 Yet when physicians and patients begin with similar world views about medical practice or when they negotiate a meeting of the minds in the course of their relationship, it is likely that this is reflected in patients’ global positive sentiments, the kind that are indicated by endorsement and trust.
Acknowledgments
This research was funded by a grant from the Robert Woods Johnson Foundation (#034384). The authors gratefully acknowledge the assistance of the 45 participating physicians and their patients. Thanks also go to Sara Lu Vorhes, Steven Kelly-Reif, and David Omerod for assistance with physician recruitment and data collection; to Christine Harlan for budgetary management; and to the staff of the Patient-Provider Relationship Initiative (Bernard Lo, Director) for technical assistance. No conflict of interest.
1. Epstein RM. The science of patient-centered care. J Fam Pract 2000;49:805-07.
2. Laine C, Davidoff F. Patient-centered medicine: a. A professional evolution. JAMA 1996;275:152-56.
3. Putnam SM, Lipkin Jr. M. The patient-centered interview: research support. In: Lipkin Jr. M, Putnam SM, editorseds. The medical interview. New York, NY: Springer; 1995;530:37.-
4. Stewart M, Brown JB, Weston WW, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, CalifA: Sage;, 1995.
5. Byrne PS, Long BEL. Doctors talking to patients. London, England: Her Majesty’s Stationery Office;, 1976.
6. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on patient outcomes. J Fam Pract 2000;49:796-804.
7. Rao JK, Weinberger M, Kroenke K. Visit-specific expectations and patient-centered outcomes: a literature review. Arch Fam Med 2000;9:1148-55.
8. Levinson W, Roter DB, Mulloly JB, Dull VT, Frankel RM. Physician-patient communication: The relationship with malpractice claims among primary care physicians. JAMA 1997;2277:553-59.
9. Maguire P, Fairbairn S, Fletcher C. Consultation skills of young doctors: 1. Benefits of feedback training in interviewing as students persist. BMJ 1986;292:1573-76.
10. Garrity TF. Medical compliance and the clinician-patient relationship. Soc Sci Med 1981;15:215-22.
11. Carrese JA, Roberts LA. Bridging cultural differences in medical practice: t. The case of discussing negative information with Navajo patients. J Int Med 2000;15:92-96.
12. Doescher MP, Saver BG, Franks P, Fiscella K. Racial and ethnic disparities in perceptions of physician style and trust. Arch Fam Med 2000;9:1156-63.
13. Johnson TM, Hardt EJ, Kleinman A. Cultural factors in the medical interview in M. Lipkin M, SM Putnam SM, & A. Lazare A, eds (Eds.). The Medical Interview. 1995; New York, NY: Springer-Verlag; 1995.
14. Adelman RD, Greene MG, Charon R. Issues in physician-elderly patient interaction. Aging and Society 1991;11:127-48.
15. Hall JA, Roter DL, Milburn MA, Daltroy LH. Patients’ health as a predictor of physician and patient behavior in medical visits: a. A synthesis of four studies. Med Care 1996;34:1205-18.
16. Bertakis KD, Callahan EJ, Helms LJ, Azari R, Robbins JA. The effect of patient health status on physician practice style. Fam Med 1993;25:530-35.
17. Jamison Jr. Patient-practitioner perceptions: can chiropractors assume congruence? J Manip Physio Therapeutics 2000;23:409-13.
18. Temple W, Toews J, Fidler H, Lockyer JM, Taenzer P, Parboosingh J. Concordance in communication between surgeon and patient. Canadian J Surg 1998;41:439-45.
19. Laine C, Davidoff F, Lewis CE, Nelson EC, Nelson E, Kessler RC, Delbanco TL, et al. Important elements of outpatient care: a comparison of patients’ and physicians’ opinions. Ann Intern Med 1996;125:640-45.
20. Goldberg R, Guadagnoli E, Silliman RA, Glickmans A. Cancer patients’ concerns: congruence between patients and primary care physicians. J Cancer Educ 1990;5:193-99.
21. Starfield B, Wray C, Hess K, Gross R, Birk PS, D’Lugoff BC. The influence of patient-practitioner agreement on outcome of care. Am J Pub Health 1981;71:127-31.
22. Kravitz RL, Linn LS, Shapiro MF. Physician satisfaction under the Ontario Health Insurance Plan. Med Care 1990;28:502-12.
23. Krupat E, Yeager CM, Putnam S. Patient role orientations, doctor-patient fit and visit satisfaction. Psych and Health 2000;15:707-19.
24. Krupat E, Rosenkranz SL, Yeager CM, Barnard K, Putnam SM, Inuii TM. The practice orientations of physicians and patients: the effects of doctor-patient congruence on satisfaction. Patient Ed Counseling 2000;39:49-59.
25. Krupat E, Hiam CM, Fleming MZ, Freeman P. Patient-centeredness and its correlates among first year medical students. Int J Psychiatry in Med 1999;29:347-56.
26. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
27. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray TH, Stange KC. Facilitating participatory decision-making: what happens in real-world community practice? Med Care. 2000;38:1200-09.
Study Design: Physicians provided demographic information and completed a scale assessing their beliefs about sharing information and power with their patients. A sample of their patients filled out the same scale and made evaluations of their physicians before and after a target visit.
Population: Physicians and patients in a large multispecialty group practice and a model health maintenance organization were included. Forty-five physicians in internal medicine, family practice, and cardiology participated, as well as 909 of their patients who had a significant concern.
Outcomes Measured: We measured trust in the physician pre-visit, and visit satisfaction and physician endorsement immediately post-visit.
Results: Among patients, patient-centered beliefs (a preference for information and control) were associated with being female, white, younger, more educated, and having a higher income; among physicians these beliefs were unrelated to sex, ethnicity, or experience. The patients of patient-centered physicians were no more trusting or endorsing of their physicians, and they were not more satisfied with the target visit. However, patients whose beliefs were congruent with their physicians’ beliefs were more likely to trust and endorse them, even though they were not more satisfied with the target visit.
Conclusions: The extent of congruence between physicians’ and patients’ beliefs plays an important role in determining how patients evaluate their physicians, although satisfaction with a specific visit and overall trust may be determined differently.
A patient-centered approach to care has been widely advocated1-5 Although there are several dimensions to patient-centeredness, one key element involves patient participation and the sharing of power and information between the patient and physician. Physicians who take a patient-centered orientation approach are more likely to treat them as partners, and assist them in making informed choices among several options. This approach has been associated with a range of positive outcomes, such as heightened patient satisfaction, better adherence, and improved health outcomes.6-10
Yet in spite of the general effectiveness of patient-centeredness, it is reasonable to ask whether a one-size-fits-all approach to patient care is the best one. Some patients—such as the elderly, or patients of certain ethnic backgrounds, for example—may desire a physician whose style is more structured and who provides more guidance.11-14 Patients who are sick or have serious health concerns may also want their physicians to provide more direction.15,16 Therefore, while accepting the overall value of patient-centeredness, some physicians and researchers have advocated that the degree of “fit” between patients and physicians, (the extent to which the physician holds attitudes and beliefs that are congruent with those of the patient17,21 ) should have an independent effect upon patients’ reactions to their health care providers.
Our study involves the measurement of patient-centeredness among both physicians and patients, in particular beliefs about the sharing of power and information. We asked what personal characteristics were associated with patient-centered beliefs among physicians and patients, and investigated the extent to which patients felt positively about clinicians who hold matching opinions about power and information sharing.
Methods
The data we report come from the Physician Patient Communication Project, a large observational study conducted in the Sacramento, California, metropolitan area. Patients in the study were surveyed before and immediately after a target outpatient visit. The physicians provided data before and immediately after the same visits.
Physician Sampling and Data Collection
All physicians and patients in the study were affiliated with one of the 2 major health care systems in the region, the University of California, Davis, Medical Group (UCDMG) or Kaiser Permanente (KP). All physicians were involved in direct patient care at least 20 hours per week in family medicine, internal medicine, or cardiology. Forty-five physicians took part in the study (22 from UCDMG, 23 from KP). Eighteen practiced general internal medicine; 16 were family physicians; and 11 were cardiologists. The UCDMG and KP physicians did not differ significantly with regard to age or sex.
All participating physicians filled out the Clinician Background Questionnaire, which contained basic demographic questions, a 15-item work satisfaction scale,22 and the 9-item Sharing subscale of the Patient-Practitioner Orientation Scale (PPOS).23-25 The PPOS, which has been shown to have good reliability and validity, measures the beliefs of patients and physicians along a dimension that ranges from patient-centered to physician-centered. The sharing subscale of the PPOS assesses beliefs toward sharing information (eg, “It is often best for patients if they do not have a full explanation of their medical condition.”) and power and control (eg, “The doctor is the one who should decide what gets talked about during a visit.”) on a 6-point Likert scale from strongly agree to strongly disagree. A higher score indicates an orientation that is more patient-centered (ie, more approving of sharing power and information sharing).
Patient Sampling and Data Collection
We identified the English-speaking adult patients of the participating physicians who could complete the questionnaires with minimal assistance. Because of the larger study’s interest in patient expectations and requests, selected patients had all indicated that they had a new or worsening problem, or that they were at least “somewhat concerned” about a serious undiagnosed condition. Contact with randomly selected patients was made through the physicians’ appointment lists 1 to -2 days in advance of the visit. During the 11- month patient enrollment period, 2606 telephone contacts were made; 677 patients declined to participate, and another 737 were deemed ineligible (69% of these because they had no significant health concern). Of the 1332 eligible consenting patients, 1071 completed screening forms, and 909 completed questionnaires at the scheduled visit.
Eligible patients filled out the Sharing subscale of the PPOS at the end of the screening interview. The instructions for the patients were the same as those for the physicians, and the items and response scale were identical to those filled out by the physicians. Immediately before their office visits, patients filled out the Trust in Physician Scale26 a 9-item instrument asking patients how much confidence they have in their physicians about specific issues (eg, to always tell the truth, to put your medical needs above all other considerations, including cost). This scale could only be completed by patients who had seen the physician at least once before (n=714).
At the end of the visit, patients provided basic demographic and personal information and evaluated the physician and their visit. Visit evaluation was measured using the sum of 5 items assessing satisfaction with care received (ie, amount of time the doctor spent with you today, explanation of what was done for you, personal manner of the doctor, technical skill of the doctor, and overall satisfaction; a=.88). Using a 5-point Likert scale (from strongly agree to strongly disagree), patients additionally evaluated their physician on 3 items (ie, I would make a special effort to see this doctor in the future; I intend to follow the advice of this doctor; and I would highly recommend this doctor to a friend). These were summed to form an Endorsement of Physician Scale a=0.90). All of the evaluative instruments were scored so that a higher score indicated a more positive evaluation.
Statistical Analysis
We first analyzed the data separately for patients and physicians using 1-way analysis of variance to determine the relationship between patient-centered beliefs and personal characteristics. Then analyses were conducted to determine the relationship of (1) the patient’s beliefs, (2) the physician’s beliefs, and (3) the difference between patient’s and physician’s beliefs (“belief congruence”) to patients’ evaluations. Because patients were clustered within-physicians, these analyses were conducted using multivariate generalized estimating equations (GEE) analysis using the Stata 6.0 software (Stata Corporation; College Station, Tex) xtgee procedure. This procedure accounts for within-physician correlation, thereby ensuring that standard errors are not overestimated. In those cases where the GEE analyses indicated significant relationships, analysis of covariance was conducted to determine whether the results remained significant after controlling for potential confounding variables.
Results
Since physicians’ and patients’ attitudes toward sharing power and information were both measured using the Sharing subscale of the PPOS, we were able to compare the scores of each group. Patients’ scores covered the full possible range of the scale, while the range for the 45 physicians was somewhat more constricted (from 2.7 to 5.7). Physicians’ mean scores were significantly higher than those of the patients (4.5 vs 4.2, P <.04), indicating a stronger belief in sharing power and information. The correlation between the score of a given patient with his or her physician scores was extremely small (r = 0.03), and the observed difference scores for each pair (patient score minus physician score) ranged from 3.33 to -3.78.
Patient Characteristics and Beliefs in Sharing
The patient sample contained somewhat more women than men (56% women) and had a mean age of 57 years. More than three fourths (77%) of the sample had completed at least some college, and 30.2% had at least a bachelor’s degree; median income was in the $40,000 to $60,000 range. The vast majority of the patients were white (81.4%), with a small representation of Latinos (6.6%), African Americans (5.4%), Asian/Pacific Islanders (3.1%), and Native Americans/Alaskans (1.9%). Of the patients, 40% were being seen in internal medicine, 37% by family physicians, and 23% in cardiology.
Women were significantly more patient-centered in their beliefs, as were patients who were younger, more educated, and had a higher income Table 1. The scores of patients aged 18 to 39 years, 40 to 49 years, and 50 to 59 years were homogeneous, and as a whole they were significantly more patient-centered than those of patients between ages 60 and 69, and those 70 years and older (using post hoc-tests, the Student-Newman-Keuls statistic). Similarly, those who had completed high school or less were less patient centered than those with some college, who differed from those with at least a bachelor’s degree. Income differences were noted between those who reported $20,000 or less versus compared with those in the $40,000 to $80,000 range compared with those of $80,000 or more. Overall, white patients were more patient-centered than nonwhites; and although the numbers of Latinos, African Americans, Asians, and Native Americans were too small for meaningful statistical comparisons, the scores of African Americans were almost identical to those of the white patients, while Latinos’ and Asian/Pacific Islanders’ scores were somewhat lower and closer to one another. The cardiology patients were less patient centered than those who were being seen in internal medicine or family practice; however, these differences may be explained by the fact that the cardiology patients were significantly older than the other 2 patient groups (mean age = 64.2 years vs 56.2 years for internal medicine and 53.4 years for family practice; F=35.80, df=2, P <.001).
Physician Characteristics and Beliefs in Sharing
Seventy percent of the study physicians were men, and 71% were white (the largest group of nonwhites were Asian/Pacific Islanders [n=6]). Their mean age was 43.9 years, with a median of 13 years since graduation from medical school. They had been affiliated with their current system for a mean of 8.3 years. Ninety-six percent of the physicians were board certified in their primary specialty, and they spent a mean of 39.1 hours in patient care weekly.
Although the physician sample was limited in size, we also explored the relationship of physicians’ PPOS scores to their demographic and personal characteristics. In contrast to the patients, the scores of men and women were very similar (4.52 and 4.45, respectively), and no significant differences were found according to ethnicity, specialty, time spent in patient care, or workplace satisfaction. Also, beliefs in power and information sharing did not differ according to experience, either by splitting physicians at the median on age or the time since their graduation from medical school.
Attitudes Toward Sharing and Patient Evaluations
Patients’ evaluations of their physicians and their visits were measured in 3 different ways: trust (pre-visit), visit satisfaction, and endorsement of physician (both post-visit). Although these measures were themselves highly intercorrelated (between 0.45 and 0.48), separate GEE analyses were performed for each. For each measure, the analysis was run 3 times, each using a different predictor. First, we entered the patients’ PPOS scores as the only predictor, then the physicians’ PPOS scores, and then the difference between the patients’ and physicians’ scores (patient minus physician). As indicated in Table 2, visit satisfaction was not significantly related to any of the predictors. However, patient-centered patients and those whose attitudes were discrepant from their physicians, were both significantly less trusting and less likely to endorse their physicians. Physicians who were patient-centered were marginally more likely to be trusted (P=.09).
Since patients’ PPOS scores were related to several other variables, the GEE analyses for those variables showing significant (P <.05) associations between beliefs and belief congruence as predictors and patient evaluations as outcomes were run a second time, controlling this time for sex, age, education, income, and ethnicity. The results of these analyses did not weaken any of the relationships. Patient PPOS and degree of congruity were each found to be stronger independent predictors of the trust and endorsement than any of the potentially confounding variables.
Discussion
The results of our study provide us with information about where patient-centered beliefs reside. Among patients, a belief that power and information should be shared appears to be a cultural phenomenon; younger age, female sex, white ethnicity, higher income, and more education were all closely associated with a desire for sharing. Yet we found a somewhat unexpected pattern for age: There was little difference among the 3 youngest categories (18-39, 40-49, and 50-59 years), and then relatively sharp drops among those in their 60s and older. These findings suggest that if there is a generation gap in patients’ beliefs about empowerment, it exists not so much between younger and middle-aged patients as it does between those older than 60 years and those younger than 60 years.
Although the comparable physician data have to be interpreted with extreme caution because of the small sample size, we found that physicians are apparently less affected by those same societal factors that shape patients’ attitudes about sharing of power and information. Consistent with previous administrations of the scale to other physician samples,24 male and female physicians did not differ in their patient-centered beliefs, nor did we find significant relationships between patient-centeredness and physician experience. Contrary to the stereotype that older physicians take a more authoritarian orientation toward patient relationships, the data suggest that patients seeking a physician who values information and power sharing are likely to be disappointed if they merely use physician age as a proxy for patient-centeredness.
Perhaps the most significant finding of this study was that the degree to which patients and physicians held similar orientations was a strong predictor of 2 of the 3 patient evaluation measures. Patients whose beliefs were congruent with their physicians’ beliefs trusted them more, as indicated before they completed the target visit. After the visit, they were also more likely to recommend to others, follow the advice of, and make a special effort to see their physicians (the 3 components of the endorsement index).
Limitations
Generalizations from our data are limited not only by the small sample size of physicians, and by the fact that the patients and physicians all came from managed care systems in one region of the country. Another limiting factor may be that the visits studied represented a targeted subsample of patients who had an ongoing or worsening problem that concerned them. Nonetheless, the most surprising finding was that physicians who held patient-centered beliefs about power and information sharing were rated no more positively on measures of satisfaction, trust, and endorsement. One possible explanation for this may have to do with the study sample of patients, all of whom had a significant problem or concern. Previous research15,16 has indicated that physicians act differently toward their patients who are more ill or more emotionally distressed, showing greater signs of conflict or tension. It is therefore possible that the power-sharing beliefs of the patient-centered physicians were not translated as directly into action in the course of treating these patients. A second possible explanation is that patients who have strong health concerns may actually want their physicians to revert to ways that are more authoritarian and to take greater control during the course of the visit.
Conclusions
In the light of the demonstrated relationship of congruence to trust and endorsement, it is striking that visit satisfaction did not reflect the same strength of relationship with congruence, even though the outcome measures were themselves highly correlated. We suggest that this pattern reflects the manner in which belief congruence operates within the physician-patient relationship. That is, even when patient and physician have a shared sense of how much control makes them both feel comfortable, this may not be reflected in the success of any single encounter. Attempts to meet a patient’s expectation do not always result in visit satisfaction.27 Yet when physicians and patients begin with similar world views about medical practice or when they negotiate a meeting of the minds in the course of their relationship, it is likely that this is reflected in patients’ global positive sentiments, the kind that are indicated by endorsement and trust.
Acknowledgments
This research was funded by a grant from the Robert Woods Johnson Foundation (#034384). The authors gratefully acknowledge the assistance of the 45 participating physicians and their patients. Thanks also go to Sara Lu Vorhes, Steven Kelly-Reif, and David Omerod for assistance with physician recruitment and data collection; to Christine Harlan for budgetary management; and to the staff of the Patient-Provider Relationship Initiative (Bernard Lo, Director) for technical assistance. No conflict of interest.
Study Design: Physicians provided demographic information and completed a scale assessing their beliefs about sharing information and power with their patients. A sample of their patients filled out the same scale and made evaluations of their physicians before and after a target visit.
Population: Physicians and patients in a large multispecialty group practice and a model health maintenance organization were included. Forty-five physicians in internal medicine, family practice, and cardiology participated, as well as 909 of their patients who had a significant concern.
Outcomes Measured: We measured trust in the physician pre-visit, and visit satisfaction and physician endorsement immediately post-visit.
Results: Among patients, patient-centered beliefs (a preference for information and control) were associated with being female, white, younger, more educated, and having a higher income; among physicians these beliefs were unrelated to sex, ethnicity, or experience. The patients of patient-centered physicians were no more trusting or endorsing of their physicians, and they were not more satisfied with the target visit. However, patients whose beliefs were congruent with their physicians’ beliefs were more likely to trust and endorse them, even though they were not more satisfied with the target visit.
Conclusions: The extent of congruence between physicians’ and patients’ beliefs plays an important role in determining how patients evaluate their physicians, although satisfaction with a specific visit and overall trust may be determined differently.
A patient-centered approach to care has been widely advocated1-5 Although there are several dimensions to patient-centeredness, one key element involves patient participation and the sharing of power and information between the patient and physician. Physicians who take a patient-centered orientation approach are more likely to treat them as partners, and assist them in making informed choices among several options. This approach has been associated with a range of positive outcomes, such as heightened patient satisfaction, better adherence, and improved health outcomes.6-10
Yet in spite of the general effectiveness of patient-centeredness, it is reasonable to ask whether a one-size-fits-all approach to patient care is the best one. Some patients—such as the elderly, or patients of certain ethnic backgrounds, for example—may desire a physician whose style is more structured and who provides more guidance.11-14 Patients who are sick or have serious health concerns may also want their physicians to provide more direction.15,16 Therefore, while accepting the overall value of patient-centeredness, some physicians and researchers have advocated that the degree of “fit” between patients and physicians, (the extent to which the physician holds attitudes and beliefs that are congruent with those of the patient17,21 ) should have an independent effect upon patients’ reactions to their health care providers.
Our study involves the measurement of patient-centeredness among both physicians and patients, in particular beliefs about the sharing of power and information. We asked what personal characteristics were associated with patient-centered beliefs among physicians and patients, and investigated the extent to which patients felt positively about clinicians who hold matching opinions about power and information sharing.
Methods
The data we report come from the Physician Patient Communication Project, a large observational study conducted in the Sacramento, California, metropolitan area. Patients in the study were surveyed before and immediately after a target outpatient visit. The physicians provided data before and immediately after the same visits.
Physician Sampling and Data Collection
All physicians and patients in the study were affiliated with one of the 2 major health care systems in the region, the University of California, Davis, Medical Group (UCDMG) or Kaiser Permanente (KP). All physicians were involved in direct patient care at least 20 hours per week in family medicine, internal medicine, or cardiology. Forty-five physicians took part in the study (22 from UCDMG, 23 from KP). Eighteen practiced general internal medicine; 16 were family physicians; and 11 were cardiologists. The UCDMG and KP physicians did not differ significantly with regard to age or sex.
All participating physicians filled out the Clinician Background Questionnaire, which contained basic demographic questions, a 15-item work satisfaction scale,22 and the 9-item Sharing subscale of the Patient-Practitioner Orientation Scale (PPOS).23-25 The PPOS, which has been shown to have good reliability and validity, measures the beliefs of patients and physicians along a dimension that ranges from patient-centered to physician-centered. The sharing subscale of the PPOS assesses beliefs toward sharing information (eg, “It is often best for patients if they do not have a full explanation of their medical condition.”) and power and control (eg, “The doctor is the one who should decide what gets talked about during a visit.”) on a 6-point Likert scale from strongly agree to strongly disagree. A higher score indicates an orientation that is more patient-centered (ie, more approving of sharing power and information sharing).
Patient Sampling and Data Collection
We identified the English-speaking adult patients of the participating physicians who could complete the questionnaires with minimal assistance. Because of the larger study’s interest in patient expectations and requests, selected patients had all indicated that they had a new or worsening problem, or that they were at least “somewhat concerned” about a serious undiagnosed condition. Contact with randomly selected patients was made through the physicians’ appointment lists 1 to -2 days in advance of the visit. During the 11- month patient enrollment period, 2606 telephone contacts were made; 677 patients declined to participate, and another 737 were deemed ineligible (69% of these because they had no significant health concern). Of the 1332 eligible consenting patients, 1071 completed screening forms, and 909 completed questionnaires at the scheduled visit.
Eligible patients filled out the Sharing subscale of the PPOS at the end of the screening interview. The instructions for the patients were the same as those for the physicians, and the items and response scale were identical to those filled out by the physicians. Immediately before their office visits, patients filled out the Trust in Physician Scale26 a 9-item instrument asking patients how much confidence they have in their physicians about specific issues (eg, to always tell the truth, to put your medical needs above all other considerations, including cost). This scale could only be completed by patients who had seen the physician at least once before (n=714).
At the end of the visit, patients provided basic demographic and personal information and evaluated the physician and their visit. Visit evaluation was measured using the sum of 5 items assessing satisfaction with care received (ie, amount of time the doctor spent with you today, explanation of what was done for you, personal manner of the doctor, technical skill of the doctor, and overall satisfaction; a=.88). Using a 5-point Likert scale (from strongly agree to strongly disagree), patients additionally evaluated their physician on 3 items (ie, I would make a special effort to see this doctor in the future; I intend to follow the advice of this doctor; and I would highly recommend this doctor to a friend). These were summed to form an Endorsement of Physician Scale a=0.90). All of the evaluative instruments were scored so that a higher score indicated a more positive evaluation.
Statistical Analysis
We first analyzed the data separately for patients and physicians using 1-way analysis of variance to determine the relationship between patient-centered beliefs and personal characteristics. Then analyses were conducted to determine the relationship of (1) the patient’s beliefs, (2) the physician’s beliefs, and (3) the difference between patient’s and physician’s beliefs (“belief congruence”) to patients’ evaluations. Because patients were clustered within-physicians, these analyses were conducted using multivariate generalized estimating equations (GEE) analysis using the Stata 6.0 software (Stata Corporation; College Station, Tex) xtgee procedure. This procedure accounts for within-physician correlation, thereby ensuring that standard errors are not overestimated. In those cases where the GEE analyses indicated significant relationships, analysis of covariance was conducted to determine whether the results remained significant after controlling for potential confounding variables.
Results
Since physicians’ and patients’ attitudes toward sharing power and information were both measured using the Sharing subscale of the PPOS, we were able to compare the scores of each group. Patients’ scores covered the full possible range of the scale, while the range for the 45 physicians was somewhat more constricted (from 2.7 to 5.7). Physicians’ mean scores were significantly higher than those of the patients (4.5 vs 4.2, P <.04), indicating a stronger belief in sharing power and information. The correlation between the score of a given patient with his or her physician scores was extremely small (r = 0.03), and the observed difference scores for each pair (patient score minus physician score) ranged from 3.33 to -3.78.
Patient Characteristics and Beliefs in Sharing
The patient sample contained somewhat more women than men (56% women) and had a mean age of 57 years. More than three fourths (77%) of the sample had completed at least some college, and 30.2% had at least a bachelor’s degree; median income was in the $40,000 to $60,000 range. The vast majority of the patients were white (81.4%), with a small representation of Latinos (6.6%), African Americans (5.4%), Asian/Pacific Islanders (3.1%), and Native Americans/Alaskans (1.9%). Of the patients, 40% were being seen in internal medicine, 37% by family physicians, and 23% in cardiology.
Women were significantly more patient-centered in their beliefs, as were patients who were younger, more educated, and had a higher income Table 1. The scores of patients aged 18 to 39 years, 40 to 49 years, and 50 to 59 years were homogeneous, and as a whole they were significantly more patient-centered than those of patients between ages 60 and 69, and those 70 years and older (using post hoc-tests, the Student-Newman-Keuls statistic). Similarly, those who had completed high school or less were less patient centered than those with some college, who differed from those with at least a bachelor’s degree. Income differences were noted between those who reported $20,000 or less versus compared with those in the $40,000 to $80,000 range compared with those of $80,000 or more. Overall, white patients were more patient-centered than nonwhites; and although the numbers of Latinos, African Americans, Asians, and Native Americans were too small for meaningful statistical comparisons, the scores of African Americans were almost identical to those of the white patients, while Latinos’ and Asian/Pacific Islanders’ scores were somewhat lower and closer to one another. The cardiology patients were less patient centered than those who were being seen in internal medicine or family practice; however, these differences may be explained by the fact that the cardiology patients were significantly older than the other 2 patient groups (mean age = 64.2 years vs 56.2 years for internal medicine and 53.4 years for family practice; F=35.80, df=2, P <.001).
Physician Characteristics and Beliefs in Sharing
Seventy percent of the study physicians were men, and 71% were white (the largest group of nonwhites were Asian/Pacific Islanders [n=6]). Their mean age was 43.9 years, with a median of 13 years since graduation from medical school. They had been affiliated with their current system for a mean of 8.3 years. Ninety-six percent of the physicians were board certified in their primary specialty, and they spent a mean of 39.1 hours in patient care weekly.
Although the physician sample was limited in size, we also explored the relationship of physicians’ PPOS scores to their demographic and personal characteristics. In contrast to the patients, the scores of men and women were very similar (4.52 and 4.45, respectively), and no significant differences were found according to ethnicity, specialty, time spent in patient care, or workplace satisfaction. Also, beliefs in power and information sharing did not differ according to experience, either by splitting physicians at the median on age or the time since their graduation from medical school.
Attitudes Toward Sharing and Patient Evaluations
Patients’ evaluations of their physicians and their visits were measured in 3 different ways: trust (pre-visit), visit satisfaction, and endorsement of physician (both post-visit). Although these measures were themselves highly intercorrelated (between 0.45 and 0.48), separate GEE analyses were performed for each. For each measure, the analysis was run 3 times, each using a different predictor. First, we entered the patients’ PPOS scores as the only predictor, then the physicians’ PPOS scores, and then the difference between the patients’ and physicians’ scores (patient minus physician). As indicated in Table 2, visit satisfaction was not significantly related to any of the predictors. However, patient-centered patients and those whose attitudes were discrepant from their physicians, were both significantly less trusting and less likely to endorse their physicians. Physicians who were patient-centered were marginally more likely to be trusted (P=.09).
Since patients’ PPOS scores were related to several other variables, the GEE analyses for those variables showing significant (P <.05) associations between beliefs and belief congruence as predictors and patient evaluations as outcomes were run a second time, controlling this time for sex, age, education, income, and ethnicity. The results of these analyses did not weaken any of the relationships. Patient PPOS and degree of congruity were each found to be stronger independent predictors of the trust and endorsement than any of the potentially confounding variables.
Discussion
The results of our study provide us with information about where patient-centered beliefs reside. Among patients, a belief that power and information should be shared appears to be a cultural phenomenon; younger age, female sex, white ethnicity, higher income, and more education were all closely associated with a desire for sharing. Yet we found a somewhat unexpected pattern for age: There was little difference among the 3 youngest categories (18-39, 40-49, and 50-59 years), and then relatively sharp drops among those in their 60s and older. These findings suggest that if there is a generation gap in patients’ beliefs about empowerment, it exists not so much between younger and middle-aged patients as it does between those older than 60 years and those younger than 60 years.
Although the comparable physician data have to be interpreted with extreme caution because of the small sample size, we found that physicians are apparently less affected by those same societal factors that shape patients’ attitudes about sharing of power and information. Consistent with previous administrations of the scale to other physician samples,24 male and female physicians did not differ in their patient-centered beliefs, nor did we find significant relationships between patient-centeredness and physician experience. Contrary to the stereotype that older physicians take a more authoritarian orientation toward patient relationships, the data suggest that patients seeking a physician who values information and power sharing are likely to be disappointed if they merely use physician age as a proxy for patient-centeredness.
Perhaps the most significant finding of this study was that the degree to which patients and physicians held similar orientations was a strong predictor of 2 of the 3 patient evaluation measures. Patients whose beliefs were congruent with their physicians’ beliefs trusted them more, as indicated before they completed the target visit. After the visit, they were also more likely to recommend to others, follow the advice of, and make a special effort to see their physicians (the 3 components of the endorsement index).
Limitations
Generalizations from our data are limited not only by the small sample size of physicians, and by the fact that the patients and physicians all came from managed care systems in one region of the country. Another limiting factor may be that the visits studied represented a targeted subsample of patients who had an ongoing or worsening problem that concerned them. Nonetheless, the most surprising finding was that physicians who held patient-centered beliefs about power and information sharing were rated no more positively on measures of satisfaction, trust, and endorsement. One possible explanation for this may have to do with the study sample of patients, all of whom had a significant problem or concern. Previous research15,16 has indicated that physicians act differently toward their patients who are more ill or more emotionally distressed, showing greater signs of conflict or tension. It is therefore possible that the power-sharing beliefs of the patient-centered physicians were not translated as directly into action in the course of treating these patients. A second possible explanation is that patients who have strong health concerns may actually want their physicians to revert to ways that are more authoritarian and to take greater control during the course of the visit.
Conclusions
In the light of the demonstrated relationship of congruence to trust and endorsement, it is striking that visit satisfaction did not reflect the same strength of relationship with congruence, even though the outcome measures were themselves highly correlated. We suggest that this pattern reflects the manner in which belief congruence operates within the physician-patient relationship. That is, even when patient and physician have a shared sense of how much control makes them both feel comfortable, this may not be reflected in the success of any single encounter. Attempts to meet a patient’s expectation do not always result in visit satisfaction.27 Yet when physicians and patients begin with similar world views about medical practice or when they negotiate a meeting of the minds in the course of their relationship, it is likely that this is reflected in patients’ global positive sentiments, the kind that are indicated by endorsement and trust.
Acknowledgments
This research was funded by a grant from the Robert Woods Johnson Foundation (#034384). The authors gratefully acknowledge the assistance of the 45 participating physicians and their patients. Thanks also go to Sara Lu Vorhes, Steven Kelly-Reif, and David Omerod for assistance with physician recruitment and data collection; to Christine Harlan for budgetary management; and to the staff of the Patient-Provider Relationship Initiative (Bernard Lo, Director) for technical assistance. No conflict of interest.
1. Epstein RM. The science of patient-centered care. J Fam Pract 2000;49:805-07.
2. Laine C, Davidoff F. Patient-centered medicine: a. A professional evolution. JAMA 1996;275:152-56.
3. Putnam SM, Lipkin Jr. M. The patient-centered interview: research support. In: Lipkin Jr. M, Putnam SM, editorseds. The medical interview. New York, NY: Springer; 1995;530:37.-
4. Stewart M, Brown JB, Weston WW, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, CalifA: Sage;, 1995.
5. Byrne PS, Long BEL. Doctors talking to patients. London, England: Her Majesty’s Stationery Office;, 1976.
6. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on patient outcomes. J Fam Pract 2000;49:796-804.
7. Rao JK, Weinberger M, Kroenke K. Visit-specific expectations and patient-centered outcomes: a literature review. Arch Fam Med 2000;9:1148-55.
8. Levinson W, Roter DB, Mulloly JB, Dull VT, Frankel RM. Physician-patient communication: The relationship with malpractice claims among primary care physicians. JAMA 1997;2277:553-59.
9. Maguire P, Fairbairn S, Fletcher C. Consultation skills of young doctors: 1. Benefits of feedback training in interviewing as students persist. BMJ 1986;292:1573-76.
10. Garrity TF. Medical compliance and the clinician-patient relationship. Soc Sci Med 1981;15:215-22.
11. Carrese JA, Roberts LA. Bridging cultural differences in medical practice: t. The case of discussing negative information with Navajo patients. J Int Med 2000;15:92-96.
12. Doescher MP, Saver BG, Franks P, Fiscella K. Racial and ethnic disparities in perceptions of physician style and trust. Arch Fam Med 2000;9:1156-63.
13. Johnson TM, Hardt EJ, Kleinman A. Cultural factors in the medical interview in M. Lipkin M, SM Putnam SM, & A. Lazare A, eds (Eds.). The Medical Interview. 1995; New York, NY: Springer-Verlag; 1995.
14. Adelman RD, Greene MG, Charon R. Issues in physician-elderly patient interaction. Aging and Society 1991;11:127-48.
15. Hall JA, Roter DL, Milburn MA, Daltroy LH. Patients’ health as a predictor of physician and patient behavior in medical visits: a. A synthesis of four studies. Med Care 1996;34:1205-18.
16. Bertakis KD, Callahan EJ, Helms LJ, Azari R, Robbins JA. The effect of patient health status on physician practice style. Fam Med 1993;25:530-35.
17. Jamison Jr. Patient-practitioner perceptions: can chiropractors assume congruence? J Manip Physio Therapeutics 2000;23:409-13.
18. Temple W, Toews J, Fidler H, Lockyer JM, Taenzer P, Parboosingh J. Concordance in communication between surgeon and patient. Canadian J Surg 1998;41:439-45.
19. Laine C, Davidoff F, Lewis CE, Nelson EC, Nelson E, Kessler RC, Delbanco TL, et al. Important elements of outpatient care: a comparison of patients’ and physicians’ opinions. Ann Intern Med 1996;125:640-45.
20. Goldberg R, Guadagnoli E, Silliman RA, Glickmans A. Cancer patients’ concerns: congruence between patients and primary care physicians. J Cancer Educ 1990;5:193-99.
21. Starfield B, Wray C, Hess K, Gross R, Birk PS, D’Lugoff BC. The influence of patient-practitioner agreement on outcome of care. Am J Pub Health 1981;71:127-31.
22. Kravitz RL, Linn LS, Shapiro MF. Physician satisfaction under the Ontario Health Insurance Plan. Med Care 1990;28:502-12.
23. Krupat E, Yeager CM, Putnam S. Patient role orientations, doctor-patient fit and visit satisfaction. Psych and Health 2000;15:707-19.
24. Krupat E, Rosenkranz SL, Yeager CM, Barnard K, Putnam SM, Inuii TM. The practice orientations of physicians and patients: the effects of doctor-patient congruence on satisfaction. Patient Ed Counseling 2000;39:49-59.
25. Krupat E, Hiam CM, Fleming MZ, Freeman P. Patient-centeredness and its correlates among first year medical students. Int J Psychiatry in Med 1999;29:347-56.
26. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
27. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray TH, Stange KC. Facilitating participatory decision-making: what happens in real-world community practice? Med Care. 2000;38:1200-09.
1. Epstein RM. The science of patient-centered care. J Fam Pract 2000;49:805-07.
2. Laine C, Davidoff F. Patient-centered medicine: a. A professional evolution. JAMA 1996;275:152-56.
3. Putnam SM, Lipkin Jr. M. The patient-centered interview: research support. In: Lipkin Jr. M, Putnam SM, editorseds. The medical interview. New York, NY: Springer; 1995;530:37.-
4. Stewart M, Brown JB, Weston WW, McWhinney IR, McWilliam CL, Freeman TR. Patient-centered medicine: transforming the clinical method. Thousand Oaks, CalifA: Sage;, 1995.
5. Byrne PS, Long BEL. Doctors talking to patients. London, England: Her Majesty’s Stationery Office;, 1976.
6. Stewart M, Brown JB, Donner A, et al. The impact of patient-centered care on patient outcomes. J Fam Pract 2000;49:796-804.
7. Rao JK, Weinberger M, Kroenke K. Visit-specific expectations and patient-centered outcomes: a literature review. Arch Fam Med 2000;9:1148-55.
8. Levinson W, Roter DB, Mulloly JB, Dull VT, Frankel RM. Physician-patient communication: The relationship with malpractice claims among primary care physicians. JAMA 1997;2277:553-59.
9. Maguire P, Fairbairn S, Fletcher C. Consultation skills of young doctors: 1. Benefits of feedback training in interviewing as students persist. BMJ 1986;292:1573-76.
10. Garrity TF. Medical compliance and the clinician-patient relationship. Soc Sci Med 1981;15:215-22.
11. Carrese JA, Roberts LA. Bridging cultural differences in medical practice: t. The case of discussing negative information with Navajo patients. J Int Med 2000;15:92-96.
12. Doescher MP, Saver BG, Franks P, Fiscella K. Racial and ethnic disparities in perceptions of physician style and trust. Arch Fam Med 2000;9:1156-63.
13. Johnson TM, Hardt EJ, Kleinman A. Cultural factors in the medical interview in M. Lipkin M, SM Putnam SM, & A. Lazare A, eds (Eds.). The Medical Interview. 1995; New York, NY: Springer-Verlag; 1995.
14. Adelman RD, Greene MG, Charon R. Issues in physician-elderly patient interaction. Aging and Society 1991;11:127-48.
15. Hall JA, Roter DL, Milburn MA, Daltroy LH. Patients’ health as a predictor of physician and patient behavior in medical visits: a. A synthesis of four studies. Med Care 1996;34:1205-18.
16. Bertakis KD, Callahan EJ, Helms LJ, Azari R, Robbins JA. The effect of patient health status on physician practice style. Fam Med 1993;25:530-35.
17. Jamison Jr. Patient-practitioner perceptions: can chiropractors assume congruence? J Manip Physio Therapeutics 2000;23:409-13.
18. Temple W, Toews J, Fidler H, Lockyer JM, Taenzer P, Parboosingh J. Concordance in communication between surgeon and patient. Canadian J Surg 1998;41:439-45.
19. Laine C, Davidoff F, Lewis CE, Nelson EC, Nelson E, Kessler RC, Delbanco TL, et al. Important elements of outpatient care: a comparison of patients’ and physicians’ opinions. Ann Intern Med 1996;125:640-45.
20. Goldberg R, Guadagnoli E, Silliman RA, Glickmans A. Cancer patients’ concerns: congruence between patients and primary care physicians. J Cancer Educ 1990;5:193-99.
21. Starfield B, Wray C, Hess K, Gross R, Birk PS, D’Lugoff BC. The influence of patient-practitioner agreement on outcome of care. Am J Pub Health 1981;71:127-31.
22. Kravitz RL, Linn LS, Shapiro MF. Physician satisfaction under the Ontario Health Insurance Plan. Med Care 1990;28:502-12.
23. Krupat E, Yeager CM, Putnam S. Patient role orientations, doctor-patient fit and visit satisfaction. Psych and Health 2000;15:707-19.
24. Krupat E, Rosenkranz SL, Yeager CM, Barnard K, Putnam SM, Inuii TM. The practice orientations of physicians and patients: the effects of doctor-patient congruence on satisfaction. Patient Ed Counseling 2000;39:49-59.
25. Krupat E, Hiam CM, Fleming MZ, Freeman P. Patient-centeredness and its correlates among first year medical students. Int J Psychiatry in Med 1999;29:347-56.
26. Thom DH, Campbell B. Patient-physician trust: an exploratory study. J Fam Pract 1997;44:169-76.
27. Gotler RS, Flocke SA, Goodwin MA, Zyzanski SJ, Murray TH, Stange KC. Facilitating participatory decision-making: what happens in real-world community practice? Med Care. 2000;38:1200-09.
Diagnosing Influenza: The Value of Clinical Clues and Laboratory Tests
STUDY DESIGN: Data were collected during 3 consecutive influenza outbreaks over a 2-year period. The information collected included date of onset, symptoms, vaccine status, WBC and differential counts, ZstatFlu test (ZymeTx, Oklahoma City, Ok), and influenza culture. Using culture positivity as the criterion for influenza diagnosis, we compared cases with noncases on each variable independently and by logistic regression.
POPULATION: We included consecutive patients presenting to a family practice office with fever, cough, sore throat, myalgia, and/or headache during flu season.
OUTCOMES MEASURED: The outcomes were sensitivity, specificity, and other measures of test accuracy.
RESULTS: Culture-positive cases could not be reliably distinguished from those that were culture negative using symptoms or vaccination status. Both WBC count and ZstatFlu results discriminated fairly well, and their combination did somewhat better. Differential counts were not helpful. WBC counts above 8000 were associated with a low probability of influenza. The sensitivity and specificity of the ZstatFlu were 65% and 83%, respectively.
CONCLUSIONS: Our data suggest that symptoms and vaccine status do not reliably identify patients with influenza. Use of WBC counts and the ZstatFlu test can be helpful. The sequence, combination, and criteria for use of these tests depend on tradeoffs between undertreatment of influenza cases and the overtreatment of noninfluenza cases, and the cost and benefit projections for individual patients.
Influenza affects between 20% and 30% of the United States population annually. Three fourths of the infected individuals develop an acute respiratory illness, and one third of these seek medical attention. In a typical year, more than 20,000 to 40,000 Americans die, and more than 100,000 are hospitalized because of complications related to influenza.1 Immunizations usually provide between 70% and 90% protection, yet many people at high risk do not receive the vaccine. An effective method for quickly differentiating patients with influenza from those with other respiratory illnesses might be helpful, since 2 new medications (the zanamivir inhaler and oseltamivir tablets) that are active against both influenza A and B are now available. Two other medications, amantadine and rimantidine, are active against influenza A only.2,3 In the past, physicians have relied on symptoms alone or in conjunction with a manual or automated white blood cell (WBC) count with a differential count to assist them in making the diagnosis of influenza. However, several researchers have found that symptoms and signs have relatively poor predictive value during influenza outbreaks.4-6 A recent pooled analysis of 3744 subjects involved in clinical trials of the antiviral agent zanamivir found that patients with influenza were somewhat more likely to have fever (68% vs 40%), cough (93% vs 80%), and nasal congestion (91% vs 81%) than patients with other infections.7
ZstatFlu is an office test for the diagnosis of both influenza A and B. It detects neuraminidase, an enzyme found on the influenza virus. In the presence of the virus, chromagen is cleaved off a synthetic neuraminidase substrate. A positive test results in a blue color change. The ZstatFlu test is 1 of 4 approved tests on the market for rapid diagnosis of influenza. Three of the 4 detect both influenza A and B8 To determine the most effective and efficient approach to the office diagnosis of influenza, we collected and analyzed clinical and laboratory data from patients seen in a family practice office setting during 2 consecutive influenza seasons.
Methods
Patients
We included consecutive patients presenting to a private family practice clinic that had 4 physicians and 1 physician assistant in a suburban setting in Oklahoma between January and March 1999 and November 1999 and January 2000 with fever, cough, sore throat, myalgia, and/or headache. No patients were systematically excluded. Consenting patients received a WBC and differential count, a rapid flu test (ZstatFlu), and an influenza culture by oropharynageal swab. Some patients consented to some but not all of these tests. Cultures were unavailable during a portion of each study period. Viral serologies were not done.
Procedure
Patients were triaged by the office nurse, who asked when they had become ill and whether they had experienced any of the following signs or symptoms: fever higher than 101°F (38.5 °C), cough, sore throat, headache, or myalgia. They were also asked if they had received that year’s flu vaccine. A WBC and differential count was performed by a laboratory technician in the clinic using a Cell-Dyn 1700 machine (Abbott Laboratories; Abbott Park, Ill). The ZstatFlu test was performed by the same laboratory technician using the method described by the manufacturer. An oropharyngeal swab for influenza culture was also obtained. The swabs were placed in viral culture medium and refrigerated. They were picked up once daily and transported to one of 2 laboratories (at ZymeTx or the Oklahoma State Department of Health) and plated that day. All participating patients gave informed consent. The protocol was approved by the Research Consultants Review Committee, Austin, Texas, and by the Institutional Review Committee at the University of Oklahoma Health Sciences Center.
Data Analysis
Only patients who had an influenza culture could be included in the analysis. Three separate influenza epidemics occurred during the 2 years of data collection. These outbreaks were first analyzed separately to evaluate consistency of results across epidemics and then as a combined data set for determination of overall test characteristics.
The following variables were considered: clinician, patient age, sex, duration of symptoms, delay in presentation, vaccine, cough, fever, myalgias, sore throat, headache, WBC count, differential WBC count, ZstatFlu result, and culture result. An additional variable, “flu symptoms,” was defined as the combination of fever, cough, and myalgia. Delay in presentation was further categorized as 2 days or less or more than 2 days, since treatment is most effective when begun within 2 days of the onset of illness. A left-shifted WBC count was defined arbitrarily as a polymorphonuclear leukocyte proportion greater than 60%, and a right-shifted WBC count was defined arbitrarily as a lymphocyte proportion greater than 40%.
Within each epidemic group, patients with positive cultures were compared with those who had negative cultures. Since the 2 influenza epidemics (A and B) during the first year occurred simultaneously, patients with negative cultures during that time were used for comparisons in both groups for these initial analyses. Comparisons were made for age and duration of symptoms using the Student’s t test for independent samples. All other comparisons were made using the chi-square statistic.
We combined all data. For these combined analyses, the influenza-negative patients from year 1 were counted only once. Receiver operating characteristic (ROC) curves were constructed for the ZstatFlu test, WBC count, and WBC count combined with the ZstatFlu test. Rockit 0.9B software (University of Chicago; Chicago, Ill) was used to determine the area under the curve (AUC) and confidence intervals for the WBC count and the WBC count combined with the ZstatFlu test by maximum likelihood estimation of the ROC parameters.9 Individual cut-points for WBC counts were compared as binary tests by calculating the AUC for each.10 To determine the AUC for the ZstatFlu test we used the nonparametric Wilcoxin statistic.11 The logistic regression modeling function of Statistix7 software was used to analyze the individual and combined predictive properties of WBC count and ZstatFlu. Positive and negative likelihood ratios were calculated using standard formulas12; they correspond to the degree that a positive test result rules in disease and a negative test result rules out disease, respectively. These were used to estimate the rates of over- and undertreatment of influenza cases under 2 different baseline assumptions (pretest probabilities of influenza of 25% and 50%). Confidence intervals for sensitivity and specificity were calculated using the normal approximation to the binomial method.13
Results
We enrolled 382 patients during the first year (268 had influenza cultures performed) and 225 patients during the second year (90 were cultured). The total analyzable sample of cultured patients was 358 patients. In most cases, those who did not have cultures performed were seen on days when culture medium or laboratory pick-up were not available. Patients who had a culture performed were more likely to have a cough (P=.01) but otherwise did not differ from those who did not have a culture.In year 1, the influenza strains were A/Sydney (H3N2) and B/Bejing. In year 2, the strain was again A/Sydney (H3N2). The youngest patient with a positive flu culture was aged 10 months and the oldest was 73 years of age. The breakdown by age, sex, duration of symptoms, vaccine status, symptoms, WBC/differential, and ZstatFlu results by epidemic is shown in Table 1.
The presentation of influenza during the 3 epidemics differed. For example, the Beijing-like flu B in year 1 was more likely to infect younger people (mean age = 22.2 years) and was unlikely to cause a left WBC shift (25%), while the influenza A strain seen in the second year was more likely to infect older people (mean age = 28.3 years) and to be associated with a left WBC shift (72%). Culture-positive patients were somewhat more likely to report fever during 2 of the 3 outbreaks, but no single symptom or the symptom complex—fever, cough, and myalgias—reliably distinguished flu cases from nonflu cases across all epidemics.
Fifteen percent, 7%, and 17% of patients with positive influenza cultures in the 3 epidemics had received the vaccine. Both influenza strains were included in the vaccines given during those 2 years. However, immunization status was not consistently helpful for distinguishing influenza cases from those with other flu-like illnesses. Duration of symptoms was only associated with culture result in the year 2 flu A epidemic, in which influenza patients, on average, presented a half day earlier.
The WBC count was strongly associated with culture result in all 3 epidemics. As the WBC count increased, the likelihood of a positive culture decreased. A right or left shift in the differential count was not consistently related to the probability of a positive culture. WBC count was positively correlated with duration of symptoms in children (Pearson correlation coefficient = 0.20; P=.04) and negatively associated with symptom duration in adults (Pearson correlation coefficient = -0.15; P=.66). There was also a negative association between left shift and duration of symptoms (P=.001) and a positive association between right shift and duration of symptoms (P=.01) for all patients, suggesting that influenza patients develop a left shift at onset of infection and later convert to a right shift.
ROC curves were constructed using various levels of WBC counts with and without the ZstatFlu test Figure 1. For WBC count alone, the AUC was 0.67 (95% confidence interval [CI], 0.61-0.74). By comparison, the AUC for the ZstatFlu test was 0.74 (95% CI, 0.68-0.80). The ROC curve describing the use of a combination of ZstatFlu test and the WBC count had an AUC of 0.82 (95% CI, 0.76-0.87); this was better than WBC alone but not significantly different from ZstatFlu alone.
WBC counts greater than 7000 (negative likelihood ratio = 0.41) were superior to a negative ZstatFlu test result at confirming the absence of the flu. WBC counts less than 3200 (positive likelihood ratio = 7.21) were superior to a positive ZstatFlu test result at confirming the presence of the flu. A WBC count greater than 6300 had greater sensitivity (67%) than the ZstatFlu test, however, for WBC counts between 6300 and 7000, the gain in sensitivity did not offset the loss in specificity. A WBC count less than 4600 had a greater specificity (84%) than the ZstatFlu test, but for WBC counts between 3200 and 4600 the gain in specificity did not offset the loss in sensitivity.
Table 2 shows the characteristics of WBC counts at several cut-points, of the ZstatFlu test, and of their combinations. Using the one test strategy of treating those with a WBC count of 8000 or less would ensure treatment of almost all influenza cases (92%). Using the ZstatFlu test as a one testing strategy would assure that most of the patients treated have the flu but would miss 44% patients with the flu. Adding a WBC count if the ZstatFlu test result is negative improves sensitivity but reduces specificity. The predictive values positive and negative in the Table 1 are based on a previous probability of 50% (peak of flu season). These values would obviously be lower at the beginning or ending of an epidemic.
Discussion
Unfortunately, signs, symptoms, and vaccine status may be of little consistent value in distinguishing patients with influenza from those with other respiratory illnesses during influenza season. During some epidemics, fever and cough may be of some help, but this will depend heavily on what other illnesses are prevalent at the same time. Others have observed that symptoms have low predictive value and that physicians have difficulty identifying flu cases during epidemics.5,6 Monto and colleagues7 reported that fever and cough occurred more frequently among influenza patients involved in clinical trials of an antiviral agent, but these results may not apply directly to primary care settings and represent pooled findings across several epidemics.
Vaccine status was also not helpful in this study for distinguishing influenza culture-positive patients. Influenza vaccination is effective in only 70% to 90% of patients. Therefore, there will always be vaccine-positive patients who develop the flu. Our data do not provide quantifiable information about overall vaccine efficacy, but the number of vaccine-positive patients was small, suggesting that the vaccine may have been effective in the community at large, though not in the culture-positive patients included in this study.
Both the WBC count and the ZstatFlu test can be helpful for identifying influenza cases. The testing strategy of choice depends to some degree on a number of factors including cost, duration and severity of symptoms, comorbidities, and potential adverse effects of treatment. The ZstatFlu test costs approximately $20. The cost of a WBC count is approximately $30, but it may have additional diagnostic value. Treating the patient with either zanamivir or oselfamivir costs $50 to $60, rimantidine $30, and amantadine $6.
The monetary value of an earlier return to work, reduced caregiver burden, or reduced transmission of infection will vary greatly. If the goal is to treat nearly every influenza case, a strategy of treating those with a WBC of 8000 or less appears to be the best strategy. If the goal is to be sure that only patients with the flu are treated, then treatment should be reserved for those who are ZstatFlu positive. Each patient and each physician would be expected to have different treatment thresholds that would affect the testing strategy. More than half the patients with positive influenza cultures were seen within 2 days of the onset of symptoms. These patients are the ones who would be most likely to benefit from the newer antiviral agents. For example, if the treatment threshold for a particular patient was 50%, no testing would have been necessary in any of the epidemics studied, since the pretest probabilities were all greater than 50%. An analysis that includes patient preferences would be helpful to determine the most cost-effective strategy.
The specificity of the ZstatFlu test is reported to be between 95% and 100%.4 However, when performed in this community family practice office by a laboratory technician trained by the test’s manufacturer, the specificity was only 85%. Many of the false-positive test results were coded as “weakly positive,” suggesting that the end point for positivity was somewhat unclear or that the laboratory technician was influenced by the patient’s symptoms. The specificity improved in the second year, suggesting an improvement in technique. We submit that this is an example of the discrepancy between test characteristics determined under “ideal” circumstances and test characteristics in actual practice settings. Another explanation is that patients with weakly positive ZstatFlu test results actually had influenza that would have been documented had serology been used as the gold standard instead of culture.
Limitations
A weakness of our study is the proportion of patients with flu-like symptoms from whom culture results were not available. Flu season actually began earlier than January during the 1998-1999 season and extended beyond January with the 1999-2000 season, but cultures were not available during a portion of these time periods. Although patients with cough were more likely to be cultured, this potential bias should not have affected our conclusions, since cough was not associated with culture result.
Two other diagnostic concerns are the lack of serologic tests and the known tendency of cultures to be more reliable early in the illness. Serology would probably have identified some additional influenza cases. This would have resulted in higher pretest probabilities of influenza. It is unclear how it would have affected the other analyses. Since there was no association between duration of symptoms and culture result in 2 of the 3 epidemics and in the combined analysis, we do not believe that waning sensitivity of flu cultures was a significant factor in this population. In the third epidemic it seems more likely that flu patients felt worse (eg, had more myalgias) and therefore came in earlier than that cultures became negative in those who delayed seeing the physician. Additionally, the study had insufficient power to detect a statistically significant difference between the diagnostic value of the ZstatFlu alone, and the combination of the WBC count and the ZstatFlu test.
It should be noted that patients were enrolled during an outbreak of influenza. In fact, the practice involved was one of the first in the state to recognize the onset of the epidemic because they were involved in this study. The conclusions reached about diagnostic strategies can only be generalized to similar epidemic situations.
Conclusions
Since influenza is associated with considerable morbidity and mortality, especially in high-risk populations, and given the brief window of opportunity (less than 48 hours) to treat patients with the flu with the newer agents, early and accurate diagnosis may be important in at least some cases. The use of screening WBC counts or rapid antigen tests could improve patient care during influenza epidemics. A cost-effectiveness analysis is needed to more fully elucidate this issue.
Acknowledgments
We would like to acknowledge the financial support of ZymeTx Corporation for supplying the ZstatFlu reagents, training, and influenza cultures at no cost. The support received from ZymeTx Corporation was unrestricted, and the company had no influence on our decision to analyze the results in the manner that we did or on the contents of this manuscript. We also want to thank Lavonne Glover for her expert assistance and patience in the preparation of the manuscript.
1. Prevention and control of influenza: recommendations of the Advisory Committee on Immunization Practices (ACIP) MMWR 1999;48:1-28.
2. Neuraminidase inhibitors for treatment of influenza A and B infections. MMWR 1999;48:1-9.
3. Jeffereson TO, Demicheli V, Deeks JJ, Rivetti D. Amantadine and rimantadine for preventing and treating influenza A in adults. Cochrane Database Syst Rev 2000;12:CD001169.-
4. Govaert ME, Dinant GJ, Aretz K, Knottnerus JA. The predictive value of influenza symptomatology in elderly people. J Fam Pract 1998;15:16-27.
5. Carrat F, Tachet A, Housset B, Valleron AJ, Rouzioux C. Influenza and influenza-like illness in general practice: drawing lessons from surveillance from a pilot study in Paris, France. B J Gen Pract 1997;47:217-20.
6. Long CE, Hall CB, Cunningham CK, et al. Influenza surveillance in community-dwelling elderly compared with children. Arch Fam Med 1997;6:459-65.
7. Monto AS, Gravenstein S, Elliott M, Colopy M, Schweinkle J. Clinical signs and symptoms predicting influenza infection. Arch Intern Med 2000;160:3243-47.
8. Rapid diagnostic tests for influenza. Med Letter 1999;41:121-22.
9. Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC curve estimates obtained from partially paired datasets. Med Dec Making 1998;18:110-21.
10. Cantor SB, Kattan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Dec Making 2000;20:468-70.
11. Hanley JA. Alternative approaches to receiver operating characteristic analyses. Radiology 1988;168:568-70.
12. Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. London, England: Churchill Livingstone; 1997.
13. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons, 1981.
STUDY DESIGN: Data were collected during 3 consecutive influenza outbreaks over a 2-year period. The information collected included date of onset, symptoms, vaccine status, WBC and differential counts, ZstatFlu test (ZymeTx, Oklahoma City, Ok), and influenza culture. Using culture positivity as the criterion for influenza diagnosis, we compared cases with noncases on each variable independently and by logistic regression.
POPULATION: We included consecutive patients presenting to a family practice office with fever, cough, sore throat, myalgia, and/or headache during flu season.
OUTCOMES MEASURED: The outcomes were sensitivity, specificity, and other measures of test accuracy.
RESULTS: Culture-positive cases could not be reliably distinguished from those that were culture negative using symptoms or vaccination status. Both WBC count and ZstatFlu results discriminated fairly well, and their combination did somewhat better. Differential counts were not helpful. WBC counts above 8000 were associated with a low probability of influenza. The sensitivity and specificity of the ZstatFlu were 65% and 83%, respectively.
CONCLUSIONS: Our data suggest that symptoms and vaccine status do not reliably identify patients with influenza. Use of WBC counts and the ZstatFlu test can be helpful. The sequence, combination, and criteria for use of these tests depend on tradeoffs between undertreatment of influenza cases and the overtreatment of noninfluenza cases, and the cost and benefit projections for individual patients.
Influenza affects between 20% and 30% of the United States population annually. Three fourths of the infected individuals develop an acute respiratory illness, and one third of these seek medical attention. In a typical year, more than 20,000 to 40,000 Americans die, and more than 100,000 are hospitalized because of complications related to influenza.1 Immunizations usually provide between 70% and 90% protection, yet many people at high risk do not receive the vaccine. An effective method for quickly differentiating patients with influenza from those with other respiratory illnesses might be helpful, since 2 new medications (the zanamivir inhaler and oseltamivir tablets) that are active against both influenza A and B are now available. Two other medications, amantadine and rimantidine, are active against influenza A only.2,3 In the past, physicians have relied on symptoms alone or in conjunction with a manual or automated white blood cell (WBC) count with a differential count to assist them in making the diagnosis of influenza. However, several researchers have found that symptoms and signs have relatively poor predictive value during influenza outbreaks.4-6 A recent pooled analysis of 3744 subjects involved in clinical trials of the antiviral agent zanamivir found that patients with influenza were somewhat more likely to have fever (68% vs 40%), cough (93% vs 80%), and nasal congestion (91% vs 81%) than patients with other infections.7
ZstatFlu is an office test for the diagnosis of both influenza A and B. It detects neuraminidase, an enzyme found on the influenza virus. In the presence of the virus, chromagen is cleaved off a synthetic neuraminidase substrate. A positive test results in a blue color change. The ZstatFlu test is 1 of 4 approved tests on the market for rapid diagnosis of influenza. Three of the 4 detect both influenza A and B8 To determine the most effective and efficient approach to the office diagnosis of influenza, we collected and analyzed clinical and laboratory data from patients seen in a family practice office setting during 2 consecutive influenza seasons.
Methods
Patients
We included consecutive patients presenting to a private family practice clinic that had 4 physicians and 1 physician assistant in a suburban setting in Oklahoma between January and March 1999 and November 1999 and January 2000 with fever, cough, sore throat, myalgia, and/or headache. No patients were systematically excluded. Consenting patients received a WBC and differential count, a rapid flu test (ZstatFlu), and an influenza culture by oropharynageal swab. Some patients consented to some but not all of these tests. Cultures were unavailable during a portion of each study period. Viral serologies were not done.
Procedure
Patients were triaged by the office nurse, who asked when they had become ill and whether they had experienced any of the following signs or symptoms: fever higher than 101°F (38.5 °C), cough, sore throat, headache, or myalgia. They were also asked if they had received that year’s flu vaccine. A WBC and differential count was performed by a laboratory technician in the clinic using a Cell-Dyn 1700 machine (Abbott Laboratories; Abbott Park, Ill). The ZstatFlu test was performed by the same laboratory technician using the method described by the manufacturer. An oropharyngeal swab for influenza culture was also obtained. The swabs were placed in viral culture medium and refrigerated. They were picked up once daily and transported to one of 2 laboratories (at ZymeTx or the Oklahoma State Department of Health) and plated that day. All participating patients gave informed consent. The protocol was approved by the Research Consultants Review Committee, Austin, Texas, and by the Institutional Review Committee at the University of Oklahoma Health Sciences Center.
Data Analysis
Only patients who had an influenza culture could be included in the analysis. Three separate influenza epidemics occurred during the 2 years of data collection. These outbreaks were first analyzed separately to evaluate consistency of results across epidemics and then as a combined data set for determination of overall test characteristics.
The following variables were considered: clinician, patient age, sex, duration of symptoms, delay in presentation, vaccine, cough, fever, myalgias, sore throat, headache, WBC count, differential WBC count, ZstatFlu result, and culture result. An additional variable, “flu symptoms,” was defined as the combination of fever, cough, and myalgia. Delay in presentation was further categorized as 2 days or less or more than 2 days, since treatment is most effective when begun within 2 days of the onset of illness. A left-shifted WBC count was defined arbitrarily as a polymorphonuclear leukocyte proportion greater than 60%, and a right-shifted WBC count was defined arbitrarily as a lymphocyte proportion greater than 40%.
Within each epidemic group, patients with positive cultures were compared with those who had negative cultures. Since the 2 influenza epidemics (A and B) during the first year occurred simultaneously, patients with negative cultures during that time were used for comparisons in both groups for these initial analyses. Comparisons were made for age and duration of symptoms using the Student’s t test for independent samples. All other comparisons were made using the chi-square statistic.
We combined all data. For these combined analyses, the influenza-negative patients from year 1 were counted only once. Receiver operating characteristic (ROC) curves were constructed for the ZstatFlu test, WBC count, and WBC count combined with the ZstatFlu test. Rockit 0.9B software (University of Chicago; Chicago, Ill) was used to determine the area under the curve (AUC) and confidence intervals for the WBC count and the WBC count combined with the ZstatFlu test by maximum likelihood estimation of the ROC parameters.9 Individual cut-points for WBC counts were compared as binary tests by calculating the AUC for each.10 To determine the AUC for the ZstatFlu test we used the nonparametric Wilcoxin statistic.11 The logistic regression modeling function of Statistix7 software was used to analyze the individual and combined predictive properties of WBC count and ZstatFlu. Positive and negative likelihood ratios were calculated using standard formulas12; they correspond to the degree that a positive test result rules in disease and a negative test result rules out disease, respectively. These were used to estimate the rates of over- and undertreatment of influenza cases under 2 different baseline assumptions (pretest probabilities of influenza of 25% and 50%). Confidence intervals for sensitivity and specificity were calculated using the normal approximation to the binomial method.13
Results
We enrolled 382 patients during the first year (268 had influenza cultures performed) and 225 patients during the second year (90 were cultured). The total analyzable sample of cultured patients was 358 patients. In most cases, those who did not have cultures performed were seen on days when culture medium or laboratory pick-up were not available. Patients who had a culture performed were more likely to have a cough (P=.01) but otherwise did not differ from those who did not have a culture.In year 1, the influenza strains were A/Sydney (H3N2) and B/Bejing. In year 2, the strain was again A/Sydney (H3N2). The youngest patient with a positive flu culture was aged 10 months and the oldest was 73 years of age. The breakdown by age, sex, duration of symptoms, vaccine status, symptoms, WBC/differential, and ZstatFlu results by epidemic is shown in Table 1.
The presentation of influenza during the 3 epidemics differed. For example, the Beijing-like flu B in year 1 was more likely to infect younger people (mean age = 22.2 years) and was unlikely to cause a left WBC shift (25%), while the influenza A strain seen in the second year was more likely to infect older people (mean age = 28.3 years) and to be associated with a left WBC shift (72%). Culture-positive patients were somewhat more likely to report fever during 2 of the 3 outbreaks, but no single symptom or the symptom complex—fever, cough, and myalgias—reliably distinguished flu cases from nonflu cases across all epidemics.
Fifteen percent, 7%, and 17% of patients with positive influenza cultures in the 3 epidemics had received the vaccine. Both influenza strains were included in the vaccines given during those 2 years. However, immunization status was not consistently helpful for distinguishing influenza cases from those with other flu-like illnesses. Duration of symptoms was only associated with culture result in the year 2 flu A epidemic, in which influenza patients, on average, presented a half day earlier.
The WBC count was strongly associated with culture result in all 3 epidemics. As the WBC count increased, the likelihood of a positive culture decreased. A right or left shift in the differential count was not consistently related to the probability of a positive culture. WBC count was positively correlated with duration of symptoms in children (Pearson correlation coefficient = 0.20; P=.04) and negatively associated with symptom duration in adults (Pearson correlation coefficient = -0.15; P=.66). There was also a negative association between left shift and duration of symptoms (P=.001) and a positive association between right shift and duration of symptoms (P=.01) for all patients, suggesting that influenza patients develop a left shift at onset of infection and later convert to a right shift.
ROC curves were constructed using various levels of WBC counts with and without the ZstatFlu test Figure 1. For WBC count alone, the AUC was 0.67 (95% confidence interval [CI], 0.61-0.74). By comparison, the AUC for the ZstatFlu test was 0.74 (95% CI, 0.68-0.80). The ROC curve describing the use of a combination of ZstatFlu test and the WBC count had an AUC of 0.82 (95% CI, 0.76-0.87); this was better than WBC alone but not significantly different from ZstatFlu alone.
WBC counts greater than 7000 (negative likelihood ratio = 0.41) were superior to a negative ZstatFlu test result at confirming the absence of the flu. WBC counts less than 3200 (positive likelihood ratio = 7.21) were superior to a positive ZstatFlu test result at confirming the presence of the flu. A WBC count greater than 6300 had greater sensitivity (67%) than the ZstatFlu test, however, for WBC counts between 6300 and 7000, the gain in sensitivity did not offset the loss in specificity. A WBC count less than 4600 had a greater specificity (84%) than the ZstatFlu test, but for WBC counts between 3200 and 4600 the gain in specificity did not offset the loss in sensitivity.
Table 2 shows the characteristics of WBC counts at several cut-points, of the ZstatFlu test, and of their combinations. Using the one test strategy of treating those with a WBC count of 8000 or less would ensure treatment of almost all influenza cases (92%). Using the ZstatFlu test as a one testing strategy would assure that most of the patients treated have the flu but would miss 44% patients with the flu. Adding a WBC count if the ZstatFlu test result is negative improves sensitivity but reduces specificity. The predictive values positive and negative in the Table 1 are based on a previous probability of 50% (peak of flu season). These values would obviously be lower at the beginning or ending of an epidemic.
Discussion
Unfortunately, signs, symptoms, and vaccine status may be of little consistent value in distinguishing patients with influenza from those with other respiratory illnesses during influenza season. During some epidemics, fever and cough may be of some help, but this will depend heavily on what other illnesses are prevalent at the same time. Others have observed that symptoms have low predictive value and that physicians have difficulty identifying flu cases during epidemics.5,6 Monto and colleagues7 reported that fever and cough occurred more frequently among influenza patients involved in clinical trials of an antiviral agent, but these results may not apply directly to primary care settings and represent pooled findings across several epidemics.
Vaccine status was also not helpful in this study for distinguishing influenza culture-positive patients. Influenza vaccination is effective in only 70% to 90% of patients. Therefore, there will always be vaccine-positive patients who develop the flu. Our data do not provide quantifiable information about overall vaccine efficacy, but the number of vaccine-positive patients was small, suggesting that the vaccine may have been effective in the community at large, though not in the culture-positive patients included in this study.
Both the WBC count and the ZstatFlu test can be helpful for identifying influenza cases. The testing strategy of choice depends to some degree on a number of factors including cost, duration and severity of symptoms, comorbidities, and potential adverse effects of treatment. The ZstatFlu test costs approximately $20. The cost of a WBC count is approximately $30, but it may have additional diagnostic value. Treating the patient with either zanamivir or oselfamivir costs $50 to $60, rimantidine $30, and amantadine $6.
The monetary value of an earlier return to work, reduced caregiver burden, or reduced transmission of infection will vary greatly. If the goal is to treat nearly every influenza case, a strategy of treating those with a WBC of 8000 or less appears to be the best strategy. If the goal is to be sure that only patients with the flu are treated, then treatment should be reserved for those who are ZstatFlu positive. Each patient and each physician would be expected to have different treatment thresholds that would affect the testing strategy. More than half the patients with positive influenza cultures were seen within 2 days of the onset of symptoms. These patients are the ones who would be most likely to benefit from the newer antiviral agents. For example, if the treatment threshold for a particular patient was 50%, no testing would have been necessary in any of the epidemics studied, since the pretest probabilities were all greater than 50%. An analysis that includes patient preferences would be helpful to determine the most cost-effective strategy.
The specificity of the ZstatFlu test is reported to be between 95% and 100%.4 However, when performed in this community family practice office by a laboratory technician trained by the test’s manufacturer, the specificity was only 85%. Many of the false-positive test results were coded as “weakly positive,” suggesting that the end point for positivity was somewhat unclear or that the laboratory technician was influenced by the patient’s symptoms. The specificity improved in the second year, suggesting an improvement in technique. We submit that this is an example of the discrepancy between test characteristics determined under “ideal” circumstances and test characteristics in actual practice settings. Another explanation is that patients with weakly positive ZstatFlu test results actually had influenza that would have been documented had serology been used as the gold standard instead of culture.
Limitations
A weakness of our study is the proportion of patients with flu-like symptoms from whom culture results were not available. Flu season actually began earlier than January during the 1998-1999 season and extended beyond January with the 1999-2000 season, but cultures were not available during a portion of these time periods. Although patients with cough were more likely to be cultured, this potential bias should not have affected our conclusions, since cough was not associated with culture result.
Two other diagnostic concerns are the lack of serologic tests and the known tendency of cultures to be more reliable early in the illness. Serology would probably have identified some additional influenza cases. This would have resulted in higher pretest probabilities of influenza. It is unclear how it would have affected the other analyses. Since there was no association between duration of symptoms and culture result in 2 of the 3 epidemics and in the combined analysis, we do not believe that waning sensitivity of flu cultures was a significant factor in this population. In the third epidemic it seems more likely that flu patients felt worse (eg, had more myalgias) and therefore came in earlier than that cultures became negative in those who delayed seeing the physician. Additionally, the study had insufficient power to detect a statistically significant difference between the diagnostic value of the ZstatFlu alone, and the combination of the WBC count and the ZstatFlu test.
It should be noted that patients were enrolled during an outbreak of influenza. In fact, the practice involved was one of the first in the state to recognize the onset of the epidemic because they were involved in this study. The conclusions reached about diagnostic strategies can only be generalized to similar epidemic situations.
Conclusions
Since influenza is associated with considerable morbidity and mortality, especially in high-risk populations, and given the brief window of opportunity (less than 48 hours) to treat patients with the flu with the newer agents, early and accurate diagnosis may be important in at least some cases. The use of screening WBC counts or rapid antigen tests could improve patient care during influenza epidemics. A cost-effectiveness analysis is needed to more fully elucidate this issue.
Acknowledgments
We would like to acknowledge the financial support of ZymeTx Corporation for supplying the ZstatFlu reagents, training, and influenza cultures at no cost. The support received from ZymeTx Corporation was unrestricted, and the company had no influence on our decision to analyze the results in the manner that we did or on the contents of this manuscript. We also want to thank Lavonne Glover for her expert assistance and patience in the preparation of the manuscript.
STUDY DESIGN: Data were collected during 3 consecutive influenza outbreaks over a 2-year period. The information collected included date of onset, symptoms, vaccine status, WBC and differential counts, ZstatFlu test (ZymeTx, Oklahoma City, Ok), and influenza culture. Using culture positivity as the criterion for influenza diagnosis, we compared cases with noncases on each variable independently and by logistic regression.
POPULATION: We included consecutive patients presenting to a family practice office with fever, cough, sore throat, myalgia, and/or headache during flu season.
OUTCOMES MEASURED: The outcomes were sensitivity, specificity, and other measures of test accuracy.
RESULTS: Culture-positive cases could not be reliably distinguished from those that were culture negative using symptoms or vaccination status. Both WBC count and ZstatFlu results discriminated fairly well, and their combination did somewhat better. Differential counts were not helpful. WBC counts above 8000 were associated with a low probability of influenza. The sensitivity and specificity of the ZstatFlu were 65% and 83%, respectively.
CONCLUSIONS: Our data suggest that symptoms and vaccine status do not reliably identify patients with influenza. Use of WBC counts and the ZstatFlu test can be helpful. The sequence, combination, and criteria for use of these tests depend on tradeoffs between undertreatment of influenza cases and the overtreatment of noninfluenza cases, and the cost and benefit projections for individual patients.
Influenza affects between 20% and 30% of the United States population annually. Three fourths of the infected individuals develop an acute respiratory illness, and one third of these seek medical attention. In a typical year, more than 20,000 to 40,000 Americans die, and more than 100,000 are hospitalized because of complications related to influenza.1 Immunizations usually provide between 70% and 90% protection, yet many people at high risk do not receive the vaccine. An effective method for quickly differentiating patients with influenza from those with other respiratory illnesses might be helpful, since 2 new medications (the zanamivir inhaler and oseltamivir tablets) that are active against both influenza A and B are now available. Two other medications, amantadine and rimantidine, are active against influenza A only.2,3 In the past, physicians have relied on symptoms alone or in conjunction with a manual or automated white blood cell (WBC) count with a differential count to assist them in making the diagnosis of influenza. However, several researchers have found that symptoms and signs have relatively poor predictive value during influenza outbreaks.4-6 A recent pooled analysis of 3744 subjects involved in clinical trials of the antiviral agent zanamivir found that patients with influenza were somewhat more likely to have fever (68% vs 40%), cough (93% vs 80%), and nasal congestion (91% vs 81%) than patients with other infections.7
ZstatFlu is an office test for the diagnosis of both influenza A and B. It detects neuraminidase, an enzyme found on the influenza virus. In the presence of the virus, chromagen is cleaved off a synthetic neuraminidase substrate. A positive test results in a blue color change. The ZstatFlu test is 1 of 4 approved tests on the market for rapid diagnosis of influenza. Three of the 4 detect both influenza A and B8 To determine the most effective and efficient approach to the office diagnosis of influenza, we collected and analyzed clinical and laboratory data from patients seen in a family practice office setting during 2 consecutive influenza seasons.
Methods
Patients
We included consecutive patients presenting to a private family practice clinic that had 4 physicians and 1 physician assistant in a suburban setting in Oklahoma between January and March 1999 and November 1999 and January 2000 with fever, cough, sore throat, myalgia, and/or headache. No patients were systematically excluded. Consenting patients received a WBC and differential count, a rapid flu test (ZstatFlu), and an influenza culture by oropharynageal swab. Some patients consented to some but not all of these tests. Cultures were unavailable during a portion of each study period. Viral serologies were not done.
Procedure
Patients were triaged by the office nurse, who asked when they had become ill and whether they had experienced any of the following signs or symptoms: fever higher than 101°F (38.5 °C), cough, sore throat, headache, or myalgia. They were also asked if they had received that year’s flu vaccine. A WBC and differential count was performed by a laboratory technician in the clinic using a Cell-Dyn 1700 machine (Abbott Laboratories; Abbott Park, Ill). The ZstatFlu test was performed by the same laboratory technician using the method described by the manufacturer. An oropharyngeal swab for influenza culture was also obtained. The swabs were placed in viral culture medium and refrigerated. They were picked up once daily and transported to one of 2 laboratories (at ZymeTx or the Oklahoma State Department of Health) and plated that day. All participating patients gave informed consent. The protocol was approved by the Research Consultants Review Committee, Austin, Texas, and by the Institutional Review Committee at the University of Oklahoma Health Sciences Center.
Data Analysis
Only patients who had an influenza culture could be included in the analysis. Three separate influenza epidemics occurred during the 2 years of data collection. These outbreaks were first analyzed separately to evaluate consistency of results across epidemics and then as a combined data set for determination of overall test characteristics.
The following variables were considered: clinician, patient age, sex, duration of symptoms, delay in presentation, vaccine, cough, fever, myalgias, sore throat, headache, WBC count, differential WBC count, ZstatFlu result, and culture result. An additional variable, “flu symptoms,” was defined as the combination of fever, cough, and myalgia. Delay in presentation was further categorized as 2 days or less or more than 2 days, since treatment is most effective when begun within 2 days of the onset of illness. A left-shifted WBC count was defined arbitrarily as a polymorphonuclear leukocyte proportion greater than 60%, and a right-shifted WBC count was defined arbitrarily as a lymphocyte proportion greater than 40%.
Within each epidemic group, patients with positive cultures were compared with those who had negative cultures. Since the 2 influenza epidemics (A and B) during the first year occurred simultaneously, patients with negative cultures during that time were used for comparisons in both groups for these initial analyses. Comparisons were made for age and duration of symptoms using the Student’s t test for independent samples. All other comparisons were made using the chi-square statistic.
We combined all data. For these combined analyses, the influenza-negative patients from year 1 were counted only once. Receiver operating characteristic (ROC) curves were constructed for the ZstatFlu test, WBC count, and WBC count combined with the ZstatFlu test. Rockit 0.9B software (University of Chicago; Chicago, Ill) was used to determine the area under the curve (AUC) and confidence intervals for the WBC count and the WBC count combined with the ZstatFlu test by maximum likelihood estimation of the ROC parameters.9 Individual cut-points for WBC counts were compared as binary tests by calculating the AUC for each.10 To determine the AUC for the ZstatFlu test we used the nonparametric Wilcoxin statistic.11 The logistic regression modeling function of Statistix7 software was used to analyze the individual and combined predictive properties of WBC count and ZstatFlu. Positive and negative likelihood ratios were calculated using standard formulas12; they correspond to the degree that a positive test result rules in disease and a negative test result rules out disease, respectively. These were used to estimate the rates of over- and undertreatment of influenza cases under 2 different baseline assumptions (pretest probabilities of influenza of 25% and 50%). Confidence intervals for sensitivity and specificity were calculated using the normal approximation to the binomial method.13
Results
We enrolled 382 patients during the first year (268 had influenza cultures performed) and 225 patients during the second year (90 were cultured). The total analyzable sample of cultured patients was 358 patients. In most cases, those who did not have cultures performed were seen on days when culture medium or laboratory pick-up were not available. Patients who had a culture performed were more likely to have a cough (P=.01) but otherwise did not differ from those who did not have a culture.In year 1, the influenza strains were A/Sydney (H3N2) and B/Bejing. In year 2, the strain was again A/Sydney (H3N2). The youngest patient with a positive flu culture was aged 10 months and the oldest was 73 years of age. The breakdown by age, sex, duration of symptoms, vaccine status, symptoms, WBC/differential, and ZstatFlu results by epidemic is shown in Table 1.
The presentation of influenza during the 3 epidemics differed. For example, the Beijing-like flu B in year 1 was more likely to infect younger people (mean age = 22.2 years) and was unlikely to cause a left WBC shift (25%), while the influenza A strain seen in the second year was more likely to infect older people (mean age = 28.3 years) and to be associated with a left WBC shift (72%). Culture-positive patients were somewhat more likely to report fever during 2 of the 3 outbreaks, but no single symptom or the symptom complex—fever, cough, and myalgias—reliably distinguished flu cases from nonflu cases across all epidemics.
Fifteen percent, 7%, and 17% of patients with positive influenza cultures in the 3 epidemics had received the vaccine. Both influenza strains were included in the vaccines given during those 2 years. However, immunization status was not consistently helpful for distinguishing influenza cases from those with other flu-like illnesses. Duration of symptoms was only associated with culture result in the year 2 flu A epidemic, in which influenza patients, on average, presented a half day earlier.
The WBC count was strongly associated with culture result in all 3 epidemics. As the WBC count increased, the likelihood of a positive culture decreased. A right or left shift in the differential count was not consistently related to the probability of a positive culture. WBC count was positively correlated with duration of symptoms in children (Pearson correlation coefficient = 0.20; P=.04) and negatively associated with symptom duration in adults (Pearson correlation coefficient = -0.15; P=.66). There was also a negative association between left shift and duration of symptoms (P=.001) and a positive association between right shift and duration of symptoms (P=.01) for all patients, suggesting that influenza patients develop a left shift at onset of infection and later convert to a right shift.
ROC curves were constructed using various levels of WBC counts with and without the ZstatFlu test Figure 1. For WBC count alone, the AUC was 0.67 (95% confidence interval [CI], 0.61-0.74). By comparison, the AUC for the ZstatFlu test was 0.74 (95% CI, 0.68-0.80). The ROC curve describing the use of a combination of ZstatFlu test and the WBC count had an AUC of 0.82 (95% CI, 0.76-0.87); this was better than WBC alone but not significantly different from ZstatFlu alone.
WBC counts greater than 7000 (negative likelihood ratio = 0.41) were superior to a negative ZstatFlu test result at confirming the absence of the flu. WBC counts less than 3200 (positive likelihood ratio = 7.21) were superior to a positive ZstatFlu test result at confirming the presence of the flu. A WBC count greater than 6300 had greater sensitivity (67%) than the ZstatFlu test, however, for WBC counts between 6300 and 7000, the gain in sensitivity did not offset the loss in specificity. A WBC count less than 4600 had a greater specificity (84%) than the ZstatFlu test, but for WBC counts between 3200 and 4600 the gain in specificity did not offset the loss in sensitivity.
Table 2 shows the characteristics of WBC counts at several cut-points, of the ZstatFlu test, and of their combinations. Using the one test strategy of treating those with a WBC count of 8000 or less would ensure treatment of almost all influenza cases (92%). Using the ZstatFlu test as a one testing strategy would assure that most of the patients treated have the flu but would miss 44% patients with the flu. Adding a WBC count if the ZstatFlu test result is negative improves sensitivity but reduces specificity. The predictive values positive and negative in the Table 1 are based on a previous probability of 50% (peak of flu season). These values would obviously be lower at the beginning or ending of an epidemic.
Discussion
Unfortunately, signs, symptoms, and vaccine status may be of little consistent value in distinguishing patients with influenza from those with other respiratory illnesses during influenza season. During some epidemics, fever and cough may be of some help, but this will depend heavily on what other illnesses are prevalent at the same time. Others have observed that symptoms have low predictive value and that physicians have difficulty identifying flu cases during epidemics.5,6 Monto and colleagues7 reported that fever and cough occurred more frequently among influenza patients involved in clinical trials of an antiviral agent, but these results may not apply directly to primary care settings and represent pooled findings across several epidemics.
Vaccine status was also not helpful in this study for distinguishing influenza culture-positive patients. Influenza vaccination is effective in only 70% to 90% of patients. Therefore, there will always be vaccine-positive patients who develop the flu. Our data do not provide quantifiable information about overall vaccine efficacy, but the number of vaccine-positive patients was small, suggesting that the vaccine may have been effective in the community at large, though not in the culture-positive patients included in this study.
Both the WBC count and the ZstatFlu test can be helpful for identifying influenza cases. The testing strategy of choice depends to some degree on a number of factors including cost, duration and severity of symptoms, comorbidities, and potential adverse effects of treatment. The ZstatFlu test costs approximately $20. The cost of a WBC count is approximately $30, but it may have additional diagnostic value. Treating the patient with either zanamivir or oselfamivir costs $50 to $60, rimantidine $30, and amantadine $6.
The monetary value of an earlier return to work, reduced caregiver burden, or reduced transmission of infection will vary greatly. If the goal is to treat nearly every influenza case, a strategy of treating those with a WBC of 8000 or less appears to be the best strategy. If the goal is to be sure that only patients with the flu are treated, then treatment should be reserved for those who are ZstatFlu positive. Each patient and each physician would be expected to have different treatment thresholds that would affect the testing strategy. More than half the patients with positive influenza cultures were seen within 2 days of the onset of symptoms. These patients are the ones who would be most likely to benefit from the newer antiviral agents. For example, if the treatment threshold for a particular patient was 50%, no testing would have been necessary in any of the epidemics studied, since the pretest probabilities were all greater than 50%. An analysis that includes patient preferences would be helpful to determine the most cost-effective strategy.
The specificity of the ZstatFlu test is reported to be between 95% and 100%.4 However, when performed in this community family practice office by a laboratory technician trained by the test’s manufacturer, the specificity was only 85%. Many of the false-positive test results were coded as “weakly positive,” suggesting that the end point for positivity was somewhat unclear or that the laboratory technician was influenced by the patient’s symptoms. The specificity improved in the second year, suggesting an improvement in technique. We submit that this is an example of the discrepancy between test characteristics determined under “ideal” circumstances and test characteristics in actual practice settings. Another explanation is that patients with weakly positive ZstatFlu test results actually had influenza that would have been documented had serology been used as the gold standard instead of culture.
Limitations
A weakness of our study is the proportion of patients with flu-like symptoms from whom culture results were not available. Flu season actually began earlier than January during the 1998-1999 season and extended beyond January with the 1999-2000 season, but cultures were not available during a portion of these time periods. Although patients with cough were more likely to be cultured, this potential bias should not have affected our conclusions, since cough was not associated with culture result.
Two other diagnostic concerns are the lack of serologic tests and the known tendency of cultures to be more reliable early in the illness. Serology would probably have identified some additional influenza cases. This would have resulted in higher pretest probabilities of influenza. It is unclear how it would have affected the other analyses. Since there was no association between duration of symptoms and culture result in 2 of the 3 epidemics and in the combined analysis, we do not believe that waning sensitivity of flu cultures was a significant factor in this population. In the third epidemic it seems more likely that flu patients felt worse (eg, had more myalgias) and therefore came in earlier than that cultures became negative in those who delayed seeing the physician. Additionally, the study had insufficient power to detect a statistically significant difference between the diagnostic value of the ZstatFlu alone, and the combination of the WBC count and the ZstatFlu test.
It should be noted that patients were enrolled during an outbreak of influenza. In fact, the practice involved was one of the first in the state to recognize the onset of the epidemic because they were involved in this study. The conclusions reached about diagnostic strategies can only be generalized to similar epidemic situations.
Conclusions
Since influenza is associated with considerable morbidity and mortality, especially in high-risk populations, and given the brief window of opportunity (less than 48 hours) to treat patients with the flu with the newer agents, early and accurate diagnosis may be important in at least some cases. The use of screening WBC counts or rapid antigen tests could improve patient care during influenza epidemics. A cost-effectiveness analysis is needed to more fully elucidate this issue.
Acknowledgments
We would like to acknowledge the financial support of ZymeTx Corporation for supplying the ZstatFlu reagents, training, and influenza cultures at no cost. The support received from ZymeTx Corporation was unrestricted, and the company had no influence on our decision to analyze the results in the manner that we did or on the contents of this manuscript. We also want to thank Lavonne Glover for her expert assistance and patience in the preparation of the manuscript.
1. Prevention and control of influenza: recommendations of the Advisory Committee on Immunization Practices (ACIP) MMWR 1999;48:1-28.
2. Neuraminidase inhibitors for treatment of influenza A and B infections. MMWR 1999;48:1-9.
3. Jeffereson TO, Demicheli V, Deeks JJ, Rivetti D. Amantadine and rimantadine for preventing and treating influenza A in adults. Cochrane Database Syst Rev 2000;12:CD001169.-
4. Govaert ME, Dinant GJ, Aretz K, Knottnerus JA. The predictive value of influenza symptomatology in elderly people. J Fam Pract 1998;15:16-27.
5. Carrat F, Tachet A, Housset B, Valleron AJ, Rouzioux C. Influenza and influenza-like illness in general practice: drawing lessons from surveillance from a pilot study in Paris, France. B J Gen Pract 1997;47:217-20.
6. Long CE, Hall CB, Cunningham CK, et al. Influenza surveillance in community-dwelling elderly compared with children. Arch Fam Med 1997;6:459-65.
7. Monto AS, Gravenstein S, Elliott M, Colopy M, Schweinkle J. Clinical signs and symptoms predicting influenza infection. Arch Intern Med 2000;160:3243-47.
8. Rapid diagnostic tests for influenza. Med Letter 1999;41:121-22.
9. Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC curve estimates obtained from partially paired datasets. Med Dec Making 1998;18:110-21.
10. Cantor SB, Kattan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Dec Making 2000;20:468-70.
11. Hanley JA. Alternative approaches to receiver operating characteristic analyses. Radiology 1988;168:568-70.
12. Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. London, England: Churchill Livingstone; 1997.
13. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons, 1981.
1. Prevention and control of influenza: recommendations of the Advisory Committee on Immunization Practices (ACIP) MMWR 1999;48:1-28.
2. Neuraminidase inhibitors for treatment of influenza A and B infections. MMWR 1999;48:1-9.
3. Jeffereson TO, Demicheli V, Deeks JJ, Rivetti D. Amantadine and rimantadine for preventing and treating influenza A in adults. Cochrane Database Syst Rev 2000;12:CD001169.-
4. Govaert ME, Dinant GJ, Aretz K, Knottnerus JA. The predictive value of influenza symptomatology in elderly people. J Fam Pract 1998;15:16-27.
5. Carrat F, Tachet A, Housset B, Valleron AJ, Rouzioux C. Influenza and influenza-like illness in general practice: drawing lessons from surveillance from a pilot study in Paris, France. B J Gen Pract 1997;47:217-20.
6. Long CE, Hall CB, Cunningham CK, et al. Influenza surveillance in community-dwelling elderly compared with children. Arch Fam Med 1997;6:459-65.
7. Monto AS, Gravenstein S, Elliott M, Colopy M, Schweinkle J. Clinical signs and symptoms predicting influenza infection. Arch Intern Med 2000;160:3243-47.
8. Rapid diagnostic tests for influenza. Med Letter 1999;41:121-22.
9. Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC curve estimates obtained from partially paired datasets. Med Dec Making 1998;18:110-21.
10. Cantor SB, Kattan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Dec Making 2000;20:468-70.
11. Hanley JA. Alternative approaches to receiver operating characteristic analyses. Radiology 1988;168:568-70.
12. Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. London, England: Churchill Livingstone; 1997.
13. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons, 1981.
Are Fluid-Based Cytologies Superior to the Conventional Papanicolaou Test? A Systematic Review
STUDY DESIGN: This was a systematic review of original research reports evaluating both conventional Pap and FBC with respect to specimen adequacy, comparison with a reference standard, or both. Two reviewers independently reviewed the articles to determine inclusion status, with differences resolved by consensus with a third author. Risk differences (RD) between occurrence rates for FBC and Pap were used for the specimen adequacy data.
DATA SOURCES: Studies published between 1985 and November 1999 were identified from MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM, and The Cochrane Library.
OUTCOMES MEASURED: Sensitivity, specificity, area under the receiver operating characteristic curve (AuROC), and the proportion of satisfactory, unsatisfactory, and “satisfactory but limited by” test results were measured.
RESULTS: There was no significant difference in AuROC (p=.37). FBC specimens were more likely to be satisfactory (RD=0.06; 95% confidence interval [CI], 0.03-0.09) or to have absent endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but had 10% fewer “satisfactory but limited by—other” reports (RD = -0.10; 95% CI, -0.14 to -0.06). There was no difference in unsatisfactory Pap test results.
CONCLUSIONS: For most women there is no reason to replace Pap with FBC. For women at high risk of cervical cancer or who are screened infrequently, the possible increase in FBC sensitivity may outweigh the potential harms from additional false positives.
Despite mass Papanicolaou (Pap) test screening, approximately 12,800 women are given the diagnosis of cervical cancer in the United States each year, and approximately 4600 die of the disease.1 Fahey and colleagues2 found an average sensitivity of 58% (range = 11%-99%). Nanda and coworkers3 reported a sensitivity of 30% to 87%, with a specificity of 86% to 100%. Follen Mitchell and colleagues4 reported sensitivities of 67% and specificities of 77%. Multiple factors including sampling technique, patient preparation, test fixation and staining, and interpretation accuracy5 can increase the false-negative rates of conventional Pap.
Fluid-based cytology (FBC) procedures use a fluid medium to capture and preserve the collected cells from the cervical-sampling device. The collected sample is homogenized using an automated device, and a subsample of cellular material is placed on a glass slide in a circumferential area. Because the technique provides a uniform thin layer and excludes obscuring debris, it eliminates problems often encountered with Pap including poor fixation, uneven thickness of the cellular spread, air-drying artifact, and obscuring of cells by blood or inflammatory exudates.6
We performed a systematic review to evaluate the accuracy of FBC (by comparing its sensitivity and specificity with Pap) and the specimen adequacy of this new method (by comparing the proportion of FBC and Pap slides reported as unsatisfactory or “satisfactory but limited” by either absence of endocervical cells or other factors).
Methods
Search Strategy
Our literature search was designed to find studies comparing FBC and Pap. The search was assisted by a medical librarian and used medical subject headings (MeSH) and text words. The search terms included monolayer technology, ThinPrep, CytoRich, Cytoprep, Autoprep, AutoCyte, Papanicolau/pap smear, liquid-based cytology, fluid-based cytology, cervical cancer screening, and vaginal smears. MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM and The Cochrane Library were searched to retrieve all potentially relevant English-language articles published between 1985 (first literature published on fluid-based cytology) and November 1999. An attempt was made to contact both FBC manufacturers to find any other available articles and abstracts in publication. The authors of articles with incomplete data were contacted in an attempt to obtain missing information. The reference lists of retrieved articles were manually searched for additional citations.
Study Inclusion
Articles were reviewed for inclusion if they contained reports of original research evaluating both conventional and fluid-based cytology samples (either ThinPrep or AutoCyte). Studies in which both tests were simultaneously applied to the same group of women (split-sample studies) and those in which one group of women who received Pap was compared with a group that received FBC (cohort studies) were included. Two reviewers independently reviewed the titles. When either reviewer felt the study might merit inclusion, the full article was retrieved. The same authors independently reviewed each article to determine whether it met the inclusion criteria. Differences were resolved by consensus with a third author.
We used the entire set of studies to address the question of specimen adequacy. To address FBC accuracy, we identified the subset of articles that compared the 2 tests with an external reference standard. We excluded articles from this analysis unless they reported colposcopy and biopsy results for at least 50% of the women with a finding of high-grade squamous intraepithelial lesions or higher on either Pap or FBC. Because few studies subjected all women with normal Pap tests to colposcopy, we also included studies that provided colposcopy to a random sample of women with normal Pap or FBC tests and those that subjected normal tests to an independent consensus review by a panel of experienced cytology professionals.7
Study Quality Assessment
For the studies addressing accuracy, we developed a criteria form for extracting infusing criteria.8 Two points were assigned and summed to create an overall study quality score (maximum score = 13). Two reviewers independently assessed the quality of each article with differences resolved by consensus Table 1.
Data Extraction
Three reviewers independently extracted data using a structured form. Differences were resolved by consensus. Specimen adequacy was classified as satisfactory, satisfactory but limited by absence of endocervical cells (SBLB-absence), satisfactory but limited by other (SBLB-o), and unsatisfactory. SBLB-o included obscuring inflammatory exudate, blood, thick tests, scant cellularity, and air-drying artifact.
Data Synthesis and Analysis
Summary estimates of sensitivity and specificity were made from studies that used an appropriate reference standard using a DerSimonian and Laird random effects model. Sensitivity and specificity were pooled independently and weighted by the inverse of the variance using MetaTest software (version 0.6, Joseph Lau, MD, with permission). The MetaTest program was also used to calculate the area under receiver operating characteristic (AuROC) curves, and the difference between AuROC was calculated using ROCKET 0.9B software for ROC analysis (Charles E. Metz, Department of Radiology University of Chicago, March 1998). The AuROC is a measure of overall diagnostic accuracy, where 1.0 is a perfect test, and 0.5 is a test that is no better than chance at distinguishing normal from abnormal specimens. For analysis of specimen adequacy, we used the RevMan 4.1 software (Cochrane Collaboration, Update Software, Oxford, England) to calculate rate differences. Study homogeneity analyses were performed, and our analysis plan called for the use of a random effects model if significant heterogeneity (P >.05) was found.
Results
Search Strategy and Study Inclusion
We identified 62 articles for critical appraisal. Because some authors published more than one article from a single study9-23 and one author combined 2 studies into one article,24 the 62 articles represented 47 actual studies. Fifty-two articles met the initial inclusion criteria.6,9-59 Ten articles were excluded, because they did not contain a reference standard or specimen adequacy data, or they restricted their reports to only a subset of Pap results, such as atypical glandular cells of uncertain significance (AGUS) or atypical squamous cells of uncertain significance (ASCUS).24,39,61-68
Study Characteristics and Qualitative Synthesis
Most articles provided no systematic comparison with any reference standard and could therefore only be used to evaluate specimen adequacy. In some cases, histologic results were reported for some patients and compared with Pap and FBC reports; however, these appeared to be haphazard samples of patients with a positive result on one or both tests. Most articles compared the results obtained from FBC and Pap with the assumption that the better test was the one with the higher proportion of positives, ignoring the possibility of false-positive tests.
Five studies included a comparison with a reference standard.18,27,35,40,50 In all 5, both tests were performed at the same time in all patients. After the cervix was scraped in the usual fashion, the sampling device was wiped across a slide for the conventional Pap and then rinsed in a vial containing the appropriate solution for the FBC method.
Three studies systematically compared FBC and Pap results with colposcopy and biopsy. One12,35 involved women referred to a colposcopy clinic because of a previous abnormal test result; Pap and FBC Pap test results were obtained, and colposcopic examinations were done on all women referred. A second16,23,55 studied 782 patients referred for colposcopy after an abnormal Pap test result; colposcopy was performed, and biopsies were taken from 445 of these patients. In the third study40 A total of 8636 randomly selected Costa Rican women were each screened with Pap, FBC, and cervicography. All women with a suspicious physical examination for cancer or with any abnormality on any of the 3 tests were referred for colposcopy, along with a random sample of 150 women with no abnormalities.
The remaining 2 studies27,50 used consensus between independent reviewers of the Pap and FBC test results as the reference standard, with biopsy for at least 50% of the women with significant abnormalities on either or both tests. In the first study50 a total of 2778 split samples were obtained and evaluated in both Germany and the United States. In Germany, masked slides were reviewed by cytotechnologists, and pathologists reviewed all abnormal and discrepant slides. Masked review was repeated in the United States, and senior cytotechnologists and cytopathologists rescreened abnormal slides. The cases were then unmasked, and discrepant cases were reviewed. A subset of histologic data (1235 samples) was analyzed, and a final reference diagnosis was made. In the second study27 2009 sample pairs from a multicenter trial were evaluated blindly by 2 cytotechnologists, with all abnormal and 10% of normal slides reviewed by 1 of 6 pathologists. All sample pairs containing an abnormal result were then sent for a second masked opinion by cytotechnologists and pathologists. Consensus data were summarized and reported.
Study Quality Assessment
The total quality assessment scores of the 5 studies ranged from 7 to 10 out of a possible maximum score of 13. In general, the studies offered little or no description of the clinical characteristics of the women screened. Most did not even mention the women’s ages. All of the studies included at least some women who were at high risk for an abnormal Pap test result. Two of the studies included only women referred because of a previously abnormal result. Three studies included colposcopy clinics among their recruiting sites. One study was done in Costa Rica because of the known high prevalence of abnormal Pap test results in that country. One study specified that a consecutive series of patients was used, and one used a population-based random sample. The other 3 studies gave no details of how patients were selected for inclusion in the trial. All studies used prospectively collected data, and the methods used to actually perform the Pap and FBC tests were universally well described. Two of the studies used methods to minimize the effects of interobserver reliability in the evaluation of the test results, but none addressed this issue as it related to the assessment of the reference standard.
Sensitivity and Specificity
The ROC curves for FBC and Pap using all 5 of the above studies are displayed in Figure 1. The AuROC curves were similar (Pap=0.93; FBC= 0.91). This difference was not significant (P=.37), and the confidence interval (CI) was wide (95% CI, -0.33 to 0.80). FBC demonstrated higher sensitivity, 90% (95% CI, 0.77-0.96) versus 79% (95% CI, 0.59-0.91) for Pap. FBC had a lower specificity, 85% (95% CI, 0.74-0.92) versus 89% (95% CI, 0.75-0.96) for Pap.
Specimen Adequacy
FBC specimens were more likely to be reported as satisfactory (RD=0.06; 95% CI, 0.03-0.09; Figure 2. There was no significant difference in the number of unsatisfactory test results. There was a 6% higher rate of absence of endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but a 10% decrease in reports of SBLB-o (RD = -0.10; 95% CI, -0.13 to -0.06) with FBC. The increase in absence of endocervical cells for FBC specimens was seen in all split-sample studies (RD= 0.08; 95% CI, 0.06-0.11) but not in the cohort studies (RD = -0.01; 95% CI -0.07 to 0.05).
Discussion
The use of FBC increases both true-positive and false-positive results when compared with Pap. These conclusions must be considered tentative because of the lack of statistical significance and the significant methodologic problems found in the studies used. Two recent meta-analyses have addressed the accuracy of FBC tests.3,60 Neither addressed the issue of specimen adequacy. Although neither included all 5 studies that we identified for our analysis of test accuracy, our findings of a possible increase in sensitivity with a decrease in specificity are consistent with the findings of the other 2 meta-analyses.
The possibility that FBC may increase sensitivity but decrease specificity should lead to caution in the adoption of this technology, especially for women at low risk for cervical cancer. For women with no history of abnormal findings on previous Pap tests (estimated prevalence of cervical cancer = 0.05%) more than 1800 FBC would be needed to detect one additional true positive. More than 50 additional false-positive test results (“cancer scares”) would accompany each additional true positive. The costs of follow-up investigations for these additional false-positive tests must be added to the additional cost of the test itself in assessing the potential impact of a widespread switch to this new technology. Many women are at risk because they fail to obtain regular Pap tests, often because of lack of insurance and cost barriers. Unfortunately, the higher cost of FBC may make these women even less likely to have screening performed. Thus widespread use of FBC could, paradoxically, lead to an increase rather than a decrease in cervical cancer deaths by decreasing the use of this important test by lower income women.
The trade-off between sensitivity and specificity is more favorable for women with higher disease rates. In a population with a 3% prevalence of cervical disease, only 300 FBC Pap tests would be required to detect one additional abnormality, and only 16 additional false-positive tests would result. Thus, for women who have had prior abnormal Pap test results or are known to be infrequent attenders, there may be a role for FBC.
Pap test results reported as less than satisfactory can present a significant problem with increased office return visits, increased psychologic trauma to the patient, and increased costs of repeated tests. Thus the decrease in “SBLB other” reports is a benefit of FBC, although it is partially offset by an increased absence of endocervical cells. Some have suggested that the absence of endocervical cells is an artifact of the study methods for split-sample studies (where the collection instrument is first wiped across a slide for the Pap and then inserted into the FBC vial). We do not understand why this process would preferentially extract endocervical cells. We did note that the cohort studies showed no increase in SBLB absence for FBC. However, in these studies, the Pap and FBC specimens were collected from different women and often involved different time periods and different physicians using different collection instruments. Both the degree of heterogeneity and the reported differences on all comparisons between Pap and FBC Pap tests were consistently larger in the cohort than in the split-sample studies.
FBC offers the advantage of doing HPV testing on the same Pap specimen in triaging the patient diagnosed with ASCUS. As shown in the ASCUS/LSIL Triage Study,61 patients with ASCUS Pap test results and negative human papillomavirus studies can be followed with annual Pap tests without colposcopy or more frequent screening.
Limitations
The greatest limitation of the studies was the lack of comparision of Pap test results and colposcopy with biopsy of any suspicious areas. We therefore also included studies that used an alternative reference standard for women with negative Pap results. However, the use of different reference standards for patients with positive and negative test results has been shown to result in overestimates of test sensitivity.8 The fact that none of the studies addressed the issue of blinding in interpretation of the reference standard is also of concern. Perhaps the most important limitation involves the fact that the patients included in these studies came from high-risk populations. It is not clear how well these results will generalize to the many women at low risk who receive Pap tests in the offices of American primary care physicians. An additional limitation of our study includes the potential for publication bias, since we included only those articles written in English.
Conclusions
Before widespread adoption of this new FBC technology, it would be advisable to have additional acceptable reference standards studies, designed to avoid verification bias and to ensure that equivalent specimen collection methods are used. Once reliable estimates of the relative sensitivity and specificity of the FBC Pap test are available from such investigations, a decision can be made about whether the benefits derived from widespread adoption would outweigh any disadvantages or additional costs.
1. Greenlee R, Taylor M, Bolden S, Wingo P. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. Fahey M, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. Am J Epidemiol 1995;141:680-89.
3. Nanda K, McCrory DC, Myers ER, et al. Accuracy of the Papanicolau test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann Intern Med 2000;132:810-19.
4. Follen Mitchell M, Cantor SB, Brookner C, Utzinger U, Schottenfeld D, Richards-Kortum R. Screening for squamous intraepithelial lesions with fluorescence spectroscopy. OB GYN 1999;94:889-96.
5. Gay JD, Donaldson LD, Goellner JR. False negative results in cervical cytologic studies. Acta Cytologica 1985;29:1043-46.
6. Dupree WB, Suprun HZ, Beckwith DG, Shane JJ, Lucente V. The promise and risk of a new technology: the Lehigh Valley Hospital’s experience with liquid-based cervical cytology. Cancer (Cancer Cytopathology) 1998;84:202-07.
7. Proposed guidelines for primary screening instruments for gynecologic cytology: Intersociety Working Group for Cytology Technologies Am J Clin Path 1997;109:10-15.
8. Lijmer J, Mol B, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-66.
9. Ashfaq R, Birdsong G, Corkill M, Inhorn S. Improved specimen adequacy with the ThinPrep 2000 System: reductions in satisfactory but limited by…interpretations (Abstract presentation at the 44th scientific meeting). Acta Cytologica 1996;40:1046-47.
10. Bishop JW, Cheuvront DA, Elston RJ. Utility of residual AutoCyte cervical cytology samples of image analysis. Acta Cytologica 1999;43:39-46.
11. Corkill M, Knapp D, Martin J, Hutchinson M. Speciman adequacy of ThinPrep sample preparations in a direct-to-vial study. Acta Cytologica 1997;41:39-44.
12. Ferenczy A, Robitaille J, Franco E, et al. Conventional cervical cytologic smears vs. ThinPrep smears: a paired comparison study on cervical cytology. Acta Cytologica 1996;40:1136-42.
13. Howell P, Belk T, Agdigos R, Davis R, Lowe J. AutoCyte interactive screening system: experience at a university hospital cytology laboratory. Acta Cytologica 1999;43:58-64.
14. Inhorn SL, Wilbur D, Zahniser D, Linder J. Validation of the ThinPrep Papanicolaou test for cervical cancer diagnosis. J Lower Genital Tract Dis 1998;2:208-12.
15. Inhorn SL, Sherman M. Independent Pathologist review of ThinPrep and conventional Pap smears from multisite clinical trials. Acta Cytologica (Abstract presentation at 44th annual scientific meeting) 1996;40:1044.-
16. Lee KL, Madge R, Sheets EE. Colposcopically directed biopsy as a basis for comparing the diagnostic accuracy of the ThinPrep and Papanicolaou smear methods. Acta Cytologica (Abstract presentation 44th annual scientific meeting) 1996;40:1047.-
17. Linder J. Recent advances in thin-layer cytology. Diagnostic Cytopathol 1998;18:24-32.
18. Sheets EE, Constantine NM, Dinisco S, Dean B, Cibas ES. Colposcopically directed biopsies provide a basis for comparing the accuracy of ThinPrep and Papanicolaou smears. J Gynecologic Techniques 1995;1:27-34.
19. Sherman ME, Schiffman MH, Lorincz AT, et al. Cervical specimens collected in liquid buffer are suitable for both cytologic screening and ancillary human papillomavirus testing. Cancer 1997;81:89-97.
20. Sherman ME, Mendoza M, Lee KR, et al. Performance of liquid-based, thin-layer cervical cytology: correlation with reference diagnoses and human papillomavirus testing. Mod Pathol 1998;11:837-43.
21. Sherman ME, Schiffman M, Herrero R, et al. Evaluation of conventional and novel cervical cancer screening methods in a population-based study of 10,000 Costa Rican women. ACTA Cytological Abstract Presentation 43rd Annual Scientific Meeting 1995;39:983.-
22. Vassilakos P, Griffin S, Megevand E, Campana A. CytoRich liquid-based cervical cytologic test: screening results in a routine cytopathology service. Acta Cytologica 1998;42:198-202.
23. Zahniser DJ, Sullivan PJ. CYTYC corporation. Acta Cytologica 1996;40:37-44.
24. Vassilakos P, Saurel J, Rondez R. Direct-to-vial use of the AutoCyte PREP liquid-based preparation for cervical-vaginal specimens in three European laboratories. Acta Cytologica 1999;43:65-68.
25. Aponte-Cipriani SL, Teplitz C, Rorat E, Scaino A, Jacobs AJ. Cervical smears prepared by an automated device versus the conventional method: a comparative analysis. Acta Cytologica 1995;39:623-30.
26. Awen C, Hathway S, Eddy W, Voskuil R, Janes C. Efficacy of ThinPrep preparation of cervical smears: a 1,000-case, investigator-sponsored study. Diagn Cytopathol 1993;11:33-36.
27. Bishop JW. Comparison of the CytoRich system with conventional cervical cytology: preliminary data on 2,032 cases from a clinical trial site. Acta Cytologica 1997;41:15-23.
28. Bishop JW, Bigner SH, Colgan TJ, et al. Multicenter masked evaluation of AutoCyte PREP thin layers with matched conventional smears: including initial biopsy results. Acta Cytologica 1998;42:189-97.
29. Bolick DR, Hellman DJ. Laboratory implementation and efficacy assessment of the Thin Prep cervical cancer screening system. Acta Cytologica 1998;42:209-13.1999;87:105-12.
30. Bur M, Knowles K, Pekow P, Corral O, Donovan J. Comparison of ThinPrep preparations with conventional cervicovaginal smears. Acta Cytologica 1995;39:631-42.
31. Candel A, Davis B, Baklios R, Selvaggi S. The ThinPrep Pap test: a cost savings perspective. Lab Invest 1998;78:36A.-
32. Carpenter AB, Davey DD. Thin Prep Pap test: performance and biopsy follow-up in a university hospital. Cancer 1999;87:105-12.
33. Diaz-Rosario LA, Kabawa SE. Performance of a fluid-based, Thin-Layer Papanicolaou smear method in the clinical setting of an independent laboratory and an outpatient screening population in New England. Arch Pathol Lab Med 1999;123:817-21.
34. Emery J, Banks H, Holz J, DePriest P, Davey DD. The ThinPrep method for cervical-vaginal specimens in a high risk population. Acta Cytologica (Abstract presentation 45th annual scientific meeting) 1997;41-1579.
35. Ferenczy A, Franco E, Arseneau J, Wright TC, Richart RM. Diagnostic performance of hybrid capture human papillomavirus deoxyribonucleic acid assay combined with liquid based cytologic study. Am J Obstet Gynecol 1996;175:651-56.
36. Geyer JW, Hancock F, Carrico C, Kirkpatrick M. Preliminary evaluation of Cyto-Rich: an improved automated cytology preparation. Diagn Cytopathol 1993;9:417-22.
37. Guidos BJ, Selvaggi SM. Use of the ThinPrep Pap test in clinical practice. Diagn Cytopathol 1999;20:70-73.
38. Howell LP, Davis RL, Belk TI, Agdigos R, Lowe J. The AutoCyte preparation system for gynecologic cytology. Acta Cytologica 1998;42:171-77.
39. Hutchinson ML, Agarwal P, Denault T, Berger B, Cibas ES. A new look at cervical cytology: ThinPrep multicenter trial results. Acta Cytologica 1992;36:499-504.
40. Hutchinson ML, Zahniser DJ, Sherman ME, et al. Utility of liquid-based cytology for cervical carcinoma: screening. Cancer Cytopathol 1999;87:48-55.
41. Johnson JE, Jones HW, Conrad KA, Huff BC. Increased rate of SIL detection with excellent biopsy correlation after implementation of direct-to-vial ThinPrep liquid-based preparation of cervicovaginal specimens at a university medical center. Acta Cytologica (Abstract presentation 46th scientific meeting) 1998;42:1242-43.
42. Laverty CRA, Farnsworth A, Thurloe JK, Grieves A, Bowditch R. Evaluation of the CytoRich slide preparation process. Analyt Quant Cytol Histol 1997;19:239-45.
43. Laverty CRA, Thurloe JK, Redman NL, Farnsworth A. An Australian trial of ThinPrep: a new cytopreparatory technique. Cytopathology 1995;6:140-48.
44. Lee KR, Ashfaqu R, Birdsong GG, Korkill ME, McIntosh KM, Inhorn SL. Comparison of conventional Papanicolaou smears and a fluid-based, thin-layer system for cervical cancer Screening. Obstet Gynecol 1997;90:278-84.
45. McGoogan E, Reith A. Would monolayers provide more representative samples and improved preparations for cervical screening? Overview and evaluation of systems available. Acta Cytologica 1996;49:107-19.
46. Papillo JL, Zarka MA, St. John TL. Evaluation of the ThinPrep Pap test in clinical practice: a seven-month, 16,314-case experience in Northern Vermont. Acta Cytologica 1998;42:203-08.
47. Quddus MR, Xu B, Sung CJ, Boardman L, Lauchlan SC. Cytohisto correlations support the observation of increased detection of squamous intraepithelial lesions by the ThinPrep process. Acta Cytologica (Abstract presentation 46th annual scientific meeting) 1998;42:1243.-
48. Radio SJ, Burns KR, Munch TM, Quasi VM, Bohl KD, Severson MA. Paired comparison of conventional and ThinPrep cervical cytology in a high risk population. Lab Invest 1998;78:42A.-
49. Shield PW, Nolan GR, Phillips GE, Cummings MC. Improving cervical cytology screening in a remote, high risk population. MJA 1999;170:255-58.
50. Sprenger E, Schwarzmann P, Kirkpatrick M, et al. The false negative rate in cervical cytology: comparison of monolayers to conventional smears. Acta Cytologica 1996;40:81-89.
51. Stevens MW, Nespolon WW, Milne AJ, Rowland R. Evaluation of the CytoRich technique for cervical smears. Diagn Cytopathol 1998;18:236-42.
52. Vassilakos P, Cossali D, Albe X, Alonso L, Hohener R, Puget E. Efficacy of Monolayer preparations for cervical cytology: emphasis on suboptimal specimens. Acta Cytologica 1996;40:496-500.
53. Wang T-Y, Chen H-S, Yang Y-C, Tsou M-C. Comparison of fluid-based, Thin-Layer processing and conventional Papanicolaou methods for uterine cervical cytology. J Formos Med Assoc 1999;98:500-05.
54. Weintraub J. The coming evolution in cervical cytology: a pathologist’s guide for the clinician. En Gynecologie Obstetrique 1997;5:169-75.
55. Wilbur DC, Cibas ES, Merritt S, James LP, Berger BM, Bonfiglio TA. ThinPrep processor: clinical trials demonstrate an increased detection rate of abnormal cervical cytologic specimens. Am J Clin Pathol 1994;101:209-14.
56. Wilbur DC, Facik MC, Rutkowski MA, Mulford DK, Atkison KM. Clinical trials of the CytoRich specimen-preparation device for cervical cytology. Acta Cytologica 1997;41:24-29.
57. Wilbur DC, Dubesher B, Angel C, Atkison KM. Use of Thin-Layer preparations for gynecologic smears with emphasis on the cytomorphology of high-grade intraepithelial lesions and carcinomas. Diagn Cytopathol 1995;14:201-11.
58. Yang M, Zachariah S. Comparison of specimen adequacy between matched ThinPrep preparations and conventional cervicovaginal smears. Acta Cytologica (Abstract presentation 45th scientific meeting) 1997;41:1579.-
59. Hutchinson ML, Cassin CM, Ball HG. The efficacy of an automated preparation device for cervical cytology. Am J Clin Pathol 1991;96:300-05.
60. Roberts J, Gurley AM, Thurloe JK, Bowditch R, Laverty CA. Evaluation of the ThinPrep test as an adjunct to the conventional Pap smear. MJA 1997;167:466-69.
61. McCrory D, Bastian D, et al. Evaluation of cervical cytology: evidence report/technology assessment no. 5 (Prepared by Duke University under contract no. 290-97-0014). Rockville, Md: Agency for Health Care Policy and Research; 1999.
62. Solomon D, Schiffman M, Tarone R. Comparison of three management strategies for patients with atypical squamous cells of undetermined significance: baseline results from a randomized trial. J Natl Cancer Instit 2001;93:293-99.
STUDY DESIGN: This was a systematic review of original research reports evaluating both conventional Pap and FBC with respect to specimen adequacy, comparison with a reference standard, or both. Two reviewers independently reviewed the articles to determine inclusion status, with differences resolved by consensus with a third author. Risk differences (RD) between occurrence rates for FBC and Pap were used for the specimen adequacy data.
DATA SOURCES: Studies published between 1985 and November 1999 were identified from MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM, and The Cochrane Library.
OUTCOMES MEASURED: Sensitivity, specificity, area under the receiver operating characteristic curve (AuROC), and the proportion of satisfactory, unsatisfactory, and “satisfactory but limited by” test results were measured.
RESULTS: There was no significant difference in AuROC (p=.37). FBC specimens were more likely to be satisfactory (RD=0.06; 95% confidence interval [CI], 0.03-0.09) or to have absent endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but had 10% fewer “satisfactory but limited by—other” reports (RD = -0.10; 95% CI, -0.14 to -0.06). There was no difference in unsatisfactory Pap test results.
CONCLUSIONS: For most women there is no reason to replace Pap with FBC. For women at high risk of cervical cancer or who are screened infrequently, the possible increase in FBC sensitivity may outweigh the potential harms from additional false positives.
Despite mass Papanicolaou (Pap) test screening, approximately 12,800 women are given the diagnosis of cervical cancer in the United States each year, and approximately 4600 die of the disease.1 Fahey and colleagues2 found an average sensitivity of 58% (range = 11%-99%). Nanda and coworkers3 reported a sensitivity of 30% to 87%, with a specificity of 86% to 100%. Follen Mitchell and colleagues4 reported sensitivities of 67% and specificities of 77%. Multiple factors including sampling technique, patient preparation, test fixation and staining, and interpretation accuracy5 can increase the false-negative rates of conventional Pap.
Fluid-based cytology (FBC) procedures use a fluid medium to capture and preserve the collected cells from the cervical-sampling device. The collected sample is homogenized using an automated device, and a subsample of cellular material is placed on a glass slide in a circumferential area. Because the technique provides a uniform thin layer and excludes obscuring debris, it eliminates problems often encountered with Pap including poor fixation, uneven thickness of the cellular spread, air-drying artifact, and obscuring of cells by blood or inflammatory exudates.6
We performed a systematic review to evaluate the accuracy of FBC (by comparing its sensitivity and specificity with Pap) and the specimen adequacy of this new method (by comparing the proportion of FBC and Pap slides reported as unsatisfactory or “satisfactory but limited” by either absence of endocervical cells or other factors).
Methods
Search Strategy
Our literature search was designed to find studies comparing FBC and Pap. The search was assisted by a medical librarian and used medical subject headings (MeSH) and text words. The search terms included monolayer technology, ThinPrep, CytoRich, Cytoprep, Autoprep, AutoCyte, Papanicolau/pap smear, liquid-based cytology, fluid-based cytology, cervical cancer screening, and vaginal smears. MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM and The Cochrane Library were searched to retrieve all potentially relevant English-language articles published between 1985 (first literature published on fluid-based cytology) and November 1999. An attempt was made to contact both FBC manufacturers to find any other available articles and abstracts in publication. The authors of articles with incomplete data were contacted in an attempt to obtain missing information. The reference lists of retrieved articles were manually searched for additional citations.
Study Inclusion
Articles were reviewed for inclusion if they contained reports of original research evaluating both conventional and fluid-based cytology samples (either ThinPrep or AutoCyte). Studies in which both tests were simultaneously applied to the same group of women (split-sample studies) and those in which one group of women who received Pap was compared with a group that received FBC (cohort studies) were included. Two reviewers independently reviewed the titles. When either reviewer felt the study might merit inclusion, the full article was retrieved. The same authors independently reviewed each article to determine whether it met the inclusion criteria. Differences were resolved by consensus with a third author.
We used the entire set of studies to address the question of specimen adequacy. To address FBC accuracy, we identified the subset of articles that compared the 2 tests with an external reference standard. We excluded articles from this analysis unless they reported colposcopy and biopsy results for at least 50% of the women with a finding of high-grade squamous intraepithelial lesions or higher on either Pap or FBC. Because few studies subjected all women with normal Pap tests to colposcopy, we also included studies that provided colposcopy to a random sample of women with normal Pap or FBC tests and those that subjected normal tests to an independent consensus review by a panel of experienced cytology professionals.7
Study Quality Assessment
For the studies addressing accuracy, we developed a criteria form for extracting infusing criteria.8 Two points were assigned and summed to create an overall study quality score (maximum score = 13). Two reviewers independently assessed the quality of each article with differences resolved by consensus Table 1.
Data Extraction
Three reviewers independently extracted data using a structured form. Differences were resolved by consensus. Specimen adequacy was classified as satisfactory, satisfactory but limited by absence of endocervical cells (SBLB-absence), satisfactory but limited by other (SBLB-o), and unsatisfactory. SBLB-o included obscuring inflammatory exudate, blood, thick tests, scant cellularity, and air-drying artifact.
Data Synthesis and Analysis
Summary estimates of sensitivity and specificity were made from studies that used an appropriate reference standard using a DerSimonian and Laird random effects model. Sensitivity and specificity were pooled independently and weighted by the inverse of the variance using MetaTest software (version 0.6, Joseph Lau, MD, with permission). The MetaTest program was also used to calculate the area under receiver operating characteristic (AuROC) curves, and the difference between AuROC was calculated using ROCKET 0.9B software for ROC analysis (Charles E. Metz, Department of Radiology University of Chicago, March 1998). The AuROC is a measure of overall diagnostic accuracy, where 1.0 is a perfect test, and 0.5 is a test that is no better than chance at distinguishing normal from abnormal specimens. For analysis of specimen adequacy, we used the RevMan 4.1 software (Cochrane Collaboration, Update Software, Oxford, England) to calculate rate differences. Study homogeneity analyses were performed, and our analysis plan called for the use of a random effects model if significant heterogeneity (P >.05) was found.
Results
Search Strategy and Study Inclusion
We identified 62 articles for critical appraisal. Because some authors published more than one article from a single study9-23 and one author combined 2 studies into one article,24 the 62 articles represented 47 actual studies. Fifty-two articles met the initial inclusion criteria.6,9-59 Ten articles were excluded, because they did not contain a reference standard or specimen adequacy data, or they restricted their reports to only a subset of Pap results, such as atypical glandular cells of uncertain significance (AGUS) or atypical squamous cells of uncertain significance (ASCUS).24,39,61-68
Study Characteristics and Qualitative Synthesis
Most articles provided no systematic comparison with any reference standard and could therefore only be used to evaluate specimen adequacy. In some cases, histologic results were reported for some patients and compared with Pap and FBC reports; however, these appeared to be haphazard samples of patients with a positive result on one or both tests. Most articles compared the results obtained from FBC and Pap with the assumption that the better test was the one with the higher proportion of positives, ignoring the possibility of false-positive tests.
Five studies included a comparison with a reference standard.18,27,35,40,50 In all 5, both tests were performed at the same time in all patients. After the cervix was scraped in the usual fashion, the sampling device was wiped across a slide for the conventional Pap and then rinsed in a vial containing the appropriate solution for the FBC method.
Three studies systematically compared FBC and Pap results with colposcopy and biopsy. One12,35 involved women referred to a colposcopy clinic because of a previous abnormal test result; Pap and FBC Pap test results were obtained, and colposcopic examinations were done on all women referred. A second16,23,55 studied 782 patients referred for colposcopy after an abnormal Pap test result; colposcopy was performed, and biopsies were taken from 445 of these patients. In the third study40 A total of 8636 randomly selected Costa Rican women were each screened with Pap, FBC, and cervicography. All women with a suspicious physical examination for cancer or with any abnormality on any of the 3 tests were referred for colposcopy, along with a random sample of 150 women with no abnormalities.
The remaining 2 studies27,50 used consensus between independent reviewers of the Pap and FBC test results as the reference standard, with biopsy for at least 50% of the women with significant abnormalities on either or both tests. In the first study50 a total of 2778 split samples were obtained and evaluated in both Germany and the United States. In Germany, masked slides were reviewed by cytotechnologists, and pathologists reviewed all abnormal and discrepant slides. Masked review was repeated in the United States, and senior cytotechnologists and cytopathologists rescreened abnormal slides. The cases were then unmasked, and discrepant cases were reviewed. A subset of histologic data (1235 samples) was analyzed, and a final reference diagnosis was made. In the second study27 2009 sample pairs from a multicenter trial were evaluated blindly by 2 cytotechnologists, with all abnormal and 10% of normal slides reviewed by 1 of 6 pathologists. All sample pairs containing an abnormal result were then sent for a second masked opinion by cytotechnologists and pathologists. Consensus data were summarized and reported.
Study Quality Assessment
The total quality assessment scores of the 5 studies ranged from 7 to 10 out of a possible maximum score of 13. In general, the studies offered little or no description of the clinical characteristics of the women screened. Most did not even mention the women’s ages. All of the studies included at least some women who were at high risk for an abnormal Pap test result. Two of the studies included only women referred because of a previously abnormal result. Three studies included colposcopy clinics among their recruiting sites. One study was done in Costa Rica because of the known high prevalence of abnormal Pap test results in that country. One study specified that a consecutive series of patients was used, and one used a population-based random sample. The other 3 studies gave no details of how patients were selected for inclusion in the trial. All studies used prospectively collected data, and the methods used to actually perform the Pap and FBC tests were universally well described. Two of the studies used methods to minimize the effects of interobserver reliability in the evaluation of the test results, but none addressed this issue as it related to the assessment of the reference standard.
Sensitivity and Specificity
The ROC curves for FBC and Pap using all 5 of the above studies are displayed in Figure 1. The AuROC curves were similar (Pap=0.93; FBC= 0.91). This difference was not significant (P=.37), and the confidence interval (CI) was wide (95% CI, -0.33 to 0.80). FBC demonstrated higher sensitivity, 90% (95% CI, 0.77-0.96) versus 79% (95% CI, 0.59-0.91) for Pap. FBC had a lower specificity, 85% (95% CI, 0.74-0.92) versus 89% (95% CI, 0.75-0.96) for Pap.
Specimen Adequacy
FBC specimens were more likely to be reported as satisfactory (RD=0.06; 95% CI, 0.03-0.09; Figure 2. There was no significant difference in the number of unsatisfactory test results. There was a 6% higher rate of absence of endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but a 10% decrease in reports of SBLB-o (RD = -0.10; 95% CI, -0.13 to -0.06) with FBC. The increase in absence of endocervical cells for FBC specimens was seen in all split-sample studies (RD= 0.08; 95% CI, 0.06-0.11) but not in the cohort studies (RD = -0.01; 95% CI -0.07 to 0.05).
Discussion
The use of FBC increases both true-positive and false-positive results when compared with Pap. These conclusions must be considered tentative because of the lack of statistical significance and the significant methodologic problems found in the studies used. Two recent meta-analyses have addressed the accuracy of FBC tests.3,60 Neither addressed the issue of specimen adequacy. Although neither included all 5 studies that we identified for our analysis of test accuracy, our findings of a possible increase in sensitivity with a decrease in specificity are consistent with the findings of the other 2 meta-analyses.
The possibility that FBC may increase sensitivity but decrease specificity should lead to caution in the adoption of this technology, especially for women at low risk for cervical cancer. For women with no history of abnormal findings on previous Pap tests (estimated prevalence of cervical cancer = 0.05%) more than 1800 FBC would be needed to detect one additional true positive. More than 50 additional false-positive test results (“cancer scares”) would accompany each additional true positive. The costs of follow-up investigations for these additional false-positive tests must be added to the additional cost of the test itself in assessing the potential impact of a widespread switch to this new technology. Many women are at risk because they fail to obtain regular Pap tests, often because of lack of insurance and cost barriers. Unfortunately, the higher cost of FBC may make these women even less likely to have screening performed. Thus widespread use of FBC could, paradoxically, lead to an increase rather than a decrease in cervical cancer deaths by decreasing the use of this important test by lower income women.
The trade-off between sensitivity and specificity is more favorable for women with higher disease rates. In a population with a 3% prevalence of cervical disease, only 300 FBC Pap tests would be required to detect one additional abnormality, and only 16 additional false-positive tests would result. Thus, for women who have had prior abnormal Pap test results or are known to be infrequent attenders, there may be a role for FBC.
Pap test results reported as less than satisfactory can present a significant problem with increased office return visits, increased psychologic trauma to the patient, and increased costs of repeated tests. Thus the decrease in “SBLB other” reports is a benefit of FBC, although it is partially offset by an increased absence of endocervical cells. Some have suggested that the absence of endocervical cells is an artifact of the study methods for split-sample studies (where the collection instrument is first wiped across a slide for the Pap and then inserted into the FBC vial). We do not understand why this process would preferentially extract endocervical cells. We did note that the cohort studies showed no increase in SBLB absence for FBC. However, in these studies, the Pap and FBC specimens were collected from different women and often involved different time periods and different physicians using different collection instruments. Both the degree of heterogeneity and the reported differences on all comparisons between Pap and FBC Pap tests were consistently larger in the cohort than in the split-sample studies.
FBC offers the advantage of doing HPV testing on the same Pap specimen in triaging the patient diagnosed with ASCUS. As shown in the ASCUS/LSIL Triage Study,61 patients with ASCUS Pap test results and negative human papillomavirus studies can be followed with annual Pap tests without colposcopy or more frequent screening.
Limitations
The greatest limitation of the studies was the lack of comparision of Pap test results and colposcopy with biopsy of any suspicious areas. We therefore also included studies that used an alternative reference standard for women with negative Pap results. However, the use of different reference standards for patients with positive and negative test results has been shown to result in overestimates of test sensitivity.8 The fact that none of the studies addressed the issue of blinding in interpretation of the reference standard is also of concern. Perhaps the most important limitation involves the fact that the patients included in these studies came from high-risk populations. It is not clear how well these results will generalize to the many women at low risk who receive Pap tests in the offices of American primary care physicians. An additional limitation of our study includes the potential for publication bias, since we included only those articles written in English.
Conclusions
Before widespread adoption of this new FBC technology, it would be advisable to have additional acceptable reference standards studies, designed to avoid verification bias and to ensure that equivalent specimen collection methods are used. Once reliable estimates of the relative sensitivity and specificity of the FBC Pap test are available from such investigations, a decision can be made about whether the benefits derived from widespread adoption would outweigh any disadvantages or additional costs.
STUDY DESIGN: This was a systematic review of original research reports evaluating both conventional Pap and FBC with respect to specimen adequacy, comparison with a reference standard, or both. Two reviewers independently reviewed the articles to determine inclusion status, with differences resolved by consensus with a third author. Risk differences (RD) between occurrence rates for FBC and Pap were used for the specimen adequacy data.
DATA SOURCES: Studies published between 1985 and November 1999 were identified from MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM, and The Cochrane Library.
OUTCOMES MEASURED: Sensitivity, specificity, area under the receiver operating characteristic curve (AuROC), and the proportion of satisfactory, unsatisfactory, and “satisfactory but limited by” test results were measured.
RESULTS: There was no significant difference in AuROC (p=.37). FBC specimens were more likely to be satisfactory (RD=0.06; 95% confidence interval [CI], 0.03-0.09) or to have absent endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but had 10% fewer “satisfactory but limited by—other” reports (RD = -0.10; 95% CI, -0.14 to -0.06). There was no difference in unsatisfactory Pap test results.
CONCLUSIONS: For most women there is no reason to replace Pap with FBC. For women at high risk of cervical cancer or who are screened infrequently, the possible increase in FBC sensitivity may outweigh the potential harms from additional false positives.
Despite mass Papanicolaou (Pap) test screening, approximately 12,800 women are given the diagnosis of cervical cancer in the United States each year, and approximately 4600 die of the disease.1 Fahey and colleagues2 found an average sensitivity of 58% (range = 11%-99%). Nanda and coworkers3 reported a sensitivity of 30% to 87%, with a specificity of 86% to 100%. Follen Mitchell and colleagues4 reported sensitivities of 67% and specificities of 77%. Multiple factors including sampling technique, patient preparation, test fixation and staining, and interpretation accuracy5 can increase the false-negative rates of conventional Pap.
Fluid-based cytology (FBC) procedures use a fluid medium to capture and preserve the collected cells from the cervical-sampling device. The collected sample is homogenized using an automated device, and a subsample of cellular material is placed on a glass slide in a circumferential area. Because the technique provides a uniform thin layer and excludes obscuring debris, it eliminates problems often encountered with Pap including poor fixation, uneven thickness of the cellular spread, air-drying artifact, and obscuring of cells by blood or inflammatory exudates.6
We performed a systematic review to evaluate the accuracy of FBC (by comparing its sensitivity and specificity with Pap) and the specimen adequacy of this new method (by comparing the proportion of FBC and Pap slides reported as unsatisfactory or “satisfactory but limited” by either absence of endocervical cells or other factors).
Methods
Search Strategy
Our literature search was designed to find studies comparing FBC and Pap. The search was assisted by a medical librarian and used medical subject headings (MeSH) and text words. The search terms included monolayer technology, ThinPrep, CytoRich, Cytoprep, Autoprep, AutoCyte, Papanicolau/pap smear, liquid-based cytology, fluid-based cytology, cervical cancer screening, and vaginal smears. MEDLINE, Best Evidence, EMBASE, Biological Abstracts/RRM and The Cochrane Library were searched to retrieve all potentially relevant English-language articles published between 1985 (first literature published on fluid-based cytology) and November 1999. An attempt was made to contact both FBC manufacturers to find any other available articles and abstracts in publication. The authors of articles with incomplete data were contacted in an attempt to obtain missing information. The reference lists of retrieved articles were manually searched for additional citations.
Study Inclusion
Articles were reviewed for inclusion if they contained reports of original research evaluating both conventional and fluid-based cytology samples (either ThinPrep or AutoCyte). Studies in which both tests were simultaneously applied to the same group of women (split-sample studies) and those in which one group of women who received Pap was compared with a group that received FBC (cohort studies) were included. Two reviewers independently reviewed the titles. When either reviewer felt the study might merit inclusion, the full article was retrieved. The same authors independently reviewed each article to determine whether it met the inclusion criteria. Differences were resolved by consensus with a third author.
We used the entire set of studies to address the question of specimen adequacy. To address FBC accuracy, we identified the subset of articles that compared the 2 tests with an external reference standard. We excluded articles from this analysis unless they reported colposcopy and biopsy results for at least 50% of the women with a finding of high-grade squamous intraepithelial lesions or higher on either Pap or FBC. Because few studies subjected all women with normal Pap tests to colposcopy, we also included studies that provided colposcopy to a random sample of women with normal Pap or FBC tests and those that subjected normal tests to an independent consensus review by a panel of experienced cytology professionals.7
Study Quality Assessment
For the studies addressing accuracy, we developed a criteria form for extracting infusing criteria.8 Two points were assigned and summed to create an overall study quality score (maximum score = 13). Two reviewers independently assessed the quality of each article with differences resolved by consensus Table 1.
Data Extraction
Three reviewers independently extracted data using a structured form. Differences were resolved by consensus. Specimen adequacy was classified as satisfactory, satisfactory but limited by absence of endocervical cells (SBLB-absence), satisfactory but limited by other (SBLB-o), and unsatisfactory. SBLB-o included obscuring inflammatory exudate, blood, thick tests, scant cellularity, and air-drying artifact.
Data Synthesis and Analysis
Summary estimates of sensitivity and specificity were made from studies that used an appropriate reference standard using a DerSimonian and Laird random effects model. Sensitivity and specificity were pooled independently and weighted by the inverse of the variance using MetaTest software (version 0.6, Joseph Lau, MD, with permission). The MetaTest program was also used to calculate the area under receiver operating characteristic (AuROC) curves, and the difference between AuROC was calculated using ROCKET 0.9B software for ROC analysis (Charles E. Metz, Department of Radiology University of Chicago, March 1998). The AuROC is a measure of overall diagnostic accuracy, where 1.0 is a perfect test, and 0.5 is a test that is no better than chance at distinguishing normal from abnormal specimens. For analysis of specimen adequacy, we used the RevMan 4.1 software (Cochrane Collaboration, Update Software, Oxford, England) to calculate rate differences. Study homogeneity analyses were performed, and our analysis plan called for the use of a random effects model if significant heterogeneity (P >.05) was found.
Results
Search Strategy and Study Inclusion
We identified 62 articles for critical appraisal. Because some authors published more than one article from a single study9-23 and one author combined 2 studies into one article,24 the 62 articles represented 47 actual studies. Fifty-two articles met the initial inclusion criteria.6,9-59 Ten articles were excluded, because they did not contain a reference standard or specimen adequacy data, or they restricted their reports to only a subset of Pap results, such as atypical glandular cells of uncertain significance (AGUS) or atypical squamous cells of uncertain significance (ASCUS).24,39,61-68
Study Characteristics and Qualitative Synthesis
Most articles provided no systematic comparison with any reference standard and could therefore only be used to evaluate specimen adequacy. In some cases, histologic results were reported for some patients and compared with Pap and FBC reports; however, these appeared to be haphazard samples of patients with a positive result on one or both tests. Most articles compared the results obtained from FBC and Pap with the assumption that the better test was the one with the higher proportion of positives, ignoring the possibility of false-positive tests.
Five studies included a comparison with a reference standard.18,27,35,40,50 In all 5, both tests were performed at the same time in all patients. After the cervix was scraped in the usual fashion, the sampling device was wiped across a slide for the conventional Pap and then rinsed in a vial containing the appropriate solution for the FBC method.
Three studies systematically compared FBC and Pap results with colposcopy and biopsy. One12,35 involved women referred to a colposcopy clinic because of a previous abnormal test result; Pap and FBC Pap test results were obtained, and colposcopic examinations were done on all women referred. A second16,23,55 studied 782 patients referred for colposcopy after an abnormal Pap test result; colposcopy was performed, and biopsies were taken from 445 of these patients. In the third study40 A total of 8636 randomly selected Costa Rican women were each screened with Pap, FBC, and cervicography. All women with a suspicious physical examination for cancer or with any abnormality on any of the 3 tests were referred for colposcopy, along with a random sample of 150 women with no abnormalities.
The remaining 2 studies27,50 used consensus between independent reviewers of the Pap and FBC test results as the reference standard, with biopsy for at least 50% of the women with significant abnormalities on either or both tests. In the first study50 a total of 2778 split samples were obtained and evaluated in both Germany and the United States. In Germany, masked slides were reviewed by cytotechnologists, and pathologists reviewed all abnormal and discrepant slides. Masked review was repeated in the United States, and senior cytotechnologists and cytopathologists rescreened abnormal slides. The cases were then unmasked, and discrepant cases were reviewed. A subset of histologic data (1235 samples) was analyzed, and a final reference diagnosis was made. In the second study27 2009 sample pairs from a multicenter trial were evaluated blindly by 2 cytotechnologists, with all abnormal and 10% of normal slides reviewed by 1 of 6 pathologists. All sample pairs containing an abnormal result were then sent for a second masked opinion by cytotechnologists and pathologists. Consensus data were summarized and reported.
Study Quality Assessment
The total quality assessment scores of the 5 studies ranged from 7 to 10 out of a possible maximum score of 13. In general, the studies offered little or no description of the clinical characteristics of the women screened. Most did not even mention the women’s ages. All of the studies included at least some women who were at high risk for an abnormal Pap test result. Two of the studies included only women referred because of a previously abnormal result. Three studies included colposcopy clinics among their recruiting sites. One study was done in Costa Rica because of the known high prevalence of abnormal Pap test results in that country. One study specified that a consecutive series of patients was used, and one used a population-based random sample. The other 3 studies gave no details of how patients were selected for inclusion in the trial. All studies used prospectively collected data, and the methods used to actually perform the Pap and FBC tests were universally well described. Two of the studies used methods to minimize the effects of interobserver reliability in the evaluation of the test results, but none addressed this issue as it related to the assessment of the reference standard.
Sensitivity and Specificity
The ROC curves for FBC and Pap using all 5 of the above studies are displayed in Figure 1. The AuROC curves were similar (Pap=0.93; FBC= 0.91). This difference was not significant (P=.37), and the confidence interval (CI) was wide (95% CI, -0.33 to 0.80). FBC demonstrated higher sensitivity, 90% (95% CI, 0.77-0.96) versus 79% (95% CI, 0.59-0.91) for Pap. FBC had a lower specificity, 85% (95% CI, 0.74-0.92) versus 89% (95% CI, 0.75-0.96) for Pap.
Specimen Adequacy
FBC specimens were more likely to be reported as satisfactory (RD=0.06; 95% CI, 0.03-0.09; Figure 2. There was no significant difference in the number of unsatisfactory test results. There was a 6% higher rate of absence of endocervical cells (RD=0.06; 95% CI, 0.02-0.10) but a 10% decrease in reports of SBLB-o (RD = -0.10; 95% CI, -0.13 to -0.06) with FBC. The increase in absence of endocervical cells for FBC specimens was seen in all split-sample studies (RD= 0.08; 95% CI, 0.06-0.11) but not in the cohort studies (RD = -0.01; 95% CI -0.07 to 0.05).
Discussion
The use of FBC increases both true-positive and false-positive results when compared with Pap. These conclusions must be considered tentative because of the lack of statistical significance and the significant methodologic problems found in the studies used. Two recent meta-analyses have addressed the accuracy of FBC tests.3,60 Neither addressed the issue of specimen adequacy. Although neither included all 5 studies that we identified for our analysis of test accuracy, our findings of a possible increase in sensitivity with a decrease in specificity are consistent with the findings of the other 2 meta-analyses.
The possibility that FBC may increase sensitivity but decrease specificity should lead to caution in the adoption of this technology, especially for women at low risk for cervical cancer. For women with no history of abnormal findings on previous Pap tests (estimated prevalence of cervical cancer = 0.05%) more than 1800 FBC would be needed to detect one additional true positive. More than 50 additional false-positive test results (“cancer scares”) would accompany each additional true positive. The costs of follow-up investigations for these additional false-positive tests must be added to the additional cost of the test itself in assessing the potential impact of a widespread switch to this new technology. Many women are at risk because they fail to obtain regular Pap tests, often because of lack of insurance and cost barriers. Unfortunately, the higher cost of FBC may make these women even less likely to have screening performed. Thus widespread use of FBC could, paradoxically, lead to an increase rather than a decrease in cervical cancer deaths by decreasing the use of this important test by lower income women.
The trade-off between sensitivity and specificity is more favorable for women with higher disease rates. In a population with a 3% prevalence of cervical disease, only 300 FBC Pap tests would be required to detect one additional abnormality, and only 16 additional false-positive tests would result. Thus, for women who have had prior abnormal Pap test results or are known to be infrequent attenders, there may be a role for FBC.
Pap test results reported as less than satisfactory can present a significant problem with increased office return visits, increased psychologic trauma to the patient, and increased costs of repeated tests. Thus the decrease in “SBLB other” reports is a benefit of FBC, although it is partially offset by an increased absence of endocervical cells. Some have suggested that the absence of endocervical cells is an artifact of the study methods for split-sample studies (where the collection instrument is first wiped across a slide for the Pap and then inserted into the FBC vial). We do not understand why this process would preferentially extract endocervical cells. We did note that the cohort studies showed no increase in SBLB absence for FBC. However, in these studies, the Pap and FBC specimens were collected from different women and often involved different time periods and different physicians using different collection instruments. Both the degree of heterogeneity and the reported differences on all comparisons between Pap and FBC Pap tests were consistently larger in the cohort than in the split-sample studies.
FBC offers the advantage of doing HPV testing on the same Pap specimen in triaging the patient diagnosed with ASCUS. As shown in the ASCUS/LSIL Triage Study,61 patients with ASCUS Pap test results and negative human papillomavirus studies can be followed with annual Pap tests without colposcopy or more frequent screening.
Limitations
The greatest limitation of the studies was the lack of comparision of Pap test results and colposcopy with biopsy of any suspicious areas. We therefore also included studies that used an alternative reference standard for women with negative Pap results. However, the use of different reference standards for patients with positive and negative test results has been shown to result in overestimates of test sensitivity.8 The fact that none of the studies addressed the issue of blinding in interpretation of the reference standard is also of concern. Perhaps the most important limitation involves the fact that the patients included in these studies came from high-risk populations. It is not clear how well these results will generalize to the many women at low risk who receive Pap tests in the offices of American primary care physicians. An additional limitation of our study includes the potential for publication bias, since we included only those articles written in English.
Conclusions
Before widespread adoption of this new FBC technology, it would be advisable to have additional acceptable reference standards studies, designed to avoid verification bias and to ensure that equivalent specimen collection methods are used. Once reliable estimates of the relative sensitivity and specificity of the FBC Pap test are available from such investigations, a decision can be made about whether the benefits derived from widespread adoption would outweigh any disadvantages or additional costs.
1. Greenlee R, Taylor M, Bolden S, Wingo P. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. Fahey M, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. Am J Epidemiol 1995;141:680-89.
3. Nanda K, McCrory DC, Myers ER, et al. Accuracy of the Papanicolau test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann Intern Med 2000;132:810-19.
4. Follen Mitchell M, Cantor SB, Brookner C, Utzinger U, Schottenfeld D, Richards-Kortum R. Screening for squamous intraepithelial lesions with fluorescence spectroscopy. OB GYN 1999;94:889-96.
5. Gay JD, Donaldson LD, Goellner JR. False negative results in cervical cytologic studies. Acta Cytologica 1985;29:1043-46.
6. Dupree WB, Suprun HZ, Beckwith DG, Shane JJ, Lucente V. The promise and risk of a new technology: the Lehigh Valley Hospital’s experience with liquid-based cervical cytology. Cancer (Cancer Cytopathology) 1998;84:202-07.
7. Proposed guidelines for primary screening instruments for gynecologic cytology: Intersociety Working Group for Cytology Technologies Am J Clin Path 1997;109:10-15.
8. Lijmer J, Mol B, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-66.
9. Ashfaq R, Birdsong G, Corkill M, Inhorn S. Improved specimen adequacy with the ThinPrep 2000 System: reductions in satisfactory but limited by…interpretations (Abstract presentation at the 44th scientific meeting). Acta Cytologica 1996;40:1046-47.
10. Bishop JW, Cheuvront DA, Elston RJ. Utility of residual AutoCyte cervical cytology samples of image analysis. Acta Cytologica 1999;43:39-46.
11. Corkill M, Knapp D, Martin J, Hutchinson M. Speciman adequacy of ThinPrep sample preparations in a direct-to-vial study. Acta Cytologica 1997;41:39-44.
12. Ferenczy A, Robitaille J, Franco E, et al. Conventional cervical cytologic smears vs. ThinPrep smears: a paired comparison study on cervical cytology. Acta Cytologica 1996;40:1136-42.
13. Howell P, Belk T, Agdigos R, Davis R, Lowe J. AutoCyte interactive screening system: experience at a university hospital cytology laboratory. Acta Cytologica 1999;43:58-64.
14. Inhorn SL, Wilbur D, Zahniser D, Linder J. Validation of the ThinPrep Papanicolaou test for cervical cancer diagnosis. J Lower Genital Tract Dis 1998;2:208-12.
15. Inhorn SL, Sherman M. Independent Pathologist review of ThinPrep and conventional Pap smears from multisite clinical trials. Acta Cytologica (Abstract presentation at 44th annual scientific meeting) 1996;40:1044.-
16. Lee KL, Madge R, Sheets EE. Colposcopically directed biopsy as a basis for comparing the diagnostic accuracy of the ThinPrep and Papanicolaou smear methods. Acta Cytologica (Abstract presentation 44th annual scientific meeting) 1996;40:1047.-
17. Linder J. Recent advances in thin-layer cytology. Diagnostic Cytopathol 1998;18:24-32.
18. Sheets EE, Constantine NM, Dinisco S, Dean B, Cibas ES. Colposcopically directed biopsies provide a basis for comparing the accuracy of ThinPrep and Papanicolaou smears. J Gynecologic Techniques 1995;1:27-34.
19. Sherman ME, Schiffman MH, Lorincz AT, et al. Cervical specimens collected in liquid buffer are suitable for both cytologic screening and ancillary human papillomavirus testing. Cancer 1997;81:89-97.
20. Sherman ME, Mendoza M, Lee KR, et al. Performance of liquid-based, thin-layer cervical cytology: correlation with reference diagnoses and human papillomavirus testing. Mod Pathol 1998;11:837-43.
21. Sherman ME, Schiffman M, Herrero R, et al. Evaluation of conventional and novel cervical cancer screening methods in a population-based study of 10,000 Costa Rican women. ACTA Cytological Abstract Presentation 43rd Annual Scientific Meeting 1995;39:983.-
22. Vassilakos P, Griffin S, Megevand E, Campana A. CytoRich liquid-based cervical cytologic test: screening results in a routine cytopathology service. Acta Cytologica 1998;42:198-202.
23. Zahniser DJ, Sullivan PJ. CYTYC corporation. Acta Cytologica 1996;40:37-44.
24. Vassilakos P, Saurel J, Rondez R. Direct-to-vial use of the AutoCyte PREP liquid-based preparation for cervical-vaginal specimens in three European laboratories. Acta Cytologica 1999;43:65-68.
25. Aponte-Cipriani SL, Teplitz C, Rorat E, Scaino A, Jacobs AJ. Cervical smears prepared by an automated device versus the conventional method: a comparative analysis. Acta Cytologica 1995;39:623-30.
26. Awen C, Hathway S, Eddy W, Voskuil R, Janes C. Efficacy of ThinPrep preparation of cervical smears: a 1,000-case, investigator-sponsored study. Diagn Cytopathol 1993;11:33-36.
27. Bishop JW. Comparison of the CytoRich system with conventional cervical cytology: preliminary data on 2,032 cases from a clinical trial site. Acta Cytologica 1997;41:15-23.
28. Bishop JW, Bigner SH, Colgan TJ, et al. Multicenter masked evaluation of AutoCyte PREP thin layers with matched conventional smears: including initial biopsy results. Acta Cytologica 1998;42:189-97.
29. Bolick DR, Hellman DJ. Laboratory implementation and efficacy assessment of the Thin Prep cervical cancer screening system. Acta Cytologica 1998;42:209-13.1999;87:105-12.
30. Bur M, Knowles K, Pekow P, Corral O, Donovan J. Comparison of ThinPrep preparations with conventional cervicovaginal smears. Acta Cytologica 1995;39:631-42.
31. Candel A, Davis B, Baklios R, Selvaggi S. The ThinPrep Pap test: a cost savings perspective. Lab Invest 1998;78:36A.-
32. Carpenter AB, Davey DD. Thin Prep Pap test: performance and biopsy follow-up in a university hospital. Cancer 1999;87:105-12.
33. Diaz-Rosario LA, Kabawa SE. Performance of a fluid-based, Thin-Layer Papanicolaou smear method in the clinical setting of an independent laboratory and an outpatient screening population in New England. Arch Pathol Lab Med 1999;123:817-21.
34. Emery J, Banks H, Holz J, DePriest P, Davey DD. The ThinPrep method for cervical-vaginal specimens in a high risk population. Acta Cytologica (Abstract presentation 45th annual scientific meeting) 1997;41-1579.
35. Ferenczy A, Franco E, Arseneau J, Wright TC, Richart RM. Diagnostic performance of hybrid capture human papillomavirus deoxyribonucleic acid assay combined with liquid based cytologic study. Am J Obstet Gynecol 1996;175:651-56.
36. Geyer JW, Hancock F, Carrico C, Kirkpatrick M. Preliminary evaluation of Cyto-Rich: an improved automated cytology preparation. Diagn Cytopathol 1993;9:417-22.
37. Guidos BJ, Selvaggi SM. Use of the ThinPrep Pap test in clinical practice. Diagn Cytopathol 1999;20:70-73.
38. Howell LP, Davis RL, Belk TI, Agdigos R, Lowe J. The AutoCyte preparation system for gynecologic cytology. Acta Cytologica 1998;42:171-77.
39. Hutchinson ML, Agarwal P, Denault T, Berger B, Cibas ES. A new look at cervical cytology: ThinPrep multicenter trial results. Acta Cytologica 1992;36:499-504.
40. Hutchinson ML, Zahniser DJ, Sherman ME, et al. Utility of liquid-based cytology for cervical carcinoma: screening. Cancer Cytopathol 1999;87:48-55.
41. Johnson JE, Jones HW, Conrad KA, Huff BC. Increased rate of SIL detection with excellent biopsy correlation after implementation of direct-to-vial ThinPrep liquid-based preparation of cervicovaginal specimens at a university medical center. Acta Cytologica (Abstract presentation 46th scientific meeting) 1998;42:1242-43.
42. Laverty CRA, Farnsworth A, Thurloe JK, Grieves A, Bowditch R. Evaluation of the CytoRich slide preparation process. Analyt Quant Cytol Histol 1997;19:239-45.
43. Laverty CRA, Thurloe JK, Redman NL, Farnsworth A. An Australian trial of ThinPrep: a new cytopreparatory technique. Cytopathology 1995;6:140-48.
44. Lee KR, Ashfaqu R, Birdsong GG, Korkill ME, McIntosh KM, Inhorn SL. Comparison of conventional Papanicolaou smears and a fluid-based, thin-layer system for cervical cancer Screening. Obstet Gynecol 1997;90:278-84.
45. McGoogan E, Reith A. Would monolayers provide more representative samples and improved preparations for cervical screening? Overview and evaluation of systems available. Acta Cytologica 1996;49:107-19.
46. Papillo JL, Zarka MA, St. John TL. Evaluation of the ThinPrep Pap test in clinical practice: a seven-month, 16,314-case experience in Northern Vermont. Acta Cytologica 1998;42:203-08.
47. Quddus MR, Xu B, Sung CJ, Boardman L, Lauchlan SC. Cytohisto correlations support the observation of increased detection of squamous intraepithelial lesions by the ThinPrep process. Acta Cytologica (Abstract presentation 46th annual scientific meeting) 1998;42:1243.-
48. Radio SJ, Burns KR, Munch TM, Quasi VM, Bohl KD, Severson MA. Paired comparison of conventional and ThinPrep cervical cytology in a high risk population. Lab Invest 1998;78:42A.-
49. Shield PW, Nolan GR, Phillips GE, Cummings MC. Improving cervical cytology screening in a remote, high risk population. MJA 1999;170:255-58.
50. Sprenger E, Schwarzmann P, Kirkpatrick M, et al. The false negative rate in cervical cytology: comparison of monolayers to conventional smears. Acta Cytologica 1996;40:81-89.
51. Stevens MW, Nespolon WW, Milne AJ, Rowland R. Evaluation of the CytoRich technique for cervical smears. Diagn Cytopathol 1998;18:236-42.
52. Vassilakos P, Cossali D, Albe X, Alonso L, Hohener R, Puget E. Efficacy of Monolayer preparations for cervical cytology: emphasis on suboptimal specimens. Acta Cytologica 1996;40:496-500.
53. Wang T-Y, Chen H-S, Yang Y-C, Tsou M-C. Comparison of fluid-based, Thin-Layer processing and conventional Papanicolaou methods for uterine cervical cytology. J Formos Med Assoc 1999;98:500-05.
54. Weintraub J. The coming evolution in cervical cytology: a pathologist’s guide for the clinician. En Gynecologie Obstetrique 1997;5:169-75.
55. Wilbur DC, Cibas ES, Merritt S, James LP, Berger BM, Bonfiglio TA. ThinPrep processor: clinical trials demonstrate an increased detection rate of abnormal cervical cytologic specimens. Am J Clin Pathol 1994;101:209-14.
56. Wilbur DC, Facik MC, Rutkowski MA, Mulford DK, Atkison KM. Clinical trials of the CytoRich specimen-preparation device for cervical cytology. Acta Cytologica 1997;41:24-29.
57. Wilbur DC, Dubesher B, Angel C, Atkison KM. Use of Thin-Layer preparations for gynecologic smears with emphasis on the cytomorphology of high-grade intraepithelial lesions and carcinomas. Diagn Cytopathol 1995;14:201-11.
58. Yang M, Zachariah S. Comparison of specimen adequacy between matched ThinPrep preparations and conventional cervicovaginal smears. Acta Cytologica (Abstract presentation 45th scientific meeting) 1997;41:1579.-
59. Hutchinson ML, Cassin CM, Ball HG. The efficacy of an automated preparation device for cervical cytology. Am J Clin Pathol 1991;96:300-05.
60. Roberts J, Gurley AM, Thurloe JK, Bowditch R, Laverty CA. Evaluation of the ThinPrep test as an adjunct to the conventional Pap smear. MJA 1997;167:466-69.
61. McCrory D, Bastian D, et al. Evaluation of cervical cytology: evidence report/technology assessment no. 5 (Prepared by Duke University under contract no. 290-97-0014). Rockville, Md: Agency for Health Care Policy and Research; 1999.
62. Solomon D, Schiffman M, Tarone R. Comparison of three management strategies for patients with atypical squamous cells of undetermined significance: baseline results from a randomized trial. J Natl Cancer Instit 2001;93:293-99.
1. Greenlee R, Taylor M, Bolden S, Wingo P. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. Fahey M, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. Am J Epidemiol 1995;141:680-89.
3. Nanda K, McCrory DC, Myers ER, et al. Accuracy of the Papanicolau test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann Intern Med 2000;132:810-19.
4. Follen Mitchell M, Cantor SB, Brookner C, Utzinger U, Schottenfeld D, Richards-Kortum R. Screening for squamous intraepithelial lesions with fluorescence spectroscopy. OB GYN 1999;94:889-96.
5. Gay JD, Donaldson LD, Goellner JR. False negative results in cervical cytologic studies. Acta Cytologica 1985;29:1043-46.
6. Dupree WB, Suprun HZ, Beckwith DG, Shane JJ, Lucente V. The promise and risk of a new technology: the Lehigh Valley Hospital’s experience with liquid-based cervical cytology. Cancer (Cancer Cytopathology) 1998;84:202-07.
7. Proposed guidelines for primary screening instruments for gynecologic cytology: Intersociety Working Group for Cytology Technologies Am J Clin Path 1997;109:10-15.
8. Lijmer J, Mol B, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-66.
9. Ashfaq R, Birdsong G, Corkill M, Inhorn S. Improved specimen adequacy with the ThinPrep 2000 System: reductions in satisfactory but limited by…interpretations (Abstract presentation at the 44th scientific meeting). Acta Cytologica 1996;40:1046-47.
10. Bishop JW, Cheuvront DA, Elston RJ. Utility of residual AutoCyte cervical cytology samples of image analysis. Acta Cytologica 1999;43:39-46.
11. Corkill M, Knapp D, Martin J, Hutchinson M. Speciman adequacy of ThinPrep sample preparations in a direct-to-vial study. Acta Cytologica 1997;41:39-44.
12. Ferenczy A, Robitaille J, Franco E, et al. Conventional cervical cytologic smears vs. ThinPrep smears: a paired comparison study on cervical cytology. Acta Cytologica 1996;40:1136-42.
13. Howell P, Belk T, Agdigos R, Davis R, Lowe J. AutoCyte interactive screening system: experience at a university hospital cytology laboratory. Acta Cytologica 1999;43:58-64.
14. Inhorn SL, Wilbur D, Zahniser D, Linder J. Validation of the ThinPrep Papanicolaou test for cervical cancer diagnosis. J Lower Genital Tract Dis 1998;2:208-12.
15. Inhorn SL, Sherman M. Independent Pathologist review of ThinPrep and conventional Pap smears from multisite clinical trials. Acta Cytologica (Abstract presentation at 44th annual scientific meeting) 1996;40:1044.-
16. Lee KL, Madge R, Sheets EE. Colposcopically directed biopsy as a basis for comparing the diagnostic accuracy of the ThinPrep and Papanicolaou smear methods. Acta Cytologica (Abstract presentation 44th annual scientific meeting) 1996;40:1047.-
17. Linder J. Recent advances in thin-layer cytology. Diagnostic Cytopathol 1998;18:24-32.
18. Sheets EE, Constantine NM, Dinisco S, Dean B, Cibas ES. Colposcopically directed biopsies provide a basis for comparing the accuracy of ThinPrep and Papanicolaou smears. J Gynecologic Techniques 1995;1:27-34.
19. Sherman ME, Schiffman MH, Lorincz AT, et al. Cervical specimens collected in liquid buffer are suitable for both cytologic screening and ancillary human papillomavirus testing. Cancer 1997;81:89-97.
20. Sherman ME, Mendoza M, Lee KR, et al. Performance of liquid-based, thin-layer cervical cytology: correlation with reference diagnoses and human papillomavirus testing. Mod Pathol 1998;11:837-43.
21. Sherman ME, Schiffman M, Herrero R, et al. Evaluation of conventional and novel cervical cancer screening methods in a population-based study of 10,000 Costa Rican women. ACTA Cytological Abstract Presentation 43rd Annual Scientific Meeting 1995;39:983.-
22. Vassilakos P, Griffin S, Megevand E, Campana A. CytoRich liquid-based cervical cytologic test: screening results in a routine cytopathology service. Acta Cytologica 1998;42:198-202.
23. Zahniser DJ, Sullivan PJ. CYTYC corporation. Acta Cytologica 1996;40:37-44.
24. Vassilakos P, Saurel J, Rondez R. Direct-to-vial use of the AutoCyte PREP liquid-based preparation for cervical-vaginal specimens in three European laboratories. Acta Cytologica 1999;43:65-68.
25. Aponte-Cipriani SL, Teplitz C, Rorat E, Scaino A, Jacobs AJ. Cervical smears prepared by an automated device versus the conventional method: a comparative analysis. Acta Cytologica 1995;39:623-30.
26. Awen C, Hathway S, Eddy W, Voskuil R, Janes C. Efficacy of ThinPrep preparation of cervical smears: a 1,000-case, investigator-sponsored study. Diagn Cytopathol 1993;11:33-36.
27. Bishop JW. Comparison of the CytoRich system with conventional cervical cytology: preliminary data on 2,032 cases from a clinical trial site. Acta Cytologica 1997;41:15-23.
28. Bishop JW, Bigner SH, Colgan TJ, et al. Multicenter masked evaluation of AutoCyte PREP thin layers with matched conventional smears: including initial biopsy results. Acta Cytologica 1998;42:189-97.
29. Bolick DR, Hellman DJ. Laboratory implementation and efficacy assessment of the Thin Prep cervical cancer screening system. Acta Cytologica 1998;42:209-13.1999;87:105-12.
30. Bur M, Knowles K, Pekow P, Corral O, Donovan J. Comparison of ThinPrep preparations with conventional cervicovaginal smears. Acta Cytologica 1995;39:631-42.
31. Candel A, Davis B, Baklios R, Selvaggi S. The ThinPrep Pap test: a cost savings perspective. Lab Invest 1998;78:36A.-
32. Carpenter AB, Davey DD. Thin Prep Pap test: performance and biopsy follow-up in a university hospital. Cancer 1999;87:105-12.
33. Diaz-Rosario LA, Kabawa SE. Performance of a fluid-based, Thin-Layer Papanicolaou smear method in the clinical setting of an independent laboratory and an outpatient screening population in New England. Arch Pathol Lab Med 1999;123:817-21.
34. Emery J, Banks H, Holz J, DePriest P, Davey DD. The ThinPrep method for cervical-vaginal specimens in a high risk population. Acta Cytologica (Abstract presentation 45th annual scientific meeting) 1997;41-1579.
35. Ferenczy A, Franco E, Arseneau J, Wright TC, Richart RM. Diagnostic performance of hybrid capture human papillomavirus deoxyribonucleic acid assay combined with liquid based cytologic study. Am J Obstet Gynecol 1996;175:651-56.
36. Geyer JW, Hancock F, Carrico C, Kirkpatrick M. Preliminary evaluation of Cyto-Rich: an improved automated cytology preparation. Diagn Cytopathol 1993;9:417-22.
37. Guidos BJ, Selvaggi SM. Use of the ThinPrep Pap test in clinical practice. Diagn Cytopathol 1999;20:70-73.
38. Howell LP, Davis RL, Belk TI, Agdigos R, Lowe J. The AutoCyte preparation system for gynecologic cytology. Acta Cytologica 1998;42:171-77.
39. Hutchinson ML, Agarwal P, Denault T, Berger B, Cibas ES. A new look at cervical cytology: ThinPrep multicenter trial results. Acta Cytologica 1992;36:499-504.
40. Hutchinson ML, Zahniser DJ, Sherman ME, et al. Utility of liquid-based cytology for cervical carcinoma: screening. Cancer Cytopathol 1999;87:48-55.
41. Johnson JE, Jones HW, Conrad KA, Huff BC. Increased rate of SIL detection with excellent biopsy correlation after implementation of direct-to-vial ThinPrep liquid-based preparation of cervicovaginal specimens at a university medical center. Acta Cytologica (Abstract presentation 46th scientific meeting) 1998;42:1242-43.
42. Laverty CRA, Farnsworth A, Thurloe JK, Grieves A, Bowditch R. Evaluation of the CytoRich slide preparation process. Analyt Quant Cytol Histol 1997;19:239-45.
43. Laverty CRA, Thurloe JK, Redman NL, Farnsworth A. An Australian trial of ThinPrep: a new cytopreparatory technique. Cytopathology 1995;6:140-48.
44. Lee KR, Ashfaqu R, Birdsong GG, Korkill ME, McIntosh KM, Inhorn SL. Comparison of conventional Papanicolaou smears and a fluid-based, thin-layer system for cervical cancer Screening. Obstet Gynecol 1997;90:278-84.
45. McGoogan E, Reith A. Would monolayers provide more representative samples and improved preparations for cervical screening? Overview and evaluation of systems available. Acta Cytologica 1996;49:107-19.
46. Papillo JL, Zarka MA, St. John TL. Evaluation of the ThinPrep Pap test in clinical practice: a seven-month, 16,314-case experience in Northern Vermont. Acta Cytologica 1998;42:203-08.
47. Quddus MR, Xu B, Sung CJ, Boardman L, Lauchlan SC. Cytohisto correlations support the observation of increased detection of squamous intraepithelial lesions by the ThinPrep process. Acta Cytologica (Abstract presentation 46th annual scientific meeting) 1998;42:1243.-
48. Radio SJ, Burns KR, Munch TM, Quasi VM, Bohl KD, Severson MA. Paired comparison of conventional and ThinPrep cervical cytology in a high risk population. Lab Invest 1998;78:42A.-
49. Shield PW, Nolan GR, Phillips GE, Cummings MC. Improving cervical cytology screening in a remote, high risk population. MJA 1999;170:255-58.
50. Sprenger E, Schwarzmann P, Kirkpatrick M, et al. The false negative rate in cervical cytology: comparison of monolayers to conventional smears. Acta Cytologica 1996;40:81-89.
51. Stevens MW, Nespolon WW, Milne AJ, Rowland R. Evaluation of the CytoRich technique for cervical smears. Diagn Cytopathol 1998;18:236-42.
52. Vassilakos P, Cossali D, Albe X, Alonso L, Hohener R, Puget E. Efficacy of Monolayer preparations for cervical cytology: emphasis on suboptimal specimens. Acta Cytologica 1996;40:496-500.
53. Wang T-Y, Chen H-S, Yang Y-C, Tsou M-C. Comparison of fluid-based, Thin-Layer processing and conventional Papanicolaou methods for uterine cervical cytology. J Formos Med Assoc 1999;98:500-05.
54. Weintraub J. The coming evolution in cervical cytology: a pathologist’s guide for the clinician. En Gynecologie Obstetrique 1997;5:169-75.
55. Wilbur DC, Cibas ES, Merritt S, James LP, Berger BM, Bonfiglio TA. ThinPrep processor: clinical trials demonstrate an increased detection rate of abnormal cervical cytologic specimens. Am J Clin Pathol 1994;101:209-14.
56. Wilbur DC, Facik MC, Rutkowski MA, Mulford DK, Atkison KM. Clinical trials of the CytoRich specimen-preparation device for cervical cytology. Acta Cytologica 1997;41:24-29.
57. Wilbur DC, Dubesher B, Angel C, Atkison KM. Use of Thin-Layer preparations for gynecologic smears with emphasis on the cytomorphology of high-grade intraepithelial lesions and carcinomas. Diagn Cytopathol 1995;14:201-11.
58. Yang M, Zachariah S. Comparison of specimen adequacy between matched ThinPrep preparations and conventional cervicovaginal smears. Acta Cytologica (Abstract presentation 45th scientific meeting) 1997;41:1579.-
59. Hutchinson ML, Cassin CM, Ball HG. The efficacy of an automated preparation device for cervical cytology. Am J Clin Pathol 1991;96:300-05.
60. Roberts J, Gurley AM, Thurloe JK, Bowditch R, Laverty CA. Evaluation of the ThinPrep test as an adjunct to the conventional Pap smear. MJA 1997;167:466-69.
61. McCrory D, Bastian D, et al. Evaluation of cervical cytology: evidence report/technology assessment no. 5 (Prepared by Duke University under contract no. 290-97-0014). Rockville, Md: Agency for Health Care Policy and Research; 1999.
62. Solomon D, Schiffman M, Tarone R. Comparison of three management strategies for patients with atypical squamous cells of undetermined significance: baseline results from a randomized trial. J Natl Cancer Instit 2001;93:293-99.
Unlocking Specialists’ Attitudes Toward Primary Care Gatekeepers
STUDY DESIGN: We performed a cross-sectional survey using a mailed questionnaire. The predictors of specialist attitudes toward gatekeepers were measured using chi-square, the t test, and regression analyses.
POPULATION: A probability sample of 1492 physicians in urban counties in California in the specialties of cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, and orthopedics was used.
OUTCOMES: We assessed specialists’ attitudes toward primary care physicians in the gatekeeper role. A summary score of attitudes was developed.
RESULTS: A total of 979 physicians completed the survey (66%). Attitudes toward primary care physicians were mixed. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers (P=.13), as did physicians with a greater percentage of income derived from capitation (P=.002).
CONCLUSIONS: Specialists’ attitudes toward the coordinating role of primary care physicians are influenced by the setting in which the specialists work and by financial interests that may be threatened by referral restrictions. Policies that promote alternatives to fee for service may generate a common sense of purpose among primary care physicians and specialists.
A well-functioning health system requires effective cooperation between primary care and specialist physicians. Tensions between these types of physicians seem to be increasing because of managed care plans, many of which rely on primary care physician gatekeepers to authorize visits to specialists, interrupting the direct access to specialists that many insured Americans expect. Researchers have investigated how gatekeeper policies are affecting primary care physicians and patients.1-11 However, little research has explored the attitudes of specialist physicians toward the changing role of their primary care counterparts.12,13 Some specialists appear to be troubled by gatekeeper policies, viewing primary care physicians as their competitors rather than their colleagues.14-17
Professional organizations representing specialists have advocated for “direct access” legislation that would require health plans to permit patients to visit a specialist without first contacting a primary care physician. These groups have promoted this type of legislation as something that is important for ensuring quality of care. However, more than the patient’s welfare may be at stake in this policy debate. Gatekeeper policies that potentially reduce use of specialist services may be reducing specialist income, particularly when those physicians are paid on a fee-for-service basis.
We surveyed specialist physicians in California to investigate their attitudes toward primary care physicians acting in a gatekeeper role. We explored whether specialist attitudes differed depending on the setting in which the physician practiced and how the physician was paid. We hypothesized that those specialists compensated mainly on a fee-for-service basis would be more financially threatened by gatekeeper policies and would therefore have less favorable attitudes toward primary care physicians in this role. We also hypothesized that specialists working in larger group practice settings would have more collegial relationships with primary care associates that would promote more favorable attitudes.
Methods
In 1998 we mailed self-administered questionnaires to specialist physicians practicing in the 13 largest urban counties in California (Alameda, Contra Costa, Fresno, Los Angeles, Orange, Riverside, San Bernardino, San Diego, Sacramento, San Francisco, San Mateo, Santa Clara, and Solano). The study counties contained 79% of California’s practicing specialist physicians and 79% of the state’s population. The physicians were identified from the American Medical Association (AMA) physician masterfile. The masterfile contains continuously updated information on all US allopathic physicians and many osteopathic physicians, including those who are not AMA members. To be eligible for the survey, physicians had to be listed as providing direct patient care, not in training, and not employed by the federal government.
Specialists were sampled who listed their primary specialty as cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, or orthopedics. These specialties were chosen to provide a broad spectrum of both surgical and medical office-based subspecialties, and to represent most of the largest non–primary care office-based specialties in California. Physicians were selected using a probability sample stratified by specialty (250 physicians in each specialty) and physician race/ethnicity (nonwhite physicians were oversampled). To develop a valid set of questions, we first pilot-tested our questionnaire on a group of 10 specialty physicians. The questionnaire included items on physician demographics, practice setting, number of physicians in the practice, and modes of payment. For analyzing payment modes, physicians were first categorized as salaried or nonsalaried; those that were nonsalaried were asked to indicate the percentage of their practice income derived from fee-for-service and capitated payment.
The questionnaire included a series of items about specialists’ attitudes toward primary care physicians in the gatekeeper role. The specialists were asked to respond to each of the following statements with “strongly agree,” “agree,” disagree,” or “strongly disagree”: “The involvement of a primary care gatekeeper in the care of the patients I see: (1) undermines my relationships with patients; (2) makes it more difficult to order expensive tests or procedures; (3) decreases freedom to make clinical decisions; (4) increases the likelihood that patients will receive preventive care; and (5) improves the coordination of patient care.” For simple descriptive analysis of the individual gatekeeper items, responses were collapsed into dichotomous categories of “agree” or “disagree.”
In addition to analyzing individual attitude items, we created a summary Attitude Toward Gatekeepers scale. To create this scale, individual attitude items worded in a negative direction (eg, the gatekeeper undermines my relationship with patients) were scored so that a score of 4 indicated maximal disagreement. Items worded in a positive direction (eg, the gatekeeper increases the likelihood that patients will receive preventive care) were scored with 4 representing maximal agreement. The summary attitudes toward gatekeeper score was then computed by calculating the mean of the 5 separate gatekeeper items for each physician. This summary scale had a range of 1 to 4, with 2.5 indicating a neutral summary attitude. The Cronbach a for the summary scale was 0.75, indicating acceptable scale properties.
Analysis
Mean scores for the summary Attitude Toward Gatekeeper scale were compared according to physician demographics and practice characteristics using t tests and analysis of variance. For these unadjusted analyses, the physician payment variable was classified into 3 mutually exclusive categories: salaried, capitated for 40% or more of practice income, or fee-for-service payment accounting for 61% or more of practice income. We based the 40% threshold for capitated income on the assumption that this degree of capitation would be sufficient to change the underlying financial incentive experienced by physicians in regard to referral visit volume.
We also performed least squares regression analysis to investigate the independent association of physician and practice variables with the summary Attitude Toward Gatekeeper scale. All physician demographic and practice variables were entered into the regression equation, regardless of their significance on unadjusted analysis. For the regression model, payment mode was categorized in a manner different from that used for the unadjusted analyses. First, a dummy variable was created indicating whether the physician was salaried or nonsalaried. A second variable was included in the model indicating the percentage of practice income attributable to capitated payment. For salaried physicians, the value of this continuous capitation income variable was set at 0. This approach results in an interpretation of the coefficient for the salaried variable indicating the change in gatekeeper attitude scale score for salaried physicians relative to nonsalaried physicians with only fee-for-service payment.
Data were also analyzed after being weighted to account for the oversampling of nonwhite physicians and for the differences in sampling proportions among the different specialties relative to the overall population of physicians in each specialty in the study counties. The results were almost identical when we used the weighted and unweighted data; we therefore present results only from the more simple unweighted analyses.
The University of California, San Francisco, Committee on Human Research reviewed and approved the study protocol.
Results
Of the initial sample of 1750 physicians, 258 were subsequently determined to be ineligible, primarily due to death, retirement, or moving out of the study counties. Completed questionnaires were obtained from 979 of the 1492 eligible specialist physicians (66%). Sixteen of those responding worked in public clinics or other practice settings, such as schools or jails. Given the uniqueness of their practice settings and the small number of physicians there, we excluded these 16 and analyzed the responses of the remaining 963.
The characteristics of the physician respondents are shown in Table 1. Most (73%) were in solo or small office-based group practices of 2 to 10 physicians. Most had fee-for-service payment as their dominant payment method. One fourth were paid on a salaried basis, and 16% had at least 40% of their practice income paid by capitation.
Attitudes toward primary care physicians in the gatekeeper role were mixed Table 2. Almost half (44%) agreed that the gatekeeper undermines a specialist’s relationship with patients. Fifty-six percent agreed that the gatekeeper makes it more difficult to order expensive tests or procedures, and two thirds agreed that the gatekeeper decreases the freedom of the specialist to make clinical decisions. In response to the attitude items positing beneficial effects of a primary care gatekeeper arrangement, 40% agreed that the primary gatekeeper improves coordination of care, and half agreed that the gatekeeper increases the likelihood that the patient will receive preventive care. When all 5 questions were combined into a single summary scale, the general attitude of the specialist physicians toward primary care gatekeepers was essentially neutral, with a mean among all specialists of 2.4 (standard deviation=0.69) on a scale of 1 to 4.
On unadjusted analyses, practice setting and payment method were the strongest predictors of the summary Attitude Toward Gatekeeper score Table 3. Specialists in solo practice exhibited the most negative attitudes. The attitudes of specialists in small (2-10 physicians) and medium-sized (11-50) group practice settings were only slightly more favorable. Attitudes were much more positive among specialists working in large practice settings (>50 physicians) and especially among physicians working in group-model health maintenance organizations (P <.001) for overall difference across practice settings. Method of payment was also significantly associated with specialist attitudes (P <.001) for differences across payment categories. Salaried physicians demonstrated the most favorable attitudes toward gatekeepers and fee-for-service specialists the least favorable attitudes. Those specialists classified as capitated were on average neutral in their views of gatekeepers.
Mean values for the summary gatekeeper attitude score also differed among the different specialty groups. Mean scores ranged from a high of 2.58 among gastroenterologists to a low of 2.15 among ophthalmologists and 2.23 among orthopedists (P <.001). Female specialists and those who were younger also had significantly more favorable mean gatekeeper attitude scores.
In the multivariate regression analysis, practice setting remained strongly predictive of attitudes Table 4. Relative to specialists in solo practice, those in groups of more than 50 physicians had a gatekeeper attitude score nearly half a point more favorable, and specialists in group-model health maintenance organizations (HMOs) had attitude scores nearly a full point more favorable. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers, although the salaried payment variable did not achieve statistical significance (P=.13) in the adjusted analysis. However, the percentage of practice income derived from capitation remained significantly associated with attitudes (P=.002) in the regression analysis. The larger the proportion of income a specialist received from capitation, the more positive the attitudes. Stated inversely, the more a specialist was paid on a fee-for-service basis, the more negative his or her attitudes were toward primary care gatekeepers.
Few other variables included in the regression analysis were statistically significant independent predictors of the gatekeeper score. In the regression model, ophthalmologists remained significantly more negative in their attitudes toward gatekeepers than the other specialists (P=.01) and male physicians remained more negative in their attitudes than women (P=.04); data for these variables not shown). Interestingly, specialist during the previous year was not a significant predictor of attitude. The regression model explained 26% of the variation in the summary gatekeeper score.
Discussion
The role of primary care physicians in the US health care system continues to evolve. Although there is widespread support for many of the core values of primary care, there is also apprehension about policies that insist that primary care physicians authorize access to specialists—particularly when primary care physicians or commercial health plans may financially profit by economizing on specialty services.
Research on patient attitudes toward the gatekeeping role of primary care physicians has shown that while they value the comprehensive and coordinating role of primary care physicians, perceptions of referral barriers are one of the strongest predictors of patients giving their primary care physician low trust, confidence, and satisfaction ratings.9 Similarly, studies have indicated that primary care physicians often have ambivalent attitudes about performing gatekeeping functions such as mandatory authorization of all specialist referrals.3,6,11
Our study extends this previous research and demonstrates that specialist physicians also tend to have ambivalent attitudes about the gatekeeping role of primary care physicians. Many specialists in our survey agreed that primary care gatekeepers infringe on specialists’ clinical autonomy and their relationships with patients. However, half also acknowledged that primary care physician gatekeepers increase delivery of preventive services, and 40% agreed that coordination of care is enhanced by the involvement of a primary care gatekeeper.
Overall, specialist attitudes toward primary care physicians acting as gatekeepers were not uniformly negative. Many specialists appear to appreciate the advantages of having a primary care physician to help integrate services.
Our study indicates that specialists’ attitudes toward primary care gatekeepers differ significantly according to how the specialists are paid and the setting in which they practice. Payment methods such as salary and capitation that eliminate or markedly reduce the direct link between volume of referral visits and specialist income appear to promote a more favorable attitude among specialists toward primary care gatekeepers. This finding suggests that the objection of some specialists to a gatekeeping role for primary care physicians may at least in part be due to concerns about possible loss of income under fee-for-service arrangements. It is possible that specialists paid by salary or capitation perceive that a more prominent coordinating role for primary care physicians may be to their professional benefit by reducing inappropriate referrals that bring no additional income to the practice.
Specialists working in larger and more organized practice settings also have more favorable views of primary care gatekeepers. This association between practice setting and attitudes may be partly explained by the fact that physicians in larger groups and group-model HMOs are more likely to be paid on a salaried basis. However, even after adjusting for payment method in regression analysis, practice setting remained predictive of attitudes toward gatekeepers. It is likely that physicians in larger office-based groups and group-model HMOs work in a multispecialty context that promotes a more collaborative and interdependent approach to practice across specialties. This organizational culture may attenuate conflicts between specialty groups about scope of practice, patient allegiances, and the appropriate role of each specialty within the overall system of care.
Our study has several policy implications. Our results indicate that specialists vary in their attitudes toward the gatekeeping role of primary care physicians and that negative attitudes are not necessarily an immutable characteristic of being a specialist. Attitudes appear to be shaped at least in part by the specialists’ financial interest that may be threatened by restrictions on referrals and by the system in which they practice. Policies that promote alternatives to fee-for-service payment and shift specialists away from solo practice toward larger, organized group practice settings may also encourage them to adopt more positive attitudes about the role of primary care physicians as coordinators of care. Integrated work environments may generate a common sense of purpose, stemming in part from physical proximity to facilitate communication and cooperation.
Limitations
Several limitations of our study are worth noting. Our study was limited to physicians in California. Although California has one of the most competitive managed care markets in the United States and may exemplify trends occurring in other states with active managed care markets, results may not necessarily be generalizable to physicians working in other states. Our main study variable—attitudes toward primary care gatekeepers—is a subjective measure. The wording of our main study question specifically highlighted primary care physicians in a gatekeeper role. The response to this question is therefore not necessarily indicative of the attitudes of specialists toward primary care physicians in general. Interpretation of the word “gatekeeper” was left up to the respondent. Finally, as in all observational studies, causal inferences must be made with caution. We detected strong associations between payment method and practice setting and specialists’ attitudes toward gatekeepers. Although it is plausible that payment incentives and practice environment influence specialist attitudes, it is also possible that specialists who have different underlying values are attracted to different types of practice settings and payment arrangements. For example, salaried group-model HMOs may attract specialists who already have relatively favorable attitudes toward primary care gatekeeping, rather than (or in addition to) that culture promoting a more favorable attitude. Solo practice, in contrast, may attract physicians who are more independent and predisposed to perceive the gatekeeper role as adversarial.
Conclusions
In the US health care system gatekeeping remains controversial. Specialist ambivalence toward gatekeeper models may undermine the legitimacy of a more primary care–focused system. Health systems with strong foundations in primary care appear to produce better patient outcomes than systems that do not promote such primary care elements as continuity and coordination of care.18 Models of care that promote integration and coordination by primary care physicians without emphasizing a restricting role may decrease tensions among physicians. Organizational structures and payment methods that minimize conflict between primary care physicians and specialists will be essential to the further development of an integrated health care system.19 Future health policies will need to consider how to encourage cooperation between primary care physicians and specialists to best meet the needs of the patient.
· Acknowledgments ·
This work was supported by the Bureau of Health Professions, HRSA (Grant 5 U76 MB 10001). The authors thank Dennis Keane, MPH, and Deborah Jaffe for their assistance with survey administration; Art Munger for assistance with manuscript preparation; Norman Hearst, MD, MPH, for his comments on early drafts; and the physicians who participated in the study.
1. Feldman SR, Fleischer AB, Jr, Chen JG. The gatekeeper model is inefficient for the delivery of dermatologic services. J Am Acad Dermatol 1999;40:426-32
2. Donelan K, Blendon RJ, Lundberg GD, et al. The new medical marketplace: physicians’ views. Health Affairs Datawatch (139) 1997;16:139-148.
3. Ellsbury KE, Montano DE, Manders D. Primary care physician attitudes about gatekeeping. Journal of Family Practice 1987;25:616-19.
4. Kulu-Glasgow I, Delnoij D, de Baker D. Self-referral in a gatekeeping system: patients’ reasons for skipping the general-practitioner. Health Policy 1998;45:221-38.
5. St. Peter RF. Access to specialists: Perspectives of patients and primary care physicians. Data Bulletin Fall 1997;2:1-2
6. Taylor TR. Pity the poor gatekeeper: a transatlantic perspective on cost containment in clinical practice. BMJ 1989;299:1323-25.
7. Schultz R, Girard C, Scheckler WE. Physician satisfaction in a managed care environment. J Fam Pract 1992;34:298-304.
8. Grumbach K, Osmond D, Vranzan K, Jaffe D, Bindman AB. Primary care physicians’ experience of financial incentives in managed-care systems. N Engl J Med 1998;339:1516-21.
9. Grumbach K, Selby JV, Damberg C, et al. Resolving the gatekeeper conundrum: what patients value in primary care and referrals to specialists. JAMA 1999;282:261-66.
10. Kerr EA, Hays RD, Lee ML, Siu AL. Does dissatisfaction with access to specialists affect the desire to leave a managed care plan? Med Care Res & Rev 1998;55:59-77.
11. Halm EA, Nancyanne C, Blumenthal D. Is gatekeeping better than traditional care? a survey of physicians’ attitudes. JAMA 1997;28:1677-81.
12. Feldman DS, Novack DH, Gracely E. Effects of managed care on physician-patient relationships, quality of care, and the ethical practice of medicine. Arch Intern Med 1998;158:1626-32.
13. Marshall MN. How well do GPs and hospital consultants work together? A survey of the professional relationship. Fam Pract 1999;16:33-8.
14. Bodenheimer T, Lo B, Casalino L. Primary care physicians should be coordinators, not gatekeepers. JAMA 1999;281:2045-49.
15. De Guzman MM. Are specialists staging a comeback? Health Syst Lead 1997;4:4-13
16. Beard PL. Specialty empowerment: a new trend in managed care. Healthc Financ Manage 1998;52:62-4.
17. Bodenheimer T. The American health care system: physicians and the changing medical marketplace. N Engl J Med 1999;340:584-88.
18. Starfield B. Is primary care essential? Lancet 1994;22:1129-33.
19. Herd B, Herd A, Mathers N. The wizard and the gatekeeper: of castles and contracts. BMJ 1995;310:1042-44.
STUDY DESIGN: We performed a cross-sectional survey using a mailed questionnaire. The predictors of specialist attitudes toward gatekeepers were measured using chi-square, the t test, and regression analyses.
POPULATION: A probability sample of 1492 physicians in urban counties in California in the specialties of cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, and orthopedics was used.
OUTCOMES: We assessed specialists’ attitudes toward primary care physicians in the gatekeeper role. A summary score of attitudes was developed.
RESULTS: A total of 979 physicians completed the survey (66%). Attitudes toward primary care physicians were mixed. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers (P=.13), as did physicians with a greater percentage of income derived from capitation (P=.002).
CONCLUSIONS: Specialists’ attitudes toward the coordinating role of primary care physicians are influenced by the setting in which the specialists work and by financial interests that may be threatened by referral restrictions. Policies that promote alternatives to fee for service may generate a common sense of purpose among primary care physicians and specialists.
A well-functioning health system requires effective cooperation between primary care and specialist physicians. Tensions between these types of physicians seem to be increasing because of managed care plans, many of which rely on primary care physician gatekeepers to authorize visits to specialists, interrupting the direct access to specialists that many insured Americans expect. Researchers have investigated how gatekeeper policies are affecting primary care physicians and patients.1-11 However, little research has explored the attitudes of specialist physicians toward the changing role of their primary care counterparts.12,13 Some specialists appear to be troubled by gatekeeper policies, viewing primary care physicians as their competitors rather than their colleagues.14-17
Professional organizations representing specialists have advocated for “direct access” legislation that would require health plans to permit patients to visit a specialist without first contacting a primary care physician. These groups have promoted this type of legislation as something that is important for ensuring quality of care. However, more than the patient’s welfare may be at stake in this policy debate. Gatekeeper policies that potentially reduce use of specialist services may be reducing specialist income, particularly when those physicians are paid on a fee-for-service basis.
We surveyed specialist physicians in California to investigate their attitudes toward primary care physicians acting in a gatekeeper role. We explored whether specialist attitudes differed depending on the setting in which the physician practiced and how the physician was paid. We hypothesized that those specialists compensated mainly on a fee-for-service basis would be more financially threatened by gatekeeper policies and would therefore have less favorable attitudes toward primary care physicians in this role. We also hypothesized that specialists working in larger group practice settings would have more collegial relationships with primary care associates that would promote more favorable attitudes.
Methods
In 1998 we mailed self-administered questionnaires to specialist physicians practicing in the 13 largest urban counties in California (Alameda, Contra Costa, Fresno, Los Angeles, Orange, Riverside, San Bernardino, San Diego, Sacramento, San Francisco, San Mateo, Santa Clara, and Solano). The study counties contained 79% of California’s practicing specialist physicians and 79% of the state’s population. The physicians were identified from the American Medical Association (AMA) physician masterfile. The masterfile contains continuously updated information on all US allopathic physicians and many osteopathic physicians, including those who are not AMA members. To be eligible for the survey, physicians had to be listed as providing direct patient care, not in training, and not employed by the federal government.
Specialists were sampled who listed their primary specialty as cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, or orthopedics. These specialties were chosen to provide a broad spectrum of both surgical and medical office-based subspecialties, and to represent most of the largest non–primary care office-based specialties in California. Physicians were selected using a probability sample stratified by specialty (250 physicians in each specialty) and physician race/ethnicity (nonwhite physicians were oversampled). To develop a valid set of questions, we first pilot-tested our questionnaire on a group of 10 specialty physicians. The questionnaire included items on physician demographics, practice setting, number of physicians in the practice, and modes of payment. For analyzing payment modes, physicians were first categorized as salaried or nonsalaried; those that were nonsalaried were asked to indicate the percentage of their practice income derived from fee-for-service and capitated payment.
The questionnaire included a series of items about specialists’ attitudes toward primary care physicians in the gatekeeper role. The specialists were asked to respond to each of the following statements with “strongly agree,” “agree,” disagree,” or “strongly disagree”: “The involvement of a primary care gatekeeper in the care of the patients I see: (1) undermines my relationships with patients; (2) makes it more difficult to order expensive tests or procedures; (3) decreases freedom to make clinical decisions; (4) increases the likelihood that patients will receive preventive care; and (5) improves the coordination of patient care.” For simple descriptive analysis of the individual gatekeeper items, responses were collapsed into dichotomous categories of “agree” or “disagree.”
In addition to analyzing individual attitude items, we created a summary Attitude Toward Gatekeepers scale. To create this scale, individual attitude items worded in a negative direction (eg, the gatekeeper undermines my relationship with patients) were scored so that a score of 4 indicated maximal disagreement. Items worded in a positive direction (eg, the gatekeeper increases the likelihood that patients will receive preventive care) were scored with 4 representing maximal agreement. The summary attitudes toward gatekeeper score was then computed by calculating the mean of the 5 separate gatekeeper items for each physician. This summary scale had a range of 1 to 4, with 2.5 indicating a neutral summary attitude. The Cronbach a for the summary scale was 0.75, indicating acceptable scale properties.
Analysis
Mean scores for the summary Attitude Toward Gatekeeper scale were compared according to physician demographics and practice characteristics using t tests and analysis of variance. For these unadjusted analyses, the physician payment variable was classified into 3 mutually exclusive categories: salaried, capitated for 40% or more of practice income, or fee-for-service payment accounting for 61% or more of practice income. We based the 40% threshold for capitated income on the assumption that this degree of capitation would be sufficient to change the underlying financial incentive experienced by physicians in regard to referral visit volume.
We also performed least squares regression analysis to investigate the independent association of physician and practice variables with the summary Attitude Toward Gatekeeper scale. All physician demographic and practice variables were entered into the regression equation, regardless of their significance on unadjusted analysis. For the regression model, payment mode was categorized in a manner different from that used for the unadjusted analyses. First, a dummy variable was created indicating whether the physician was salaried or nonsalaried. A second variable was included in the model indicating the percentage of practice income attributable to capitated payment. For salaried physicians, the value of this continuous capitation income variable was set at 0. This approach results in an interpretation of the coefficient for the salaried variable indicating the change in gatekeeper attitude scale score for salaried physicians relative to nonsalaried physicians with only fee-for-service payment.
Data were also analyzed after being weighted to account for the oversampling of nonwhite physicians and for the differences in sampling proportions among the different specialties relative to the overall population of physicians in each specialty in the study counties. The results were almost identical when we used the weighted and unweighted data; we therefore present results only from the more simple unweighted analyses.
The University of California, San Francisco, Committee on Human Research reviewed and approved the study protocol.
Results
Of the initial sample of 1750 physicians, 258 were subsequently determined to be ineligible, primarily due to death, retirement, or moving out of the study counties. Completed questionnaires were obtained from 979 of the 1492 eligible specialist physicians (66%). Sixteen of those responding worked in public clinics or other practice settings, such as schools or jails. Given the uniqueness of their practice settings and the small number of physicians there, we excluded these 16 and analyzed the responses of the remaining 963.
The characteristics of the physician respondents are shown in Table 1. Most (73%) were in solo or small office-based group practices of 2 to 10 physicians. Most had fee-for-service payment as their dominant payment method. One fourth were paid on a salaried basis, and 16% had at least 40% of their practice income paid by capitation.
Attitudes toward primary care physicians in the gatekeeper role were mixed Table 2. Almost half (44%) agreed that the gatekeeper undermines a specialist’s relationship with patients. Fifty-six percent agreed that the gatekeeper makes it more difficult to order expensive tests or procedures, and two thirds agreed that the gatekeeper decreases the freedom of the specialist to make clinical decisions. In response to the attitude items positing beneficial effects of a primary care gatekeeper arrangement, 40% agreed that the primary gatekeeper improves coordination of care, and half agreed that the gatekeeper increases the likelihood that the patient will receive preventive care. When all 5 questions were combined into a single summary scale, the general attitude of the specialist physicians toward primary care gatekeepers was essentially neutral, with a mean among all specialists of 2.4 (standard deviation=0.69) on a scale of 1 to 4.
On unadjusted analyses, practice setting and payment method were the strongest predictors of the summary Attitude Toward Gatekeeper score Table 3. Specialists in solo practice exhibited the most negative attitudes. The attitudes of specialists in small (2-10 physicians) and medium-sized (11-50) group practice settings were only slightly more favorable. Attitudes were much more positive among specialists working in large practice settings (>50 physicians) and especially among physicians working in group-model health maintenance organizations (P <.001) for overall difference across practice settings. Method of payment was also significantly associated with specialist attitudes (P <.001) for differences across payment categories. Salaried physicians demonstrated the most favorable attitudes toward gatekeepers and fee-for-service specialists the least favorable attitudes. Those specialists classified as capitated were on average neutral in their views of gatekeepers.
Mean values for the summary gatekeeper attitude score also differed among the different specialty groups. Mean scores ranged from a high of 2.58 among gastroenterologists to a low of 2.15 among ophthalmologists and 2.23 among orthopedists (P <.001). Female specialists and those who were younger also had significantly more favorable mean gatekeeper attitude scores.
In the multivariate regression analysis, practice setting remained strongly predictive of attitudes Table 4. Relative to specialists in solo practice, those in groups of more than 50 physicians had a gatekeeper attitude score nearly half a point more favorable, and specialists in group-model health maintenance organizations (HMOs) had attitude scores nearly a full point more favorable. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers, although the salaried payment variable did not achieve statistical significance (P=.13) in the adjusted analysis. However, the percentage of practice income derived from capitation remained significantly associated with attitudes (P=.002) in the regression analysis. The larger the proportion of income a specialist received from capitation, the more positive the attitudes. Stated inversely, the more a specialist was paid on a fee-for-service basis, the more negative his or her attitudes were toward primary care gatekeepers.
Few other variables included in the regression analysis were statistically significant independent predictors of the gatekeeper score. In the regression model, ophthalmologists remained significantly more negative in their attitudes toward gatekeepers than the other specialists (P=.01) and male physicians remained more negative in their attitudes than women (P=.04); data for these variables not shown). Interestingly, specialist during the previous year was not a significant predictor of attitude. The regression model explained 26% of the variation in the summary gatekeeper score.
Discussion
The role of primary care physicians in the US health care system continues to evolve. Although there is widespread support for many of the core values of primary care, there is also apprehension about policies that insist that primary care physicians authorize access to specialists—particularly when primary care physicians or commercial health plans may financially profit by economizing on specialty services.
Research on patient attitudes toward the gatekeeping role of primary care physicians has shown that while they value the comprehensive and coordinating role of primary care physicians, perceptions of referral barriers are one of the strongest predictors of patients giving their primary care physician low trust, confidence, and satisfaction ratings.9 Similarly, studies have indicated that primary care physicians often have ambivalent attitudes about performing gatekeeping functions such as mandatory authorization of all specialist referrals.3,6,11
Our study extends this previous research and demonstrates that specialist physicians also tend to have ambivalent attitudes about the gatekeeping role of primary care physicians. Many specialists in our survey agreed that primary care gatekeepers infringe on specialists’ clinical autonomy and their relationships with patients. However, half also acknowledged that primary care physician gatekeepers increase delivery of preventive services, and 40% agreed that coordination of care is enhanced by the involvement of a primary care gatekeeper.
Overall, specialist attitudes toward primary care physicians acting as gatekeepers were not uniformly negative. Many specialists appear to appreciate the advantages of having a primary care physician to help integrate services.
Our study indicates that specialists’ attitudes toward primary care gatekeepers differ significantly according to how the specialists are paid and the setting in which they practice. Payment methods such as salary and capitation that eliminate or markedly reduce the direct link between volume of referral visits and specialist income appear to promote a more favorable attitude among specialists toward primary care gatekeepers. This finding suggests that the objection of some specialists to a gatekeeping role for primary care physicians may at least in part be due to concerns about possible loss of income under fee-for-service arrangements. It is possible that specialists paid by salary or capitation perceive that a more prominent coordinating role for primary care physicians may be to their professional benefit by reducing inappropriate referrals that bring no additional income to the practice.
Specialists working in larger and more organized practice settings also have more favorable views of primary care gatekeepers. This association between practice setting and attitudes may be partly explained by the fact that physicians in larger groups and group-model HMOs are more likely to be paid on a salaried basis. However, even after adjusting for payment method in regression analysis, practice setting remained predictive of attitudes toward gatekeepers. It is likely that physicians in larger office-based groups and group-model HMOs work in a multispecialty context that promotes a more collaborative and interdependent approach to practice across specialties. This organizational culture may attenuate conflicts between specialty groups about scope of practice, patient allegiances, and the appropriate role of each specialty within the overall system of care.
Our study has several policy implications. Our results indicate that specialists vary in their attitudes toward the gatekeeping role of primary care physicians and that negative attitudes are not necessarily an immutable characteristic of being a specialist. Attitudes appear to be shaped at least in part by the specialists’ financial interest that may be threatened by restrictions on referrals and by the system in which they practice. Policies that promote alternatives to fee-for-service payment and shift specialists away from solo practice toward larger, organized group practice settings may also encourage them to adopt more positive attitudes about the role of primary care physicians as coordinators of care. Integrated work environments may generate a common sense of purpose, stemming in part from physical proximity to facilitate communication and cooperation.
Limitations
Several limitations of our study are worth noting. Our study was limited to physicians in California. Although California has one of the most competitive managed care markets in the United States and may exemplify trends occurring in other states with active managed care markets, results may not necessarily be generalizable to physicians working in other states. Our main study variable—attitudes toward primary care gatekeepers—is a subjective measure. The wording of our main study question specifically highlighted primary care physicians in a gatekeeper role. The response to this question is therefore not necessarily indicative of the attitudes of specialists toward primary care physicians in general. Interpretation of the word “gatekeeper” was left up to the respondent. Finally, as in all observational studies, causal inferences must be made with caution. We detected strong associations between payment method and practice setting and specialists’ attitudes toward gatekeepers. Although it is plausible that payment incentives and practice environment influence specialist attitudes, it is also possible that specialists who have different underlying values are attracted to different types of practice settings and payment arrangements. For example, salaried group-model HMOs may attract specialists who already have relatively favorable attitudes toward primary care gatekeeping, rather than (or in addition to) that culture promoting a more favorable attitude. Solo practice, in contrast, may attract physicians who are more independent and predisposed to perceive the gatekeeper role as adversarial.
Conclusions
In the US health care system gatekeeping remains controversial. Specialist ambivalence toward gatekeeper models may undermine the legitimacy of a more primary care–focused system. Health systems with strong foundations in primary care appear to produce better patient outcomes than systems that do not promote such primary care elements as continuity and coordination of care.18 Models of care that promote integration and coordination by primary care physicians without emphasizing a restricting role may decrease tensions among physicians. Organizational structures and payment methods that minimize conflict between primary care physicians and specialists will be essential to the further development of an integrated health care system.19 Future health policies will need to consider how to encourage cooperation between primary care physicians and specialists to best meet the needs of the patient.
· Acknowledgments ·
This work was supported by the Bureau of Health Professions, HRSA (Grant 5 U76 MB 10001). The authors thank Dennis Keane, MPH, and Deborah Jaffe for their assistance with survey administration; Art Munger for assistance with manuscript preparation; Norman Hearst, MD, MPH, for his comments on early drafts; and the physicians who participated in the study.
STUDY DESIGN: We performed a cross-sectional survey using a mailed questionnaire. The predictors of specialist attitudes toward gatekeepers were measured using chi-square, the t test, and regression analyses.
POPULATION: A probability sample of 1492 physicians in urban counties in California in the specialties of cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, and orthopedics was used.
OUTCOMES: We assessed specialists’ attitudes toward primary care physicians in the gatekeeper role. A summary score of attitudes was developed.
RESULTS: A total of 979 physicians completed the survey (66%). Attitudes toward primary care physicians were mixed. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers (P=.13), as did physicians with a greater percentage of income derived from capitation (P=.002).
CONCLUSIONS: Specialists’ attitudes toward the coordinating role of primary care physicians are influenced by the setting in which the specialists work and by financial interests that may be threatened by referral restrictions. Policies that promote alternatives to fee for service may generate a common sense of purpose among primary care physicians and specialists.
A well-functioning health system requires effective cooperation between primary care and specialist physicians. Tensions between these types of physicians seem to be increasing because of managed care plans, many of which rely on primary care physician gatekeepers to authorize visits to specialists, interrupting the direct access to specialists that many insured Americans expect. Researchers have investigated how gatekeeper policies are affecting primary care physicians and patients.1-11 However, little research has explored the attitudes of specialist physicians toward the changing role of their primary care counterparts.12,13 Some specialists appear to be troubled by gatekeeper policies, viewing primary care physicians as their competitors rather than their colleagues.14-17
Professional organizations representing specialists have advocated for “direct access” legislation that would require health plans to permit patients to visit a specialist without first contacting a primary care physician. These groups have promoted this type of legislation as something that is important for ensuring quality of care. However, more than the patient’s welfare may be at stake in this policy debate. Gatekeeper policies that potentially reduce use of specialist services may be reducing specialist income, particularly when those physicians are paid on a fee-for-service basis.
We surveyed specialist physicians in California to investigate their attitudes toward primary care physicians acting in a gatekeeper role. We explored whether specialist attitudes differed depending on the setting in which the physician practiced and how the physician was paid. We hypothesized that those specialists compensated mainly on a fee-for-service basis would be more financially threatened by gatekeeper policies and would therefore have less favorable attitudes toward primary care physicians in this role. We also hypothesized that specialists working in larger group practice settings would have more collegial relationships with primary care associates that would promote more favorable attitudes.
Methods
In 1998 we mailed self-administered questionnaires to specialist physicians practicing in the 13 largest urban counties in California (Alameda, Contra Costa, Fresno, Los Angeles, Orange, Riverside, San Bernardino, San Diego, Sacramento, San Francisco, San Mateo, Santa Clara, and Solano). The study counties contained 79% of California’s practicing specialist physicians and 79% of the state’s population. The physicians were identified from the American Medical Association (AMA) physician masterfile. The masterfile contains continuously updated information on all US allopathic physicians and many osteopathic physicians, including those who are not AMA members. To be eligible for the survey, physicians had to be listed as providing direct patient care, not in training, and not employed by the federal government.
Specialists were sampled who listed their primary specialty as cardiology, endocrinology, gastroenterology, general surgery, neurology, ophthalmology, or orthopedics. These specialties were chosen to provide a broad spectrum of both surgical and medical office-based subspecialties, and to represent most of the largest non–primary care office-based specialties in California. Physicians were selected using a probability sample stratified by specialty (250 physicians in each specialty) and physician race/ethnicity (nonwhite physicians were oversampled). To develop a valid set of questions, we first pilot-tested our questionnaire on a group of 10 specialty physicians. The questionnaire included items on physician demographics, practice setting, number of physicians in the practice, and modes of payment. For analyzing payment modes, physicians were first categorized as salaried or nonsalaried; those that were nonsalaried were asked to indicate the percentage of their practice income derived from fee-for-service and capitated payment.
The questionnaire included a series of items about specialists’ attitudes toward primary care physicians in the gatekeeper role. The specialists were asked to respond to each of the following statements with “strongly agree,” “agree,” disagree,” or “strongly disagree”: “The involvement of a primary care gatekeeper in the care of the patients I see: (1) undermines my relationships with patients; (2) makes it more difficult to order expensive tests or procedures; (3) decreases freedom to make clinical decisions; (4) increases the likelihood that patients will receive preventive care; and (5) improves the coordination of patient care.” For simple descriptive analysis of the individual gatekeeper items, responses were collapsed into dichotomous categories of “agree” or “disagree.”
In addition to analyzing individual attitude items, we created a summary Attitude Toward Gatekeepers scale. To create this scale, individual attitude items worded in a negative direction (eg, the gatekeeper undermines my relationship with patients) were scored so that a score of 4 indicated maximal disagreement. Items worded in a positive direction (eg, the gatekeeper increases the likelihood that patients will receive preventive care) were scored with 4 representing maximal agreement. The summary attitudes toward gatekeeper score was then computed by calculating the mean of the 5 separate gatekeeper items for each physician. This summary scale had a range of 1 to 4, with 2.5 indicating a neutral summary attitude. The Cronbach a for the summary scale was 0.75, indicating acceptable scale properties.
Analysis
Mean scores for the summary Attitude Toward Gatekeeper scale were compared according to physician demographics and practice characteristics using t tests and analysis of variance. For these unadjusted analyses, the physician payment variable was classified into 3 mutually exclusive categories: salaried, capitated for 40% or more of practice income, or fee-for-service payment accounting for 61% or more of practice income. We based the 40% threshold for capitated income on the assumption that this degree of capitation would be sufficient to change the underlying financial incentive experienced by physicians in regard to referral visit volume.
We also performed least squares regression analysis to investigate the independent association of physician and practice variables with the summary Attitude Toward Gatekeeper scale. All physician demographic and practice variables were entered into the regression equation, regardless of their significance on unadjusted analysis. For the regression model, payment mode was categorized in a manner different from that used for the unadjusted analyses. First, a dummy variable was created indicating whether the physician was salaried or nonsalaried. A second variable was included in the model indicating the percentage of practice income attributable to capitated payment. For salaried physicians, the value of this continuous capitation income variable was set at 0. This approach results in an interpretation of the coefficient for the salaried variable indicating the change in gatekeeper attitude scale score for salaried physicians relative to nonsalaried physicians with only fee-for-service payment.
Data were also analyzed after being weighted to account for the oversampling of nonwhite physicians and for the differences in sampling proportions among the different specialties relative to the overall population of physicians in each specialty in the study counties. The results were almost identical when we used the weighted and unweighted data; we therefore present results only from the more simple unweighted analyses.
The University of California, San Francisco, Committee on Human Research reviewed and approved the study protocol.
Results
Of the initial sample of 1750 physicians, 258 were subsequently determined to be ineligible, primarily due to death, retirement, or moving out of the study counties. Completed questionnaires were obtained from 979 of the 1492 eligible specialist physicians (66%). Sixteen of those responding worked in public clinics or other practice settings, such as schools or jails. Given the uniqueness of their practice settings and the small number of physicians there, we excluded these 16 and analyzed the responses of the remaining 963.
The characteristics of the physician respondents are shown in Table 1. Most (73%) were in solo or small office-based group practices of 2 to 10 physicians. Most had fee-for-service payment as their dominant payment method. One fourth were paid on a salaried basis, and 16% had at least 40% of their practice income paid by capitation.
Attitudes toward primary care physicians in the gatekeeper role were mixed Table 2. Almost half (44%) agreed that the gatekeeper undermines a specialist’s relationship with patients. Fifty-six percent agreed that the gatekeeper makes it more difficult to order expensive tests or procedures, and two thirds agreed that the gatekeeper decreases the freedom of the specialist to make clinical decisions. In response to the attitude items positing beneficial effects of a primary care gatekeeper arrangement, 40% agreed that the primary gatekeeper improves coordination of care, and half agreed that the gatekeeper increases the likelihood that the patient will receive preventive care. When all 5 questions were combined into a single summary scale, the general attitude of the specialist physicians toward primary care gatekeepers was essentially neutral, with a mean among all specialists of 2.4 (standard deviation=0.69) on a scale of 1 to 4.
On unadjusted analyses, practice setting and payment method were the strongest predictors of the summary Attitude Toward Gatekeeper score Table 3. Specialists in solo practice exhibited the most negative attitudes. The attitudes of specialists in small (2-10 physicians) and medium-sized (11-50) group practice settings were only slightly more favorable. Attitudes were much more positive among specialists working in large practice settings (>50 physicians) and especially among physicians working in group-model health maintenance organizations (P <.001) for overall difference across practice settings. Method of payment was also significantly associated with specialist attitudes (P <.001) for differences across payment categories. Salaried physicians demonstrated the most favorable attitudes toward gatekeepers and fee-for-service specialists the least favorable attitudes. Those specialists classified as capitated were on average neutral in their views of gatekeepers.
Mean values for the summary gatekeeper attitude score also differed among the different specialty groups. Mean scores ranged from a high of 2.58 among gastroenterologists to a low of 2.15 among ophthalmologists and 2.23 among orthopedists (P <.001). Female specialists and those who were younger also had significantly more favorable mean gatekeeper attitude scores.
In the multivariate regression analysis, practice setting remained strongly predictive of attitudes Table 4. Relative to specialists in solo practice, those in groups of more than 50 physicians had a gatekeeper attitude score nearly half a point more favorable, and specialists in group-model health maintenance organizations (HMOs) had attitude scores nearly a full point more favorable. Relative to nonsalaried physicians, those who were salaried had a somewhat more favorable attitude toward gatekeepers, although the salaried payment variable did not achieve statistical significance (P=.13) in the adjusted analysis. However, the percentage of practice income derived from capitation remained significantly associated with attitudes (P=.002) in the regression analysis. The larger the proportion of income a specialist received from capitation, the more positive the attitudes. Stated inversely, the more a specialist was paid on a fee-for-service basis, the more negative his or her attitudes were toward primary care gatekeepers.
Few other variables included in the regression analysis were statistically significant independent predictors of the gatekeeper score. In the regression model, ophthalmologists remained significantly more negative in their attitudes toward gatekeepers than the other specialists (P=.01) and male physicians remained more negative in their attitudes than women (P=.04); data for these variables not shown). Interestingly, specialist during the previous year was not a significant predictor of attitude. The regression model explained 26% of the variation in the summary gatekeeper score.
Discussion
The role of primary care physicians in the US health care system continues to evolve. Although there is widespread support for many of the core values of primary care, there is also apprehension about policies that insist that primary care physicians authorize access to specialists—particularly when primary care physicians or commercial health plans may financially profit by economizing on specialty services.
Research on patient attitudes toward the gatekeeping role of primary care physicians has shown that while they value the comprehensive and coordinating role of primary care physicians, perceptions of referral barriers are one of the strongest predictors of patients giving their primary care physician low trust, confidence, and satisfaction ratings.9 Similarly, studies have indicated that primary care physicians often have ambivalent attitudes about performing gatekeeping functions such as mandatory authorization of all specialist referrals.3,6,11
Our study extends this previous research and demonstrates that specialist physicians also tend to have ambivalent attitudes about the gatekeeping role of primary care physicians. Many specialists in our survey agreed that primary care gatekeepers infringe on specialists’ clinical autonomy and their relationships with patients. However, half also acknowledged that primary care physician gatekeepers increase delivery of preventive services, and 40% agreed that coordination of care is enhanced by the involvement of a primary care gatekeeper.
Overall, specialist attitudes toward primary care physicians acting as gatekeepers were not uniformly negative. Many specialists appear to appreciate the advantages of having a primary care physician to help integrate services.
Our study indicates that specialists’ attitudes toward primary care gatekeepers differ significantly according to how the specialists are paid and the setting in which they practice. Payment methods such as salary and capitation that eliminate or markedly reduce the direct link between volume of referral visits and specialist income appear to promote a more favorable attitude among specialists toward primary care gatekeepers. This finding suggests that the objection of some specialists to a gatekeeping role for primary care physicians may at least in part be due to concerns about possible loss of income under fee-for-service arrangements. It is possible that specialists paid by salary or capitation perceive that a more prominent coordinating role for primary care physicians may be to their professional benefit by reducing inappropriate referrals that bring no additional income to the practice.
Specialists working in larger and more organized practice settings also have more favorable views of primary care gatekeepers. This association between practice setting and attitudes may be partly explained by the fact that physicians in larger groups and group-model HMOs are more likely to be paid on a salaried basis. However, even after adjusting for payment method in regression analysis, practice setting remained predictive of attitudes toward gatekeepers. It is likely that physicians in larger office-based groups and group-model HMOs work in a multispecialty context that promotes a more collaborative and interdependent approach to practice across specialties. This organizational culture may attenuate conflicts between specialty groups about scope of practice, patient allegiances, and the appropriate role of each specialty within the overall system of care.
Our study has several policy implications. Our results indicate that specialists vary in their attitudes toward the gatekeeping role of primary care physicians and that negative attitudes are not necessarily an immutable characteristic of being a specialist. Attitudes appear to be shaped at least in part by the specialists’ financial interest that may be threatened by restrictions on referrals and by the system in which they practice. Policies that promote alternatives to fee-for-service payment and shift specialists away from solo practice toward larger, organized group practice settings may also encourage them to adopt more positive attitudes about the role of primary care physicians as coordinators of care. Integrated work environments may generate a common sense of purpose, stemming in part from physical proximity to facilitate communication and cooperation.
Limitations
Several limitations of our study are worth noting. Our study was limited to physicians in California. Although California has one of the most competitive managed care markets in the United States and may exemplify trends occurring in other states with active managed care markets, results may not necessarily be generalizable to physicians working in other states. Our main study variable—attitudes toward primary care gatekeepers—is a subjective measure. The wording of our main study question specifically highlighted primary care physicians in a gatekeeper role. The response to this question is therefore not necessarily indicative of the attitudes of specialists toward primary care physicians in general. Interpretation of the word “gatekeeper” was left up to the respondent. Finally, as in all observational studies, causal inferences must be made with caution. We detected strong associations between payment method and practice setting and specialists’ attitudes toward gatekeepers. Although it is plausible that payment incentives and practice environment influence specialist attitudes, it is also possible that specialists who have different underlying values are attracted to different types of practice settings and payment arrangements. For example, salaried group-model HMOs may attract specialists who already have relatively favorable attitudes toward primary care gatekeeping, rather than (or in addition to) that culture promoting a more favorable attitude. Solo practice, in contrast, may attract physicians who are more independent and predisposed to perceive the gatekeeper role as adversarial.
Conclusions
In the US health care system gatekeeping remains controversial. Specialist ambivalence toward gatekeeper models may undermine the legitimacy of a more primary care–focused system. Health systems with strong foundations in primary care appear to produce better patient outcomes than systems that do not promote such primary care elements as continuity and coordination of care.18 Models of care that promote integration and coordination by primary care physicians without emphasizing a restricting role may decrease tensions among physicians. Organizational structures and payment methods that minimize conflict between primary care physicians and specialists will be essential to the further development of an integrated health care system.19 Future health policies will need to consider how to encourage cooperation between primary care physicians and specialists to best meet the needs of the patient.
· Acknowledgments ·
This work was supported by the Bureau of Health Professions, HRSA (Grant 5 U76 MB 10001). The authors thank Dennis Keane, MPH, and Deborah Jaffe for their assistance with survey administration; Art Munger for assistance with manuscript preparation; Norman Hearst, MD, MPH, for his comments on early drafts; and the physicians who participated in the study.
1. Feldman SR, Fleischer AB, Jr, Chen JG. The gatekeeper model is inefficient for the delivery of dermatologic services. J Am Acad Dermatol 1999;40:426-32
2. Donelan K, Blendon RJ, Lundberg GD, et al. The new medical marketplace: physicians’ views. Health Affairs Datawatch (139) 1997;16:139-148.
3. Ellsbury KE, Montano DE, Manders D. Primary care physician attitudes about gatekeeping. Journal of Family Practice 1987;25:616-19.
4. Kulu-Glasgow I, Delnoij D, de Baker D. Self-referral in a gatekeeping system: patients’ reasons for skipping the general-practitioner. Health Policy 1998;45:221-38.
5. St. Peter RF. Access to specialists: Perspectives of patients and primary care physicians. Data Bulletin Fall 1997;2:1-2
6. Taylor TR. Pity the poor gatekeeper: a transatlantic perspective on cost containment in clinical practice. BMJ 1989;299:1323-25.
7. Schultz R, Girard C, Scheckler WE. Physician satisfaction in a managed care environment. J Fam Pract 1992;34:298-304.
8. Grumbach K, Osmond D, Vranzan K, Jaffe D, Bindman AB. Primary care physicians’ experience of financial incentives in managed-care systems. N Engl J Med 1998;339:1516-21.
9. Grumbach K, Selby JV, Damberg C, et al. Resolving the gatekeeper conundrum: what patients value in primary care and referrals to specialists. JAMA 1999;282:261-66.
10. Kerr EA, Hays RD, Lee ML, Siu AL. Does dissatisfaction with access to specialists affect the desire to leave a managed care plan? Med Care Res & Rev 1998;55:59-77.
11. Halm EA, Nancyanne C, Blumenthal D. Is gatekeeping better than traditional care? a survey of physicians’ attitudes. JAMA 1997;28:1677-81.
12. Feldman DS, Novack DH, Gracely E. Effects of managed care on physician-patient relationships, quality of care, and the ethical practice of medicine. Arch Intern Med 1998;158:1626-32.
13. Marshall MN. How well do GPs and hospital consultants work together? A survey of the professional relationship. Fam Pract 1999;16:33-8.
14. Bodenheimer T, Lo B, Casalino L. Primary care physicians should be coordinators, not gatekeepers. JAMA 1999;281:2045-49.
15. De Guzman MM. Are specialists staging a comeback? Health Syst Lead 1997;4:4-13
16. Beard PL. Specialty empowerment: a new trend in managed care. Healthc Financ Manage 1998;52:62-4.
17. Bodenheimer T. The American health care system: physicians and the changing medical marketplace. N Engl J Med 1999;340:584-88.
18. Starfield B. Is primary care essential? Lancet 1994;22:1129-33.
19. Herd B, Herd A, Mathers N. The wizard and the gatekeeper: of castles and contracts. BMJ 1995;310:1042-44.
1. Feldman SR, Fleischer AB, Jr, Chen JG. The gatekeeper model is inefficient for the delivery of dermatologic services. J Am Acad Dermatol 1999;40:426-32
2. Donelan K, Blendon RJ, Lundberg GD, et al. The new medical marketplace: physicians’ views. Health Affairs Datawatch (139) 1997;16:139-148.
3. Ellsbury KE, Montano DE, Manders D. Primary care physician attitudes about gatekeeping. Journal of Family Practice 1987;25:616-19.
4. Kulu-Glasgow I, Delnoij D, de Baker D. Self-referral in a gatekeeping system: patients’ reasons for skipping the general-practitioner. Health Policy 1998;45:221-38.
5. St. Peter RF. Access to specialists: Perspectives of patients and primary care physicians. Data Bulletin Fall 1997;2:1-2
6. Taylor TR. Pity the poor gatekeeper: a transatlantic perspective on cost containment in clinical practice. BMJ 1989;299:1323-25.
7. Schultz R, Girard C, Scheckler WE. Physician satisfaction in a managed care environment. J Fam Pract 1992;34:298-304.
8. Grumbach K, Osmond D, Vranzan K, Jaffe D, Bindman AB. Primary care physicians’ experience of financial incentives in managed-care systems. N Engl J Med 1998;339:1516-21.
9. Grumbach K, Selby JV, Damberg C, et al. Resolving the gatekeeper conundrum: what patients value in primary care and referrals to specialists. JAMA 1999;282:261-66.
10. Kerr EA, Hays RD, Lee ML, Siu AL. Does dissatisfaction with access to specialists affect the desire to leave a managed care plan? Med Care Res & Rev 1998;55:59-77.
11. Halm EA, Nancyanne C, Blumenthal D. Is gatekeeping better than traditional care? a survey of physicians’ attitudes. JAMA 1997;28:1677-81.
12. Feldman DS, Novack DH, Gracely E. Effects of managed care on physician-patient relationships, quality of care, and the ethical practice of medicine. Arch Intern Med 1998;158:1626-32.
13. Marshall MN. How well do GPs and hospital consultants work together? A survey of the professional relationship. Fam Pract 1999;16:33-8.
14. Bodenheimer T, Lo B, Casalino L. Primary care physicians should be coordinators, not gatekeepers. JAMA 1999;281:2045-49.
15. De Guzman MM. Are specialists staging a comeback? Health Syst Lead 1997;4:4-13
16. Beard PL. Specialty empowerment: a new trend in managed care. Healthc Financ Manage 1998;52:62-4.
17. Bodenheimer T. The American health care system: physicians and the changing medical marketplace. N Engl J Med 1999;340:584-88.
18. Starfield B. Is primary care essential? Lancet 1994;22:1129-33.
19. Herd B, Herd A, Mathers N. The wizard and the gatekeeper: of castles and contracts. BMJ 1995;310:1042-44.
Primary Care Physician Supply and Colorectal Cancer
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
STUDY DESIGN: We performed an ecologic study of Florida’s 67 counties, using data from the state tumor registry and the American Medical Association physician masterfile.
POPULATION: Florida residents were included.
OUTCOMES MEASURED: We measured age-adjusted colorectal cancer incidence and mortality rates for Florida’s 67 counties during the period 1993 to 1995.
RESULTS: Increasing primary care physician supply was negatively correlated with both colorectal cancer (CC) incidence (CC = -0.46; P < .001) and mortality rates (CC = -0.29; P =.02). In linear regression that controlled for other county characteristics, each 1% increase in the proportion of county physicians who were in primary care specialties was associated with a corresponding reduction in colorectal cancer incidence of 0.25 cases per 100,000 (P < .001) and a reduction in colorectal cancer mortality of 0.08 cases per 100,000 (P=.008).
CONCLUSIONS: Incidence and mortality of colorectal cancer decreased in Florida counties that had an increased supply of primary care physicians. This suggests that a balanced work force may achieve better health outcomes.
It was predicted that more than 130,000 Americans would develop colorectal cancer in the year 2000. This is the second leading cause of cancer mortality in the United States, with an estimated 56,300 deaths predicted for 2000.1 In that year, the state of Florida ranked third in the number of colorectal cancer cases (9100) and colorectal cancer deaths (3900).
Earlier diagnosis of colorectal cancer, with subsequently reduced mortality, can be achieved by eliciting and promptly evaluating signs and symptoms of colorectal cancer and by providing recommended screening tests, such as fecal occult blood testing and flexible sigmoidoscopy.2 Also, the provision of screening tests may reduce colorectal cancer incidence by detecting and eliminating precancerous polyps. Annual fecal occult blood testing, for example, has been demonstrated to reduce colorectal cancer incidence by 20%.3 Polyps found by screening sigmoidoscopy would also generally result in surveillance colonoscopy, a procedure which may reduce colorectal cancer incidence by as much as 90%.4
Studies have consistently reported that access to health care and a physician’s recommendation for screening are important predictors of cancer screening.5-10 One would expect, therefore, that the provision of colorectal cancer screening tests would be dependent to some extent on the availability of physician services. Physician specialties may differ, however, in their provision of preventive health services. Stange and colleagues,11 for example, found that family physicians addressed at least one US Preventive Services Task Force recommendation for preventive care in 39% of visits for chronic illness. In contrast, evidence suggests that most specialists are not likely to address health care needs outside their specialty.12
Compared with other cancer screening tests, colorectal cancer screening is less frequently recommended by physicians and is less frequently completed by patients. It is possible, therefore, that the availability of primary care providers has relatively limited impact on colorectal cancer outcomes.13-15 We have previously shown that increasing supplies of primary care physicians were associated with earlier detection of colorectal cancer, while increasing supplies of non–primary care physicians were associated with later-stage diagnosis.16 We hypothesized, therefore, that increasing primary care physician supply would also be associated with lower incidence and mortality rates for colorectal cancer.
Methods
We performed an ecologic study comparing primary care physician supply with colorectal cancer incidence and mortality rates. Colorectal cancer incidence and mortality rates for Florida’s 67 counties were identified using the Florida Cancer Data System (FCDS), a population-based statewide cancer registry. The FCDS is a member of the North American Association of Central Cancer Registries (NAACCR). NAACCR audits have estimated that the completeness of case ascertainment for the period 1990 to 1994 is 99.7%. The FCDS provides age-adjusted incidence and mortality rates by standardizing them to the 1970 US standard population. To account for year-to-year fluctuations, rates were averaged over the 3-year period 1993 to 1995.
Because distal cancers may be more easily detected with screening tests such as sigmoidoscopy, we also examined incidence rates stratified by proximal versus distal origin of the cancer. We defined proximal cancers as those arising from the cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure. Distal cancers were defined as those arising from the descending colon, sigmoid colon, rectosigmoid juncture, and the rectum. Tumors of the anal canal were excluded because of differing pathology and treatment implications.17
We used the 1990 US census to ascertain other characteristics of Florida counties that might have an impact on colorectal cancer incidence and mortality. In addition to age, colorectal cancer incidence and mortality rates vary by race, socioeconomic status, and marital status. Variables obtained for each county included median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, and percentage who were married.
Data on physician supply were obtained from the 1994 American Medical Association (AMA) physician masterfile, which includes allopathic and osteopathic physicians regardless of AMA membership. County-level population estimates were obtained from the 1990 United States Census. Physician supply variables were created for total physician supply, and for primary care physician supply and non–primary care physician supply. Physicians were classified as primary care if their self-designated specialty was family practice, general practice, obstetrics/gynecology, or general internal medicine.
Physicians who indicated they were engaged in full-time direct patient care were counted as one full-time equivalent (FTE); those who indicated in the masterfile that they were either “semi-retired,” in residency training, or engaged in teaching or research were counted as 0.5 FTE. Physicians who indicated they were no longer involved in direct patient care were excluded. On the basis of this information, we calculated for each county the proportion of all physicians engaged in primary care and used this as our measure of primary care supply.
Counties were the unit of analysis for our study. We explored associations between primary care physician supply, and colorectal cancer incidence and mortality rates in 2 ways. First we constructed scatterplots to explore possible linear relationships, and to exclude nonlinear associations, and also calculated Pearson correlation coefficients. Second, we used multiple linear regression to explore the multivariable relationship between primary care physician supply and outcomes, controlling for other county-level characteristics.
Parameter estimates were determined using the method of ordinary least squares. Potential confounding variables included in each initial model were median household income, percentage of county residents with less than a high school education, percentage residing in urban census areas, percentage who were white, percentage who were married, and total physician supply. Final regression models included all variables that remained statistically significant (P < .05), using a backward variable selection algorithm. We also confirmed that all outcomes were normally distributed using graphical methods.
Results
The average physician supply for Florida’s 67 counties (physicians per 100,000 population) was 134.9, with primary care supply at 49.7 and specialist supply at 85.2. The average supply of primary care specialties was as follows: family physicians, 17.5; general practitioners, 10.7; general internists, 13.9; and obstetrician-gynecologists, 7.2. There was substantial variation in physician supply, with some counties having as few as 15 physicians per 100,000 population and other counties having more than 500 physicians per 100,000 population. The average proportion of physicians who were in a primary care specialty was 0.36 across Florida’s 67 counties (standard deviation = 0.19; range = 0.17-1.00).
There was also substantial variation in both incidence and mortality rates across Florida’s 67 counties. Some counties had incidence rates as low as 9.6 cases per 100,000 and others as high as 72 cases per 100,000. Mortality rates varied from a low of 3.8 cases per 100,000 to a high of 26.4 cases per 100,000. Incidence and mortality rates were both higher in men than in women.
Associations between primary care physician supply and colorectal cancer incidence and mortality rates were assessed both graphically and using the Pearson correlation coefficient Table 1.* ( Figure 1, Figure 2, Figure 3) Primary care physician supply was negatively correlated with colorectal cancer incidence and mortality rates in the 67 counties studied. For colorectal cancer incidence rates, negative correlations were observed for both proximal and distal cancers, and among both men and women. For mortality rates, correlations were stronger for men and did not reach statistical significance among women. Scatter diagrams did not suggest the presence of nonlinear relationships.
Table 2 presents the results of linear regression analyses. Primary care physician supply was a statistically significant predictor of all outcomes examined. Each 1% increase in primary care physician supply was associated with a reduction in overall colorectal cancer incidence of 0.25 cases per 100,000. Each 1% increase in primary care physician supply was similarly associated with a reduction in overall colorectal cancer mortality of 0.08 cases per 100,000. In stratified analysis, primary care physician supply had similar effects for both proximal and distal cancers, with slightly greater effects among men than women. Overall physician supply was not a significant predictor of any of the outcomes examined.
Discussion
We found that an increasing supply of primary care physicians was associated with lower incidence and lower mortality rates of colorectal cancer in Florida counties. Each 1% increase in primary care physician supply was associated with a reduction in colorectal cancer incidence of 0.25 cases per 100,000 and a reduction in mortality of 0.08 cases per 100,000. In contrast, overall physician supply was unrelated to any of the outcomes examined. Findings were similar in men and women and for proximal and distal cancers.
Although there is continued interest in the composition of the United States physician work force,18-25 there have been surprisingly few studies demonstrating the effects of physician supply on health-related outcomes. Some studies have suggested that an oversupply of specialists may contribute to higher health care costs.22,26-28 Primary care physician supply has been correlated with reduced hospitalization rates for ambulatory care–sensitive conditions29,30 and with improved access and overall use of ambulatory health services.31-34
We have previously shown associations between primary care physician supply and earlier detection of breast cancer, colorectal cancer, and malignant melanoma.16,35,36 These findings are consistent with studies showing that patients who have a family physician are more likely to receive a diagnosis of early-stage cancer.37 Our study suggests that increasing supplies of primary care physicians might also be associated with reduced incidence and mortality for some cancers. In contrast, increased overall supplies of physicians have not been associated with improved cancer outcomes, suggesting that a balanced physician work force may be necessary to achieve optimal health outcomes.
Physician specialty choice and practice location are driven by many factors, including the location of training programs at medical schools and residencies, role models in medical school, education debt, lifestyle, and other issues. These factors influence the types of physicians that practice in various locations, and as a result may influence the health care of the population in that area. As the physician work force is studied and policy decisions are made, it will be important to consider measurable health care outcomes in addition to projected demands based on economic forces.38
Limitations
This study has a number of important limitations that should be considered. First, ecologic studies are subject to the ecologic fallacy, in which associations at the population level do not accurately reflect associations at the individual level. We did not have information on individual patients’ actual use of physician services, for example, so patients’ actual access to primary care may have been different than that predicted by county-level measures. Ecologic studies have very limited ability to establish causation, and follow-up studies conducted at the individual patient level (such as case-control or cohort studies) will be necessary to confirm these findings. The exploratory nature of selecting variables for ecologic studies may also increase type 1 statistical errors, falsely concluding that associations exist when they have actually occurred by chance.
We did not have information on other colorectal cancer risk factors, such as dietary patterns, rates of family history, or rates of ulcerative colitis. We also lacked information on rates of detection of precancerous polyps, and the age/sex distribution of physicians, which would have strengthened our study. Because incidence and mortality rates were established according to the patient’s county of residence rather than the location of diagnosis or treatment, we do not believe the associations observed were the result of referral patterns (eg, patients with suspected late-stage disease being referred to areas with higher-specialty physician supply). However, physician supply might be correlated with other unmeasured characteristics of our health care delivery system, which could account for the observed associations. Finally, our study was restricted to colorectal cancer in Florida, which may not be representative of other diseases or other parts of the country.
Conclusions
Both the incidence and mortality of colorectal cancer were decreased in Florida counties that had a greater supply of primary care physicians. Overall physician supply, however, was unrelated to colorectal cancer mortality or incidence. These associations will need to be confirmed with studies conducted at the individual level.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
1. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7-33.
2. United States Preventive Service Task Force. Guide to clinical preventive services. 2nd ed. Washington, DC: US Department of Health and Human Services; 1996.
3. Mandel JS, Church TR, Bond JH, et al. The effect of fecal occult-blood screening on the incidence of colorectal cancer. N Engl J Med 2000;343:1603-07.
4. Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy: The National Polyp Study Workgroup. N Engl J Med 1993;329:1977-81.
5. Fox SA, Murata PJ, Stein JA. The impact of physician compliance on screening mammography for older women. Arch Intern Med 1991;151:50-56.
6. Fox SA, Siu AL, Stein JA. The importance of physician communication on breast cancer screening of older women. Arch Intern Med 1994;154:2058-68.
7. Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-67.
8. National Cancer Institute Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? JAMA 1990;264:54-58.
9. Lewis SF, Jensen NM. Screening sigmoidoscopy: factors associated with utilization. J Gen Intern Med 1996;11:542-44.
10. Vernon S. Participation in colorectal cancer screening: a review. J Natl Cancer Inst 1997;89:1406-22.
11. Stange K, Flocke S, Goodwin M. Opportunistic preventive services delivery. J Fam Pract 1998;46:419-24.
12. Rosenblatt RA, Hart LG, Baldwin LM, Chan L, Schneeweiss R. The generalist role of specialty physicians: is there a hidden system of primary care? JAMA 1998;279:1364-70.
13. Brownson RC, Davis JR, Simms SG, Kern TG, Harmon RG. Cancer control knowledge and priorities among primary care physicians. J Cancer Educ 1993;8:35-41.
14. Weisman CS, Celentano DD, Teitelbaum MA, Klassen AC. Cancer screening services for the elderly. Public Health Rep 1989;104:209-14.
15. American Cancer Society. Survey of physicians’ attitudes and practices in early cancer detection. Cancer 1990;40:77-101.
16. Roetzheim RG, Pal N, Gonzalez EC, et al. The effects of physician supply on the early detection of colorectal cancer. J Fam Pract 1999;48:850-88.
17. Laish-Vaturi A, Gutman H. Cancer of the anus. Oncol Rep 1998;5:1525-29.
18. Kindig DA, Cultice JM, Mullan F. The elusive generalist physician: can we reach a 50% goal? JAMA 1993;270:1069-73.
19. Rivo ML, Satcher D. Improving access to health care through physician workforce reform: directions for the 21st century. JAMA 1993;270:1074-78.
20. Rivo ML, Mays HL, Katzoff J, Kindig DA. Managed health care: implications for the physician workforce and medical education. Council on Graduate Medical Education. JAMA 1995;274:712-15.
21. Rosenblatt RA. Specialists or generalists: on whom should we base the American health care system? JAMA 1992;267:1665-66.
22. Schroeder SA, Sandy LG. Specialty distribution of U.S. physicians—the invisible driver of health care costs. N Engl J Med 1993;328:961-63.
23. Weiner JP. Forecasting the effects of health reform on US physician workforce requirement: evidence from HMO staffing patterns. JAMA 1994;272:222-30.
24. Barnett PG, Midtling JE. Public policy and the supply of primary care physicians. JAMA 1989;262:2864-68.
25. Barondess JA. Specialization and the physician workforce: drivers and determinants. JAMA 2000;284:1299-301.
26. Kane R, Friedman B. State variations in medicare expenditures. Am J Public Health 1997;87:1611-20.
27. Mark DH, Gottlieb MS, Zellner BB, Chetty VK, Midtling JE. Medicare costs in urban areas and the supply of primary care physicians. J Fam Pract 1996;43:33-39.
28. Welch WP, Miller ME, Welch HG, Fisher ES, Wennberg JE. Geographic variation in expenditures for physicians’ services in the united states. N Engl J Med 1993;328:621-27.
29. Parchman ML, Culler S. Primary care physicians and avoidable hospitalizations. J Fam Pract 1994;39:123-28.
30. Krakauer H, Jacoby I, Millman M, Lukomnik JE. Physician impact on hospital admission and on mortality rates in the Medicare population. Health Serv Res 1996;31:191-211.
31. Krishan I, Drummond DC, Naessens JM, Nobrega FT, Smoldt RK. Impact of increased physician supply on use of health services: a longitudinal analysis in rural Minnesota. Public Health Rep 1985;100:379-86.
32. Briggs LW, Rohrer JE, Ludke RL, Hilsenrath PE, Phillips KT. Geographic variation in primary care visits in Iowa. Health Serv Res 1995;30:657-71.
33. Williams AP, Schwartz WB, Newhouse JP, Bennett BW. How many miles to the doctor? N Engl J Med 1983;309:958-63.
34. Allen DI, Kamradt JM. Relationship of infant mortality to the availability of obstetrical care in Indiana. J Fam Pract 1991;33:609-13.
35. Roetzheim RG, Pal N, Van Durme DJ, et al. Increasing supplies of dermatologists and family physicians are associated with earlier stage of melanoma detection. J Am Acad Derm 2000;43:211-18.
36. Ferrante JM, Gonzalez EC, Pal N, Roetzheim RG. The effects of physician supply on the early detection of breast cancer. J Am Board Fam Pract 2000;13:408-14.
37. Samet JM, Hunt WC, Goodwin JS. Determinants of cancer stage: a population-based study of elderly New Mexicans. Cancer 1990;66:1302-07.
38. Greene J. Emerging specialist shortage triggers workforce review. Am Med News 2001;13-14.
The Effect of Patient and Visit Characteristics on Diagnosis of Depression in Primary Care
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
STUDY DESIGN: We used a cross-sectional design.
POPULATION: Data from the 1997 and 1998 National Ambulatory Medical Care Surveys were examined.
OUTCOMES MEASURED: We assessed the association of factors such as age, sex, race, physician specialty, type of insurance, and visit duration with a recorded depression diagnosis during office visits to primary care physicians.
RESULTS: After controlling for symptom presentation, primary care physicians were 56% less likely to record a diagnosis of depression during visits made by elderly patients, 37% less likely to do so during visits by African Americans, and 35% less likely to do so during visits by Medicaid patients. Visits with a depression diagnosis were, on average, 2.9 minutes longer in duration (16.4 vs 19.3) than visits without a depression diagnosis. Family practice and general practice physicians were 65% more likely to record a diagnosis of depression than internists.
CONCLUSIONS: Many factors were associated with making and recording a depression diagnosis beyond the patient’s reported symptoms. If rates of diagnosis are to improve, interventions that go beyond getting physicians to recognize the symptoms of depression are needed.
- Receipt of a recorded depression diagnosis during office visits to primary care physicians is dependent on patient age, race, and type of insurance.
- Family practice and general practice physicians are more likely than internists to record a depression diagnosis during office visits.
- Many factors beyond the patient’s reported symptoms are associated with making and recording a depression diagnosis.
Characteristics and Depression Diagnosis
Depression is a common disorder that significantly affects quality of life, functioning, and even mortality.1-4 However, as indicated in the Surgeon General’s Report on Mental Health, depression remains under-recognized and underdiagnosed.5 Most studies examining recognition of depression have focused on the role of symptom presentation, the use of screening tools, and physician educational interventions designed to improve symptom recognition.6 However, factors other than clinical presentation may be associated with the likelihood that depression is recognized during a physician visit.7,8 For example, patient age and race, type of insurance, and duration of the visit may increase or decrease the rate at which a depression diagnosis is recorded. Also, diagnostic rates may differ between family or general practice physicians and internists. If differences in diagnostic rates indeed occur because of extraclinical factors and current interventions continue to focus primarily on recognition of patients’ symptoms, certain patient groups will continue to be underdiagnosed and undertreated.
Given this concern about the range of factors possibly associated with receiving a depression diagnosis, we examined data from a nationally representative sample of office visits to physicians, the National Ambulatory Medical Care Survey. More specifically, we examined the independent role of factors such as age, sex, race, type of insurance, and duration of the visit on the probability that depression would be diagnosed during a patient’s visit to a primary care physician. Although the prevalence of depression is greater in women, there should not be a large difference in the likelihood that a depression diagnosis is recorded during an office visit after controlling for the patient’s reason for encounter. Similarly, if primary care physicians are recording diagnoses of depression based solely on the patient’s reasons for encounter, the likelihood that a depression diagnosis is recorded should be similar by age, even though there is a reported lower prevalence of major depression in elderly persons (minor depression is believed to occur more frequently in the elderly).9 Admittedly, however, some of the somatic symptoms associated with depression (eg, fatigue) are more likely to be due to a physical illness rather than depression in elderly patients. Thus, rates of diagnoses can should be slightly lower among elderly persons. However, because of primary care providers’ lack of confidence in assessing and diagnosing adults with depression1,10 and the tendency for older persons to present depressive symptoms in terms of somatic complaints,11,12 depression diagnoses are expected to be recorded much less frequently during visits by elderly persons, even after controlling for the patient’s reasons for the visit. Also, although African American patients have a lower reported prevalence and incidence rate of depression,13,14 one would expect depression diagnoses to be recorded at rates similar to those for other races after controlling for patient presentation of symptoms. Nevertheless, cultural stereotypes among providers may lead to depression diagnoses being recorded less frequently during these visits.15,16
With regard to practice factors affecting accurate diagnosis, since primary care physicians tend to schedule short patient visits and have many conditions to treat during those visits, we expected that the probability of a depression diagnosis being recorded would increase as the duration of the visit increased. Given competing demands for the physician’s awareness, depression often gets less attention during visits where the patient has a recent medical problem or even several of them.17 Finally, we expected family and general practice physicians to diagnose depression more often than internists. Family practice physicians express more responsibility for treating depression, tend to have more complete knowledge of available treatments, and are more confident in managing a mood disorder.10
Methods
Data
The study used data from the 1997 and 1998 National Ambulatory Medical Care Surveys (NAMCS). The NAMCS, which have been conducted every year since 1989 by the National Center for Health Statistics (NCHS), sample a nationally representative group of visits to physicians in office-based practices. The NCHS included weights in the NAMCS to enable the sample to represent all office visits in the United States. A detailed description of the NAMCS sample and sampling procedure, as well as a description of the survey instrument and survey administration procedures, is provided elsewhere.18,19
There were 24,715 visits sampled in 1997 and 23,339 visits sampled in 1998. For each office visit, the survey provided information on physician specialty, up to 3 diagnoses, and up to 3 patient reasons for the visit. Because there were fewer than 200 visits with a diagnosis of depression sampled in each year, we combined the data from 1997 and 1998 to increase the power of the analysis. We limited our analysis to the 17,058 visits made during this interval by adults 18 years and older to primary care physicians. Primary care physicians included physicians with specialties of family practice, general practice, or internal medicine. Item nonresponse rates in the NAMCS data are low (<5%), and the NCHS provides imputed values for any missing information on demographic variables and duration of the visit in the NAMCS data.19
Diagnostic Groups
Patients were categorized on the basis of diagnoses assigned by providers during the index visit, using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We classified depression visits as those with ICD-9 codes of 296.2 (major depressive disorder, single episode), 296.3 (major depressive disorder, recurrent), 300.4 (neurotic depression), 311 (depressive disorder, not elsewhere classified), and 298.0 (depressive type psychosis).
Patient and Visit Characteristics
Information on patient age, race, and ethnicity was recorded in the NAMCS survey, as was information on whether the visit was prepaid or fee-for-service and type of insurance coverage (eg, private, Medicaid, Medicare). The duration of the visit was also recorded. The survey reported physician specialty; we classified primary care physicians into 2 groups: family practice/general practice and internal medicine. The survey also indicated whether the physician had seen the patient previously. Information on up to 3 reasons for the visit, according to the patient, was collected in the survey at the time of the visit. Self-reported depressive symptoms were divided into 3 categories: (1) depressed mood, (2) physical symptoms of depression (eg, tiredness, general weakness or ill feeling, weight loss, restlessness, disturbance of sleep, abnormal appetite), and (3) other psychiatric symptoms associated with depression (eg, nervousness, fears and phobias, problems with self-esteem and identity, disturbance of memory, social adjustment problems, intentional self-mutilation, and suicidal ideation). The number of medications prescribed during the visit and the visit’s duration were recorded in the survey and used in the analysis.
Analysis
We sought to examine the role of patient and visit characteristics on the probability that a depression diagnosis was recorded during an office visit to a primary care physician. Specifically, we investigated the independent effect of factors such as age, race, sex, type of insurance, and duration of the visit on the probability of receiving a depression diagnosis, after controlling for patient-reported symptoms of depression, physician specialty, and other patient characteristics. Factors associated with having a depression diagnosis recorded were determined using weighted logistic regression models, and adjusted odds ratios and their 95% confidence intervals were calculated. Statistically significant differences in recognition rates were identified by reducing the sample weights by the proportion needed to downweight the sample to the size of a simple random sample with the same variance.20 Although this method did not address problems caused by clustering within strata, it produced results that tend to overcompensate rather than undercompensate for artifacts produced from stratification.21 Significant differences were identified by testing the coefficients using a c2 test.
A sensitivity analysis was performed. We were concerned that patients with multiple medical conditions may be less likely to have a depression diagnosis recorded in the NAMCS because the survey only allows for 3 recorded diagnoses, and because these patients may not be randomly distributed by age, sex, race, type of physician, and so forth. A weighted logistic regression analysis was conducted on the subset of visits that recorded only 1 or 2 diagnoses (N=14,135). This should eliminate visits in which depression was recognized but a diagnosis was not recorded because 3 other conditions were perceived to be more important by the physician. The results of this analysis were then compared with results based on the full sample.
Results
Of the 17,058 visits made by adults to primary care physicians included in the 1997-1998 NAMCS samples, 358 visits included a diagnosis of depression Table 1. Therefore, using the weights provided by the NCHS, we estimated there were 20.2 million office visits to primary care physicians with a recorded diagnosis of depression in 1997 and 1998. This represented 2.4% of all visits to primary care physicians. The rate at which depression was diagnosed, however, varied significantly by several patient and visit characteristics, according to results from the multivariate analysis.
As we postulated, the data in Table 2 indicate that the probability of a diagnosis of depression’s being recorded during an office visit is significantly related to the patient’s reason for the visit, with depression being diagnosed over 40 times more often during visits where the patient reported depression as a reason for the visit. Also, a depression diagnosis was 3.4 times more likely to be recorded if the patient reported physical symptoms of depression as a reason for the visit and 4.9 times more likely if the patient reported other psychiatric symptoms associated with depression as a reason for the visit. However, even after controlling for the reasons for the visit, significant differences in the rate of depression diagnoses were observed by age, gender, and duration of the visit. Primary care physicians were 56% less likely to diagnose depression during visits made by elderly patients. Depression diagnoses were recorded more frequently during visits made by women, even after controlling for the patient’s reasons for the visit. Although the results are not reported in Table 2, we also questioned whether significant interactions of age with sex, race, or ethnicity were evident. We found a significant interaction of age and sex, demonstrating that elderly women were less likely to be considered depressed than elderly men (P=.01). Duration of the visit was also significantly associated with the rate at which depression diagnoses were recorded, with such diagnoses being recorded 1% more often for each additional minute that an office visit lasts. Visits during which a diagnosis of depression was recorded averaged 19.3 minutes, compared with 16.4 minutes for visits in which this diagnosis was not reported.
Differences in the rate at which depressive diagnoses were recorded were also observed by race and type of insurance coverage, although these differences did not achieve statistical significance at the P less than .05 level. A diagnosis of depression was recorded 37% (P=.055) less often during visits by African Americans and 35% (P=.08) less often during visits by Medicaid patients. After controlling for age, a diagnosis of depression was recorded 35% (P=.07) more often during visits by Medicare patients than with patients with private insurance. Large differences in rates at which a depression diagnosis was recorded were also observed by physician specialty. Family practice and general practice physicians were 65% (P <.001) more likely to record a diagnosis of depression than internists. Similar results were observed in the sensitivity analysis performed only on visits with 1 or 2 recorded diagnoses.
Discussion
Given that the prevalence of depression in epidemiologic studies is reported to approximate 12% to 18% in primary care practice,22,23 one would expect to see a depression diagnosis recorded more frequently than in 2.4% of office visits. Admittedly, depressed patients are likely to see their physicians for reasons other than their depression and may therefore not receive a depression diagnosis during each visit. Although reporting of depressive symptoms as the reason for the visit was an important determinant of whether or not a diagnosis of depression was recorded by the physician, there were several other nonclinical factors that predicted a depression diagnosis during visits to primary care physicians.
These findings show that the rate at which diagnoses of depression are recorded during office visits is influenced by factors other than symptom presentation. Sex and age were significantly associated with a depression diagnosis. Although the prevalence of depression is higher among women,14 the likelihood that a depression diagnosis was recorded should not have varied greatly by sex after controlling for the patient’s reason for the visit. Yet, this was the case. If a man and a woman both present to a primary care physician with the same symptoms, we found that a diagnosis of depression was more likely to be recorded during the visit made by a woman. Similarly, it appears that a diagnosis of depression was less likely to be recorded during visits made by older patients. During office visits by older persons, primary care physicians may simply attribute depressive symptoms to physical ailments or the normal aging process. However, it is also possible that older patients are more likely to report depressive symptoms that are actually due to other ailments than are younger patients.
African Americans were less likely to have a depression diagnosis recorded than were non-African Americans during visits to primary care physicians, even after controlling for mood disorder related symptoms. Primary care physicians possibly perceive African American patients to be stigmatized by a depression diagnosis more frequently than non-African American patients and thus choose not to assign them this diagnosis. It is also conceivable that primary care physicians do not assess physical and mood symptoms in African American patients as indicative of depression because of preconceptions about African American patients and their morbidities. The causes of racial differences in diagnosis rates cannot be determined from the NAMCS data set and warrant further study with different research strategies.
The duration of the visit had a significant effect on the probability that a depression diagnosis was recorded. Given that primary care physicians typically treat or monitor several conditions during a relatively short visit, it is not surprising that depression is recognized and diagnosed more often during longer visits. However, it may not be the case that depression was recognized because the visit was longer. It may be that visits of depressed patients just take longer. It is not possible to determine the causal relationship with this data. Again, further studies are needed of the physician diagnosis-making process.
Finally, a depression diagnosis was much more likely to be recorded during visits to family practice or general practice physicians than to internists. One may speculate that this occurs because the training of family/general practice physicians focuses more extensively on the identification and treatment of psychosocial problems than does the training of physicians who specialize in internal medicine. Only a third of training directors for internal medicine residencies were satisfied with the training received by their residents with regard to depression.24 Additionally, internists are much less likely to consider themselves responsible for treatment of depression than are family physicians.10 Although it is possible that the prevalence of depression is greater among patients treated by family/general practice physicians than internists, differences in the true prevalence of depression among physician practices could not be ascertained using this data. However, controlling for patient symptoms should have accounted for much of the difference in prevalence.
Limitations
The study’s findings should be interpreted cautiously because of various limitations of the dataset. This analysis was based on a nationally representative sample of physician office visits in which a diagnosis of depression was recorded. The use of diagnoses that primary care physicians coded sets a threshold that is not equivalent to recognition that might be assessed by direct inquiry of the physicians. Also, since the NAMCS only allows for the recording of 3 diagnoses, the physician conceivably recognized depression but did not record it because a higher priority was assigned to 3 other diagnoses. This quite conceivably is occurring with regard to visits by elderly patients who frequently experience multiple conditions. However, over 80% of visits by all subjects only had 1 or 2 diagnoses recorded during the visit, suggesting that in most cases, a depression diagnosis was not “crowded out.” Additionally, a sensitivity analysis conducted only on visits where 2 or fewer diagnoses were recorded during the visit found the same factors associated with a recorded depression diagnosis. The NAMCS data also only allows for the recording of 3 patient reasons for the visit. If a patient had more than 3 reasons for the visit, only the top 3, as identified by the physician, were recorded in the survey. This could lead to important patient symptoms being excluded from the survey. Thus, the analysis could not perfectly control for all the patients’ reasons for the visit, and this limitation should be kept in mind when interpreting these findings. Another limitation of the data is that no assessment of history of depression that might be an important clue for primary care physicians is recorded in the NAMCS survey.
Conclusions
There are many factors associated with physician recording of a depression diagnosis beyond the patient’s reported symptoms. Therefore, if rates of diagnosis of depression in office-based practice are to more closely approximate the true prevalence of the disorder, interventions are needed that go beyond simply helping physicians to better recognize the symptoms of depression. A recent review found that approximately one fourth of interventions designed to increase recognition and management of depression had no effect on diagnosis and treatment rates.6 Perhaps their effectiveness could be improved by designing more focused interventions that target African American and elderly patients who presently are assigned low rates of depressive diagnoses in primary care. This is a particularly high priority, since both African American and elderly patients are more likely to seek treatment in the primary care sector rather than the mental health specialty sector. Solberg and colleagues25 found that primary care physicians viewed systematic screening unfavorably, but were supportive of alternative approaches, such as external feedback about the care that they provide. Thus, feedback about differences in age-and race-specific rates could possibly provide the impetus needed for primary care physicians to alter their assessment procedures and clinical formulations in these under-recognized groups of patients. Finally, intervention efforts may want to focus on the unique manner in which internists formulate psychiatric diagnoses, since recognition rates for depression are unduly low in this specialty group.
Acknowledgments
This research was supported in part by National Institute of Mental Health grants P30 MH3095, P30 MH52247, R25 MH60473, K01 MH01613, and R01 MH59318.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
1. Unutzer J, Katon W, Sullivan M, Miranda J. Treating depressed older adults in primary care: narrowing the gap between efficacy and effectiveness. Milbank Q 1999;77:225-56.
2. Penninx W, Penninx H, Guralnik J, et al. Depressive symptoms and physical decline in community dwelling older persons. JAMA 1998;279:1720-26.
3. Penninx B, Geerlings S, Deeg D, van Eijk J, van Tilburg W, Beekman A. Minor and major depression and the risk of death in older persons. Arch Gen Psychiatry 1999;56:889-95.
4. Rovner B, German P, Brant L, Clark R, Burton L, Folstein M. Depression and mortality in nursing homes. JAMA 1991;265:993-96.
5. US Department of Health and Human Services. Mental health: a report of the surgeon general. Rockville, Md: US Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Center for Mental Health Services, National Institutes of Health, National Institutes of Mental Health.; 1999.
6. Kroenke K, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and treatment of mental disorders in primary care: a critical review of the literature. Psychosomatics 2000;41:39-52.
7. Klinkman M, Coyne J, Gallo S, Schwenk T. False positives, false negatives, and the validity of the diagnosis of major depression in primary care. Arch Fam Med 1998;7:451-61.
8. Rost Kea. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med 1994;3:333-37.
9. Eaton W, Anthony J, Gallo J, et al. National history of Diagnostic Interview Schedule/DSM-IV major depression: the Baltimore Epidemiologic Catchment Area Follow-up. Arch Gen Psychiatry 1997;54:993-99.
10. Williams JW, Rost K, Dietrich AJ, Ciotti MC, Zyzanski SJ, Cornell J. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med 1999;8:58-67.
11. Caine E, Lyness J, King D, Connors L. Clinical and etiological heterogeneity of mood disorders in elderly patients. In: Schneider L, Reynolds C, Lebowitz B, Friedhoff A, eds. Diagnosis and treatment of depression in late life: results of the NIH Consensus Development Conference. Washington, DC: American Psychiatric Association; 1994;21-54.
12. Gallo J, Rabins P, Anthony J. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med 1999;29:341-50.
13. Gallo J, Royall D, Anthony J. Risk factors for the onset of major depression in middle age and late life. Soc Psychiatry Psych Epidemiol 1993;28:101-08.
14. Kessler R, McGonagle K, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8-19.
15. Gallo J, Cooper-Patrick L, Lesikar S. Depressive symptoms of whites and African Americans aged 60 years and older. J Gerontol: Psychol Sci 1998;53B:277-86.
16. Cooper-Patrick L, Gallo J, Gonzalez J, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;37:1034-45.
17. Rost K, Nutting P, Smith J, Coyne JC, Cooper-Patrick L, Rubenstein L. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med 2000;9:150-54.
18. Bryant E, Shimizu I. Sampling design, sampling variance, and estimation procedures for the National Ambulatory Medical Care Survey. Vital Health Stat 2 1988;108:1-39.
19. Woodwell DA. National Ambulatory Medical Care Survey: 1998 summary. Advance data from vital and health statistics. Hyattsville, Md: National Center for Health Statistics; 2000.
20. Potthoff R, Woodbury M, Manton K. ‘Equivalent sample size’ and ’equivalent degrees of freedom’ refinements for inference using survey weights under superpopulation models. J Am Stat Assoc 1992;87:383-96.
21. Leaf P, Myers J, McEvoy L. Procedures used in the epidemiologic catchment area study. In: Robins L, Regier D, eds. Psychiatric Disorders of America: The Epidemiologic Catchment Area Study. New York, NY: The Free Press; 1991.
22. Brown C, Shulberg HC. Diagnosis and treatment of depression in primary medical care practice: the application of research findings to clinical practice. J Clin Psychol 1998;54:303-14.
23. Olfson M, Shea S, Feder A, et al. Prevalence of anxiety, depression, and substance use disorders in an urban general medicine practice. Arch Fam Med 2000;9:876-83.
24. Sullivan M, Cole S, Gordon G, Hahn S, Kathol R. Psychiatric training in medicine residencies: current needs, practices and satisfaction. Gen Hosp Psychiatry 1996;18:95-101.
25. Solberg L, Korsen N, Oxman T, Fischer L, Bartels S. The need for a system in the care of depression. J Fam Pract 1999;48:973-79.
Why Some Cancer Patients Choose Complementary and Alternative Medicine Instead of Conventional Treatment
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
STUDY DESIGN: This was a qualitative interview study.
POPULATION: Fourteen cancer survivors who reported having declined all or part of the recommended conventional treatment (surgery, chemotherapy, or radiation) were included. The participants were a subset from a multi-ethnic (Asian, Native Hawaiian, and white) group of 143 adults with cancer in 1995 or 1996 who were recruited through a population-based tumor registry and interviewed about CAM.
OUTCOMES MEASURED: We performed semistructured interviews regarding experience with conventional cancer treatment and providers, use of CAM, and beliefs about disease.
RESULTS: All participants used 3 or more types of CAM, most commonly herbal or nutritional supplements. Across the board, participants stated that their reason for declining conventional treatment was to avoid damage or harm to the body. The majority of participants also felt that conventional treatment would not make a difference in disease outcome, and some but not all participants perceived an unsatisfactory or alienating relationship with health care providers. Some participants reported that their discovery of CAM contributed to their decision to decline conventional treatment, and participants generally perceived CAM as an effective and less harmful alternative to conventional treatment.
CONCLUSIONS: Cancer patients may benefit from interventions (eg, patient education, improvements in physician-patient communication, and psychological therapy) to facilitate treatment decision making through increased understanding of conventional and CAM treatments and to identify barriers to treatment for individual patients.
- Factors expressed by participants as influencing the decision to decline conventional cancer treatment included: beliefs about harm, possible death and side effects, and the belief in or discovery of CAM as an effective alternative.
- Participants found CAM to be more effective and less harmful than conventional treatment.
- Participants gave sources of evidence for effectiveness of CAM: personal, medical, anecdotal, and belief.
- Participants reported positive or neutral interactions with health care providers regarding their use of CAM.
- Participants reported negative interactions or possible missing communication with health care providers as being factors in their decision to decline conventional treatment.
Although noncompliance or refusal of cancer treatment is a serious concern and has been shown to reduce the effectiveness of treatment and decrease the length of survival after diagnosis,1-4 the phenomenon itself has been scarcely studied. Existing studies report rates of less than 1% for patients refusing all treatment,4 12.5% for patients refusing chemotherapy,5 and 20% for patients refusing treatment for hematologic malignancy.6 Possible reasons for noncompliance have been proposed, including patients’ fear of the adverse side effects of cancer treatment, uncertainty, hopelessness, loss of control, denial of illness, psychiatric disorders, patient-physician relationship and communication issues, and medical systems dysfunctions.4,5,7-10
It has been hypothesized that individuals who choose complementary and alternative medicine (CAM) are more likely to forgo medical treatment than other patients.11 However, studies among noncancer populations have found that only a small percentage (between 3% and 4%) rely primarily on CAM.12-14 The few studies reporting rates of treatment refusal among cancer populations have found higher percentages (between 8% to 20%) of patients using CAM exclusively or ceasing conventional treatment in favor of CAM,15,16 but reasons for these decisions are unclear. Primary reliance on CAM for a variety of noncancer disorders was found in one study to be associated with distrust or dissatisfaction with conventional medicine and physicians, as well as the need to seek control over health.12 Some speculate that because of the extreme nature of most standard cancer treatment, patients may decline medical care in favor of CAM therapies that have few or no side effects.15,17,18
In a recent qualitative study of 8 Canadian cancer patients who abandoned biomedical treatment in favor of CAM, Montbriand19 found themes of anger and fear, need for control, belief in CAM as a cure, social support for CAM, cost considerations, and mystical insights into health care. This study provided an initial understanding of the concerns of cancer patients who refuse conventional treatment and choose CAM, but is limited by its small, homogeneous sample. More diverse samples are needed to cross-validate Montbriand’s findings and to uncover additional reasons. In the following study we describe themes that emerged from interviews with a multiethnic group of 14 participants as they discuss their reasons for declining conventional cancer treatment and choosing CAM.
Methods
Recruitment
The participants in this analysis were initially surveyed by mail as part of a larger study investigating ethnic differences in alternative medicine use among cancer patients in 1995 or 1996 in Hawaii and identified through a population-based tumor registry.20 Among those who returned the survey (n=1168), 439 (32%) volunteered to be interviewed. Because we were primarily interested in the diversity of experiences of CAM users, a heterogeneous group of 143 interview subjects was selected on the basis of CAM use, geographic areas, ethnicity, and cancer site. For this analysis, we included only those interview participants (n=14) who reported declining all or part of conventional treatment for cancer while simultaneously using CAM.
The mean age of participants was 52.5 (standard deviation = 14.1; range = 43-92), 9 were women, and 6 were married. The participants were white (9); Asian or Pacific Islander (5); Chinese; Filipino; Japanese; or Native Hawaiian). Participants were well educated, with the majority having past or present professional, managerial, or technical occupations. Five were retired at the time of the interview. Eight of the participants had breast cancer, and the rest had gastrointestinal cancer (3), prostate cancer (2), or skin cancer (1). Most of the participants had localized disease. The stage of disease was unknown for 4 participants, because they had declined procedures (eg, lymph node excision; exploratory surgery) to determine stage. Six participants reported that they had refused all conventional treatment (3 localized disease and 3 unstaged). Five participants reported undergoing surgery for the cancer but rejected all further treatment. Three participants had surgery and chemotherapy or radiation but reported refusing further treatment (eg, second surgery) that their physician considered necessary.
Procedure
Three human subjects research committees approved the research protocol. One- to 2-hour tape-recorded interviews were conducted in person at the participant’s home or another location in late 1998 or early 1999. All participants were compensated with a $20 gift certificate, and all gave signed informed consent.
Outcome measures
The semistructured interviews covered (a) demographics, (b) satisfaction with health care providers, (c) conventional treatments received for cancer and satisfaction, (d) types of CAM used for cancer and satisfaction, and (e) perceptions about cancer and cancer treatments.
After reading all the interview transcripts, the research team engaged in an iterative process in which we coded the text according to the nature of information, developed hypotheses and then translated the coding into categories.21 Responses were coded using NUD*IST 4,22 a software package for qualitative analysis. We assigned coding for: (a) reasons for rejecting conventional treatments, (b) types of CAM used, (c) reasons for choosing CAM, (d) beliefs CAM’s effectiveness, and (e) communication with physician. We included quantitative data (ie, demographics, disease characteristics, and types of CAM used) from the survey and from the tumor registry as a triangulation technique21 and to aid in describing the sample.
Results
All 14 participants used 3 or more types of CAM (max=14; median=8; Table 1), and all took some herbal or botanical supplement; 11 reported diet changes, and 7 used meditation or relaxation. Two participants attended CAM cancer clinics for intravenous therapy. One participant worked with a native Hawaiian healer, with whom she learned to gather and prepare traditional herbal remedies.
Three broad categories of themes emerged in the analysis: (1) beliefs about conventional treatment, (2) interactions with treatment providers, and (3) beliefs about CAM as an alternative to conventional treatment. Participants’ supporting quotes are shown in (Table 2, Table 2a)
Beliefs About Conventional Treatment
Conventional Treatment Is Harmful.. When asked to describe their reasons for declining conventional cancer treatment, participants described many ways that chemotherapy and radiation were harmful, including damaging cells, weakening the immune system, or inhibiting recovery. In the extreme, participants believed that conventional treatment would be fatal for them. Those who declined either a first (n=6) or a second (n=2) surgery commonly expressed concerns about mutilation (being “cut”) and the debilitating effects of surgery. A number of participants mentioned concerns that conventional treatment would increase their risk of future cancer. Participants also mentioned being deterred from conventional treatment by possible side effects, previous negative experience with a treatment, or knowing someone who died from the treatment.
Conventional Treatment Will Not Improve Outcome. Several patients expressed that conventional treatment was not likely to make a difference in disease outcome, either because of limitations inherent in conventional treatment or because of the particular characteristics of their disease. Often, the participants cited their belief that conventional treatment offered no complete guarantee for a cure. Although none of the participants disputed the validity of their cancer diagnosis, a few participants believed that cancer treatment was unnecessary because the cancer had been eliminated by initial treatment. One participant proposed that fate, not treatment, would decide her disease outcome.
Interactions with Treatment Providers
Nearly all participants (12 out of 14) stated that they had informed at least one of their physicians about CAM, and 2 had not. Nine respondents reported that their physicians were either supportive or neutral about their use of CAM. In the context of participants’ decision making about conventional treatment, participants expressed that they felt physicians could not be trusted, that physcians did not listen to their needs, and that medical professionals were hostile or threatening about participants’ treatment choices. Participants’ responses also indicated possible missed chances for communication between patient and physician about both conventional treatment and CAM. A minority of participants described feeling alienated from the medical community.
Beliefs About CAM as Alternative to Conventional Treatment
CAM Contributed to Decision to Decline. The perception that CAM offered a feasible alternative to conventional treatment appeared to assist participants in making the decision to go against their physicians’ recommendations. In 6 cases, the actual decision to refuse conventional treatment appeared to be facilitated by the discovery or knowledge of CAM.
CAM Is Better than Conventional Treatment. In many cases, the CAM choice was perceived to be considerably less aversive than the conventional treatment option or was perceived to make more “intuitive” sense. A common viewpoint expressed by participants was that conventional treatment and CAM have different methods and purposes. Participants pointed out that CAM works with the body’s own resources in a natural way to promote healing, while conventional treatment is short-sighted and merely attacks the symptom without addressing underlying imbalances.
CAM Is Effective. In choosing CAM as an alternative to conventional treatment, the participants stated that they were satisfied with CAM’s effectiveness and described sources of evidence for this, including personal evidence (most frequently cited), medical and anecdotal evidence, and belief. Participants’ personal experience of continuing to be alive, feeling well, or having subjective improvement in symptoms was proof for them that a particular CAM treatment worked. Participants also used medical evidence (eg, PSA tests or mammography) to demonstrate that their condition was improved and attributed this to the CAM. Anecdotal evidence based on others’ reported benefits from CAM was sufficient for at least one participant to state that she felt CAM was effective. A number of participants stated that they did not have any demonstrable evidence of the effectiveness of CAM, such as improved, symptoms or medical evidence, but that they nonetheless continued to believe that CAM was working for them. Participants’ reasoning included statements about how the particular CAM made logical sense to them and therefore “must work,” or that they had a long history of belief in the benefits of CAM. Only one participant admitted that she was not sure if CAM had helped her.
Discussion
A predominant theme in our analysis was the finding that participants perceived CAM to be a harmless, natural, and effective alternative to the damaging effects of conventional cancer treatment. In the participants’ views, conventional treatment offered no guarantee of a cure, while guaranteeing almost certain harm and for some, possible death. Participants felt that CAM had a positive effect on their overall health and, with a few exceptions, participants were confident in CAM’s ability to cure their cancer or prevent recurrence. The quality of physician/patient communication was also a factor in the decision of participants to decline conventional treatment. While participants reported both positive and negative experiences with medical staff, the more negative perceptions, including distrust, lack of response, and perceived hostility from health care providers, possibly caused further alienation between participants and the medical community.
A study by Astin reported similar predictors for primary reliance on CAM in the general population (lack of trust and dissatisfaction with conventional treatment and providers, and belief in the efficacy of CAM).12 Astin also observed that CAM was perceived as promoting health, while conventional treatment focused on the illness, a belief expressed by several of our participants. While the desire for control over health was a predictor in Astin’s study, this did not emerge as a theme in our analysis.
Our analysis provides cross-validity evidence with an ethnically diverse sample for several themes observed by Montbriand19 (difficulty in communication with health care providers, previous negative experiences with medical care, belief in a cure from CAM, and lack of hope for a cure offered by biomedical therapies). Montbriand’s themes of expressed stress, the need of patients to take control of treatment, and mystical insights into health care also appear to have some similarities to our results, while the influence of social support and cost considerations on CAM use were not as evident in our analysis. Also, unlike the Montbriand study, our participants reported supportive as well as negative health care interactions regarding CAM use, sources of evidence for CAM’s effectiveness (personal, medical, anecdotal, and belief), and the belief that CAM offered an opportunity to avoid the harmful effects of conventional treatment.
The preceding analysis is qualitative and based on the self-report of a small sample of 14 participants. Generalizability of the findings is therefore limited. However, the use of a qualitative method allowed investigation of a relatively rare population (cancer treatment decliners) that is seldom studied. The results are also limited by the fact that participants were primarily cancer survivors in relatively good health.
Future research should include participants with more advanced cancers, as compliance with treatment may be dependent on the patients’ expectation of the likely progression of their disease.23
Our findings have a number of clinical implications. Given some of the examples of interactions with medical professionals, it is possible that the participants did not fully understand their treatment options, including their chances of experiencing serious or debilitating consequences of conventional treatment, and may have overestimated such consequences. A better understanding of individual patients’ concerns about conventional treatment can guide how health care professionals in framing recommendations when talking to patients. While patients should be made as aware as possible of the pros and cons of all options for cancer treatment, including conventional methods, CAM, or no treatment, patient education efforts alone are not sufficient. Our findings, as well as those of Montbriand,19 indicate that fear and anxiety may be issues for patients who decline conventional treatment in favor of CAM. Some patients may require psychological and health behavior interventions aimed at improved adjustment and better coping with cancer, as well as addressing the motivational and emotional barriers to compliance. And finally, treatment decision making is an ongoing process, treatment decliners may choose conventional cancer treatment at a later date if given the adequate support, information, and time necessary to make the decision.23 Even if patients have declined oncologic care, they may continue to see their primary care and family physicians. Patients need to feel that they have not been permanently excluded from the health care system even if they make choices that are contrary to the recommendations of their medical team.
Acknowledgments
We want to thank all participants for taking the time and effort to respond to our questionnaire and to participate in the interviews. The help of Marc Goodman, PhD, and the staff of the Hawaii Tumor registry is greatly appreciated. We would also like to thank our research team, including Professor Thomas Maretzki, Yvonne Tatsumura, Katsuya Tasaki, Tammy Brown, Carole Prism, and David Henderson for their help with transcription and analysis. This research was supported by a special study grant from the National Cancer Institute, Surveillance, Epidemiology, and End Results program under contract number N01-PC67001.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
1. Hoagland AC, Morrow GR, Bennett JM, Carnrike CL, Jr. Oncologists’ views of cancer patient noncompliance. Am J Clin Onc 1983;6:239-44.
2. Li BD, Brown WA, Ampil FL, Burton GV, Yu H, McDonald JC. Patient compliance is critical for equivalent clinical outcomes for breast cancer treated by breast-conservation therapy. Ann Surg 2000;231:883-89.
3. Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med 1981;304:10-15.
4. Huchcroft SA, Snodgrass T. Cancer patients who refuse treatment. Cancer Causes Cont 1993;4:179-85.
5. Levin M, Mermelstein H, Rigberg C. Factors associated with acceptance or rejection of recommendation for chemotherapy in a community cancer center. Cancer Nurs 1999;22:246-50.
6. Evans SH, Clarke P. When cancer patients fail to get well: flaws in health communication. Beverly Hills, Calif:. Sage Publications; 1983;225-48.
7. Richardson JL, Sanchez K. Compliance with cancer treatment. In: Holland JC, ed. Psychoonc. New York, NY: Oxford University Press; 1998;67-77.
8. Kunkel EJ, Woods CM, Rodgers C, Myers RE. Consultations for ‘maladaptive denial of illness’ in patients with cancer: psychiatric disorders that result in noncompliance. Psychoonc 1997;6:139-49.
9. Goldberg RJ. Systematic understanding of cancer patients who refuse treatment. Psychother Psychosom 1983;39:180-89.
10. Appelbaum PS, Roth LH. Patients who refuse treatment in medical hospitals. JAMA 1983;250:1296-301.
11. Lowenthal RM. Alternative cancer treatments. Med J Aust 1996;165:536-37.
12. Astin JA. Why patients use alternative medicine: results of a national study. JAMA 1998;279:1548-53.
13. Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. Unconventional medicine in the United States: prevalence, costs, and patterns of use. N Engl J Med 1993;328:246-52.
14. Eisenberg DM, Davis RB, Ettner SL, et al. Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. JAMA 1998;280:1569-75.
15. Cassileth BR, Lusk EJ, Strouse TB, Bodenheimer BJ. Contemporary unorthodox treatments in cancer medicine: a study of patients, treatments, and practitioners. Ann Intern Med 1984;101:105-12.
16. Lerner IJ, Kennedy BJ. The prevalence of questionable methods of cancer treatment in the United States. CA Cancer J Clin 1992;42:181-91.
17. Jenkins CA, Scarfe A, Bruera E. Integration of palliative care with alternative medicine in patients who have refused curative cancer therapy: a report of two cases. J Pall Care 1998;14:55-59.
18. Downer SM, Cody MM, McCluskey P, et al. Pursuit and practice of complementary therapies by cancer patients receiving conventional treatment. BMJ 1994;309:86-89.
19. Montbriand MJ. Abandoning biomedicine for alternate therapies: oncology patients’ stories. Cancer Nursing 1998;21:36-45.
20. Maskarinec G, Shumay DM, Kakai H, Gotay CC. Ethnic differences in complementary and alternative medicine use among cancer patients. J Altern Complement Med 2000;6:531-38.
21. Bogdan R, Biklin S. Qualitative research in education. Boston, Mass: Allyn and Bacon; 1998.
22. Qualitative Solutions and Research Pty Ltd. QSR NUD*IST 4 user guide. Australia: Sage Publications, 1997.
23. Gotay CC, Bultz BD. Patient decision making inside and outside the cancer care system. J Psychosoc Onc 1986;4:105-14.
24. Cassileth BR. The alternative medicine handbook: The complete reference guide to alternative and complementary therapies. New York, NY: W.W. Norton & Company Inc, 1998.
Tazarotene 0.1% Gel in the Treatment of Fingernail Psoriasis: A Double-Blind, Randomized, Vehicle-Controlled Study
Assessing Guidelines for Use in Family Practice
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
With more than 1000 new guidelines produced annually over the past decade, it is impossible for the practicing family physician to determine which ones should be adapted into their clinical practice. The Ontario Ministry of Health and Long-Term Care and the Ontario Medical Association formed the Guideline Advisory Committee (GAC) in 1997 to assess and disseminate guidelines that would improve the quality and utilization of health care services in the province. Over the past 3 years the GAC has developed a strategy to identify important topics, to rank guidelines published on these topics based on the quality of their development, and to reformat guidelines as necessary to make them user-friendly for implementation in clinical practice. The GAC is currently assessing a number of strategies to enhance the dissemination of selected guidelines to improve the quality of care delivered in the province.
Key points for clinicians
A method of selecting, reviewing, and endorsing clinical practice guidelines has been established in the province of Ontario, Canada. Recommended guideline summaries are posted on a Web site with links to full text for easy access by practicing physicians (www.gacguidelines.ca).
Strategies for the successful implementation and impact evaluation of recommended guidelines are currently in development.
Clinical practice guidelines are statements that are systematically developed to assist physisican and patient decisions about appropriate health care for specific clinical circumstances.1 Published guidelines have become widely available through Internet technology; it has been estimated that more than 2500 exist. Most are produced by specific interest groups (eg, national societies and pharmaceutical companies), disseminated by publication in a medical journal or traditional mail, and seldom demonstrate any effect on clinical practice.2 Such a large volume of guidelines creates confusion for clinicians who often do not follow any of them because of the time required to assess their quality.3
With this dilemma in mind, the GAC was formed with members representing the Ontario Medical Association (OMA), the Ministry of Health and Long-Term Care (MOHLTC) in the province of Ontario, and one ex-officio member of the Institute for Clinical Evaluative Sciences (ICES). The GAC determined its first priority was to identify the best-quality guidelines available for clinicians on selected topics and to then promote their dissemination across the province. The purpose of our paper is to describe the methods that have been developed over the last 3 years to identify high-quality guidelines and some of the strategies being proposed for their dissemination, implementation, and evaluation. We also identify the best-quality guidelines for 10 common conditions.
Methods to assess the development of clinical practice guidelines
Topic Selection
Using a number of parameters, the GAC initially produced a grid as an assessment tool to identify priority areas for guideline review. Table 1 shows the basic grid incorporating provincial utilization and cost data, outcomes research, feedback from clinicians or health care organizations, and a previously published list of common and important problems in family practice.4 Feedback from the OMA sections indicated considerable confusion resulting from conflicting advice in specific areas as to appropriate practice (eg, screening for osteoporosis and diabetes). Utilization data from the MOHLTC demonstrated that the use of numerous procedures had rapidly increased over previous years; for example, diagnostic ultrasound utilization increased 65% in 1998. Practicing physicians also identified areas where there was a need for guidelines to be developed because of a lack of evidence or unknown best practice. The committee took all these factors into account when producing a list of priority topics for guideline assessment Table 2.
Guideline Assessment and Recommendation
Once a topic was chosen for assessment, a literature search was conducted by University of Toronto librarians to find all guidelines published in English over the past 10 years on that specific topic. The search strategy included databases such as MEDLINE and HealthStar, and guideline Web sites such as the National Guideline Clearinghouse and the Canadian Medical Association’s Clinical Practice Guideline Infobase. Copies of all guidelines identified in the search were then obtained. A survey of associations and interest groups in Ontario was also made to determine whether there were any unpublished guidelines that we had not identified in this process.
Initially, members of the committee carried out a literature search to determine if there were any publications about scoring the quality of the process used to produce the guidelines. Our search found some processes, but none that directly suited our needs. As a result, the GAC embarked on the development of a guideline-scoring instrument. After a year of work we realized that it would likely take 2 to 3 more years to adequately validate the instrument, and thus a decision was made to adopt the Appraisal Instrument for Clinical Guidelines5 (available at: www.sghms.ac.uk/phs/hceu/form.htm) to help determine quality guidelines in each clinical area, supplemented by the tool developed by the committee. The Appraisal Instrument consists of 37 items addressing 3 dimensions Table 3. The classification system the committee is using to choose top-scoring guidelines after appraisal is as follows. An excellent guideline is one in which the majority of the dimensions (rigor of development, context and content, application) are well addressed by the guideline producers with minimal omission. The evidence is linked to the major recommendations, and the development process is robust. These types of guidelines are highly recommended.
A very good guideline is one in which many of the dimensions are addressed, and some of the recommendations are linked to evidence levels. Objectives and rationale for development are often clearly defined but may be lacking in other areas, such as application (eg, outcome measures, targets, risks, and benefits). These are generally well produced and useful for practicing clinicians and are recommended.
In a fair guideline, some of the dimensions are addressed, but there are some major omissions, often in terms of levels of evidence, literature search strategy, clarity, risks, and benefits. Often these documents are local adaptations of other guidelines. Information can sometimes be used as a general reference if user-friendly materials are incorporated but are generally not very useful as guidelines. These guidelines are recommended under special circumstances.
A poor guideline is one which most of the dimensions are not well addressed, if at all. Often, it is unclear who produced these documents, and there is no description of the individuals involved. Levels of evidence and literature search strategy are rarely included, and there is no description of the methods used to formulate the recommendations. These guidelines are of little use to practicing clinicians and are not recommended.
Recognizing that recommending guidelines based on the quality of the process by which they were produced and the evidence used in their development would be controversial, we felt it was extremely important to develop a rigorous and objective scoring methodology. Fellows from the Department of Family and Community Medicine at the University of Toronto and community-based family physician volunteers from the OMA were brought together in 5 workshops. Each workshop included approximately 20 participants and consisted of a half-day session on the objectives of the GAC, a detailed review of the Appraisal Instrument, and a hands-on session where all participants evaluated the same guideline. Scores were then openly declared, and a discussion held on discrepancies identified in the assessments in an attempt to standardize the process. At the end of the session, interested participants were provided with an additional 5 guidelines to assess in the subsequent 2 weeks. The resulting appraisals were evaluated for consistency and inter-rater reliability (results indicate that using the instrument as an initial filter to determine the best-quality guidelines in each clinical area is a valid approach). To date, 45 assessors have been trained and are reviewing guidelines on an ongoing basis. Each guideline is evaluated a total of 3 times by independent assessors. Those guidelines that have been selected for recommendation in a particular clinical area are then reviewed for clinical relevance and applicability to the Ontario context. More than 250 published guidelines have been identified and distributed to physician assessors in the clinical areas shown in Table 2.
Reformatting
The GAC is in the process of determining the user-friendliness of recommended guidelines. Not infrequently, guidelines that are found to be the most evidence-based and objective are hundreds of pages in length and would be extremely burdensome for the average family physician to use. We anticipate that guidelines found to be of excellent quality but not convenient for use in clinical practice will need to be reformatted into user-friendly summaries. Volunteer physicians from the community will be asked to evaluate such summaries and provide feedback for improvement.
Dissemination
Once the best-quality guideline(s) on a topic are identified and reformatted as necessary; we intend to mount them on the GAC Web site (www.gacguidelines.ca) for use by the profession and the general public. Table 4 shows the results of the guideline selection process for the first 10 clinical areas. The process for choosing guidelines is transparent so that practicing physicians can determine for themselves the usefulness and applicability of the recommendations. Only the most rigorously developed guidelines will be posted on the Web site in the form of structured summaries, although interested clinicians can obtain the outcome of nonrecommended guideline appraisals on request.
Continuing medical education literature on dissemination strategies indicates that a single method, such as posting information on a Web site or mailing guidelines to clinicians has a minimal effect on changing medical practice.6 The GAC is currently considering a number of options to enhance the dissemination of the best available guidelines. Since Ontario health data on diagnostic testing, hospitalization records, and office visits are collected provincially, it could be possible to measure clinical outcomes following the dissemination of evidence-based guidelines. We are currently working with provincial groups to disseminate guidelines through medical school continuing medical education (CME) division programs, peer presenter programs, small group CME programs, outreach facilitation programs, and a peer assessment program run by the provincial licensing body.
Conclusions
Over the past 3 years the GAC has developed a method to identify relevant guideline topics and assess the quality of the process by which the guidelines were developed. Clinically excellent guidelines may require some reformatting to make them user-friendly for implementation in clinical practice. The initial product of this process has been posted on the GAC Web site for access by the profession. The GAC is currently assessing and developing a number of strategies to more effectively disseminate guideline information and measure the impact of these interventions on the quality of medical care delivered to the people of Ontario. The GAC will report on the impact of these interventions to facilitate the exchange of successful implementation strategies across jurisdictions.
Acknowledgments
We thank the Physician Services Committee and the members of the Ontario Medical Association and the Ministry of Health and Long-Term Care for their support of this initiative. Conflict of Interest Statement: Dr Rosser and Dr Davis receive stipends for participation on the Guideline Advisory Committee. Ms Gilbart is employed full-time by the Committee through a grant from the Ministry of Health and Long-Term Care. Dr Rosser was a member of the CANMAT Depression Working Group which developed the top-scoring guideline in depression as chosen through the GAC assessment process.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
1. Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine. Field MJ, Lohr KN, eds. Clinical practice guidelines: directions for a new program. Washington, DC: National Academy Press; 1990.
2. Worrall G, Chaulk P, Freake D. The effects of clinical practice guidelines on patient outcomes in primary care: a systematic review. CMAJ 1997;156:1705-12.
3. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.
4. Rosser WW, Beaulieu M. Institutional objectives for medical education that relates to the community. CMAJ. 1984;130:683-89.
5. Cluzeau F, Littlejohns P, Grimshaw J, Feder G, Moran S. Development and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care 1999;11:21-28.
6. Davis DA, Taylor-Vaisey AL. Translating guidelines into practice: a systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997;157:408-16.