Article Type
Changed
Mon, 04/22/2024 - 15:31

 

TOPLINE: 

A study comparing the clinical reasoning of an artificial intelligence (AI) model with that of physicians found the AI outperformed residents and attending physicians in simulated cases. The AI had more instances of incorrect reasoning than the doctors did but scored better overall.

METHODOLOGY:

  • The study involved 39 physicians from two academic medical centers in Boston and the generative AI model GPT-4.
  • Participants were presented with 20 simulated clinical cases involving common problems such as pharyngitisheadache, abdominal pain, cough, and chest pain. Each case included sections describing the triage presentation, review of systems, physical examination, and diagnostic testing.
  • The primary outcome was the Revised-IDEA (R-IDEA) score, a 10-point scale evaluating clinical reasoning documentation across four domains: Interpretive summary, differential diagnosis, explanation of the lead diagnosis, and alternative diagnoses.

TAKEAWAY: 

  • AI achieved a median R-IDEA score of 10, higher than attending physicians (median score, 9) and residents (8).
  • The chatbot had a significantly higher estimated probability of achieving a high R-IDEA score of 8-10 (0.99) compared with attendings (0.76) and residents (0.56).
  • AI provided more responses that contained instances of incorrect clinical reasoning (13.8%) than residents (2.8%) and attending physicians (12.5%). It performed similarly to physicians in diagnostic accuracy and inclusion of cannot-miss diagnoses.

IN PRACTICE:

“Future research should assess clinical reasoning of the LLM-physician interaction, as LLMs will more likely augment, not replace, the human reasoning process,” the authors of the study wrote. 

SOURCE:

Adam Rodman, MD, MPH, with Beth Israel Deaconess Medical Center, Boston, was the corresponding author on the paper. The research was published online in JAMA Internal Medicine

LIMITATIONS: 

Simulated clinical cases may not replicate performance in real-world scenarios. Further training could enhance the performance of the AI, so the study may underestimate its capabilities, the researchers noted. 

DISCLOSURES:

The study was supported by the Harvard Clinical and Translational Science Center and Harvard University. Authors disclosed financial ties to publishing companies and Solera Health. Dr. Rodman received funding from the Gordon and Betty Moore Foundation.

This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication. A version of this article appeared on Medscape.com.

Publications
Topics
Sections

 

TOPLINE: 

A study comparing the clinical reasoning of an artificial intelligence (AI) model with that of physicians found the AI outperformed residents and attending physicians in simulated cases. The AI had more instances of incorrect reasoning than the doctors did but scored better overall.

METHODOLOGY:

  • The study involved 39 physicians from two academic medical centers in Boston and the generative AI model GPT-4.
  • Participants were presented with 20 simulated clinical cases involving common problems such as pharyngitisheadache, abdominal pain, cough, and chest pain. Each case included sections describing the triage presentation, review of systems, physical examination, and diagnostic testing.
  • The primary outcome was the Revised-IDEA (R-IDEA) score, a 10-point scale evaluating clinical reasoning documentation across four domains: Interpretive summary, differential diagnosis, explanation of the lead diagnosis, and alternative diagnoses.

TAKEAWAY: 

  • AI achieved a median R-IDEA score of 10, higher than attending physicians (median score, 9) and residents (8).
  • The chatbot had a significantly higher estimated probability of achieving a high R-IDEA score of 8-10 (0.99) compared with attendings (0.76) and residents (0.56).
  • AI provided more responses that contained instances of incorrect clinical reasoning (13.8%) than residents (2.8%) and attending physicians (12.5%). It performed similarly to physicians in diagnostic accuracy and inclusion of cannot-miss diagnoses.

IN PRACTICE:

“Future research should assess clinical reasoning of the LLM-physician interaction, as LLMs will more likely augment, not replace, the human reasoning process,” the authors of the study wrote. 

SOURCE:

Adam Rodman, MD, MPH, with Beth Israel Deaconess Medical Center, Boston, was the corresponding author on the paper. The research was published online in JAMA Internal Medicine

LIMITATIONS: 

Simulated clinical cases may not replicate performance in real-world scenarios. Further training could enhance the performance of the AI, so the study may underestimate its capabilities, the researchers noted. 

DISCLOSURES:

The study was supported by the Harvard Clinical and Translational Science Center and Harvard University. Authors disclosed financial ties to publishing companies and Solera Health. Dr. Rodman received funding from the Gordon and Betty Moore Foundation.

This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication. A version of this article appeared on Medscape.com.

 

TOPLINE: 

A study comparing the clinical reasoning of an artificial intelligence (AI) model with that of physicians found the AI outperformed residents and attending physicians in simulated cases. The AI had more instances of incorrect reasoning than the doctors did but scored better overall.

METHODOLOGY:

  • The study involved 39 physicians from two academic medical centers in Boston and the generative AI model GPT-4.
  • Participants were presented with 20 simulated clinical cases involving common problems such as pharyngitisheadache, abdominal pain, cough, and chest pain. Each case included sections describing the triage presentation, review of systems, physical examination, and diagnostic testing.
  • The primary outcome was the Revised-IDEA (R-IDEA) score, a 10-point scale evaluating clinical reasoning documentation across four domains: Interpretive summary, differential diagnosis, explanation of the lead diagnosis, and alternative diagnoses.

TAKEAWAY: 

  • AI achieved a median R-IDEA score of 10, higher than attending physicians (median score, 9) and residents (8).
  • The chatbot had a significantly higher estimated probability of achieving a high R-IDEA score of 8-10 (0.99) compared with attendings (0.76) and residents (0.56).
  • AI provided more responses that contained instances of incorrect clinical reasoning (13.8%) than residents (2.8%) and attending physicians (12.5%). It performed similarly to physicians in diagnostic accuracy and inclusion of cannot-miss diagnoses.

IN PRACTICE:

“Future research should assess clinical reasoning of the LLM-physician interaction, as LLMs will more likely augment, not replace, the human reasoning process,” the authors of the study wrote. 

SOURCE:

Adam Rodman, MD, MPH, with Beth Israel Deaconess Medical Center, Boston, was the corresponding author on the paper. The research was published online in JAMA Internal Medicine

LIMITATIONS: 

Simulated clinical cases may not replicate performance in real-world scenarios. Further training could enhance the performance of the AI, so the study may underestimate its capabilities, the researchers noted. 

DISCLOSURES:

The study was supported by the Harvard Clinical and Translational Science Center and Harvard University. Authors disclosed financial ties to publishing companies and Solera Health. Dr. Rodman received funding from the Gordon and Betty Moore Foundation.

This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication. A version of this article appeared on Medscape.com.

Publications
Publications
Topics
Article Type
Sections
Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article
Display survey writer
Reuters content
Disable Inline Native ads
WebMD Article