X-ray vision: Using AI to maximize the value of radiographic images

Article Type

Changed

Tue, 02/16/2021 - 15:18

Author(s)

Artificial intelligence (AI) is expected to one day affect the entire continuum of cancer care – from screening and risk prediction to diagnosis, risk stratification, treatment selection, and follow-up, according to an expert in the field.

Hugo J.W.L. Aerts, PhD, director of the AI in Medicine Program at Brigham and Women’s Hospital in Boston, described studies using AI for some of these purposes during a presentation at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-06).

In one study, Dr. Aerts and colleagues set out to determine whether a convolutional neural network (CNN) could extract prognostic information from chest radiographs. The researchers tested this theory using patients from two trials – the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST).

The team developed a CNN, called CXR-risk, and tested whether it could predict the longevity and prognosis of patients in the PLCO (n = 52,320) and NLST (n = 5,493) trials over a 12-year time period, based only on chest radiographs. No clinical information, demographics, radiographic interpretations, duration of follow-up, or censoring were provided to the deep-learning system.

CXR-risk output was stratified into five categories of radiographic risk scores for probability of death, from 0 (very low likelihood of mortality) to 1 (very high likelihood of mortality).

The investigators found a graded association between radiographic risk score and mortality. The very-high-risk group had mortality rates of 53.0% (PLCO) and 33.9% (NLST). In both trials, this was significantly higher than for the very-low-risk group. The unadjusted hazard ratio was 18.3 in the PCLO data set and 15.2 in the NLST data set (P < .001 for both).

This association was maintained after adjustment for radiologists’ findings (e.g., a lung nodule) and risk factors such as age, gender, and comorbid illnesses like diabetes. The adjusted HR was 4.8 in the PCLO data set and 7.0 in the NLST data set (P < .001 for both).

In both data sets, individuals in the very-high-risk group were significantly more likely to die of lung cancer. The aHR was 11.1 in the PCLO data set and 8.4 in the NSLT data set (P < .001 for both).

This might be expected for people who were interested in being screened for lung cancer. However, patients in the very-high-risk group were also more likely to die of cardiovascular illness (aHR, 3.6 for PLCO and 47.8 for NSLT; P < .001 for both) and respiratory illness (aHR, 27.5 for PLCO and 31.9 for NLST; P ≤ .001 for both).

With this information, a clinician could initiate additional testing and/or utilize more aggressive surveillance measures. If an oncologist considered therapy for a patient with newly diagnosed cancer, treatment choices and stratification for adverse events would be more intelligently planned.

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCO_M2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.

Meeting/Event

AACR: AI, Diagnosis, and Imaging 2021

Publications

MDedge Hematology and Oncology

Topics

Sections

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCO_M2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.

Artificial intelligence (AI) is expected to one day affect the entire continuum of cancer care – from screening and risk prediction to diagnosis, risk stratification, treatment selection, and follow-up, according to an expert in the field.

Hugo J.W.L. Aerts, PhD, director of the AI in Medicine Program at Brigham and Women’s Hospital in Boston, described studies using AI for some of these purposes during a presentation at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-06).

In one study, Dr. Aerts and colleagues set out to determine whether a convolutional neural network (CNN) could extract prognostic information from chest radiographs. The researchers tested this theory using patients from two trials – the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST).

The team developed a CNN, called CXR-risk, and tested whether it could predict the longevity and prognosis of patients in the PLCO (n = 52,320) and NLST (n = 5,493) trials over a 12-year time period, based only on chest radiographs. No clinical information, demographics, radiographic interpretations, duration of follow-up, or censoring were provided to the deep-learning system.

CXR-risk output was stratified into five categories of radiographic risk scores for probability of death, from 0 (very low likelihood of mortality) to 1 (very high likelihood of mortality).

The investigators found a graded association between radiographic risk score and mortality. The very-high-risk group had mortality rates of 53.0% (PLCO) and 33.9% (NLST). In both trials, this was significantly higher than for the very-low-risk group. The unadjusted hazard ratio was 18.3 in the PCLO data set and 15.2 in the NLST data set (P < .001 for both).

This association was maintained after adjustment for radiologists’ findings (e.g., a lung nodule) and risk factors such as age, gender, and comorbid illnesses like diabetes. The adjusted HR was 4.8 in the PCLO data set and 7.0 in the NLST data set (P < .001 for both).

In both data sets, individuals in the very-high-risk group were significantly more likely to die of lung cancer. The aHR was 11.1 in the PCLO data set and 8.4 in the NSLT data set (P < .001 for both).

This might be expected for people who were interested in being screened for lung cancer. However, patients in the very-high-risk group were also more likely to die of cardiovascular illness (aHR, 3.6 for PLCO and 47.8 for NSLT; P < .001 for both) and respiratory illness (aHR, 27.5 for PLCO and 31.9 for NLST; P ≤ .001 for both).

With this information, a clinician could initiate additional testing and/or utilize more aggressive surveillance measures. If an oncologist considered therapy for a patient with newly diagnosed cancer, treatment choices and stratification for adverse events would be more intelligently planned.

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCO_M2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.