利用机器学习模型预测乳腺癌新辅助化疗的病理完全反应

IF 3.3 Q2 ONCOLOGY
JCO Clinical Cancer Informatics Pub Date : 2024-11-01 Epub Date: 2024-11-22 DOI:10.1200/CCI.24.00071
Rayhan Erlangga Rahadian, Hong Qi Tan, Bryan Shihan Ho, Arjunan Kumaran, Andre Villanueva, Joy Sng, Ryan Shea Ying Cong Tan, Tira Jing Ying Tan, Veronique Kiak Mien Tan, Benita Kiat Tee Tan, Geok Hoon Lim, Yiyu Cai, Wen Long Nei, Fuh Yong Wong
{"title":"利用机器学习模型预测乳腺癌新辅助化疗的病理完全反应","authors":"Rayhan Erlangga Rahadian, Hong Qi Tan, Bryan Shihan Ho, Arjunan Kumaran, Andre Villanueva, Joy Sng, Ryan Shea Ying Cong Tan, Tira Jing Ying Tan, Veronique Kiak Mien Tan, Benita Kiat Tee Tan, Geok Hoon Lim, Yiyu Cai, Wen Long Nei, Fuh Yong Wong","doi":"10.1200/CCI.24.00071","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Neoadjuvant chemotherapy (NAC) is increasingly used in breast cancer. Predictive modeling is useful in predicting pathologic complete response (pCR) to NAC. We test machine learning (ML) models to predict pCR in breast cancer and explore methods of handling missing data.</p><p><strong>Methods: </strong>Four hundred and ninety-nine patients with breast cancer treated with NAC in two centers in Singapore (National Cancer Centre Singapore [NCCS] and KK Hospital) between January 2014 and December 2017 were included. Eleven clinical features were used to train five different ML models. Listwise deletion and imputation were evaluated on handling missing data. Model performance was evaluated by AUC and calibration (Brier score). Feature importance from the best performing model in the external testing data set was calculated using Shapley additive explanations.</p><p><strong>Results: </strong>Seventy-two (24.6%), 18 (24.7%), and 31 (24.8%) patients attained pCR in NCCS training, NCCS testing, and KK Women's and Children's Hospital (KKH) testing data sets, respectively. The random forest (RF) base and imputed models have the highest AUCs in the KKH cohort of 0.794 (95% CI, 0.709 to 0.873) and 0.795 (95% CI, 0.706 to 0.871), respectively, and were the best calibrated with the lowest Brier score. No statistically significant difference was noted between AUCs of the base and imputed models in all data sets. The imputed model had a larger positive predictive value (PPV; 98.2% <i>v</i> 95.1%) and negative predictive value (NPV; 96.7% <i>v</i> 90.0%) than the base model in the KKH data set. Estrogen receptor intensity, human epidermal growth factor 2 intensity, and age at diagnosis were the three most important predictors.</p><p><strong>Conclusion: </strong>ML, particularly RF, demonstrates reasonable accuracy in pCR prediction after NAC. Imputing missing fields in the data can improve the PPV and NPV of the pCR prediction model.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400071"},"PeriodicalIF":3.3000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Machine Learning Models to Predict Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer.\",\"authors\":\"Rayhan Erlangga Rahadian, Hong Qi Tan, Bryan Shihan Ho, Arjunan Kumaran, Andre Villanueva, Joy Sng, Ryan Shea Ying Cong Tan, Tira Jing Ying Tan, Veronique Kiak Mien Tan, Benita Kiat Tee Tan, Geok Hoon Lim, Yiyu Cai, Wen Long Nei, Fuh Yong Wong\",\"doi\":\"10.1200/CCI.24.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Neoadjuvant chemotherapy (NAC) is increasingly used in breast cancer. Predictive modeling is useful in predicting pathologic complete response (pCR) to NAC. We test machine learning (ML) models to predict pCR in breast cancer and explore methods of handling missing data.</p><p><strong>Methods: </strong>Four hundred and ninety-nine patients with breast cancer treated with NAC in two centers in Singapore (National Cancer Centre Singapore [NCCS] and KK Hospital) between January 2014 and December 2017 were included. Eleven clinical features were used to train five different ML models. Listwise deletion and imputation were evaluated on handling missing data. Model performance was evaluated by AUC and calibration (Brier score). Feature importance from the best performing model in the external testing data set was calculated using Shapley additive explanations.</p><p><strong>Results: </strong>Seventy-two (24.6%), 18 (24.7%), and 31 (24.8%) patients attained pCR in NCCS training, NCCS testing, and KK Women's and Children's Hospital (KKH) testing data sets, respectively. The random forest (RF) base and imputed models have the highest AUCs in the KKH cohort of 0.794 (95% CI, 0.709 to 0.873) and 0.795 (95% CI, 0.706 to 0.871), respectively, and were the best calibrated with the lowest Brier score. No statistically significant difference was noted between AUCs of the base and imputed models in all data sets. The imputed model had a larger positive predictive value (PPV; 98.2% <i>v</i> 95.1%) and negative predictive value (NPV; 96.7% <i>v</i> 90.0%) than the base model in the KKH data set. Estrogen receptor intensity, human epidermal growth factor 2 intensity, and age at diagnosis were the three most important predictors.</p><p><strong>Conclusion: </strong>ML, particularly RF, demonstrates reasonable accuracy in pCR prediction after NAC. Imputing missing fields in the data can improve the PPV and NPV of the pCR prediction model.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"8 \",\"pages\":\"e2400071\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.24.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/22 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:新辅助化疗(NAC)在乳腺癌中的应用越来越广泛。预测模型有助于预测新辅助化疗的病理完全反应(pCR)。我们测试了预测乳腺癌病理完全反应的机器学习(ML)模型,并探索了处理缺失数据的方法:纳入了2014年1月至2017年12月期间在新加坡两个中心(新加坡国立癌症中心[NCCS]和KK医院)接受NAC治疗的49名乳腺癌患者。11 个临床特征被用于训练 5 个不同的 ML 模型。对处理缺失数据的列表删除和估算进行了评估。模型性能通过 AUC 和校准(Brier 评分)进行评估。外部测试数据集中表现最好的模型的特征重要性使用 Shapley 加性解释进行计算:在NCCS训练数据集、NCCS测试数据集和KK妇女儿童医院(KKH)测试数据集中,分别有72例(24.6%)、18例(24.7%)和31例(24.8%)患者获得了pCR。在KKH队列中,随机森林(RF)基础模型和估算模型的AUC最高,分别为0.794(95% CI,0.709至0.873)和0.795(95% CI,0.706至0.871),并且是校准效果最好、Brier评分最低的模型。在所有数据集中,基础模型和估算模型的 AUC 没有明显的统计学差异。在 KKH 数据集中,估算模型的阳性预测值(PPV;98.2% 对 95.1%)和阴性预测值(NPV;96.7% 对 90.0%)均高于基础模型。雌激素受体强度、人表皮生长因子 2 强度和诊断时的年龄是三个最重要的预测因素:结论:ML,尤其是 RF,在 NAC 后 pCR 预测中表现出合理的准确性。结论:ML,尤其是 RF 在 NAC 后的 pCR 预测中表现出了合理的准确性,对数据中的缺失字段进行补充可以提高 pCR 预测模型的 PPV 和 NPV。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Machine Learning Models to Predict Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer.

Purpose: Neoadjuvant chemotherapy (NAC) is increasingly used in breast cancer. Predictive modeling is useful in predicting pathologic complete response (pCR) to NAC. We test machine learning (ML) models to predict pCR in breast cancer and explore methods of handling missing data.

Methods: Four hundred and ninety-nine patients with breast cancer treated with NAC in two centers in Singapore (National Cancer Centre Singapore [NCCS] and KK Hospital) between January 2014 and December 2017 were included. Eleven clinical features were used to train five different ML models. Listwise deletion and imputation were evaluated on handling missing data. Model performance was evaluated by AUC and calibration (Brier score). Feature importance from the best performing model in the external testing data set was calculated using Shapley additive explanations.

Results: Seventy-two (24.6%), 18 (24.7%), and 31 (24.8%) patients attained pCR in NCCS training, NCCS testing, and KK Women's and Children's Hospital (KKH) testing data sets, respectively. The random forest (RF) base and imputed models have the highest AUCs in the KKH cohort of 0.794 (95% CI, 0.709 to 0.873) and 0.795 (95% CI, 0.706 to 0.871), respectively, and were the best calibrated with the lowest Brier score. No statistically significant difference was noted between AUCs of the base and imputed models in all data sets. The imputed model had a larger positive predictive value (PPV; 98.2% v 95.1%) and negative predictive value (NPV; 96.7% v 90.0%) than the base model in the KKH data set. Estrogen receptor intensity, human epidermal growth factor 2 intensity, and age at diagnosis were the three most important predictors.

Conclusion: ML, particularly RF, demonstrates reasonable accuracy in pCR prediction after NAC. Imputing missing fields in the data can improve the PPV and NPV of the pCR prediction model.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信