An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting

IF 1 Q3 Mathematics
Aditya Chakraborty, Chris P. Tsokos
{"title":"An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting","authors":"Aditya Chakraborty, Chris P. Tsokos","doi":"10.1007/s44199-023-00063-7","DOIUrl":null,"url":null,"abstract":"Abstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient data-driven analytical predictive model based on the associated risk factors and identifying the most contributing factors influencing the survival times of patients diagnosed with pancreatic cancer using the XGBoost (eXtreme Gradient Boosting) algorithm. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical model by minimizing the root mean square error (RMSE). The optimum hyperparameters of the final analytical model were selected by comparing the values with 243 competing models. To check the validity of the model, we compared the model’s performance with ten deep neural network models, grown sequentially with different activation functions and optimizers. We also constructed an ensemble model using Gradient Boosting Machine (GBM). The proposed XGBoost model outperformed all competing models we considered with regard to root mean square error (RMSE). After developing the model, the individual risk factors were ranked according to their individual contribution to the response predictions, which is extremely important for pancreatic research organizations to spend their resources on the risk factors causing/influencing the particular type of cancer. The three most influencing risk factors affecting the survival of pancreatic cancer patients were found to be the age of the patient, current BMI, and cigarette smoking years with contributing percentages of 35.5%, 24.3%, and 14.93%, respectively. The predictive model is approximately 96.42% accurate in predicting the survival times of the patients diagnosed with pancreatic cancer and performs excellently on test data. The analytical methodology of developing the model can be utilized for prediction purposes. It can be utilized to predict the time to death related to a specific type of cancer, given a set of numeric, and non-numeric features.","PeriodicalId":45080,"journal":{"name":"Journal of Statistical Theory and Applications","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44199-023-00063-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient data-driven analytical predictive model based on the associated risk factors and identifying the most contributing factors influencing the survival times of patients diagnosed with pancreatic cancer using the XGBoost (eXtreme Gradient Boosting) algorithm. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical model by minimizing the root mean square error (RMSE). The optimum hyperparameters of the final analytical model were selected by comparing the values with 243 competing models. To check the validity of the model, we compared the model’s performance with ten deep neural network models, grown sequentially with different activation functions and optimizers. We also constructed an ensemble model using Gradient Boosting Machine (GBM). The proposed XGBoost model outperformed all competing models we considered with regard to root mean square error (RMSE). After developing the model, the individual risk factors were ranked according to their individual contribution to the response predictions, which is extremely important for pancreatic research organizations to spend their resources on the risk factors causing/influencing the particular type of cancer. The three most influencing risk factors affecting the survival of pancreatic cancer patients were found to be the age of the patient, current BMI, and cigarette smoking years with contributing percentages of 35.5%, 24.3%, and 14.93%, respectively. The predictive model is approximately 96.42% accurate in predicting the survival times of the patients diagnosed with pancreatic cancer and performs excellently on test data. The analytical methodology of developing the model can be utilized for prediction purposes. It can be utilized to predict the time to death related to a specific type of cancer, given a set of numeric, and non-numeric features.
基于极端梯度增强的胰腺癌患者ai预测模型
胰腺癌是危害人类健康的致癌性疾病之一。大多数患者通常在III期或IV期被发现,一旦在晚期被发现,生存的机会非常低。本研究的重点是基于相关危险因素构建高效的数据驱动分析预测模型,并利用XGBoost (eXtreme Gradient Boosting)算法识别影响胰腺癌患者生存时间的最大因素。采用网格搜索机制,通过最小化均方根误差(RMSE)来计算解析模型超参数的最优值。通过与243个竞争模型的数值比较,选择了最终解析模型的最优超参数。为了验证该模型的有效性,我们将该模型的性能与10个深度神经网络模型进行了比较,这些模型采用不同的激活函数和优化器顺序生长。我们还使用梯度增强机(Gradient Boosting Machine, GBM)构建了一个集成模型。提出的XGBoost模型在均方根误差(RMSE)方面优于我们考虑的所有竞争模型。在建立模型后,根据个体对反应预测的贡献对个体危险因素进行排名,这对于胰腺研究机构将资源用于研究导致/影响特定类型癌症的危险因素至关重要。影响胰腺癌患者生存的三个最主要危险因素是患者年龄、当前BMI和吸烟年限,贡献率分别为35.5%、24.3%和14.93%。该预测模型预测胰腺癌患者生存时间的准确率约为96.42%,在测试数据上表现出色。开发模型的分析方法可用于预测目的。给定一组数字和非数字特征,它可以用来预测与特定类型癌症相关的死亡时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
13
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信