使用机器学习算法开发预测哈萨克斯坦共和国肺癌患者生存的模型

V. Makarov, D. Kaidarova, S. Yessentayeva, J. Kalmatayeva, М. Мansurova, N. Каdyrbek, R. Kadyrbayeva, S. Оlzhayev, I. Novikov
{"title":"使用机器学习算法开发预测哈萨克斯坦共和国肺癌患者生存的模型","authors":"V. Makarov, D. Kaidarova, S. Yessentayeva, J. Kalmatayeva, М. Мansurova, N. Каdyrbek, R. Kadyrbayeva, S. Оlzhayev, I. Novikov","doi":"10.52532/2663-4864-2022-3-65-4-11","DOIUrl":null,"url":null,"abstract":"Relevance: The 5-year overall survival rate(s) in NSCLC p-stage IA is 73%, and the recurrence rate in radically treated patients is \nalmost 10%. \nThe study aimed to evaluate the prognostic significance of several clinical and morphological factors and apply machine learning \nalgorithms to predict the results of the overall survival of patients with lung cancer. \nMethods: The forms 030-6/y C34 – lung cancer (n=19,379) from the EROB database for 2014-2018 were analyzed, and the impact of \nrisk factors on overall survival was assessed using the Kaplan-Meier method. Accordingly, the training data set for constructing forecasting \nmodels included 19,379 observations and 15 factors. The machine learning algorithms such as Random Forest Classifier, Gradient \nBoosting Classifier, Logistic Regression Model, Decision Tree Classifier, and K Nearest Neighbors (KNN) Classifier were implemented \nin the Python programming language. The results were evaluated by constructing an error matrix and calculating classification metrics: \nthe proportion of correctly classified objects (accuracy) during training and validation (validation), accuracy (precision), completeness \n(recall), Kappa-Cohen. \nResults: In our study, 19,379 patients were analyzed, including 15,494 men (79.95%) and 3,885 women (20.04%). At the time of the \nstudy, 6,171 men (39.8%) and 1,962 women (49.5%) were alive. Median survival was 8.3 months (SE – 0.154 months, 95% CI – 7.96-8.56) \nin men and 15.43 months (SE – 1.0 months, 95% CI – 13.497-17.363) in women. At diagnosis, 1,037 patients (5.35%) had stage I disease, \nand 4,145 (21.38%) had stage II. Most patients (61.4%) had advanced stage NSCLC: 9,189 people (47.4%) were diagnosed with stage III, \nand 4,655 (24%) – with stage IV. The reliability of differences in median survival (χ2=3991.6, p=0.00) indicated the prognostic significance \nof the tumor process stage and its influence on the patient’s survival. Also, the revealed significant difference in the median survival of \npatients with various morphological forms of lung cancer suggests the prognostic significance of the morphological factor (the difference \nbetween those indicators was statistically significant, χ2=623.4 p=0.000). \nConclusion: Machine learning models can predict the risk of fatal outcomes for patients after surgical treatment and registration in \nthe EROB database. The creation of patient-oriented systems to support medical decision-making makes it possible to choose the optimal \nstrategies for adjuvant therapy, dispensary observation, and frequency of diagnostic studies.","PeriodicalId":19480,"journal":{"name":"Oncologia i radiologia Kazakhstana","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"USING MACHINE LEARNING ALGORITHMS TO DEVELOP A MODEL FOR PREDICTING THE SURVIVAL OF LUNG CANCER PATIENTS IN THE REPUBLIC OF KAZAKHSTAN\",\"authors\":\"V. Makarov, D. Kaidarova, S. Yessentayeva, J. Kalmatayeva, М. Мansurova, N. Каdyrbek, R. Kadyrbayeva, S. Оlzhayev, I. Novikov\",\"doi\":\"10.52532/2663-4864-2022-3-65-4-11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relevance: The 5-year overall survival rate(s) in NSCLC p-stage IA is 73%, and the recurrence rate in radically treated patients is \\nalmost 10%. \\nThe study aimed to evaluate the prognostic significance of several clinical and morphological factors and apply machine learning \\nalgorithms to predict the results of the overall survival of patients with lung cancer. \\nMethods: The forms 030-6/y C34 – lung cancer (n=19,379) from the EROB database for 2014-2018 were analyzed, and the impact of \\nrisk factors on overall survival was assessed using the Kaplan-Meier method. Accordingly, the training data set for constructing forecasting \\nmodels included 19,379 observations and 15 factors. The machine learning algorithms such as Random Forest Classifier, Gradient \\nBoosting Classifier, Logistic Regression Model, Decision Tree Classifier, and K Nearest Neighbors (KNN) Classifier were implemented \\nin the Python programming language. The results were evaluated by constructing an error matrix and calculating classification metrics: \\nthe proportion of correctly classified objects (accuracy) during training and validation (validation), accuracy (precision), completeness \\n(recall), Kappa-Cohen. \\nResults: In our study, 19,379 patients were analyzed, including 15,494 men (79.95%) and 3,885 women (20.04%). At the time of the \\nstudy, 6,171 men (39.8%) and 1,962 women (49.5%) were alive. Median survival was 8.3 months (SE – 0.154 months, 95% CI – 7.96-8.56) \\nin men and 15.43 months (SE – 1.0 months, 95% CI – 13.497-17.363) in women. At diagnosis, 1,037 patients (5.35%) had stage I disease, \\nand 4,145 (21.38%) had stage II. Most patients (61.4%) had advanced stage NSCLC: 9,189 people (47.4%) were diagnosed with stage III, \\nand 4,655 (24%) – with stage IV. The reliability of differences in median survival (χ2=3991.6, p=0.00) indicated the prognostic significance \\nof the tumor process stage and its influence on the patient’s survival. Also, the revealed significant difference in the median survival of \\npatients with various morphological forms of lung cancer suggests the prognostic significance of the morphological factor (the difference \\nbetween those indicators was statistically significant, χ2=623.4 p=0.000). \\nConclusion: Machine learning models can predict the risk of fatal outcomes for patients after surgical treatment and registration in \\nthe EROB database. The creation of patient-oriented systems to support medical decision-making makes it possible to choose the optimal \\nstrategies for adjuvant therapy, dispensary observation, and frequency of diagnostic studies.\",\"PeriodicalId\":19480,\"journal\":{\"name\":\"Oncologia i radiologia Kazakhstana\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Oncologia i radiologia Kazakhstana\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52532/2663-4864-2022-3-65-4-11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oncologia i radiologia Kazakhstana","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52532/2663-4864-2022-3-65-4-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

相关性:NSCLC p期IA的5年总生存率为73%,根治患者的复发率几乎为10%。本研究旨在评估几种临床和形态学因素的预后意义,并应用机器学习算法预测肺癌患者的总生存结果。方法:对EROB数据库2014-2018年030-6/y C34 -肺癌(n= 19379)进行分析,采用Kaplan-Meier法评估危险因素对总生存期的影响。因此,构建预测模型的训练数据集包括19,379个观测值和15个因子。随机森林分类器、梯度增强分类器、逻辑回归模型、决策树分类器、K近邻分类器等机器学习算法在Python编程语言中实现。通过构建误差矩阵并计算分类指标对结果进行评价:训练和验证过程中正确分类对象的比例(准确率)、准确性(精密度)、完备性(召回率)、Kappa-Cohen。结果:本研究共纳入19379例患者,其中男性15494例(79.95%),女性3885例(20.04%)。在研究期间,6171名男性(39.8%)和1962名女性(49.5%)还活着。男性的中位生存期为8.3个月(SE - 0.154个月,95% CI - 7.96-8.56),女性为15.43个月(SE - 1.0个月,95% CI - 13.497-17.363)。诊断时,1037例患者(5.35%)为I期疾病,4145例(21.38%)为II期疾病。大多数患者(61.4%)为晚期NSCLC: 9189人(47.4%)诊断为III期,4655人(24%)诊断为IV期。中位生存差异的信度(χ2=3991.6, p=0.00)表明肿瘤进展阶段及其对患者生存的影响具有预后意义。不同形态肺癌患者的中位生存期差异有统计学意义(χ2=623.4 p=0.000),说明形态因素对预后的影响有统计学意义。结论:机器学习模型可以预测手术治疗和EROB数据库登记后患者死亡结局的风险。创建以患者为导向的系统来支持医疗决策,使得选择辅助治疗、药房观察和诊断研究频率的最佳策略成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
USING MACHINE LEARNING ALGORITHMS TO DEVELOP A MODEL FOR PREDICTING THE SURVIVAL OF LUNG CANCER PATIENTS IN THE REPUBLIC OF KAZAKHSTAN
Relevance: The 5-year overall survival rate(s) in NSCLC p-stage IA is 73%, and the recurrence rate in radically treated patients is almost 10%. The study aimed to evaluate the prognostic significance of several clinical and morphological factors and apply machine learning algorithms to predict the results of the overall survival of patients with lung cancer. Methods: The forms 030-6/y C34 – lung cancer (n=19,379) from the EROB database for 2014-2018 were analyzed, and the impact of risk factors on overall survival was assessed using the Kaplan-Meier method. Accordingly, the training data set for constructing forecasting models included 19,379 observations and 15 factors. The machine learning algorithms such as Random Forest Classifier, Gradient Boosting Classifier, Logistic Regression Model, Decision Tree Classifier, and K Nearest Neighbors (KNN) Classifier were implemented in the Python programming language. The results were evaluated by constructing an error matrix and calculating classification metrics: the proportion of correctly classified objects (accuracy) during training and validation (validation), accuracy (precision), completeness (recall), Kappa-Cohen. Results: In our study, 19,379 patients were analyzed, including 15,494 men (79.95%) and 3,885 women (20.04%). At the time of the study, 6,171 men (39.8%) and 1,962 women (49.5%) were alive. Median survival was 8.3 months (SE – 0.154 months, 95% CI – 7.96-8.56) in men and 15.43 months (SE – 1.0 months, 95% CI – 13.497-17.363) in women. At diagnosis, 1,037 patients (5.35%) had stage I disease, and 4,145 (21.38%) had stage II. Most patients (61.4%) had advanced stage NSCLC: 9,189 people (47.4%) were diagnosed with stage III, and 4,655 (24%) – with stage IV. The reliability of differences in median survival (χ2=3991.6, p=0.00) indicated the prognostic significance of the tumor process stage and its influence on the patient’s survival. Also, the revealed significant difference in the median survival of patients with various morphological forms of lung cancer suggests the prognostic significance of the morphological factor (the difference between those indicators was statistically significant, χ2=623.4 p=0.000). Conclusion: Machine learning models can predict the risk of fatal outcomes for patients after surgical treatment and registration in the EROB database. The creation of patient-oriented systems to support medical decision-making makes it possible to choose the optimal strategies for adjuvant therapy, dispensary observation, and frequency of diagnostic studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信