机器学习技术在预测癌症患者生活质量特征中的应用

IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis
{"title":"机器学习技术在预测癌症患者生活质量特征中的应用","authors":"Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis","doi":"10.2298/csis220227061s","DOIUrl":null,"url":null,"abstract":"Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).","PeriodicalId":50636,"journal":{"name":"Computer Science and Information Systems","volume":"67 1","pages":"381-404"},"PeriodicalIF":1.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The application of machine learning techniques in prediction of quality of life features for cancer patients\",\"authors\":\"Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis\",\"doi\":\"10.2298/csis220227061s\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).\",\"PeriodicalId\":50636,\"journal\":{\"name\":\"Computer Science and Information Systems\",\"volume\":\"67 1\",\"pages\":\"381-404\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2298/csis220227061s\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2298/csis220227061s","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

摘要

生活质量(QoL)是癌症患者的主要问题之一。随着包含大量相关生活质量信息的医学数据库的出现,通过机器学习(ML)技术训练预测生活质量模型成为可能。然而,预测生活质量模型的训练面临着一些挑战,主要是由于数据隐私问题和患者数据中的缺失值。本文分析了几种预测乳腺癌和前列腺癌患者生活质量指标的分类和回归ML模型。采用了三种不同的方法来估算缺失值,并对几种数据隐私保护设置进行了测试。所检查的机器学习模型在两个数据库中形成的数据集上进行训练,这些数据库包含大量瑞典癌症患者的匿名医疗记录。这里考虑了两种学习场景:集中式学习和联邦式学习。在集中式学习场景中,所有来自不同数据源的患者数据在模型训练之前收集在一个中心位置。另一方面,联邦学习可以在没有数据共享的情况下对机器学习模型进行集体训练。我们的实验评估结果表明,对于短期生活质量预测,联邦模型的预测能力与集中训练的模型相当,而对于长期生活质量预测,集中模型提供更准确的生活质量预测。此外,我们还提供了对数据预处理任务质量的见解(缺失值估算和差异隐私)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The application of machine learning techniques in prediction of quality of life features for cancer patients
Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Science and Information Systems
Computer Science and Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
2.30
自引率
21.40%
发文量
76
审稿时长
7.5 months
期刊介绍: About the journal Home page Contact information Aims and scope Indexing information Editorial policies ComSIS consortium Journal boards Managing board For authors Information for contributors Paper submission Article submission through OJS Copyright transfer form Download section For readers Forthcoming articles Current issue Archive Subscription For reviewers View and review submissions News Journal''s Facebook page Call for special issue New issue notification Aims and scope Computer Science and Information Systems (ComSIS) is an international refereed journal, published in Serbia. The objective of ComSIS is to communicate important research and development results in the areas of computer science, software engineering, and information systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信