机器学习技术在预测癌症患者生活质量特征中的应用

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science and Information Systems Pub Date : 2023-01-01 DOI:10.2298/csis220227061s

Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis

{"title":"机器学习技术在预测癌症患者生活质量特征中的应用","authors":"Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis","doi":"10.2298/csis220227061s","DOIUrl":null,"url":null,"abstract":"Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).","PeriodicalId":50636,"journal":{"name":"Computer Science and Information Systems","volume":"67 1","pages":"381-404"},"PeriodicalIF":1.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The application of machine learning techniques in prediction of quality of life features for cancer patients\",\"authors\":\"Milos Savic, V. Kurbalija, Mihailo Ilic, M. Ivanović, D. Jakovetić, A. Valachis, Serge Autexier, Johannes Rust, T. Kosmidis\",\"doi\":\"10.2298/csis220227061s\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).\",\"PeriodicalId\":50636,\"journal\":{\"name\":\"Computer Science and Information Systems\",\"volume\":\"67 1\",\"pages\":\"381-404\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2298/csis220227061s\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2298/csis220227061s","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

摘要

生活质量(QoL)是癌症患者的主要问题之一。随着包含大量相关生活质量信息的医学数据库的出现，通过机器学习(ML)技术训练预测生活质量模型成为可能。然而，预测生活质量模型的训练面临着一些挑战，主要是由于数据隐私问题和患者数据中的缺失值。本文分析了几种预测乳腺癌和前列腺癌患者生活质量指标的分类和回归ML模型。采用了三种不同的方法来估算缺失值，并对几种数据隐私保护设置进行了测试。所检查的机器学习模型在两个数据库中形成的数据集上进行训练，这些数据库包含大量瑞典癌症患者的匿名医疗记录。这里考虑了两种学习场景:集中式学习和联邦式学习。在集中式学习场景中，所有来自不同数据源的患者数据在模型训练之前收集在一个中心位置。另一方面，联邦学习可以在没有数据共享的情况下对机器学习模型进行集体训练。我们的实验评估结果表明，对于短期生活质量预测，联邦模型的预测能力与集中训练的模型相当，而对于长期生活质量预测，集中模型提供更准确的生活质量预测。此外，我们还提供了对数据预处理任务质量的见解(缺失值估算和差异隐私)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The application of machine learning techniques in prediction of quality of life features for cancer patients

Quality of life (QoL) is one of the major issues for cancer patients. With the advent of medical databases containing large amounts of relevant QoL information it becomes possible to train predictive QoL models by machine learning (ML) techniques. However, the training of predictive QoL models poses several challenges mostly due to data privacy concerns and missing values in patient data. In this paper, we analyze several classification and regression ML models predicting QoL indicators for breast and prostate cancer patients. Three different approaches are employed for imputing missing values, and several settings for data privacy preserving are tested. The examined ML models are trained on datasets formed from two databases containing a large number of anonymized medical records of cancer patients from Sweden. Two learning scenarios are considered: centralized and federated learning. In the centralized learning scenario all patient data coming from different data sources is collected at a central location prior to model training. On the other hand, federated learning enables collective training of machine learning models without data sharing. The results of our experimental evaluation show that the predictive power of federated models is comparable to that of centrally trained models for short-term QoL predictions, whereas for long-term periods centralized models provide more accurate QoL predictions. Furthermore, we provide insights into the quality of data preprocessing tasks (missing value imputation and differential privacy).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Science and Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

2.30

自引率

21.40%

发文量

审稿时长

7.5 months

期刊介绍： About the journal Home page Contact information Aims and scope Indexing information Editorial policies ComSIS consortium Journal boards Managing board For authors Information for contributors Paper submission Article submission through OJS Copyright transfer form Download section For readers Forthcoming articles Current issue Archive Subscription For reviewers View and review submissions News Journal''s Facebook page Call for special issue New issue notification Aims and scope Computer Science and Information Systems (ComSIS) is an international refereed journal, published in Serbia. The objective of ComSIS is to communicate important research and development results in the areas of computer science, software engineering, and information systems.