Efficient healthcare service based on Stacking Ensemble

Yunsang Joo, Seungwon Lee, Hyoungju Kim, Pankoo Kim, Seong Oun Hwang, Chang Choi
{"title":"Efficient healthcare service based on Stacking Ensemble","authors":"Yunsang Joo, Seungwon Lee, Hyoungju Kim, Pankoo Kim, Seong Oun Hwang, Chang Choi","doi":"10.1145/3440943.3444727","DOIUrl":null,"url":null,"abstract":"Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.","PeriodicalId":310247,"journal":{"name":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440943.3444727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.
基于堆叠集成的高效医疗服务
近年来,利用医疗大数据预测高概率患病患者的研究备受关注。由于人工智能的进步,持续的研究是必不可少的,因为疾病只能通过计算数字来预测,并且可以在疾病发生之前进行预防。因此,利用医疗大数据进行疾病预测的机器学习和深度学习研究正在积极推进。由于医疗数据的性质,疾病是罕见的,因此存在过采样或欠采样的倾向,从而导致信息失真。此外,考虑到大多数基于机器学习的研究都是基于某些预测模型,预测本身就有可能反映出存在的偏见。所以,如果你泛化你的模型将要训练的数据,或者调整模型的偏差,你可以得到更好的预测。在本白皮书中,我们使用糖尿病、心脏病和乳腺癌的数据,通过几个单独的分类器来获得预测值,并将它们作为一个元模型的训练数据来获得最终预测。即通过构建一个叠加系综模型来预测疾病的存在与否,并通过实验分析其性能。该模型在相同的数据上训练多个分类器,因此存在模型过拟合数据的可能性。因此,当训练多个分类器时,我们比较有和没有交叉验证的模型。在实验中,使用交叉验证进行训练的模型表现出比单个模型平均高出1.4%的性能。另一方面,未经交叉验证的元模型表现出比单个模型更低的性能。换句话说,在构建堆叠集成模型时,只有通过交叉验证单个分类器才能获得高性能。对高性能单个模型的预测值执行一个最终预测将产生更稳定和可靠的预测。本文提出的基于交叉学习的累积集成模型可以预测疾病的存在或不存在,可用于医疗服务开发和疾病预防。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信