Yunsang Joo, Seungwon Lee, Hyoungju Kim, Pankoo Kim, Seong Oun Hwang, Chang Choi
{"title":"Efficient healthcare service based on Stacking Ensemble","authors":"Yunsang Joo, Seungwon Lee, Hyoungju Kim, Pankoo Kim, Seong Oun Hwang, Chang Choi","doi":"10.1145/3440943.3444727","DOIUrl":null,"url":null,"abstract":"Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.","PeriodicalId":310247,"journal":{"name":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440943.3444727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.