{"title":"HIOC:一种预测医疗数据集缺失值的混合插补方法","authors":"P. Rani, Rajneesh Kumar, Anurag Jain","doi":"10.1108/ijicc-03-2021-0042","DOIUrl":null,"url":null,"abstract":"PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.","PeriodicalId":352072,"journal":{"name":"Int. J. Intell. Comput. Cybern.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"HIOC: a hybrid imputation method to predict missing values in medical datasets\",\"authors\":\"P. Rani, Rajneesh Kumar, Anurag Jain\",\"doi\":\"10.1108/ijicc-03-2021-0042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.\",\"PeriodicalId\":352072,\"journal\":{\"name\":\"Int. J. Intell. Comput. Cybern.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Intell. Comput. Cybern.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijicc-03-2021-0042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Intell. Comput. Cybern.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijicc-03-2021-0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
目的利用机器学习分类器开发的决策支持系统已成为预测各种疾病的重要工具。然而,这些系统的性能受到医疗数据集中缺失值的不利影响。使用插值方法来预测这些缺失值。本文提出了一种新的基于分类器优化的混合输入方法(HIOC)来有效地预测缺失值。本文提出的HIOC是通过一个分类器将链式方程(MICE)、K近邻(KNN)、均值和模态方法的多元插值方法以最优方式结合在一起而开发的。将HIOC的性能与MICE、KNN和mean and mode方法进行了比较。采用支持向量机(SVM)、朴素贝叶斯(NB)、随机森林(RF)和决策树(DT)四种分类器对插值方法的性能进行了评价。结果表明,即使缺失值率很高,HIOC也能有效地执行。它将心脏病数据集中的均方根误差(RMSE)降低到17.32%,在乳腺癌数据集中降低到34.73%。对缺失值的正确预测提高了分类器预测疾病的准确性。它在心脏病数据集中提高了18.61%的分类准确率,在乳腺癌数据集中提高了6.20%。本文提出的HIOC是一种新的混合插值方法,可以有效地预测任何医学数据集中的缺失值。
HIOC: a hybrid imputation method to predict missing values in medical datasets
PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.