G. Madhu, B Lalith Bharadwaj, G. Nagachandrika, K. Vardhan
{"title":"A Novel Algorithm for Missing Data Imputation on Machine Learning","authors":"G. Madhu, B Lalith Bharadwaj, G. Nagachandrika, K. Vardhan","doi":"10.1109/ICSSIT46314.2019.8987895","DOIUrl":null,"url":null,"abstract":"Missing data value plays a significant role in medical research and its presence causes an adverse effect on machine learning and AI models which leads to the wrong insights for decision making. Past few decades, researchers have developed and applied various imputation approaches to real-world applications. In addition, imputation methods help us to build effective models to discover hidden patterns in medical applications that can provide insightful outcomes for better decision-making. In this paper, a new approach is proposed to impute the missing data value using XGBoost (eXtreme Gradient Boosting) of ensemble learning method for continuous attributes in medical datasets. The proposed methods are continuous type attribute imputations for continuous and discrete data attributes. In this approach, we impute each missing data attribute value by predicting its data value from non-missing data attributes. The experiments are conducted on benchmark medical datasets missing values ranging from 1.98% to 50.65% and compared with iterative imputation, KNN imputation, and missForest imputation. In our study, we observe that missXGBoost can successfully handle missing data attributes of continuous types of attributes and it outperforms other imputation methods.","PeriodicalId":330309,"journal":{"name":"2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSIT46314.2019.8987895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Missing data value plays a significant role in medical research and its presence causes an adverse effect on machine learning and AI models which leads to the wrong insights for decision making. Past few decades, researchers have developed and applied various imputation approaches to real-world applications. In addition, imputation methods help us to build effective models to discover hidden patterns in medical applications that can provide insightful outcomes for better decision-making. In this paper, a new approach is proposed to impute the missing data value using XGBoost (eXtreme Gradient Boosting) of ensemble learning method for continuous attributes in medical datasets. The proposed methods are continuous type attribute imputations for continuous and discrete data attributes. In this approach, we impute each missing data attribute value by predicting its data value from non-missing data attributes. The experiments are conducted on benchmark medical datasets missing values ranging from 1.98% to 50.65% and compared with iterative imputation, KNN imputation, and missForest imputation. In our study, we observe that missXGBoost can successfully handle missing data attributes of continuous types of attributes and it outperforms other imputation methods.