Pascal Ndayishimiyepas, Cheruiyot Wilson, Micheal W. Kimwele
{"title":"A Hybrid Model for Predicting Missing Records in Data Using XGBoost","authors":"Pascal Ndayishimiyepas, Cheruiyot Wilson, Micheal W. Kimwele","doi":"10.1109/ISPCE-ASIA57917.2022.9971092","DOIUrl":null,"url":null,"abstract":"Many of the datasets in real-world applications contain incompleteness. The volume of the historical data is usually large. Moreover, there are many missing values for many features of the data. Therefore, this paper implemented an enhanced model for predicting missing records in data using supervised machine learning XGBoost regression. The paper explores different approaches that have been implemented for predicting missing records in data and then implement an enhanced approach. XGBoost stands for extreme Gradient Boosting. The main goal of XGBoost's development was improvement in model performance and speed of computation. It is an implementation of Gradient Boosting Machine which enhances the computing power for boosted trees algorithms. From the results of accuracy, precision, and recall score, it can be concluded that the implemented XGBoost algorithm model is capable of predicting missing records in a dataset.","PeriodicalId":197173,"journal":{"name":"2022 IEEE International Symposium on Product Compliance Engineering - Asia (ISPCE-ASIA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Product Compliance Engineering - Asia (ISPCE-ASIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPCE-ASIA57917.2022.9971092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many of the datasets in real-world applications contain incompleteness. The volume of the historical data is usually large. Moreover, there are many missing values for many features of the data. Therefore, this paper implemented an enhanced model for predicting missing records in data using supervised machine learning XGBoost regression. The paper explores different approaches that have been implemented for predicting missing records in data and then implement an enhanced approach. XGBoost stands for extreme Gradient Boosting. The main goal of XGBoost's development was improvement in model performance and speed of computation. It is an implementation of Gradient Boosting Machine which enhances the computing power for boosted trees algorithms. From the results of accuracy, precision, and recall score, it can be concluded that the implemented XGBoost algorithm model is capable of predicting missing records in a dataset.