{"title":"Missing Values Imputation Framework for Mixed Datasets","authors":"Kritanat Chungnoy, Pokpong Songmuamg","doi":"10.1109/ICCI57424.2023.10111846","DOIUrl":null,"url":null,"abstract":"A missing value is a critical problem in data mining. The quality of the dataset is important in the mining process. This problem solves by the imputation method. In 2019 Chungnoy et al. proposed bees-based imputation using nearest neighbor for heuristic function [7]. This method shows outperform in imputation task compare to other methods. However, this method can't impute a numerical data type. There is room to be improving. In this work, we propose a hybrid bees-based imputation method for a mixed datatype. The method is applied to mean and estimation mode for numerical and categorical data. Form evaluation, the hybrid bee imputation successfully imputes missing values in mixed data types for all missing percentages. In comparison to other approaches, the average accuracy has improved by 8.57%, which is the biggest improvement in 15% missing percentage. The overall average accuracy of the predictive models from the proposed method is 81.10%","PeriodicalId":112409,"journal":{"name":"2023 IEEE International Conference on Cybernetics and Innovations (ICCI)","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Cybernetics and Innovations (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI57424.2023.10111846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A missing value is a critical problem in data mining. The quality of the dataset is important in the mining process. This problem solves by the imputation method. In 2019 Chungnoy et al. proposed bees-based imputation using nearest neighbor for heuristic function [7]. This method shows outperform in imputation task compare to other methods. However, this method can't impute a numerical data type. There is room to be improving. In this work, we propose a hybrid bees-based imputation method for a mixed datatype. The method is applied to mean and estimation mode for numerical and categorical data. Form evaluation, the hybrid bee imputation successfully imputes missing values in mixed data types for all missing percentages. In comparison to other approaches, the average accuracy has improved by 8.57%, which is the biggest improvement in 15% missing percentage. The overall average accuracy of the predictive models from the proposed method is 81.10%