{"title":"Iterative missing value imputation based on feature importance","authors":"Cong Guo, Wei Yang, Chun Liu, Zheng Li","doi":"10.1007/s10115-024-02159-7","DOIUrl":null,"url":null,"abstract":"<p>Many datasets suffer from missing values due to various reasons, which not only increases the processing difficulty of related tasks but also reduces the classification accuracy. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning. In particular, matrix completion is performed based on a completion loss function that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"5 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02159-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Many datasets suffer from missing values due to various reasons, which not only increases the processing difficulty of related tasks but also reduces the classification accuracy. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning. In particular, matrix completion is performed based on a completion loss function that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
期刊介绍:
Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.