{"title":"基于稀疏表示的分类缺失数据输入方法","authors":"Xiaochen Shao, Sen Wu, Xiaodong Feng, Rui Song","doi":"10.1504/IJSTM.2016.078542","DOIUrl":null,"url":null,"abstract":"K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.","PeriodicalId":171228,"journal":{"name":"Int. J. Serv. Technol. Manag.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Categorical missing data imputation approach via sparse representation\",\"authors\":\"Xiaochen Shao, Sen Wu, Xiaodong Feng, Rui Song\",\"doi\":\"10.1504/IJSTM.2016.078542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.\",\"PeriodicalId\":171228,\"journal\":{\"name\":\"Int. J. Serv. Technol. Manag.\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Serv. Technol. Manag.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJSTM.2016.078542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Serv. Technol. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJSTM.2016.078542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Categorical missing data imputation approach via sparse representation
K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.