基于稀疏表示的分类缺失数据输入方法

Int. J. Serv. Technol. Manag. Pub Date : 2016-08-23 DOI:10.1504/IJSTM.2016.078542

Xiaochen Shao, Sen Wu, Xiaodong Feng, Rui Song

{"title":"基于稀疏表示的分类缺失数据输入方法","authors":"Xiaochen Shao, Sen Wu, Xiaodong Feng, Rui Song","doi":"10.1504/IJSTM.2016.078542","DOIUrl":null,"url":null,"abstract":"K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.","PeriodicalId":171228,"journal":{"name":"Int. J. Serv. Technol. Manag.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Categorical missing data imputation approach via sparse representation\",\"authors\":\"Xiaochen Shao, Sen Wu, Xiaodong Feng, Rui Song\",\"doi\":\"10.1504/IJSTM.2016.078542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.\",\"PeriodicalId\":171228,\"journal\":{\"name\":\"Int. J. Serv. Technol. Manag.\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Serv. Technol. Manag.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJSTM.2016.078542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Serv. Technol. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJSTM.2016.078542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

k近邻法(KNN)是一种重要的分类缺失数据估计方法。KNN的有效性对一些局部参数如相似函数的选择和邻居数的选择高度敏感。针对这两个问题，提出了一种分类缺失数据输入算法(CSR)。首先进行矩阵变换，使分类数据更符合计算;然后利用KNN作为字典构造，将局部性约束思想引入到稀疏表示理论中。然后，该方法利用光滑性和局部结构特征得到每个缺失实例的权值向量。最后，利用稀疏重建系数向量，选取每个缺失属性的最大对应重建值来填充缺失数据。实证检验表明，从效率和稳定性的角度来看，CSR优于KNNimpute(包括其两个衍生方法IKNNimpute、SKNNimpute)和LLSimpute。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Categorical missing data imputation approach via sparse representation

K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Serv. Technol. Manag.

自引率

0.00%

发文量