{"title":"A novel interpolation based missing value estimation method to predict missing values in microarray gene expression data","authors":"S. Bose, C. Das, S. Dutta, S. Chattopadhyay","doi":"10.1109/CODIS.2012.6422202","DOIUrl":null,"url":null,"abstract":"Microarray experiments can generate data sets with multiple missing expression values, normally due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. Thereore, effective missing value estimation methods are essential to minimize the effect of incomplete data sets on analysis, and to increase the range of data sets to which these algorithms can be applied. In this regard, a new interpolation based imputation method is proposed to predict missing values in microarray gene expression data. The proposed method selects a subset of similar genes and a subset of similar samples with respect to each missing position and then applies interpolation in a novel manner to predict that missing value. The performance of the proposed method is studied based on the normalized root mean square error with existing estimation techniques including K-nearest neighbor (KNN), Sequential K-nearest neighbor (SKNN) and Iterative K-nearest neighbor (IKNN). The effectiveness of the proposed method, along with a comparison with existing methods, is demonstrated on different microarray data sets.","PeriodicalId":274831,"journal":{"name":"2012 International Conference on Communications, Devices and Intelligent Systems (CODIS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Communications, Devices and Intelligent Systems (CODIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODIS.2012.6422202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Microarray experiments can generate data sets with multiple missing expression values, normally due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. Thereore, effective missing value estimation methods are essential to minimize the effect of incomplete data sets on analysis, and to increase the range of data sets to which these algorithms can be applied. In this regard, a new interpolation based imputation method is proposed to predict missing values in microarray gene expression data. The proposed method selects a subset of similar genes and a subset of similar samples with respect to each missing position and then applies interpolation in a novel manner to predict that missing value. The performance of the proposed method is studied based on the normalized root mean square error with existing estimation techniques including K-nearest neighbor (KNN), Sequential K-nearest neighbor (SKNN) and Iterative K-nearest neighbor (IKNN). The effectiveness of the proposed method, along with a comparison with existing methods, is demonstrated on different microarray data sets.