Comparison of estimation methods for missing value imputation of gene expression data

A. Sarıkaş, N. Odabasioglu, G. Altay
{"title":"Comparison of estimation methods for missing value imputation of gene expression data","authors":"A. Sarıkaş, N. Odabasioglu, G. Altay","doi":"10.1109/TIPTEKNO.2016.7863090","DOIUrl":null,"url":null,"abstract":"Control and correction process of missing values (imputation of MVs) is the first stage of the preprocessing of microarray datasets. This paper focuses on a comparison of most reliable and up to date estimation methods to control and correct the missing values. Imputation of MVs has a very high priority because of its impact on next pre-processing and post-processing stages of microarray data analysis namely, quality control, normalization, differential gene expression, classification, clustering, and pathway analysis, etc. Normalized root mean square error (NRMSE) value is used to evaluate the performances of most popular five methods (k-nearest neighbors, Bayesian principal component analysis, local least squares, mean and median). When NRMSE values of methods were compared, it has observed that local least squares (LLS) and Bayesian principal component analysis (BPCA) methods outperformed all other methods in all percentages of MVs (1%, 5%, 10%, and 20%). BPCA method has given the best results in all percentages of MVs over the number of probes or genes, whereas LLS method has given the best results in all percentages of MVs over the number of samples. The advantage of these two methods over others is that they are least affected by the complexity of the data set.","PeriodicalId":431660,"journal":{"name":"2016 Medical Technologies National Congress (TIPTEKNO)","volume":"6 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Medical Technologies National Congress (TIPTEKNO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TIPTEKNO.2016.7863090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Control and correction process of missing values (imputation of MVs) is the first stage of the preprocessing of microarray datasets. This paper focuses on a comparison of most reliable and up to date estimation methods to control and correct the missing values. Imputation of MVs has a very high priority because of its impact on next pre-processing and post-processing stages of microarray data analysis namely, quality control, normalization, differential gene expression, classification, clustering, and pathway analysis, etc. Normalized root mean square error (NRMSE) value is used to evaluate the performances of most popular five methods (k-nearest neighbors, Bayesian principal component analysis, local least squares, mean and median). When NRMSE values of methods were compared, it has observed that local least squares (LLS) and Bayesian principal component analysis (BPCA) methods outperformed all other methods in all percentages of MVs (1%, 5%, 10%, and 20%). BPCA method has given the best results in all percentages of MVs over the number of probes or genes, whereas LLS method has given the best results in all percentages of MVs over the number of samples. The advantage of these two methods over others is that they are least affected by the complexity of the data set.
基因表达数据缺失值估算方法的比较
缺失值的控制和校正过程是微阵列数据预处理的第一步。本文重点比较了最可靠的和最新的估计方法来控制和纠正缺失值。mv的插入具有非常高的优先级,因为它影响到芯片数据分析的下一个预处理和后处理阶段,即质量控制、归一化、差异基因表达、分类、聚类和通路分析等。使用归一化均方根误差(NRMSE)值来评价最常用的五种方法(k近邻、贝叶斯主成分分析、局部最小二乘、均值和中位数)的性能。当比较方法的NRMSE值时,观察到局部最小二乘(LLS)和贝叶斯主成分分析(BPCA)方法在所有百分比的mv(1%, 5%, 10%和20%)中都优于所有其他方法。BPCA法在探针或基因数量的所有mv百分比上都给出了最好的结果,而LLS法在样品数量的所有mv百分比上都给出了最好的结果。与其他方法相比,这两种方法的优点是它们受数据集复杂性的影响最小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信