Missing value imputation methods for gene-sample-time microarray data analysis

Yifeng Li, A. Ngom, L. Rueda
{"title":"Missing value imputation methods for gene-sample-time microarray data analysis","authors":"Yifeng Li, A. Ngom, L. Rueda","doi":"10.1109/CIBCB.2010.5510349","DOIUrl":null,"url":null,"abstract":"With the recent advances in microarray technology, the expression levels of genes with respect to the samples can be monitored synchronically over a series of time points. Such three-dimensional microarray data, termed gene-sample-time microarray data or GST data for short, may contain missing values. Current microarray analysis methods require complete data sets, and thus, either each row, column or tube containing missing values must be removed from the original GST data, or these missing values must be estimated before analysis. Imputation of missing values is, however, more recommended than removal of data in order to increase the effectiveness of analysis algorithms. In this paper, we extend automated imputation methods, devised for two-dimensional microarray data, to GST data. We implemented imputation methods for GST data based on Singular Value Decomposition (3SVDimpute), K-Nearest Neighbor (3KNNimpute), and gene and sample average methods (3Aimpute), and show that methods based on KNN yield the best results with the lowest normalized root mean squared error.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2010.5510349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

With the recent advances in microarray technology, the expression levels of genes with respect to the samples can be monitored synchronically over a series of time points. Such three-dimensional microarray data, termed gene-sample-time microarray data or GST data for short, may contain missing values. Current microarray analysis methods require complete data sets, and thus, either each row, column or tube containing missing values must be removed from the original GST data, or these missing values must be estimated before analysis. Imputation of missing values is, however, more recommended than removal of data in order to increase the effectiveness of analysis algorithms. In this paper, we extend automated imputation methods, devised for two-dimensional microarray data, to GST data. We implemented imputation methods for GST data based on Singular Value Decomposition (3SVDimpute), K-Nearest Neighbor (3KNNimpute), and gene and sample average methods (3Aimpute), and show that methods based on KNN yield the best results with the lowest normalized root mean squared error.
基因-样品-时间微阵列数据分析的缺失值输入方法
随着微阵列技术的最新进展,可以在一系列时间点上同步监测样品的基因表达水平。这种三维微阵列数据,被称为基因-样本-时间微阵列数据或简称GST数据,可能包含缺失值。目前的微阵列分析方法需要完整的数据集,因此,必须从原始GST数据中删除包含缺失值的每一行,列或管,或者必须在分析前估计这些缺失值。然而,为了提高分析算法的有效性,更建议对缺失值进行代入,而不是删除数据。在本文中,我们将为二维微阵列数据设计的自动插补方法扩展到GST数据。我们实现了基于奇异值分解(3SVDimpute)、k -近邻(3KNNimpute)和基因和样本平均(3Aimpute)的GST数据归一化均方根误差最小的方法,结果表明基于KNN的方法得到的结果最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信