An Improved Probabilistic Model for Finding Differential Gene Expression

Li Zhang, Xuejun Liu
{"title":"An Improved Probabilistic Model for Finding Differential Gene Expression","authors":"Li Zhang, Xuejun Liu","doi":"10.1109/BMEI.2009.5302665","DOIUrl":null,"url":null,"abstract":"Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.","PeriodicalId":6389,"journal":{"name":"2009 2nd International Conference on Biomedical Engineering and Informatics","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 2nd International Conference on Biomedical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2009.5302665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.
一种寻找差异基因表达的改进概率模型
发现差异表达基因是微阵列实验的基本目标。最近提出的PPLR方法考虑了探针水平的测量误差,提高了发现差异基因表达的准确性。然而,PPLR在变分EM算法的e步中使用了重要采样过程,导致计算效率较低。我们修改了原始的PPLR,以获得一个改进的模型,用于寻找不同的基因表达。新模型IPPLR增加了隐变量来表示基因的真实表达,并消除了原PPLR中的重要采样。我们将IPPLR应用于一个峰值数据集和一个小鼠胚胎数据集。结果表明,IPPLR提高了发现差异基因表达的准确性和计算效率。微阵列(1)(2)目前被广泛用于获得基因表达的大规模测量。发现差异表达(DE)基因是微阵列实验的最基本目标。由于微阵列数据中存在着臭名昭著的噪声,在实验中通常使用重复来处理数据的可变性。此外,一些微阵列(如Affymetrix基因芯片)包含多个探针来询问基因表达谱。这为获得与每个基因表达测量相关的技术测量误差的估计提供了丰富的信息。这种错误信息对于弱表达基因尤其重要,因为这些基因通常与高变异性相关。概率方法提供了一种处理噪声数据的基本方法。大多数概率方法,如广泛使用的Cyber-T(3)和SAM(4),都是基于基因表达值的单点估计,而忽略了相关的探针级测量误差。这浪费了数据中的丰富信息。近年来,在噪声数据分析(5)(6)(7)(8)中,数据点的测量误差受到越来越多的关注。PPLR(5)在寻找差异基因表达时考虑探针水平的测量误差。该方法已被证明比其他替代方法更准确(5)(9)。然而,PPLR在变分EM算法的e步中使用了重要采样过程。这将导致较差的精度和较低的计算效率。特别是当实验涉及大量芯片时,PPLR非常耗时。这使得PPLR在现实中的应用变得困难。在这篇文章中,我们通过添加隐藏变量来代表真实的基因表达来改进PPLR。这消除了原PPLR中重要性采样效率低下的问题。在峰值数据集和小鼠胚胎数据集上的结果表明,改进的PPLR (IPPLR)提高了寻找DE基因的准确性和计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信