Nonlinear dependence in the discovery of differentially expressed genes.

ISRN bioinformatics Pub Date : 2012-04-12 eCollection Date: 2012-01-01 DOI:10.5402/2012/564715
J R Deller, Hayder Radha, J Justin McCormick, Huiyan Wang
{"title":"Nonlinear dependence in the discovery of differentially expressed genes.","authors":"J R Deller,&nbsp;Hayder Radha,&nbsp;J Justin McCormick,&nbsp;Huiyan Wang","doi":"10.5402/2012/564715","DOIUrl":null,"url":null,"abstract":"<p><p>Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are \"discovered\" when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ 0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the \"moment framework\" by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"564715"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393074/pdf/","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISRN bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5402/2012/564715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are "discovered" when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ 0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the "moment framework" by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations.

Abstract Image

Abstract Image

Abstract Image

发现差异表达基因的非线性依赖。
微阵列数据用于确定哪些基因在响应变化的细胞环境时是活跃的。当基因在不同条件下收集的微阵列数据中显着表达差异时,基因被“发现”。在一种流行的方法中,假设所有基因都满足零假设,即表达无差异。当错误地拒绝了y0时,会出现错误发现(类型1错误)。检测算法的质量是通过估计其错误发现的数量来评估的,𝔉。涉及z值直方图(表示基因表达差异)的第二矩建模的工作表明,基因间表达相关性对𝔉的估计有显著的有害影响。本文表明,非线性依赖关系可能同样重要。从应用的角度出发,本文扩展了“矩框架”,在𝔉估计量中加入了第三矩偏度修正。该估计器将观察到的相关性(对抽样波动进行了修正)与来自易于识别的空情况的信息相结合。非线性相关建模相对于线性估计减少了估计误差。第三矩计算涉及使用很少的样本估计的3 × 3协方差矩阵的经验密度。利用熵最大化原理将估计的矩与𝔉推理联系起来。模型结果用BRCA和HIV数据集以及精心构建的模拟进行了测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信