发现差异表达基因的非线性依赖。

ISRN bioinformatics Pub Date : 2012-04-12 eCollection Date: 2012-01-01 DOI:10.5402/2012/564715

J R Deller, Hayder Radha, J Justin McCormick, Huiyan Wang

{"title":"发现差异表达基因的非线性依赖。","authors":"J R Deller, Hayder Radha, J Justin McCormick, Huiyan Wang","doi":"10.5402/2012/564715","DOIUrl":null,"url":null,"abstract":"Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are \"discovered\" when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ 0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the \"moment framework\" by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations. ","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"564715"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393074/pdf/","citationCount":"2","resultStr":"{\"title\":\"Nonlinear dependence in the discovery of differentially expressed genes.\",\"authors\":\"J R Deller, Hayder Radha, J Justin McCormick, Huiyan Wang\",\"doi\":\"10.5402/2012/564715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are \\\"discovered\\\" when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ 0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the \\\"moment framework\\\" by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations. \",\"PeriodicalId\":90877,\"journal\":{\"name\":\"ISRN bioinformatics\",\"volume\":\"2012 \",\"pages\":\"564715\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393074/pdf/\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISRN bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5402/2012/564715\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2012/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISRN bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5402/2012/564715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

微阵列数据用于确定哪些基因在响应变化的细胞环境时是活跃的。当基因在不同条件下收集的微阵列数据中显着表达差异时，基因被“发现”。在一种流行的方法中，假设所有基因都满足零假设，即表达无差异。当错误地拒绝了y0时，会出现错误发现(类型1错误)。检测算法的质量是通过估计其错误发现的数量来评估的，𝔉。涉及z值直方图(表示基因表达差异)的第二矩建模的工作表明，基因间表达相关性对𝔉的估计有显著的有害影响。本文表明，非线性依赖关系可能同样重要。从应用的角度出发，本文扩展了“矩框架”，在𝔉估计量中加入了第三矩偏度修正。该估计器将观察到的相关性(对抽样波动进行了修正)与来自易于识别的空情况的信息相结合。非线性相关建模相对于线性估计减少了估计误差。第三矩计算涉及使用很少的样本估计的3 × 3协方差矩阵的经验密度。利用熵最大化原理将估计的矩与𝔉推理联系起来。模型结果用BRCA和HIV数据集以及精心构建的模拟进行了测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Nonlinear dependence in the discovery of differentially expressed genes.

查看原文本刊更多论文

Nonlinear dependence in the discovery of differentially expressed genes.

Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are "discovered" when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ 0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ 0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the "moment framework" by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISRN bioinformatics

自引率

0.00%

发文量