为少量测试设计的局部错误发现率估计器。

Pub Date : 2012-10-12 DOI:10.1515/1544-6115.1807
Marta Padilla, David R Bickel
{"title":"为少量测试设计的局部错误发现率估计器。","authors":"Marta Padilla,&nbsp;David R Bickel","doi":"10.1515/1544-6115.1807","DOIUrl":null,"url":null,"abstract":"<p><p>Histogram-based empirical Bayes methods developed for analyzing data for large numbers of genes, SNPs, or other biological features tend to have large biases when applied to data with a smaller number of features such as genes with expression measured conventionally, proteins, and metabolites. To analyze such small-scale and medium-scale data in an empirical Bayes framework, we introduce corrections of maximum likelihood estimators (MLEs) of the local false discovery rate (LFDR). In this context, the MLE estimates the LFDR, which is a posterior probability of null hypothesis truth, by estimating the prior distribution. The corrections lie in excluding each feature when estimating one or more parameters on which the prior depends. In addition, we propose the expected LFDR (ELFDR) in order to propagate the uncertainty involved in estimating the prior. We also introduce an optimally weighted combination of the best of the corrected MLEs with a previous estimator that, being based on a binomial distribution, does not require a parametric model of the data distribution across features. An application of the new estimators and previous estimators to protein abundance data illustrates the extent to which different estimators lead to different conclusions about which proteins are affected by cancer. A simulation study was conducted to approximate the bias of the new estimators relative to previous LFDR estimators. Data were simulated for two different numbers of features (N), two different noncentrality parameter values or detectability levels (dalt), and several proportions of unaffected features (p0). One of these previous estimators is a histogram-based estimator (HBE) designed for a large number of features. The simulations show that some of the corrected MLEs and the ELFDR that corrects the HBE reduce the negative bias relative to the MLE and the HBE, respectively. For every method, we defined the worst-case performance as the maximum of the absolute value of the bias over the two different dalt and over various p0. The best worst-case methods represent the safest methods to be used under given conditions. This analysis indicates that the binomial-based method has the lowest worst-case absolute bias for high p0 and for N = 3, 12. However, the corrected MLE that is based on the minimum description length (MDL) principle is the best worst-case method when the value of p0 is more uncertain since it has one of the lowest worst-case biases over all possible values of p0 and for N = 3, 12. Therefore, the safest estimator considered is the binomial-based method when a high proportion of unaffected features can be assumed and the MDL-based method otherwise. A second simulation study was conducted with additional values of N. We found that HBE requires N to be at least 6-12 features to perform as well as the estimators proposed here, with the precise minimum N depending on p0 and dalt.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1807","citationCount":"5","resultStr":"{\"title\":\"Estimators of the local false discovery rate designed for small numbers of tests.\",\"authors\":\"Marta Padilla,&nbsp;David R Bickel\",\"doi\":\"10.1515/1544-6115.1807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Histogram-based empirical Bayes methods developed for analyzing data for large numbers of genes, SNPs, or other biological features tend to have large biases when applied to data with a smaller number of features such as genes with expression measured conventionally, proteins, and metabolites. To analyze such small-scale and medium-scale data in an empirical Bayes framework, we introduce corrections of maximum likelihood estimators (MLEs) of the local false discovery rate (LFDR). In this context, the MLE estimates the LFDR, which is a posterior probability of null hypothesis truth, by estimating the prior distribution. The corrections lie in excluding each feature when estimating one or more parameters on which the prior depends. In addition, we propose the expected LFDR (ELFDR) in order to propagate the uncertainty involved in estimating the prior. We also introduce an optimally weighted combination of the best of the corrected MLEs with a previous estimator that, being based on a binomial distribution, does not require a parametric model of the data distribution across features. An application of the new estimators and previous estimators to protein abundance data illustrates the extent to which different estimators lead to different conclusions about which proteins are affected by cancer. A simulation study was conducted to approximate the bias of the new estimators relative to previous LFDR estimators. Data were simulated for two different numbers of features (N), two different noncentrality parameter values or detectability levels (dalt), and several proportions of unaffected features (p0). One of these previous estimators is a histogram-based estimator (HBE) designed for a large number of features. The simulations show that some of the corrected MLEs and the ELFDR that corrects the HBE reduce the negative bias relative to the MLE and the HBE, respectively. For every method, we defined the worst-case performance as the maximum of the absolute value of the bias over the two different dalt and over various p0. The best worst-case methods represent the safest methods to be used under given conditions. This analysis indicates that the binomial-based method has the lowest worst-case absolute bias for high p0 and for N = 3, 12. However, the corrected MLE that is based on the minimum description length (MDL) principle is the best worst-case method when the value of p0 is more uncertain since it has one of the lowest worst-case biases over all possible values of p0 and for N = 3, 12. Therefore, the safest estimator considered is the binomial-based method when a high proportion of unaffected features can be assumed and the MDL-based method otherwise. A second simulation study was conducted with additional values of N. We found that HBE requires N to be at least 6-12 features to perform as well as the estimators proposed here, with the precise minimum N depending on p0 and dalt.</p>\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2012-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/1544-6115.1807\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/1544-6115.1807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/1544-6115.1807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

基于直方图的经验贝叶斯方法用于分析大量基因,snp或其他生物特征的数据,当应用于具有少量特征的数据时,如常规测量表达的基因,蛋白质和代谢物,往往具有较大的偏差。为了在经验贝叶斯框架中分析这些小尺度和中等尺度的数据,我们引入了局部错误发现率(LFDR)的最大似然估计量(MLEs)的修正。在这种情况下,MLE通过估计先验分布来估计LFDR,即零假设真值的后验概率。校正在于在估计先验所依赖的一个或多个参数时排除每个特征。此外,我们提出了期望LFDR (ELFDR),以传播估计先验所涉及的不确定性。我们还引入了一种最优加权组合,将最好的修正mle与先前的估计器结合起来,该估计器基于二项分布,不需要跨特征的数据分布的参数模型。新的估计器和以前的估计器对蛋白质丰度数据的应用说明了不同的估计器在多大程度上导致关于哪些蛋白质受癌症影响的不同结论。进行了模拟研究,以近似新的估计器相对于以前的LFDR估计器的偏差。对两种不同数量的特征(N)、两种不同的非中心性参数值或可检测性水平(dalt)以及几种未受影响的特征(p0)的比例进行数据模拟。其中一个先前的估计器是基于直方图的估计器(HBE),它是为大量的特征而设计的。仿真结果表明,部分修正后的MLE和修正了HBE的ELFDR分别减少了相对于MLE和HBE的负偏置。对于每一种方法,我们都将最坏情况的性能定义为两个不同的数据和不同的p0上的偏差绝对值的最大值。最佳最坏情况方法代表在给定条件下使用的最安全的方法。分析表明,对于高p0和N = 3,12,基于二项的方法具有最低的最坏情况绝对偏差。然而,当p0的值更不确定时,基于最小描述长度(MDL)原则的修正MLE是最佳最坏情况方法,因为它在p0的所有可能值中具有最低的最坏情况偏差之一,并且N = 3,12。因此,考虑的最安全的估计量是基于二项式的方法,当可以假设高比例的未受影响的特征时,否则是基于mdl的方法。我们发现HBE需要N至少有6-12个特征才能达到这里提出的估计器的效果,精确的最小N取决于p0和dalt。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享
查看原文
Estimators of the local false discovery rate designed for small numbers of tests.

Histogram-based empirical Bayes methods developed for analyzing data for large numbers of genes, SNPs, or other biological features tend to have large biases when applied to data with a smaller number of features such as genes with expression measured conventionally, proteins, and metabolites. To analyze such small-scale and medium-scale data in an empirical Bayes framework, we introduce corrections of maximum likelihood estimators (MLEs) of the local false discovery rate (LFDR). In this context, the MLE estimates the LFDR, which is a posterior probability of null hypothesis truth, by estimating the prior distribution. The corrections lie in excluding each feature when estimating one or more parameters on which the prior depends. In addition, we propose the expected LFDR (ELFDR) in order to propagate the uncertainty involved in estimating the prior. We also introduce an optimally weighted combination of the best of the corrected MLEs with a previous estimator that, being based on a binomial distribution, does not require a parametric model of the data distribution across features. An application of the new estimators and previous estimators to protein abundance data illustrates the extent to which different estimators lead to different conclusions about which proteins are affected by cancer. A simulation study was conducted to approximate the bias of the new estimators relative to previous LFDR estimators. Data were simulated for two different numbers of features (N), two different noncentrality parameter values or detectability levels (dalt), and several proportions of unaffected features (p0). One of these previous estimators is a histogram-based estimator (HBE) designed for a large number of features. The simulations show that some of the corrected MLEs and the ELFDR that corrects the HBE reduce the negative bias relative to the MLE and the HBE, respectively. For every method, we defined the worst-case performance as the maximum of the absolute value of the bias over the two different dalt and over various p0. The best worst-case methods represent the safest methods to be used under given conditions. This analysis indicates that the binomial-based method has the lowest worst-case absolute bias for high p0 and for N = 3, 12. However, the corrected MLE that is based on the minimum description length (MDL) principle is the best worst-case method when the value of p0 is more uncertain since it has one of the lowest worst-case biases over all possible values of p0 and for N = 3, 12. Therefore, the safest estimator considered is the binomial-based method when a high proportion of unaffected features can be assumed and the MDL-based method otherwise. A second simulation study was conducted with additional values of N. We found that HBE requires N to be at least 6-12 features to perform as well as the estimators proposed here, with the precise minimum N depending on p0 and dalt.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信