G. Kang, Sedigheh Mirzaei, Hui Zhang, Liang Zhu, S. Rai, D. Srivastava
{"title":"Robust Behrens–Fisher Statistic Based on Trimmed Means and Its Usefulness in Analyzing High-Throughput Data","authors":"G. Kang, Sedigheh Mirzaei, Hui Zhang, Liang Zhu, S. Rai, D. Srivastava","doi":"10.3389/fsysb.2022.877601","DOIUrl":null,"url":null,"abstract":"In the context of high-throughput data, the differences in continuous markers between two groups are usually assessed by ordering the p-values obtained from the two-sample pooled t-test or Wilcoxon–Mann–Whitney test and choosing a stringent cutoff such as 10–8 to control the family-wise error rate ( F W E R ) or false discovery rate ( F D R ) . All markers with p-values below the cutoff are declared to be significantly associated with the phenotype. This inherently assumes that the test procedure provides valid type I error estimates in extreme tails of the null distribution. The aforementioned tests assume homoscedasticity in the two groups, and the t-test further assumes underlying distributions to be normally distributed. Cao et al. (Biometrika, 2013, 100, 495–502) have shown that in the context of multiple hypotheses testing the approach based on F D R may not be valid under non-normality and/or heteroscedasticity. Therefore, having a test statistic that is robust to these violations is needed. In this study, we propose a robust analog of Behrens–Fisher statistic based on trimmed means, conduct an extensive simulation study to compare its performance with other competing approaches, and demonstrate its usefulness by applying it to DNA methylation data used by Teschendorff et al. (Genome Res., 2010, 20, 440–446). An R program to implement the proposed method is provided in the Supplementary Material.","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2022.877601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the context of high-throughput data, the differences in continuous markers between two groups are usually assessed by ordering the p-values obtained from the two-sample pooled t-test or Wilcoxon–Mann–Whitney test and choosing a stringent cutoff such as 10–8 to control the family-wise error rate ( F W E R ) or false discovery rate ( F D R ) . All markers with p-values below the cutoff are declared to be significantly associated with the phenotype. This inherently assumes that the test procedure provides valid type I error estimates in extreme tails of the null distribution. The aforementioned tests assume homoscedasticity in the two groups, and the t-test further assumes underlying distributions to be normally distributed. Cao et al. (Biometrika, 2013, 100, 495–502) have shown that in the context of multiple hypotheses testing the approach based on F D R may not be valid under non-normality and/or heteroscedasticity. Therefore, having a test statistic that is robust to these violations is needed. In this study, we propose a robust analog of Behrens–Fisher statistic based on trimmed means, conduct an extensive simulation study to compare its performance with other competing approaches, and demonstrate its usefulness by applying it to DNA methylation data used by Teschendorff et al. (Genome Res., 2010, 20, 440–446). An R program to implement the proposed method is provided in the Supplementary Material.
在高通量数据的背景下,通常通过对从两个样本合并t检验或Wilcoxon–Mann–Whitney检验中获得的p值进行排序,并选择严格的截止值(如10–8)来控制家族错误率(F W E R)或错误发现率(F D R),来评估两组之间连续标记的差异。所有p值低于临界值的标记物都被宣布与表型显著相关。这固有地假设测试程序在零分布的极端尾部中提供有效的I型误差估计。上述检验假设两组中存在同方差,t检验进一步假设潜在分布为正态分布。Cao等人(Biometrika,2013100495-502)已经表明,在多个假设测试的背景下,基于F D R的方法在非正态性和/或异方差下可能无效。因此,需要有一个对这些违规行为具有鲁棒性的测试统计数据。在这项研究中,我们提出了一种基于修剪均值的Behrens–Fisher统计的稳健模拟,进行了广泛的模拟研究,以将其性能与其他竞争方法进行比较,并通过将其应用于Teschendorf等人使用的DNA甲基化数据来证明其有用性。(基因组研究,2010,20440-446)。补充材料中提供了一个实施拟议方法的R程序。