{"title":"Sample-to-sample p-value variability and its implications for multivariate analysis","authors":"Wei Wang, W. Goh","doi":"10.1504/IJBRA.2018.10009566","DOIUrl":null,"url":null,"abstract":"Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that p-values are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.","PeriodicalId":434900,"journal":{"name":"Int. J. Bioinform. Res. Appl.","volume":"11 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Bioinform. Res. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2018.10009566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that p-values are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.