On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-06-13 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0325920

Farrokh Habibzadeh

{"title":"On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.","authors":"Farrokh Habibzadeh","doi":"10.1371/journal.pone.0325920","DOIUrl":null,"url":null,"abstract":"Background: Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.Methods: The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.Results: Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.Conclusions: It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325920"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325920","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.

Methods: The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.

Results: Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.

Conclusions: It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.

Abstract Image

查看原文本刊更多论文

灵活调整p值显著性阈值对随机临床试验可重复性的影响。

背景：可重复性危机是全世界许多科学家关注的主要问题之一。一些研究人员认为，危机主要归因于传统的p显著性阈值被任意选择为0.05，并建议将临界值降低到0.005。降低截止值虽然会降低假阳性率，但也会增加假阴性率。近年来，提出了一种灵活的p显著性阈值，使假设的统计推断检验的加权误差和最小化。方法：采用当前的计算机研究，比较p显著性阈值（0.05、0.005和灵活阈值）在不同条件下的错误率。使用蒙特卡罗模拟，在假设的随机临床试验中计算假阳性率（当零假设成立时）和假阴性率（当替代假设成立时）。结果：增加研究样本量与假阴性率的降低相关，然而，当使用固定显著性阈值时，无论样本量大小，假阳性率都发生在固定值；然而，当采用灵活阈值时，该比率下降。采用弹性阈值在很大程度上消除了可重复性危机的同时，也暴露了频率统计推断框架的内在冲突。计算灵活的p显著性阈值只能在后验，在获得结果之后。因此，即使对于复制品，阈值也会有所不同，这与常识相矛盾。结论：依靠频率统计推断和p值似乎不再是一种可行的方法。重点应该转向数据分析的替代方法，例如贝叶斯统计方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage