{"title":"On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.","authors":"Farrokh Habibzadeh","doi":"10.1371/journal.pone.0325920","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.</p><p><strong>Methods: </strong>The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.</p><p><strong>Results: </strong>Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.</p><p><strong>Conclusions: </strong>It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325920"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325920","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.
Methods: The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.
Results: Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.
Conclusions: It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage