{"title":"灵活调整p值显著性阈值对随机临床试验可重复性的影响。","authors":"Farrokh Habibzadeh","doi":"10.1371/journal.pone.0325920","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.</p><p><strong>Methods: </strong>The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.</p><p><strong>Results: </strong>Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.</p><p><strong>Conclusions: </strong>It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325920"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/","citationCount":"0","resultStr":"{\"title\":\"On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.\",\"authors\":\"Farrokh Habibzadeh\",\"doi\":\"10.1371/journal.pone.0325920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.</p><p><strong>Methods: </strong>The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.</p><p><strong>Results: </strong>Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.</p><p><strong>Conclusions: </strong>It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0325920\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0325920\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325920","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.
Background: Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.
Methods: The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.
Results: Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.
Conclusions: It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage