灵活调整p值显著性阈值对随机临床试验可重复性的影响。

IF 2.6 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
PLoS ONE Pub Date : 2025-06-13 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0325920
Farrokh Habibzadeh
{"title":"灵活调整p值显著性阈值对随机临床试验可重复性的影响。","authors":"Farrokh Habibzadeh","doi":"10.1371/journal.pone.0325920","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.</p><p><strong>Methods: </strong>The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.</p><p><strong>Results: </strong>Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.</p><p><strong>Conclusions: </strong>It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325920"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/","citationCount":"0","resultStr":"{\"title\":\"On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.\",\"authors\":\"Farrokh Habibzadeh\",\"doi\":\"10.1371/journal.pone.0325920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.</p><p><strong>Methods: </strong>The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.</p><p><strong>Results: </strong>Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.</p><p><strong>Conclusions: </strong>It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0325920\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165351/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0325920\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325920","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:可重复性危机是全世界许多科学家关注的主要问题之一。一些研究人员认为,危机主要归因于传统的p显著性阈值被任意选择为0.05,并建议将临界值降低到0.005。降低截止值虽然会降低假阳性率,但也会增加假阴性率。近年来,提出了一种灵活的p显著性阈值,使假设的统计推断检验的加权误差和最小化。方法:采用当前的计算机研究,比较p显著性阈值(0.05、0.005和灵活阈值)在不同条件下的错误率。使用蒙特卡罗模拟,在假设的随机临床试验中计算假阳性率(当零假设成立时)和假阴性率(当替代假设成立时)。结果:增加研究样本量与假阴性率的降低相关,然而,当使用固定显著性阈值时,无论样本量大小,假阳性率都发生在固定值;然而,当采用灵活阈值时,该比率下降。采用弹性阈值在很大程度上消除了可重复性危机的同时,也暴露了频率统计推断框架的内在冲突。计算灵活的p显著性阈值只能在后验,在获得结果之后。因此,即使对于复制品,阈值也会有所不同,这与常识相矛盾。结论:依靠频率统计推断和p值似乎不再是一种可行的方法。重点应该转向数据分析的替代方法,例如贝叶斯统计方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

Background: Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.

Methods: The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.

Results: Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.

Conclusions: It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信