{"title":"贝叶斯因子、HDI-ROPE 和频数等效检验都可以反向设计,几乎完全相同:回复 Linde 等人(2021)。","authors":"Harlan Campbell, Paul Gustafson","doi":"10.1037/met0000507","DOIUrl":null,"url":null,"abstract":"<p><p>Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist \"TOST,\" the Bayesian \"HDI-ROPE,\" and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that \"researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence\" (p. 1). We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum Type I error rate. Our results suggest that, when calibrated in this way, the Bayes factor, HDI-ROPE, and frequentist equivalence tests all have similar-almost exactly-Type II error rates. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for 1 approach over another seems like tilting at windmills. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"613-623"},"PeriodicalIF":7.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021).\",\"authors\":\"Harlan Campbell, Paul Gustafson\",\"doi\":\"10.1037/met0000507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist \\\"TOST,\\\" the Bayesian \\\"HDI-ROPE,\\\" and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that \\\"researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence\\\" (p. 1). We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum Type I error rate. Our results suggest that, when calibrated in this way, the Bayes factor, HDI-ROPE, and frequentist equivalence tests all have similar-almost exactly-Type II error rates. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for 1 approach over another seems like tilting at windmills. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>\",\"PeriodicalId\":20782,\"journal\":{\"name\":\"Psychological methods\",\"volume\":\" \",\"pages\":\"613-623\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychological methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1037/met0000507\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000507","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
Linde 等人(2021 年)进行了广泛的模拟研究,比较了用于确定等效性的三种不同程序(频数主义 "TOST"、贝叶斯 "HDI-ROPE "和贝叶斯因子区间无效程序)的运行特征,最后建议 "研究人员更多地依赖贝叶斯因子区间无效方法来量化等效性证据"(第 1 页)。我们重新进行了 Linde 等人(2021 年)的全部模拟研究,但将不同的程序校准为具有相同的预定最大 I 类错误率。我们的结果表明,当以这种方式进行校准时,贝叶斯因子、HDI-ROPE 和频数等效检验都具有相似的--几乎完全相同的--第二类错误率。总的来说,任何鼓吹频繁测试在经验结果方面优于或劣于贝叶斯测试的说法,充其量也只是一种怀疑。如果我们决定了在处理某个问题时应采用哪种基本原则,那么方法自然也就随之而来了。要知道,每种方法都可以从其他方法中逆向推导出来(至少可以近似地推导出来),因此,试图用经验结果来证明一种方法优于另一种方法,似乎是自寻烦恼。(PsycInfo Database Record (c) 2024 APA,保留所有权利)。
The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021).
Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist "TOST," the Bayesian "HDI-ROPE," and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that "researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence" (p. 1). We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum Type I error rate. Our results suggest that, when calibrated in this way, the Bayes factor, HDI-ROPE, and frequentist equivalence tests all have similar-almost exactly-Type II error rates. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for 1 approach over another seems like tilting at windmills. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.