统计显著性与社会科学的复制危机

Oxford Research Encyclopedia of Economics and Finance Pub Date : 2019-07-29 DOI:10.1093/ACREFORE/9780190625979.013.461

Anna Dreber, M. Johannesson

{"title":"统计显著性与社会科学的复制危机","authors":"Anna Dreber, M. Johannesson","doi":"10.1093/ACREFORE/9780190625979.013.461","DOIUrl":null,"url":null,"abstract":"The recent “replication crisis” in the social sciences has led to increased attention on what statistically significant results entail. There are many reasons for why false positive results may be published in the scientific literature, such as low statistical power and “researcher degrees of freedom” in the analysis (where researchers when testing a hypothesis more or less actively seek to get results with p < .05). The results from three large replication projects in psychology, experimental economics, and the social sciences are discussed, with most of the focus on the last project where the statistical power in the replications was substantially higher than in the other projects. The results suggest that there is a substantial share of published results in top journals that do not replicate. While several replication indicators have been proposed, the main indicator for whether a results replicates or not is whether the replication study using the same statistical test finds a statistically significant effect (p < .05 in a two-sided test). For the project with very high statistical power the various replication indicators agree to a larger extent than for the other replication projects, and this is most likely due to the higher statistical power. While the replications discussed mainly are experiments, there are no reasons to believe that the replicability would be higher in other parts of economics and finance, if anything the opposite due to more researcher degrees of freedom. There is also a discussion of solutions to the often-observed low replicability, including lowering the p value threshold to .005 for statistical significance and increasing the use of preanalysis plans and registered reports for new studies as well as replications, followed by a discussion of measures of peer beliefs. Recent attempts to understand to what extent the academic community is aware of the limited reproducibility and can predict replication outcomes using prediction markets and surveys suggest that peer beliefs may be viewed as an additional reproducibility indicator.","PeriodicalId":211658,"journal":{"name":"Oxford Research Encyclopedia of Economics and Finance","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Statistical Significance and the Replication Crisis in the Social Sciences\",\"authors\":\"Anna Dreber, M. Johannesson\",\"doi\":\"10.1093/ACREFORE/9780190625979.013.461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent “replication crisis” in the social sciences has led to increased attention on what statistically significant results entail. There are many reasons for why false positive results may be published in the scientific literature, such as low statistical power and “researcher degrees of freedom” in the analysis (where researchers when testing a hypothesis more or less actively seek to get results with p < .05). The results from three large replication projects in psychology, experimental economics, and the social sciences are discussed, with most of the focus on the last project where the statistical power in the replications was substantially higher than in the other projects. The results suggest that there is a substantial share of published results in top journals that do not replicate. While several replication indicators have been proposed, the main indicator for whether a results replicates or not is whether the replication study using the same statistical test finds a statistically significant effect (p < .05 in a two-sided test). For the project with very high statistical power the various replication indicators agree to a larger extent than for the other replication projects, and this is most likely due to the higher statistical power. While the replications discussed mainly are experiments, there are no reasons to believe that the replicability would be higher in other parts of economics and finance, if anything the opposite due to more researcher degrees of freedom. There is also a discussion of solutions to the often-observed low replicability, including lowering the p value threshold to .005 for statistical significance and increasing the use of preanalysis plans and registered reports for new studies as well as replications, followed by a discussion of measures of peer beliefs. Recent attempts to understand to what extent the academic community is aware of the limited reproducibility and can predict replication outcomes using prediction markets and surveys suggest that peer beliefs may be viewed as an additional reproducibility indicator.\",\"PeriodicalId\":211658,\"journal\":{\"name\":\"Oxford Research Encyclopedia of Economics and Finance\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Oxford Research Encyclopedia of Economics and Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/ACREFORE/9780190625979.013.461\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oxford Research Encyclopedia of Economics and Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ACREFORE/9780190625979.013.461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

最近社会科学领域的“复制危机”导致人们越来越关注统计上显著的结果所带来的影响。科学文献中出现假阳性结果的原因有很多，比如统计能力低和分析中的“研究人员自由度”(研究人员在检验假设时或多或少地积极寻求p < 0.05的结果)。本文讨论了心理学、实验经济学和社会科学三个大型复制项目的结果，其中大部分集中在最后一个项目上，其中复制的统计能力大大高于其他项目。研究结果表明，在顶级期刊上发表的研究结果中，有相当大一部分没有被复制。虽然已经提出了几个重复性指标，但结果是否重复的主要指标是使用相同统计检验的重复性研究是否发现有统计学显著效应(双侧检验p < 0.05)。对于统计功率非常高的项目，各复制指标的一致性比其他复制项目的一致性更大，这很可能是由于统计功率更高。虽然所讨论的复制主要是实验，但没有理由相信经济和金融的其他部分的可复制性会更高，如果有相反的情况，因为更多的研究人员的自由度。还讨论了解决经常观察到的低可复制性的方法，包括将p值阈值降低到0.005的统计显著性，增加对新研究和重复的预分析计划和注册报告的使用，然后讨论了同伴信念的测量方法。最近尝试了解学术界在多大程度上意识到有限的可重复性，并可以使用预测市场和调查来预测复制结果，这表明同伴信念可以被视为一个额外的可重复性指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Statistical Significance and the Replication Crisis in the Social Sciences

The recent “replication crisis” in the social sciences has led to increased attention on what statistically significant results entail. There are many reasons for why false positive results may be published in the scientific literature, such as low statistical power and “researcher degrees of freedom” in the analysis (where researchers when testing a hypothesis more or less actively seek to get results with p < .05). The results from three large replication projects in psychology, experimental economics, and the social sciences are discussed, with most of the focus on the last project where the statistical power in the replications was substantially higher than in the other projects. The results suggest that there is a substantial share of published results in top journals that do not replicate. While several replication indicators have been proposed, the main indicator for whether a results replicates or not is whether the replication study using the same statistical test finds a statistically significant effect (p < .05 in a two-sided test). For the project with very high statistical power the various replication indicators agree to a larger extent than for the other replication projects, and this is most likely due to the higher statistical power. While the replications discussed mainly are experiments, there are no reasons to believe that the replicability would be higher in other parts of economics and finance, if anything the opposite due to more researcher degrees of freedom. There is also a discussion of solutions to the often-observed low replicability, including lowering the p value threshold to .005 for statistical significance and increasing the use of preanalysis plans and registered reports for new studies as well as replications, followed by a discussion of measures of peer beliefs. Recent attempts to understand to what extent the academic community is aware of the limited reproducibility and can predict replication outcomes using prediction markets and surveys suggest that peer beliefs may be viewed as an additional reproducibility indicator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Oxford Research Encyclopedia of Economics and Finance

自引率

0.00%

发文量