超越显著性检验仪式:有什么?

IF 2 4区 心理学 Q2 PSYCHOLOGY, MULTIDISCIPLINARY
P. Sedlmeier
{"title":"超越显著性检验仪式:有什么?","authors":"P. Sedlmeier","doi":"10.1027/0044-3409.217.1.1","DOIUrl":null,"url":null,"abstract":"The mindless use of null-hypothesis significance testing – the significance test ritual (e.g., Salsburg, 1985) – has long been criticized. The main component of the ritual can be characterized as follows: Once you have collected your data, try to refute your null hypothesis (e.g., no mean difference, zero correlation, etc.) in an automatized manner. Often the ritual is complemented by the “star procedure”: If p < .05, assign one star to your results (*), if p < .01 give two stars (**), and if p < .001 you have earned yourself three stars (***). If you have obtained at least one star, the ritual has been successfully performed; if not, your results are not worth much. The stars, or the corresponding numerical values, have been door-openers to prestigious psychology journals and, therefore, the ritual has received strong reinforcement. The ritual does not have a firm theoretical grounding; it seems to have arisen as a badly understood hybrid mixture of the approaches of Ronald A. Fisher, Jerzy Neyman, Egon S. Pearson, and (at least in some variations of the ritual) Thomas Bayes (see Acree, 1979; Gigerenzer & Murray, 1987; Spielman, 1974). For quite some time, there has been controversy over its usefulness. The debates arising from this controversy, however, have not been limited to discussions about the mindless procedure as sketched above, but have expanded to include the issues of experimental design and sampling procedures, assumptions about the size of population effects (leading to the specification of an alternative hypothesis), deliberations about statistical power before the data are collected, and decisions about Type I and Type II errors. There have been several such debates and the controversy is ongoing (for a summary see Balluerka, Gómez, & Hidalgo, 2005; Nickerson, 2000; Sedlmeier, 1999, Appendix C). Although there have been voices that argue for a ban on significance testing (e.g., Hunter, 1997), authors usually conclude that significance tests, if conducted properly, probably have some value (or at least do no harm) but should be complemented (or replaced) by other more informative ways of analyzing data (e.g., Abelson, 1995; Cohen, 1994; Howard, Maxwell, & Fleming, 2000; Loftus, 1993; Nickerson, 2000; Sedlmeier, 1996; Wilkinson & Task Force on Statistical Inference, 1999). Alternative data-analysis techniques have been wellknown among methodologists for decades but this knowledge, mainly collected in methods journals, seems to have had little impact on the practice of researchers to date. I see two main reasons for this unsatisfactory state of affairs. First, it appears that there is still a fair amount of misunderstanding about what the results of significance tests really mean (e.g., Gordon, 2001; Haller & Krauss, 2002; Mittag & Thompson, 2000; Monterde-i-Bort, Pascual Llobell, & Frias-Navarro, 2008). Second, although alternatives have been briefly mentioned in widely received summary articles (such as Wilkinson & Task Force on Statistical Inference, 1999), they have rarely been presented in a nontechnical and detailed manner to a nonspecialized audience. Thus, researchers might, in principle, be willing to change how they analyze data but the effort needed to learn about alternative methods might just be regarded as too great. The main aim of this special issue is to introduce a collection of these alternative data-analysis methods in a nontechnical way, described by experts in the field. Before introducing the contents of the special issue, I will briefly outline the ideal state of affairs in inference statistics and discuss the difference between mindless and mindful significance testing.","PeriodicalId":47289,"journal":{"name":"Zeitschrift Fur Psychologie-Journal of Psychology","volume":"24 1","pages":"1-5"},"PeriodicalIF":2.0000,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Beyond the Significance Test Ritual: What Is There?\",\"authors\":\"P. Sedlmeier\",\"doi\":\"10.1027/0044-3409.217.1.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The mindless use of null-hypothesis significance testing – the significance test ritual (e.g., Salsburg, 1985) – has long been criticized. The main component of the ritual can be characterized as follows: Once you have collected your data, try to refute your null hypothesis (e.g., no mean difference, zero correlation, etc.) in an automatized manner. Often the ritual is complemented by the “star procedure”: If p < .05, assign one star to your results (*), if p < .01 give two stars (**), and if p < .001 you have earned yourself three stars (***). If you have obtained at least one star, the ritual has been successfully performed; if not, your results are not worth much. The stars, or the corresponding numerical values, have been door-openers to prestigious psychology journals and, therefore, the ritual has received strong reinforcement. The ritual does not have a firm theoretical grounding; it seems to have arisen as a badly understood hybrid mixture of the approaches of Ronald A. Fisher, Jerzy Neyman, Egon S. Pearson, and (at least in some variations of the ritual) Thomas Bayes (see Acree, 1979; Gigerenzer & Murray, 1987; Spielman, 1974). For quite some time, there has been controversy over its usefulness. The debates arising from this controversy, however, have not been limited to discussions about the mindless procedure as sketched above, but have expanded to include the issues of experimental design and sampling procedures, assumptions about the size of population effects (leading to the specification of an alternative hypothesis), deliberations about statistical power before the data are collected, and decisions about Type I and Type II errors. There have been several such debates and the controversy is ongoing (for a summary see Balluerka, Gómez, & Hidalgo, 2005; Nickerson, 2000; Sedlmeier, 1999, Appendix C). Although there have been voices that argue for a ban on significance testing (e.g., Hunter, 1997), authors usually conclude that significance tests, if conducted properly, probably have some value (or at least do no harm) but should be complemented (or replaced) by other more informative ways of analyzing data (e.g., Abelson, 1995; Cohen, 1994; Howard, Maxwell, & Fleming, 2000; Loftus, 1993; Nickerson, 2000; Sedlmeier, 1996; Wilkinson & Task Force on Statistical Inference, 1999). Alternative data-analysis techniques have been wellknown among methodologists for decades but this knowledge, mainly collected in methods journals, seems to have had little impact on the practice of researchers to date. I see two main reasons for this unsatisfactory state of affairs. First, it appears that there is still a fair amount of misunderstanding about what the results of significance tests really mean (e.g., Gordon, 2001; Haller & Krauss, 2002; Mittag & Thompson, 2000; Monterde-i-Bort, Pascual Llobell, & Frias-Navarro, 2008). Second, although alternatives have been briefly mentioned in widely received summary articles (such as Wilkinson & Task Force on Statistical Inference, 1999), they have rarely been presented in a nontechnical and detailed manner to a nonspecialized audience. Thus, researchers might, in principle, be willing to change how they analyze data but the effort needed to learn about alternative methods might just be regarded as too great. The main aim of this special issue is to introduce a collection of these alternative data-analysis methods in a nontechnical way, described by experts in the field. Before introducing the contents of the special issue, I will briefly outline the ideal state of affairs in inference statistics and discuss the difference between mindless and mindful significance testing.\",\"PeriodicalId\":47289,\"journal\":{\"name\":\"Zeitschrift Fur Psychologie-Journal of Psychology\",\"volume\":\"24 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2009-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Zeitschrift Fur Psychologie-Journal of Psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1027/0044-3409.217.1.1\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zeitschrift Fur Psychologie-Journal of Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1027/0044-3409.217.1.1","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 10

摘要

无脑地使用零假设显著性检验——显著性检验仪式(例如,Salsburg, 1985)——长期以来一直受到批评。这个仪式的主要组成部分可以描述如下:一旦你收集了你的数据,试着用一种自动化的方式反驳你的零假设(例如,没有平均差异,零相关等)。通常,这个仪式是由“星级程序”补充的:如果p < 0.05,给你的结果打一颗星(*),如果p < 0.01,给两颗星(**),如果p < 0.001,你已经赢得了自己的三颗星(***)。如果你获得了至少一颗星,那么仪式已经成功完成;如果不是,你的成绩就没有多大价值。星星,或相应的数值,已经成为著名心理学期刊的大门,因此,这种仪式得到了强烈的强化。这种仪式没有坚实的理论基础;它似乎是Ronald a . Fisher、Jerzy Neyman、Egon S. Pearson和Thomas Bayes(至少在仪式的某些变体中)(见Acree, 1979;Gigerenzer & Murray, 1987;Spielman, 1974)。很长一段时间以来,人们对它的实用性一直存在争议。然而,从这一争议中产生的争论并不局限于对上述无意识程序的讨论,而是扩展到包括实验设计和抽样程序的问题,关于总体效应大小的假设(导致另一种假设的规范),在收集数据之前对统计能力的审议,以及关于类型I和类型II错误的决定。有几次这样的辩论,争论仍在继续(摘要见Balluerka, Gómez, & Hidalgo, 2005;Nickerson, 2000;虽然有人主张禁止显著性检验(例如,Hunter, 1997),但作者通常得出结论,如果进行得当,显著性检验可能有一些价值(或至少不会造成伤害),但应该用其他更有信息量的分析数据的方法来补充(或取代)(例如,Abelson, 1995;科恩,1994;霍华德,麦克斯韦和弗莱明,2000;Loftus, 1993;Nickerson, 2000;Sedlmeier, 1996;Wilkinson & Task Force on Statistical Inference, 1999)。替代数据分析技术在方法学家中已经广为人知了几十年,但这些知识主要收集在方法期刊上,迄今为止似乎对研究人员的实践几乎没有影响。对于这种令人不满意的状况,我认为有两个主要原因。首先,对于显著性检验结果的真正含义,似乎仍然存在相当多的误解(例如,Gordon, 2001;Haller & Krauss, 2002;米塔格和汤普森,2000;Monterde-i-Bort, Pascual Llobell, & Frias-Navarro, 2008)。其次,尽管在广泛接受的总结文章中(如Wilkinson & Task Force on Statistical Inference, 1999)简要地提到了替代方案,但它们很少以非技术和详细的方式呈现给非专业受众。因此,原则上,研究人员可能愿意改变他们分析数据的方式,但学习替代方法所需的努力可能被认为太大了。本期特刊的主要目的是介绍由该领域专家以非技术方式描述的这些可选数据分析方法的集合。在介绍特刊内容之前,我将简要概述推理统计的理想状态,并讨论无意识和有意识显著性检验之间的区别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Beyond the Significance Test Ritual: What Is There?
The mindless use of null-hypothesis significance testing – the significance test ritual (e.g., Salsburg, 1985) – has long been criticized. The main component of the ritual can be characterized as follows: Once you have collected your data, try to refute your null hypothesis (e.g., no mean difference, zero correlation, etc.) in an automatized manner. Often the ritual is complemented by the “star procedure”: If p < .05, assign one star to your results (*), if p < .01 give two stars (**), and if p < .001 you have earned yourself three stars (***). If you have obtained at least one star, the ritual has been successfully performed; if not, your results are not worth much. The stars, or the corresponding numerical values, have been door-openers to prestigious psychology journals and, therefore, the ritual has received strong reinforcement. The ritual does not have a firm theoretical grounding; it seems to have arisen as a badly understood hybrid mixture of the approaches of Ronald A. Fisher, Jerzy Neyman, Egon S. Pearson, and (at least in some variations of the ritual) Thomas Bayes (see Acree, 1979; Gigerenzer & Murray, 1987; Spielman, 1974). For quite some time, there has been controversy over its usefulness. The debates arising from this controversy, however, have not been limited to discussions about the mindless procedure as sketched above, but have expanded to include the issues of experimental design and sampling procedures, assumptions about the size of population effects (leading to the specification of an alternative hypothesis), deliberations about statistical power before the data are collected, and decisions about Type I and Type II errors. There have been several such debates and the controversy is ongoing (for a summary see Balluerka, Gómez, & Hidalgo, 2005; Nickerson, 2000; Sedlmeier, 1999, Appendix C). Although there have been voices that argue for a ban on significance testing (e.g., Hunter, 1997), authors usually conclude that significance tests, if conducted properly, probably have some value (or at least do no harm) but should be complemented (or replaced) by other more informative ways of analyzing data (e.g., Abelson, 1995; Cohen, 1994; Howard, Maxwell, & Fleming, 2000; Loftus, 1993; Nickerson, 2000; Sedlmeier, 1996; Wilkinson & Task Force on Statistical Inference, 1999). Alternative data-analysis techniques have been wellknown among methodologists for decades but this knowledge, mainly collected in methods journals, seems to have had little impact on the practice of researchers to date. I see two main reasons for this unsatisfactory state of affairs. First, it appears that there is still a fair amount of misunderstanding about what the results of significance tests really mean (e.g., Gordon, 2001; Haller & Krauss, 2002; Mittag & Thompson, 2000; Monterde-i-Bort, Pascual Llobell, & Frias-Navarro, 2008). Second, although alternatives have been briefly mentioned in widely received summary articles (such as Wilkinson & Task Force on Statistical Inference, 1999), they have rarely been presented in a nontechnical and detailed manner to a nonspecialized audience. Thus, researchers might, in principle, be willing to change how they analyze data but the effort needed to learn about alternative methods might just be regarded as too great. The main aim of this special issue is to introduce a collection of these alternative data-analysis methods in a nontechnical way, described by experts in the field. Before introducing the contents of the special issue, I will briefly outline the ideal state of affairs in inference statistics and discuss the difference between mindless and mindful significance testing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Zeitschrift Fur Psychologie-Journal of Psychology
Zeitschrift Fur Psychologie-Journal of Psychology PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
4.10
自引率
5.60%
发文量
37
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信