多个比较程序的功率比较

R. S. Rodger, Mark D. Roberts
{"title":"多个比较程序的功率比较","authors":"R. S. Rodger, Mark D. Roberts","doi":"10.2458/V4I1.17775","DOIUrl":null,"url":null,"abstract":"The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffe 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffe, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, \"Are there any differences at all among the groups?\" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":"4 1","pages":"20-47"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2458/V4I1.17775","citationCount":"21","resultStr":"{\"title\":\"Comparison of Power for Multiple Comparison Procedures\",\"authors\":\"R. S. Rodger, Mark D. Roberts\",\"doi\":\"10.2458/V4I1.17775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffe 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffe, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, \\\"Are there any differences at all among the groups?\\\" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger\",\"PeriodicalId\":90602,\"journal\":{\"name\":\"Journal of methods and measurement in the social sciences\",\"volume\":\"4 1\",\"pages\":\"20-47\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2458/V4I1.17775\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of methods and measurement in the social sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2458/V4I1.17775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of methods and measurement in the social sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2458/V4I1.17775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

摘要

自20世纪50年代初以来,评估零对比(或它们的小子集,多重比较)并可能做出统计决策的方法的数量已经广泛增长。这表明了这门学科的重要性,但大多数增长都是早期方法的适度变化。本文考察了9个相当基本的程序,其中6个是设计用来评价事后选择的对比的方法,即在检查测试数据之后。其中三个使用实验型或家庭型1型错误率(Scheffe 1953, Tukey 1953, Newman-Keuls, 1939和1952),两个使用基于决策的1型错误率(Duncan 1951和Rodger 1975),一个(Fisher的LSD 1935)使用两种1型错误率定义的混合。研究的其他三种方法是用于评估和可能决定有限数量的零对比,这些零对比是独立于样本数据而选择的——最好是在收集数据之前。其中一个(计划t检验)使用基于决策的1型错误率,另外两个(一个基于Bonferroni的不等式1936年,另一个是Dunnett的1964年多一程序)使用家庭的1型错误率。使用这些不同的第1类错误率定义会在方法检测正在评估的对比中真正的非零效应的能力方面产生相当大的差异。本文描述了这些权力上的差异,特别是,它们是如何随着调查规模的增加而加剧的(例如,J的增加,即被检查的样本数量的增加)。多重对比程序从样本数据中“解出”“真实”差异的能力也受到该程序所允许的对比类型的影响。例如,多范围程序(如Newman-Keuls和Duncan的程序)只允许比较(即两组差异),这极大地限制了它们的判别能力(从技术上讲,这不是它们的权力)。许多方法(Scheffe, Tukey的HSD, Newman-Keuls, Fisher的LSD, Bonferroni和Dunnett的方法)都把重点放在一个特定的问题上,“这些群体之间有任何差异吗?”其他一些程序集中于个体对比(即邓肯,罗杰和计划对比);因此,我们更关心的是该方法可以检测到多少假null对比。这导致了对检测能力的两种基本不同的定义。最后,在事后方法和那些评估预先计划的对比可以发现的分类差异。后者的成功取决于用户如何明智地(或诚实地了解情况)计划有限数量的统计揭示对比测试。这可以极大地影响该方法的判别成功,但它通常不包括在功率评估中。这些问题将在下面的论述中加以阐述。DOI: 10.2458 / azu_jmmss_v4i1_rodger
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of Power for Multiple Comparison Procedures
The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffe 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffe, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, "Are there any differences at all among the groups?" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
26 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信