{"title":"Digging into p Values","authors":"R. W. Emerson","doi":"10.1177/0145482x221144443","DOIUrl":null,"url":null,"abstract":"Back in the January to February 2016 issue of this journal, I discussed p value and the increased need to report effect sizes along with p value. Since some time has passed and p value remains an important aspect of statistical reporting, I thought it wise to revisit the topic. To illustrate some points, we will refer to the article from this issue entitled “COVID-19: Social Distancing and Physical Activity in United Kingdom Residents with Visual Impairment,” by Strongman, Swain, Chung, Merzbach, and Gordon. The authors of this article made a number of t test comparisons where they are comparing the mean of one group to the mean of another group. If you cast your mind back, you will remember that, in the social sciences, we generally have a cutoff for “statistical significance” of .05 for such comparisons. This measure of significance means that, if the p value or significance level, is < .05, the difference in means between the two groups is deemed “statistically significant.” Statistical significance means that there is less than a 5% chance that the observed difference is due to chance. It is the accepted level of chance that experimenters are willing to accept in the social sciences, where data tend to be a little more noisy or hard to measure accurately than in something like physics. Let us unpack this matter a little more. A number of the comparisons in the article I am using as an example today has p values close to .05, either slightly more or less. How meaningful is it to claim that a comparison with a p value of .051 is not statistically meaningful while one with a p value of .049 is? This question is the reason why I made the case in 2016 that we should also include a measure of effect size when reporting the results of statistical tests so that the magnitude of the difference can also be known. In 1994, Jacob Cohen, a big name in statistics circles, wrote a piece entitled, “The Earth is Round (p < .05),” in which he summarized a long history of people noting that null hypothesis significance testing (which is what you are doing when you rely on the p level) is a dangerous game. Let us take this suggestion step by step. In null hypothesis significance testing (or NHST, for short), we start with the null hypothesis that the groups we are comparing are not different, or are drawn from the same larger population. If the p value from our statistical comparison is less than our cutoff (which is often .05), we “fail to accept the null hypothesis,” which leads one to want to say that the two groups are different. As Jacob Cohen notes,","PeriodicalId":47438,"journal":{"name":"Journal of Visual Impairment & Blindness","volume":"116 1","pages":"857 - 858"},"PeriodicalIF":1.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Impairment & Blindness","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/0145482x221144443","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REHABILITATION","Score":null,"Total":0}
引用次数: 0
Abstract
Back in the January to February 2016 issue of this journal, I discussed p value and the increased need to report effect sizes along with p value. Since some time has passed and p value remains an important aspect of statistical reporting, I thought it wise to revisit the topic. To illustrate some points, we will refer to the article from this issue entitled “COVID-19: Social Distancing and Physical Activity in United Kingdom Residents with Visual Impairment,” by Strongman, Swain, Chung, Merzbach, and Gordon. The authors of this article made a number of t test comparisons where they are comparing the mean of one group to the mean of another group. If you cast your mind back, you will remember that, in the social sciences, we generally have a cutoff for “statistical significance” of .05 for such comparisons. This measure of significance means that, if the p value or significance level, is < .05, the difference in means between the two groups is deemed “statistically significant.” Statistical significance means that there is less than a 5% chance that the observed difference is due to chance. It is the accepted level of chance that experimenters are willing to accept in the social sciences, where data tend to be a little more noisy or hard to measure accurately than in something like physics. Let us unpack this matter a little more. A number of the comparisons in the article I am using as an example today has p values close to .05, either slightly more or less. How meaningful is it to claim that a comparison with a p value of .051 is not statistically meaningful while one with a p value of .049 is? This question is the reason why I made the case in 2016 that we should also include a measure of effect size when reporting the results of statistical tests so that the magnitude of the difference can also be known. In 1994, Jacob Cohen, a big name in statistics circles, wrote a piece entitled, “The Earth is Round (p < .05),” in which he summarized a long history of people noting that null hypothesis significance testing (which is what you are doing when you rely on the p level) is a dangerous game. Let us take this suggestion step by step. In null hypothesis significance testing (or NHST, for short), we start with the null hypothesis that the groups we are comparing are not different, or are drawn from the same larger population. If the p value from our statistical comparison is less than our cutoff (which is often .05), we “fail to accept the null hypothesis,” which leads one to want to say that the two groups are different. As Jacob Cohen notes,
期刊介绍:
The Journal of Visual Impairment & Blindness is the essential professional resource for information about visual impairment (that is, blindness or low vision). The international peer-reviewed journal of record in the field, it delivers current research and best practice information, commentary from authoritative experts on critical topics, News From the Field, and a calendar of important events. Practitioners and researchers, policymakers and administrators, counselors and advocates rely on JVIB for its delivery of cutting-edge research and the most up-to-date practices in the field of visual impairment and blindness. Available in print and online 24/7, JVIB offers immediate access to information from the leading researchers, teachers of students with visual impairments (often referred to as TVIs), orientation and mobility (O&M) practitioners, vision rehabilitation therapists (often referred to as VRTs), early interventionists, and low vision therapists (often referred to as LVTs) in the field.