Letter regarding “Prospective randomized trial comparing relapse rates in dogs with steroid-responsive meningitis-arteritis treated with a 6-week or 6-month prednisolone protocol”
{"title":"Letter regarding “Prospective randomized trial comparing relapse rates in dogs with steroid-responsive meningitis-arteritis treated with a 6-week or 6-month prednisolone protocol”","authors":"Andrew Woodward","doi":"10.1111/jvim.17188","DOIUrl":null,"url":null,"abstract":"<p>I read with interest the article “Prospective randomized trial comparing relapse rates in dogs with steroid-responsive meningitis-arteritis treated with a 6-week or 6-month prednisolone protocol.”<span><sup>1</sup></span> I am concerned that the article contains substantial misinterpretations of statistical evidence, which undermine the reliability of the authors' conclusions.</p><p>Unfortunately, the phrase “no significant difference” may be deeply misleading unless it is correctly interpreted, under the unintuitive logic of frequentist hypothesis tests, and has led the authors to a mistake; it may be true that the interventions are practically exchangeable, but that conclusion cannot be reached from “not significant.” The <i>P</i>-value resulting from frequentist hypothesis tests represents an approximation of the probability that data more extreme than the data at hand would be observed, if the test hypothesis was true; where “probability” represents the frequency in a hypothetical series of identical repeated trials, and the test hypothesis is some statistical model including its parameters.<span><sup>2</sup></span></p><p>The <i>P</i>-value is calculated under the assumption that the test hypothesis (whatever it is) is true, so never indicates support for the test hypothesis, which would involve circular reasoning. Though the “evidential” meaning of <i>P</i>-values is contested and generally dubious,<span><sup>3</sup></span> in simple terms the <i>P</i>-value can be considered a summary of the evidence provided by the data to <i>refute</i> the test hypothesis,<span><sup>4</sup></span> or equivalently, an expression of how surprising it would be to observe data at least as extreme as these, if the test hypothesis was in fact true.<span><sup>2</sup></span> Though a small <i>P</i>-value may suggest (charitably) that some aspect of the test hypothesis is untrue, the usage advocated by Fisher, a large <i>P</i>-value does not support that it is true, because it says nothing about other test hypotheses (in this case, difference between interventions) with which the data may be compatible. It is, therefore, incorrect to conclude anything substantive from a large <i>P</i>-value. Unfortunately, the incorrect interpretation that a large <i>P</i>-value indicates that an effect or association is absent appears common in clinical trials reporting.<span><sup>5</sup></span></p><p>An emphasis on confidence intervals may mitigate some of the limitations of reasoning based on hypothesis tests, even if their exact meaning is unintuitive. This is a popular view,<span><sup>6</sup></span> and is generally encouraged by relevant reporting guidelines. Considering the authors' estimate of the relative incidence risk (which they express as odds ratio) of at least one relapse, which they state as 1.40 (95% CI: 0.40, 4.96, <i>P</i> = 0.60), the confidence interval represents, in simple terms, the set of values of the parameter of interest (test hypotheses) with which the data are reasonably consistent; in some sense, those values of the parameter that the data at hand cannot rule out. Although I profess no expertise with the assessment of intervention effect sizes in neurology, at face value the interval is fairly wide; the data are consistent with benefit of the 6-weekly intervention of about 2.5 reduction (1/0.40) in the odds of relapse, or detriment of about 5 times increase in the odds of relapse. The extent of evidence this study provided in support of the 6-weekly intervention depends strongly on a contextual interpretation of all the effects within the nominated uncertainty interval, which was not provided by the authors.</p><p>The conflation of “not statistically significant” with “not different” is an apparently common, but fundamental, error. The apparent effect of this and related misinterpretations of frequentist statistics have led to recent calls for major reform.<span><sup>8</sup></span> For the majority of situations, I contend that veterinary clinical researchers would be wise to set aside “significance,” <i>P</i>-value thresholds, and related concepts altogether, and focus on estimates and their uncertainty.</p>","PeriodicalId":49958,"journal":{"name":"Journal of Veterinary Internal Medicine","volume":"38 5","pages":"2412-2413"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jvim.17188","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Veterinary Internal Medicine","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jvim.17188","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VETERINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
I read with interest the article “Prospective randomized trial comparing relapse rates in dogs with steroid-responsive meningitis-arteritis treated with a 6-week or 6-month prednisolone protocol.”1 I am concerned that the article contains substantial misinterpretations of statistical evidence, which undermine the reliability of the authors' conclusions.
Unfortunately, the phrase “no significant difference” may be deeply misleading unless it is correctly interpreted, under the unintuitive logic of frequentist hypothesis tests, and has led the authors to a mistake; it may be true that the interventions are practically exchangeable, but that conclusion cannot be reached from “not significant.” The P-value resulting from frequentist hypothesis tests represents an approximation of the probability that data more extreme than the data at hand would be observed, if the test hypothesis was true; where “probability” represents the frequency in a hypothetical series of identical repeated trials, and the test hypothesis is some statistical model including its parameters.2
The P-value is calculated under the assumption that the test hypothesis (whatever it is) is true, so never indicates support for the test hypothesis, which would involve circular reasoning. Though the “evidential” meaning of P-values is contested and generally dubious,3 in simple terms the P-value can be considered a summary of the evidence provided by the data to refute the test hypothesis,4 or equivalently, an expression of how surprising it would be to observe data at least as extreme as these, if the test hypothesis was in fact true.2 Though a small P-value may suggest (charitably) that some aspect of the test hypothesis is untrue, the usage advocated by Fisher, a large P-value does not support that it is true, because it says nothing about other test hypotheses (in this case, difference between interventions) with which the data may be compatible. It is, therefore, incorrect to conclude anything substantive from a large P-value. Unfortunately, the incorrect interpretation that a large P-value indicates that an effect or association is absent appears common in clinical trials reporting.5
An emphasis on confidence intervals may mitigate some of the limitations of reasoning based on hypothesis tests, even if their exact meaning is unintuitive. This is a popular view,6 and is generally encouraged by relevant reporting guidelines. Considering the authors' estimate of the relative incidence risk (which they express as odds ratio) of at least one relapse, which they state as 1.40 (95% CI: 0.40, 4.96, P = 0.60), the confidence interval represents, in simple terms, the set of values of the parameter of interest (test hypotheses) with which the data are reasonably consistent; in some sense, those values of the parameter that the data at hand cannot rule out. Although I profess no expertise with the assessment of intervention effect sizes in neurology, at face value the interval is fairly wide; the data are consistent with benefit of the 6-weekly intervention of about 2.5 reduction (1/0.40) in the odds of relapse, or detriment of about 5 times increase in the odds of relapse. The extent of evidence this study provided in support of the 6-weekly intervention depends strongly on a contextual interpretation of all the effects within the nominated uncertainty interval, which was not provided by the authors.
The conflation of “not statistically significant” with “not different” is an apparently common, but fundamental, error. The apparent effect of this and related misinterpretations of frequentist statistics have led to recent calls for major reform.8 For the majority of situations, I contend that veterinary clinical researchers would be wise to set aside “significance,” P-value thresholds, and related concepts altogether, and focus on estimates and their uncertainty.
期刊介绍:
The mission of the Journal of Veterinary Internal Medicine is to advance veterinary medical knowledge and improve the lives of animals by publication of authoritative scientific articles of animal diseases.