{"title":"Further Questioning of the Significance of the Gepants: A Response","authors":"E. Loder, P. Tfelt-Hansen","doi":"10.1111/head.13683","DOIUrl":null,"url":null,"abstract":"We thank Drs. Nguyen and Hu for their comments on our paper, and for suggesting the fragility index as a method to assess claims about statistical significance. Its major virtue is to draw attention to the number of events that would have to change in the control group in order to shift the P value to above .05. In a large series of randomized, controlled trials (RCTs), the median fragility index was 8, and such low numbers may help to identify less robust results. The calculation of fragility indices can probably be useful in methodological evaluation of RCTs, but it is most likely not suitable for evaluation of the clinical relevance of the results in an RCT. For this purpose, the calculation of therapeutic gain with 95% CI conveys more clinically relevant information. To us and to over 800 signatories of a recent editorial in the journal Nature, however, the larger problem seems to be the “dichotomania” that prevails in interpreting P values. Any P value threshold is artificial. It is naive and simplistic to use P values to claim that effects are present or absent. Instead, P values should be interpreted as a continuous measure and study findings should be framed in terms of clinical benefit. Researchers and readers should be encouraged to consider whether, across all values within the 95% confidence interval, there is evidence of meaningful medical effects. It can be difficult to decide whether trial findings are clinically important, and such determinations are often context-specific. No metric solves all problems of interpretation or can substitute for common sense and clinical judgment. Everyone should beware of claims that study findings are “highly statistically significant.” The next time someone invites you to admire a very tiny P value, consider that they may be hoping you will “pay no attention to that [equally tiny effect size] behind the curtain.”","PeriodicalId":12845,"journal":{"name":"Headache: The Journal of Head and Face Pain","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Headache: The Journal of Head and Face Pain","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/head.13683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We thank Drs. Nguyen and Hu for their comments on our paper, and for suggesting the fragility index as a method to assess claims about statistical significance. Its major virtue is to draw attention to the number of events that would have to change in the control group in order to shift the P value to above .05. In a large series of randomized, controlled trials (RCTs), the median fragility index was 8, and such low numbers may help to identify less robust results. The calculation of fragility indices can probably be useful in methodological evaluation of RCTs, but it is most likely not suitable for evaluation of the clinical relevance of the results in an RCT. For this purpose, the calculation of therapeutic gain with 95% CI conveys more clinically relevant information. To us and to over 800 signatories of a recent editorial in the journal Nature, however, the larger problem seems to be the “dichotomania” that prevails in interpreting P values. Any P value threshold is artificial. It is naive and simplistic to use P values to claim that effects are present or absent. Instead, P values should be interpreted as a continuous measure and study findings should be framed in terms of clinical benefit. Researchers and readers should be encouraged to consider whether, across all values within the 95% confidence interval, there is evidence of meaningful medical effects. It can be difficult to decide whether trial findings are clinically important, and such determinations are often context-specific. No metric solves all problems of interpretation or can substitute for common sense and clinical judgment. Everyone should beware of claims that study findings are “highly statistically significant.” The next time someone invites you to admire a very tiny P value, consider that they may be hoping you will “pay no attention to that [equally tiny effect size] behind the curtain.”