Cats and external cervical resorption: Statistical considerations

IF 7.1 1区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

International endodontic journal Pub Date : 2025-05-19 DOI:10.1111/iej.14258

Nasir Z. Bashir, Eduardo Bernabé

{"title":"Cats and external cervical resorption: Statistical considerations","authors":"Nasir Z. Bashir, Eduardo Bernabé","doi":"10.1111/iej.14258","DOIUrl":null,"url":null,"abstract":"We have read with some intrigue the recent article by Patel et al., 2025, which investigates putative risk factors for external cervical resorption (ECR). In a subsequent letter, Tay & Cooray, 2025, provide a fantastic exposition on why basic consideration of epidemiological methods means that drawing reliable inferences from this article is incredibly difficult. In addition to these concerns, we would like to highlight issues pertaining to the statistical methods used:The way in which the authors present their power calculations is an example of ‘cut-and-paste’ statistics (White et al., 2022). They claim ‘For a sample size of 180 patients, a power of 95% would be achieved to detect differences between two independent proportions, assuming a level of confidence of 95%’.It is evident that power is dependent on what the investigator specifies p1 and p2 to be. The authors have not described what they assumed these to be, nor is there any mention of what the desired minimum detectable effect is, that is, p2–p1. Hence, it is completely unclear what effect the authors are claiming they have 95% power to detect.For readers who wish to run the power calculation analogous to that which the authors present, we provide the following R code:\n The output of which indicates the authors have a power to detect a difference of 0.26 between proportions in independent groups, assuming p1 = 0.5. The differences in proportions between the groups observed in the study sample are generally far smaller than 0.26.The study sample was stratified into individuals with no identifiable factor, a single factor, or multiple factors. Whilst stratification in this manner is not inherently an issue, statistical tests carried out should be interpreted appropriately. In figure 4, there are 15 proposed risk factors. This means there are 215 (= 31 768) possible combinations of risk factors which an individual may have. Taken to the logical extreme, should the authors not run all 31 768 tests and report all significant findings? Effectively, figures 6 through 8 suggest a less extreme version of this was carried out, with 15 tests per figure, resulting in (at a minimum) a total of 45 significance tests. Assuming a type I error rate of 0.05, this means that 2 to 3 of the reported significant findings are likely false positives. Tay & Cooray, 2025, suggest the use of correction for multiple testing to control the false discovery rate. There have been back-and-forth debates on the need to adjust for multiple testing for several decades, and we are not necessarily arguing that it should or should not have been done in this paper (Rothman, 1990). What we are stating, however, is that the issues around multiple testing should have been brought up and transparently discussed.The authors claim that ‘the aim of this study was to investigate potential predisposing factors associated with ECR’. All of the individuals in this study have ECR, so it is unclear exactly how this aim will be addressed, without comparison to individuals who do not have ECR.This inappropriate study sample means that the null hypothesis in these significance tests which the authors carry out is not that the predisposing factors are associated with no increase in risk of ECR, that is, the intuitive null hypothesis which one typically assumes. Given that the study sample is patients with ECR, the findings of the statistical tests are to be interpreted as conditional on the presence of disease. For example, consider the reported result that ‘There was a significant association between cat ownership and ECR in the mandible (23.6%, 25/106 teeth, p = .002)’. This statement is not disingenuous, but it is still prone to misinterpretation by the reader. The null hypothesis being tested here is: amongst individuals with ECR, is cat ownership associated with jaw location? In other words, we are not being told if cat ownership is associated with an increased risk of ECR. We are being told, if you have ECR, is cat ownership associated with a difference in whether the ECR is incident in the upper or lower jaw? This hypothesis test seems effectively irrelevant to clinical practice.More generally, the null hypotheses being across the statistical tests used were, ‘amongst individuals with ECR, is there an association between factor X and jaw location?’ Where factor X may be age, sex, cat ownership, number of putative risk factors etc. Again, it is completely unclear why such a test would be informative for clinical practice.Finally, and most obviously, the authors give only cursory consideration as to how cat ownership in their study sample relates to the population from which they are sampling. Of course, one should not forget (as Tay & Cooray, 2025, mention) naïve comparisons are prone to confounding and do not account for any causal structure. However, all the tests carried out in the article are done in this manner, so we feel equally justified in doing so.The authors collected their data during the period from 2017 to 2022, and national statistics from Cats Protection indicate that the proportion of cat-owning households in Greater London in 2021 was 26% (Cats Protection, 2021). Assuming the authors have taken a representative sample of patients with ECR, this means the prevalence of cat-owning amongst patients with ECR is over 10% lower than that of the population from which they were sampled. Of course, this is a liberal estimate as one may need to correct for repeated measures. There were 194 patients and 215 teeth, so the upper bound on the proportion of cat-owning ECR patients is 34/194 = 17.5% (95% CI: 12.2% to 22.9%); still much lower than the London population prevalence of cat-owning households.Based on the above data, should we then ask: does cat-owning prevent ECR?Nasir Z. Bashir: Conceptualization, design, data acquisition and interpretation, drafting and critically revising the manuscript. Eduardo Bernabé: Drafting and critically revising the manuscript.NZB is supported by the Wellcome Trust (322 777/Z/24/Z).The authors declare no conflicts of interest.No ethical approval was required for this work.","PeriodicalId":13724,"journal":{"name":"International endodontic journal","volume":"58 8","pages":"1277-1279"},"PeriodicalIF":7.1000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/iej.14258","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International endodontic journal","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/iej.14258","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

We have read with some intrigue the recent article by Patel et al., 2025, which investigates putative risk factors for external cervical resorption (ECR). In a subsequent letter, Tay & Cooray, 2025, provide a fantastic exposition on why basic consideration of epidemiological methods means that drawing reliable inferences from this article is incredibly difficult. In addition to these concerns, we would like to highlight issues pertaining to the statistical methods used:

The way in which the authors present their power calculations is an example of ‘cut-and-paste’ statistics (White et al., 2022). They claim ‘For a sample size of 180 patients, a power of 95% would be achieved to detect differences between two independent proportions, assuming a level of confidence of 95%’.

It is evident that power is dependent on what the investigator specifies p₁ and p₂ to be. The authors have not described what they assumed these to be, nor is there any mention of what the desired minimum detectable effect is, that is, p₂–p₁. Hence, it is completely unclear what effect the authors are claiming they have 95% power to detect.

For readers who wish to run the power calculation analogous to that which the authors present, we provide the following R code:

The output of which indicates the authors have a power to detect a difference of 0.26 between proportions in independent groups, assuming p₁ = 0.5. The differences in proportions between the groups observed in the study sample are generally far smaller than 0.26.

The study sample was stratified into individuals with no identifiable factor, a single factor, or multiple factors. Whilst stratification in this manner is not inherently an issue, statistical tests carried out should be interpreted appropriately. In figure 4, there are 15 proposed risk factors. This means there are 2¹⁵ (= 31 768) possible combinations of risk factors which an individual may have. Taken to the logical extreme, should the authors not run all 31 768 tests and report all significant findings? Effectively, figures 6 through 8 suggest a less extreme version of this was carried out, with 15 tests per figure, resulting in (at a minimum) a total of 45 significance tests. Assuming a type I error rate of 0.05, this means that 2 to 3 of the reported significant findings are likely false positives. Tay & Cooray, 2025, suggest the use of correction for multiple testing to control the false discovery rate. There have been back-and-forth debates on the need to adjust for multiple testing for several decades, and we are not necessarily arguing that it should or should not have been done in this paper (Rothman, 1990). What we are stating, however, is that the issues around multiple testing should have been brought up and transparently discussed.

The authors claim that ‘the aim of this study was to investigate potential predisposing factors associated with ECR’. All of the individuals in this study have ECR, so it is unclear exactly how this aim will be addressed, without comparison to individuals who do not have ECR.

This inappropriate study sample means that the null hypothesis in these significance tests which the authors carry out is not that the predisposing factors are associated with no increase in risk of ECR, that is, the intuitive null hypothesis which one typically assumes. Given that the study sample is patients with ECR, the findings of the statistical tests are to be interpreted as conditional on the presence of disease. For example, consider the reported result that ‘There was a significant association between cat ownership and ECR in the mandible (23.6%, 25/106 teeth, p = .002)’. This statement is not disingenuous, but it is still prone to misinterpretation by the reader. The null hypothesis being tested here is: amongst individuals with ECR, is cat ownership associated with jaw location? In other words, we are not being told if cat ownership is associated with an increased risk of ECR. We are being told, if you have ECR, is cat ownership associated with a difference in whether the ECR is incident in the upper or lower jaw? This hypothesis test seems effectively irrelevant to clinical practice.

More generally, the null hypotheses being across the statistical tests used were, ‘amongst individuals with ECR, is there an association between factor X and jaw location?’ Where factor X may be age, sex, cat ownership, number of putative risk factors etc. Again, it is completely unclear why such a test would be informative for clinical practice.

Finally, and most obviously, the authors give only cursory consideration as to how cat ownership in their study sample relates to the population from which they are sampling. Of course, one should not forget (as Tay & Cooray, 2025, mention) naïve comparisons are prone to confounding and do not account for any causal structure. However, all the tests carried out in the article are done in this manner, so we feel equally justified in doing so.

The authors collected their data during the period from 2017 to 2022, and national statistics from Cats Protection indicate that the proportion of cat-owning households in Greater London in 2021 was 26% (Cats Protection, 2021). Assuming the authors have taken a representative sample of patients with ECR, this means the prevalence of cat-owning amongst patients with ECR is over 10% lower than that of the population from which they were sampled. Of course, this is a liberal estimate as one may need to correct for repeated measures. There were 194 patients and 215 teeth, so the upper bound on the proportion of cat-owning ECR patients is 34/194 = 17.5% (95% CI: 12.2% to 22.9%); still much lower than the London population prevalence of cat-owning households.

Based on the above data, should we then ask: does cat-owning prevent ECR?

Nasir Z. Bashir: Conceptualization, design, data acquisition and interpretation, drafting and critically revising the manuscript. Eduardo Bernabé: Drafting and critically revising the manuscript.

NZB is supported by the Wellcome Trust (322 777/Z/24/Z).

The authors declare no conflicts of interest.

No ethical approval was required for this work.

查看原文本刊更多论文

猫和颈椎外吸收：统计学考虑：致编辑的信“颈椎外吸收的潜在易感性特征：一项观察性研究（Patel et al., 2025）”。

我们带着一些兴趣阅读了最近由Patel等人（2025）撰写的文章，该文章调查了宫颈外吸收（ECR）的推定危险因素。在随后的一封信中，Tay &；Cooray, 2025，精彩地阐述了为什么对流行病学方法的基本考虑意味着从这篇文章中得出可靠推论是极其困难的。除了这些问题，我们还想强调与所使用的统计方法有关的问题：作者展示其功率计算的方式是“剪切-粘贴”统计的一个例子（White et al., 2022）。他们声称，“对于180名患者的样本量，假设95%的置信度水平，95%的功率可以检测到两个独立比例之间的差异”。很明显，权力取决于调查者指定的p1和p2是什么。作者没有描述他们假设的这些效应是什么，也没有提到期望的最小可检测效应是什么，即p2-p1。因此，完全不清楚作者声称他们有95%的能力检测到什么影响。对于希望运行类似于作者所呈现的功率计算的读者，我们提供了以下R代码：其输出表明作者有能力检测到独立组中比例之间的0.26差异，假设p1 = 0.5。在研究样本中观察到的各组之间的比例差异通常远小于0.26。研究样本被分层为没有可识别因素、单一因素或多重因素的个体。虽然以这种方式分层本身不是一个问题，但应适当地解释所进行的统计检验。在图4中，有15个建议的风险因素。这意味着一个人可能有215（= 31768）种可能的风险因素组合。从逻辑上讲，作者不应该运行所有31768个测试并报告所有重要的发现吗？实际上，图6到图8表明执行了一个不太极端的版本，每个图有15个测试，结果（至少）总共有45个显著性测试。假设I型错误率为0.05，这意味着报告的重要发现中有2到3个可能是假阳性。茶,Cooray， 2025建议对多重测试使用校正来控制错误发现率。几十年来，关于调整多重测试的必要性一直存在着反复的争论，我们并不一定认为在本文中应该或不应该这样做（Rothman, 1990）。然而，我们要说的是，围绕多重测试的问题应该被提出并透明地讨论。作者声称，“这项研究的目的是调查与ECR相关的潜在诱发因素”。本研究中的所有个体都有ECR，因此不清楚如何实现这一目标，没有与没有ECR的个体进行比较。这种不适当的研究样本意味着，作者进行的这些显著性检验中的原假设不是诱发因素与ECR风险不增加有关，即人们通常假设的直观的原假设。鉴于研究样本是患有ECR的患者，统计检验的结果应被解释为以疾病的存在为条件。例如，考虑报道的结果“猫的所有权与下颌骨ECR之间存在显著关联（23.6%，25/106齿，p = 0.002）”。这种说法并不虚伪，但仍容易被读者误解。这里测试的零假设是：在患有ECR的个体中，猫的所有权与下巴位置有关吗？换句话说，我们并没有被告知养猫是否与ECR风险增加有关。我们被告知，如果你有ECR，养猫是否与ECR发生在上颚还是下颚的差异有关？这个假设检验似乎与临床实践无关。更一般地说，在使用的统计检验中，零假设是，“在患有ECR的个体中，因素X和下巴位置之间是否存在关联？”“其中X因素可能是年龄、性别、养猫情况、可能的风险因素数量等。同样，目前还完全不清楚为什么这样的测试会为临床实践提供信息。最后，也是最明显的一点是，作者只粗略地考虑了他们研究样本中猫的所有权与他们抽样的人口之间的关系。当然，人们不应该忘记（就像Tay一样）。Cooray, 2025，提及)naïve比较容易混淆，不考虑任何因果结构。但是，本文中进行的所有测试都是以这种方式进行的，因此我们认为这样做同样合理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International endodontic journal 医学-牙科与口腔外科

CiteScore

10.20

自引率

28.00%

发文量

195

审稿时长

4-8 weeks

期刊介绍： The International Endodontic Journal is published monthly and strives to publish original articles of the highest quality to disseminate scientific and clinical knowledge; all manuscripts are subjected to peer review. Original scientific articles are published in the areas of biomedical science, applied materials science, bioengineering, epidemiology and social science relevant to endodontic disease and its management, and to the restoration of root-treated teeth. In addition, review articles, reports of clinical cases, book reviews, summaries and abstracts of scientific meetings and news items are accepted. The International Endodontic Journal is essential reading for general dental practitioners, specialist endodontists, research, scientists and dental teachers.