{"title":"Behavior of test specificity under an imperfect gold standard: findings from a simulation study and analysis of real-world oncology data.","authors":"Mark S Walker, Lukas Slipski, Yanina Natanzon","doi":"10.1186/s12874-025-02603-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Gold standards used in validation of new tests may be imperfect, with sensitivity or specificity less than 100%. The impact of imperfection in a gold standard on measured test attributes has been demonstrated formally, but its relevance in real-world oncology research may not be well understood.</p><p><strong>Methods: </strong>This simulation study examined the impact of imperfect gold standard sensitivity on measured test specificity at different levels of condition prevalence for a hypothetical real-world measure of death. The study also evaluated real-world oncology datasets with a linked National Death Index (NDI) dataset, to examine the measured specificity of a death indicator at levels of death prevalence that matched the simulation. The simulation and real-world data analysis both examined measured specificity of the death indicator at death prevalence ranging from 50 to 98%. To isolate the effects of death prevalence and imperfect gold standard sensitivity, the simulation assumed a test with perfect sensitivity and specificity, and with perfect gold standard specificity. However, gold standard sensitivity was modeled at values from 90 to 99%.</p><p><strong>Results: </strong>Results of the simulation showed that decreasing gold standard sensitivity was associated with increasing underestimation of test specificity, and that the extent of underestimation increased with higher death prevalence. Analysis of the real-world data yielded findings that closely matched the simulation pattern. At 98% death prevalence, near-perfect gold standard sensitivity (99%) still resulted in suppression of specificity from the true value of 100% to the measured value of < 67%.</p><p><strong>Conclusions: </strong>New validation research, and review of existing validation studies, should consider the prevalence of the conditions assessed by a measure, and the possible impact on sensitivity and specificity of an imperfect gold standard.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"151"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125893/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02603-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Gold standards used in validation of new tests may be imperfect, with sensitivity or specificity less than 100%. The impact of imperfection in a gold standard on measured test attributes has been demonstrated formally, but its relevance in real-world oncology research may not be well understood.
Methods: This simulation study examined the impact of imperfect gold standard sensitivity on measured test specificity at different levels of condition prevalence for a hypothetical real-world measure of death. The study also evaluated real-world oncology datasets with a linked National Death Index (NDI) dataset, to examine the measured specificity of a death indicator at levels of death prevalence that matched the simulation. The simulation and real-world data analysis both examined measured specificity of the death indicator at death prevalence ranging from 50 to 98%. To isolate the effects of death prevalence and imperfect gold standard sensitivity, the simulation assumed a test with perfect sensitivity and specificity, and with perfect gold standard specificity. However, gold standard sensitivity was modeled at values from 90 to 99%.
Results: Results of the simulation showed that decreasing gold standard sensitivity was associated with increasing underestimation of test specificity, and that the extent of underestimation increased with higher death prevalence. Analysis of the real-world data yielded findings that closely matched the simulation pattern. At 98% death prevalence, near-perfect gold standard sensitivity (99%) still resulted in suppression of specificity from the true value of 100% to the measured value of < 67%.
Conclusions: New validation research, and review of existing validation studies, should consider the prevalence of the conditions assessed by a measure, and the possible impact on sensitivity and specificity of an imperfect gold standard.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.