A model-free framework for evaluating the reliability of a new device with multiple imperfect reference standards.

IF 1.4 4区 数学 Q3 BIOLOGY
Biometrics Pub Date : 2025-01-07 DOI:10.1093/biomtc/ujaf025
Ying Cui, Qi Yu, Amita Manatunga, Jeong Hoon Jang
{"title":"A model-free framework for evaluating the reliability of a new device with multiple imperfect reference standards.","authors":"Ying Cui, Qi Yu, Amita Manatunga, Jeong Hoon Jang","doi":"10.1093/biomtc/ujaf025","DOIUrl":null,"url":null,"abstract":"<p><p>A common practice for establishing the reliability of a new computer-aided diagnostic (CAD) device is to evaluate how well its clinical measurements agree with those of a gold standard test. However, in many clinical studies, a gold standard is unavailable, and one needs to aggregate information from multiple imperfect reference standards for evaluation. A key challenge here is the heterogeneity in diagnostic accuracy across different reference standards, which may lead to biased evaluation of a device if improperly accounted for during the aggregation process. We propose an intuitive and easy-to-use statistical framework for evaluation of a device by assessing agreement between its measurements and the weighted sum of measurements from multiple imperfect reference standards, where weights representing relative reliability of each reference standard are determined by a model-free, unsupervised inductive procedure. Specifically, the inductive procedure recursively assigns higher weights to reference standards whose assessments are more consistent with each other and form a majority opinion, while assigning lower weights to those with greater discrepancies. Unlike existing methods, our approach does not require any modeling assumptions or external data to quantify heterogeneous accuracy levels of reference standards. It only requires specifying an appropriate agreement index used for weight assignment and device evaluation. The framework is applied to evaluate a CAD device for kidney obstruction by comparing its diagnostic ratings with those of multiple nuclear medicine physicians.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911720/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf025","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

A common practice for establishing the reliability of a new computer-aided diagnostic (CAD) device is to evaluate how well its clinical measurements agree with those of a gold standard test. However, in many clinical studies, a gold standard is unavailable, and one needs to aggregate information from multiple imperfect reference standards for evaluation. A key challenge here is the heterogeneity in diagnostic accuracy across different reference standards, which may lead to biased evaluation of a device if improperly accounted for during the aggregation process. We propose an intuitive and easy-to-use statistical framework for evaluation of a device by assessing agreement between its measurements and the weighted sum of measurements from multiple imperfect reference standards, where weights representing relative reliability of each reference standard are determined by a model-free, unsupervised inductive procedure. Specifically, the inductive procedure recursively assigns higher weights to reference standards whose assessments are more consistent with each other and form a majority opinion, while assigning lower weights to those with greater discrepancies. Unlike existing methods, our approach does not require any modeling assumptions or external data to quantify heterogeneous accuracy levels of reference standards. It only requires specifying an appropriate agreement index used for weight assignment and device evaluation. The framework is applied to evaluate a CAD device for kidney obstruction by comparing its diagnostic ratings with those of multiple nuclear medicine physicians.

具有多个不完善参考标准的新设备可靠性评估的无模型框架。
建立一种新的计算机辅助诊断(CAD)设备可靠性的常见做法是评估其临床测量结果与金标准测试结果的一致性。然而,在许多临床研究中,没有金标准,需要从多个不完善的参考标准中汇总信息进行评估。这里的一个关键挑战是不同参考标准的诊断准确性的异质性,如果在汇总过程中不正确地考虑,可能会导致对设备的有偏见的评估。我们提出了一个直观且易于使用的统计框架,通过评估其测量值与多个不完善参考标准的加权和之间的一致性来评估设备,其中代表每个参考标准相对可靠性的权重由无模型、无监督的归纳过程确定。具体来说,归纳法递归地对评价比较一致并形成多数意见的参考标准赋予较高的权重,而对差异较大的参考标准赋予较低的权重。与现有的方法不同,我们的方法不需要任何建模假设或外部数据来量化参考标准的异构精度水平。它只需要指定用于权重分配和设备评估的适当协议索引。将该框架应用于一种CAD设备对肾梗阻的诊断评分,并与多名核医学医生的诊断评分进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biometrics
Biometrics 生物-生物学
CiteScore
2.70
自引率
5.30%
发文量
178
审稿时长
4-8 weeks
期刊介绍: The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信