Clinical Trial Design Approach to Auditing Language Models in Health Care Setting.

IF 3.3 Q2 ONCOLOGY
JCO Clinical Cancer Informatics Pub Date : 2025-06-01 Epub Date: 2025-06-03 DOI:10.1200/CCI-24-00331
Lovedeep Gondara, Jonathan Simkin, Shebnum Devji
{"title":"Clinical Trial Design Approach to Auditing Language Models in Health Care Setting.","authors":"Lovedeep Gondara, Jonathan Simkin, Shebnum Devji","doi":"10.1200/CCI-24-00331","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Rapid advancements in natural language processing have led to the development of sophisticated language models. Inspired by their success, these models are now used in health care for tasks such as clinical documentation and medical record classification. However, language models are prone to errors, which can have serious consequences in critical domains such as health care, ensuring that their reliability is essential to maintain patient safety and data integrity.</p><p><strong>Methods: </strong>To address this, we propose an innovative auditing process based on principles from clinical trial design. Our approach involves subject matter experts (SMEs) manually reviewing pathology reports without previous knowledge of the model's classification. This single-blind setup minimizes bias and allows us to apply statistical rigor to assess model performance.</p><p><strong>Results: </strong>Deployed at the British Columbia Cancer Registry, our audit process effectively identified the core issues in the operational models. Early interventions addressed these issues, maintaining data integrity and patient care standards.</p><p><strong>Conclusion: </strong>The audit provides real-world performance metrics and underscores the importance of human-in-the-loop machine learning. Even advanced models require SME oversight to ensure accuracy and reliability. To our knowledge, we have developed the first continuous audit process for language models in health care, modeled after clinical trial principles. This methodology ensures that audits are statistically sound and operationally feasible, setting a new standard for evaluating language models in critical applications.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400331"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/3 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Rapid advancements in natural language processing have led to the development of sophisticated language models. Inspired by their success, these models are now used in health care for tasks such as clinical documentation and medical record classification. However, language models are prone to errors, which can have serious consequences in critical domains such as health care, ensuring that their reliability is essential to maintain patient safety and data integrity.

Methods: To address this, we propose an innovative auditing process based on principles from clinical trial design. Our approach involves subject matter experts (SMEs) manually reviewing pathology reports without previous knowledge of the model's classification. This single-blind setup minimizes bias and allows us to apply statistical rigor to assess model performance.

Results: Deployed at the British Columbia Cancer Registry, our audit process effectively identified the core issues in the operational models. Early interventions addressed these issues, maintaining data integrity and patient care standards.

Conclusion: The audit provides real-world performance metrics and underscores the importance of human-in-the-loop machine learning. Even advanced models require SME oversight to ensure accuracy and reliability. To our knowledge, we have developed the first continuous audit process for language models in health care, modeled after clinical trial principles. This methodology ensures that audits are statistically sound and operationally feasible, setting a new standard for evaluating language models in critical applications.

临床试验设计方法审计语言模型在卫生保健设置。
目的:自然语言处理的快速发展导致了复杂语言模型的发展。受其成功的启发,这些模型现在被用于医疗保健任务,如临床文档和医疗记录分类。然而,语言模型容易出错,这可能在医疗保健等关键领域造成严重后果,因此确保其可靠性对于维护患者安全和数据完整性至关重要。方法:为了解决这一问题,我们提出了一种基于临床试验设计原则的创新审核流程。我们的方法涉及主题专家(sme)手动审查病理报告,而不需要事先了解模型的分类。这种单盲设置最大限度地减少了偏差,并允许我们应用统计严谨性来评估模型性能。结果:在不列颠哥伦比亚省癌症登记处,我们的审计过程有效地识别了运营模式中的核心问题。早期干预措施解决了这些问题,保持了数据完整性和患者护理标准。结论:审计提供了真实世界的性能指标,并强调了人在循环机器学习的重要性。即使是先进的模型也需要中小企业的监督,以确保准确性和可靠性。据我们所知,我们已经开发了第一个针对医疗保健语言模型的连续审计流程,以临床试验原则为模型。这种方法确保审计在统计上是合理的,在操作上是可行的,为评估关键应用程序中的语言模型设置了新的标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信