Evaluation of Vision-Language Models for Detection and Deidentification of Medical Images with Burned-In Protected Health Information.

IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Radiology Pub Date : 2025-06-01 DOI:10.1148/radiol.243664
Taehee Lee, Hyungjin Kim, Seong Ho Park, Seonhye Chae, Soon Ho Yoon
{"title":"Evaluation of Vision-Language Models for Detection and Deidentification of Medical Images with Burned-In Protected Health Information.","authors":"Taehee Lee, Hyungjin Kim, Seong Ho Park, Seonhye Chae, Soon Ho Yoon","doi":"10.1148/radiol.243664","DOIUrl":null,"url":null,"abstract":"<p><p>Background Advances in vision-language models (VLMs) may enable detection and deidentification of burned-in protected health information (PHI) on medical images. Purpose To investigate the ability of commercial and open-source VLMs to detect burned-in PHI on medical images, confirm full deidentification, and obscure PHI where present. Materials and Methods In this retrospective study, records of deceased patients aged 18 years or older who died during admission at a tertiary hospital between January and June 2021 were randomly selected. One study per modality was randomly selected. Images were preprocessed to ensure burned-in PHI and test four scenarios of deidentification conditions: all PHI text is visible, PHI text is redacted using asterisks, PHI text is removed, and all text is removed. Real PHI was replaced with fictitious data to protect privacy. Four VLMs (three commercial: ChatGPT-4o [OpenAI], Gemini 1.5 Pro [Google]), and Claude-3 Haiku [Anthropic]; one open-source: Llama 3.2 Vision 11B [Meta]) were tested on three tasks: task 1, overall confirmation of deidentification; task 2, detection and specification of any identifiable PHI items; and task 3, detection and specification of the five preselected PHI items (name, identification number, date of birth, age, and sex). Text was extracted from images using an open-source Tesseract optical character recognition software and input into the VLMs for the same tasks. Additionally, the capability of each VLM to mask detected PHI fields was evaluated. Statistical comparisons were conducted using χ<sup>2</sup>, independent <i>t</i> tests, or generalized estimating equations. Results Data from 100 deceased patients (mean age, 71.1 years ± 10.1 [SD]; 57 male) with 709 imaging studies were randomly included. Among 6696 PHI occurrences, ChatGPT-4o achieved deidentification verification accuracy of 95.0% (<i>n</i> = 6362) for task 1, 61.2% (<i>n</i> = 4098) for task 2, and 96.2% (<i>n</i> = 6441) for task 3, outperforming Gemini 1.5 Pro (68.1%, 55.2%, and 86.3% for tasks 1-3, respectively), Claude-3 Haiku (75.8%, 86.9%, and 79.4% for tasks 1-3, respectively), and Llama 3.2 Vision 11B (51.6%, 66.9%, and 74.3% for tasks 1-3, respectively) (<i>P</i> < .001 for all). Direct image analysis by ChatGPT-4o and Gemini 1.5 Pro was more accurate than the optical character recognition software for PHI detection across all three deidentification verification tasks (<i>P</i> < .001 for all). Among 375 PHI occurrences on 100 images, ChatGPT-4o successfully obscured 81.1% (<i>n</i> = 304) of them. Conclusion ChatGPT-4o demonstrated substantial potential in detecting, verifying, and obscuring burned-in PHI on medical images. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Pinto dos Santos in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"315 3","pages":"e243664"},"PeriodicalIF":12.1000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.243664","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background Advances in vision-language models (VLMs) may enable detection and deidentification of burned-in protected health information (PHI) on medical images. Purpose To investigate the ability of commercial and open-source VLMs to detect burned-in PHI on medical images, confirm full deidentification, and obscure PHI where present. Materials and Methods In this retrospective study, records of deceased patients aged 18 years or older who died during admission at a tertiary hospital between January and June 2021 were randomly selected. One study per modality was randomly selected. Images were preprocessed to ensure burned-in PHI and test four scenarios of deidentification conditions: all PHI text is visible, PHI text is redacted using asterisks, PHI text is removed, and all text is removed. Real PHI was replaced with fictitious data to protect privacy. Four VLMs (three commercial: ChatGPT-4o [OpenAI], Gemini 1.5 Pro [Google]), and Claude-3 Haiku [Anthropic]; one open-source: Llama 3.2 Vision 11B [Meta]) were tested on three tasks: task 1, overall confirmation of deidentification; task 2, detection and specification of any identifiable PHI items; and task 3, detection and specification of the five preselected PHI items (name, identification number, date of birth, age, and sex). Text was extracted from images using an open-source Tesseract optical character recognition software and input into the VLMs for the same tasks. Additionally, the capability of each VLM to mask detected PHI fields was evaluated. Statistical comparisons were conducted using χ2, independent t tests, or generalized estimating equations. Results Data from 100 deceased patients (mean age, 71.1 years ± 10.1 [SD]; 57 male) with 709 imaging studies were randomly included. Among 6696 PHI occurrences, ChatGPT-4o achieved deidentification verification accuracy of 95.0% (n = 6362) for task 1, 61.2% (n = 4098) for task 2, and 96.2% (n = 6441) for task 3, outperforming Gemini 1.5 Pro (68.1%, 55.2%, and 86.3% for tasks 1-3, respectively), Claude-3 Haiku (75.8%, 86.9%, and 79.4% for tasks 1-3, respectively), and Llama 3.2 Vision 11B (51.6%, 66.9%, and 74.3% for tasks 1-3, respectively) (P < .001 for all). Direct image analysis by ChatGPT-4o and Gemini 1.5 Pro was more accurate than the optical character recognition software for PHI detection across all three deidentification verification tasks (P < .001 for all). Among 375 PHI occurrences on 100 images, ChatGPT-4o successfully obscured 81.1% (n = 304) of them. Conclusion ChatGPT-4o demonstrated substantial potential in detecting, verifying, and obscuring burned-in PHI on medical images. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Pinto dos Santos in this issue.

基于烧伤保护健康信息的医学图像检测与去识别的视觉语言模型评价。
视觉语言模型(VLMs)的进步可以检测和去识别医学图像上的烧伤保护健康信息(PHI)。目的探讨商用和开源VLMs在医学图像上检测烧伤PHI的能力,确认完全去识别,并在存在PHI的情况下模糊PHI。材料与方法在本回顾性研究中,随机选择2021年1月至6月在某三级医院住院期间死亡的18岁及以上患者的死亡记录。每种模式随机选择一项研究。对图像进行预处理以确保刻录PHI,并测试四种去识别条件:所有PHI文本可见,PHI文本使用星号编辑,PHI文本被删除,所有文本被删除。真实的PHI被虚构的数据所取代,以保护隐私。4台vlm(商用3台:chatgpt - 40 [OpenAI]、Gemini 1.5 Pro [b谷歌])和Claude-3 Haiku [Anthropic];一个开源:Llama 3.2 Vision 11B [Meta])在三个任务上进行测试:任务1,整体确认去识别;任务2,检测和规范任何可识别的PHI项目;任务3,对五个预选PHI项目(姓名、身份证号、出生日期、年龄、性别)进行检测和规范。使用开源的Tesseract光学字符识别软件从图像中提取文本,并将其输入VLMs进行相同的任务。此外,还评估了每个VLM屏蔽检测到的PHI场的能力。采用χ2、独立t检验或广义估计方程进行统计学比较。结果100例死亡患者资料(平均年龄71.1岁±10.1岁[SD];57例男性),709例影像学研究随机纳入。在6696个PHI出现中,chatgpt - 40对任务1的去识别验证准确率为95.0% (n = 6362),对任务2的去识别验证准确率为61.2% (n = 4098),对任务3的去识别验证准确率为96.2% (n = 6441),优于Gemini 1.5 Pro(任务1-3分别为68.1%、55.2%和86.3%)、claude3 Haiku(任务1-3分别为75.8%、86.9%和79.4%)和Llama 3.2 Vision 11B(任务1-3分别为51.6%、66.9%和74.3%)(P < 0.001)。在所有三个去识别验证任务中,chatgpt - 40和Gemini 1.5 Pro的直接图像分析比光学字符识别软件更准确地进行PHI检测(P < 0.001)。在100张图像的375个PHI出现中,chatgpt - 40成功地掩盖了其中的81.1% (n = 304)。结论chatgpt - 40在检测、验证和模糊医学图像上的烧伤PHI方面具有很大的潜力。©RSNA, 2025本文可获得补充材料。请参阅Pinto dos Santos在本期的社论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Radiology
Radiology 医学-核医学
CiteScore
35.20
自引率
3.00%
发文量
596
审稿时长
3.6 months
期刊介绍: Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信