EVALUATING ARTIFICIAL INTELLIGENCE DATA EXTRACTION FROM PROSTATE MRI REPORTS: A COMPARATIVE STUDY WITH TRADITIONAL METHODS

IF 2.4 3区 医学 Q3 ONCOLOGY
Eugene Lee, Ruben Blachman-Braun, Charles Hesswani, William Azar, Braden Millan, Mitchell Hwang, Dylan Junkin, Christopher Koller, Sahil Parikh, Kyle Schuppe, Daniel Nethala, Neil Mendhiratta, Alexander Kenigsberg, Baris Turkbey, Maria Merino, George Zaki, Janelle Cortner, Sandeep Gurram, Peter Pinto
{"title":"EVALUATING ARTIFICIAL INTELLIGENCE DATA EXTRACTION FROM PROSTATE MRI REPORTS: A COMPARATIVE STUDY WITH TRADITIONAL METHODS","authors":"Eugene Lee,&nbsp;Ruben Blachman-Braun,&nbsp;Charles Hesswani,&nbsp;William Azar,&nbsp;Braden Millan,&nbsp;Mitchell Hwang,&nbsp;Dylan Junkin,&nbsp;Christopher Koller,&nbsp;Sahil Parikh,&nbsp;Kyle Schuppe,&nbsp;Daniel Nethala,&nbsp;Neil Mendhiratta,&nbsp;Alexander Kenigsberg,&nbsp;Baris Turkbey,&nbsp;Maria Merino,&nbsp;George Zaki,&nbsp;Janelle Cortner,&nbsp;Sandeep Gurram,&nbsp;Peter Pinto","doi":"10.1016/j.urolonc.2024.12.095","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Integrating large language models (LLMs) into healthcare is set to transform medical research. Most clinical research relies on data manually extracted by data managers, a laborious and time-consuming process. To streamline such tasks, the National Institutes of Health Integrated Data Analysis Platform (NIDAP) Text Extraction Program (NTEP) was developed. This artificial intelligence data aggregation platform, powered by LLMs, can output a collection of data within seconds after a prompt engineering process by the clinician. In this study, we aim to compare the accuracy of data extracted by NTEP with data that was manually extracted by NIH data managers in patients with prostate cancer enrolled in our institution's prospective trial.</div></div><div><h3>Methods</h3><div>We conducted a comparative analysis between datasets extracted by data managers and NTEP. Both were tasked to extract data for four MRI-related variables for patients enrolled in the prostate cancer natural history trial (NCT02594202): prostate volume, PSA density, number of lesions, and PI-RADS score. Custom-built LLM prompts were built by urologists using GPT-4 prompts aimed to extract the data directly from electronic medical record (EMR) documents. Both datasets were then subject to minor processing and formatting to allow for comparison between extraction methods. Prostate volumes were rounded to the appropriated absolute value, PSA density was rounded to three decimals places, and only the highest PI-RADS lesion reported by data managers was evaluated. Statistical analysis was performed with SPSS 29.0 to evaluate the correlation between pair observations in continuous variables via a Spearman's rho, and to quantify the level of agreement between categorical variables, a Cohen's kappa was performed.</div></div><div><h3>Results</h3><div>A total of 1728 MRIs from 1289 patients were evaluated. In comparing the datasets extracted by NIDAP and the data managers, we found that agreement between values occurred 1598 times (92.5%) for prostate volume, 1705 times (98.7%) for PSA density, 1221 times (70.7%) for number of lesions, and 1577 times (91.3%) for PI-RADS score. In reports that had pair observations, both NIDAP and data managers results appeared highly concordant, however, the results between both groups differed from 0.5% to 6.8%. There were also cases where the datasets were missing data entirely; notably, for the number of lesions on MRI, the data managers did not report data in 488 (28.2%) instances. (Table 1)</div></div><div><h3>Conclusions</h3><div>NTEP is a useful tool to facilitate data extraction from EMRs. Although there is a high concordance when data was reported by both NIDAP and data managers, NIDAP was able to extract more information, leading to fewer missing variables. Future research should involve larger cohorts to validate the platform's scalability and efficiency compared to traditional manual extraction methods, and data quality extracted by NTEP should be further assessed. We anticipate that the integration of LLMs will significantly enhance and transform the data extraction process.</div></div>","PeriodicalId":23408,"journal":{"name":"Urologic Oncology-seminars and Original Investigations","volume":"43 3","pages":"Page 38"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Urologic Oncology-seminars and Original Investigations","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1078143924008755","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Integrating large language models (LLMs) into healthcare is set to transform medical research. Most clinical research relies on data manually extracted by data managers, a laborious and time-consuming process. To streamline such tasks, the National Institutes of Health Integrated Data Analysis Platform (NIDAP) Text Extraction Program (NTEP) was developed. This artificial intelligence data aggregation platform, powered by LLMs, can output a collection of data within seconds after a prompt engineering process by the clinician. In this study, we aim to compare the accuracy of data extracted by NTEP with data that was manually extracted by NIH data managers in patients with prostate cancer enrolled in our institution's prospective trial.

Methods

We conducted a comparative analysis between datasets extracted by data managers and NTEP. Both were tasked to extract data for four MRI-related variables for patients enrolled in the prostate cancer natural history trial (NCT02594202): prostate volume, PSA density, number of lesions, and PI-RADS score. Custom-built LLM prompts were built by urologists using GPT-4 prompts aimed to extract the data directly from electronic medical record (EMR) documents. Both datasets were then subject to minor processing and formatting to allow for comparison between extraction methods. Prostate volumes were rounded to the appropriated absolute value, PSA density was rounded to three decimals places, and only the highest PI-RADS lesion reported by data managers was evaluated. Statistical analysis was performed with SPSS 29.0 to evaluate the correlation between pair observations in continuous variables via a Spearman's rho, and to quantify the level of agreement between categorical variables, a Cohen's kappa was performed.

Results

A total of 1728 MRIs from 1289 patients were evaluated. In comparing the datasets extracted by NIDAP and the data managers, we found that agreement between values occurred 1598 times (92.5%) for prostate volume, 1705 times (98.7%) for PSA density, 1221 times (70.7%) for number of lesions, and 1577 times (91.3%) for PI-RADS score. In reports that had pair observations, both NIDAP and data managers results appeared highly concordant, however, the results between both groups differed from 0.5% to 6.8%. There were also cases where the datasets were missing data entirely; notably, for the number of lesions on MRI, the data managers did not report data in 488 (28.2%) instances. (Table 1)

Conclusions

NTEP is a useful tool to facilitate data extraction from EMRs. Although there is a high concordance when data was reported by both NIDAP and data managers, NIDAP was able to extract more information, leading to fewer missing variables. Future research should involve larger cohorts to validate the platform's scalability and efficiency compared to traditional manual extraction methods, and data quality extracted by NTEP should be further assessed. We anticipate that the integration of LLMs will significantly enhance and transform the data extraction process.
评估人工智能从前列腺mri报告中提取数据:与传统方法的比较研究
将大型语言模型(llm)集成到医疗保健中将改变医学研究。大多数临床研究依赖于数据管理员手动提取的数据,这是一个费力且耗时的过程。为了简化这些任务,美国国立卫生研究院综合数据分析平台(NIDAP)文本提取程序(NTEP)被开发出来。这个由llm提供支持的人工智能数据聚合平台可以在临床医生快速完成工程处理后的几秒钟内输出数据集。在这项研究中,我们的目的是比较NTEP提取的数据与NIH数据管理人员手动提取的数据在我们机构前瞻性试验中入选的前列腺癌患者中的准确性。方法将数据管理员提取的数据集与NTEP提取的数据集进行对比分析。两项研究的任务是提取前列腺癌自然史试验(NCT02594202)患者的四个mri相关变量的数据:前列腺体积、PSA密度、病变数量和PI-RADS评分。定制的LLM提示由泌尿科医生使用GPT-4提示构建,旨在直接从电子病历(EMR)文档中提取数据。然后对两个数据集进行小的处理和格式化,以便在提取方法之间进行比较。前列腺体积四舍五入到适当的绝对值,PSA密度四舍五入到小数点后三位,仅评估数据管理人员报告的最高PI-RADS病变。采用SPSS 29.0进行统计分析,通过Spearman's rho来评价连续变量中成对观测值之间的相关性,并使用Cohen's kappa来量化分类变量之间的一致性水平。结果1289例患者共获得1728张mri。在比较NIDAP和数据管理器提取的数据集时,我们发现前列腺体积值的一致性为1598倍(92.5%),PSA密度值的一致性为1705倍(98.7%),病变数的一致性为1221倍(70.7%),PI-RADS评分的一致性为1577倍(91.3%)。在成对观察的报告中,NIDAP和数据管理人员的结果似乎高度一致,然而,两组之间的结果相差0.5%至6.8%。还存在数据集完全丢失数据的情况;值得注意的是,对于MRI上的病变数量,488例(28.2%)的数据管理人员没有报告数据。(表1)结论sntep是方便电子病历数据提取的有效工具。尽管NIDAP和数据管理器报告的数据具有很高的一致性,但NIDAP能够提取更多信息,从而减少丢失变量。与传统的人工提取方法相比,未来的研究应涉及更大的队列,以验证平台的可扩展性和效率,并进一步评估NTEP提取的数据质量。我们预计法学硕士的整合将显著增强和改变数据提取过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
3.70%
发文量
297
审稿时长
7.6 weeks
期刊介绍: Urologic Oncology: Seminars and Original Investigations is the official journal of the Society of Urologic Oncology. The journal publishes practical, timely, and relevant clinical and basic science research articles which address any aspect of urologic oncology. Each issue comprises original research, news and topics, survey articles providing short commentaries on other important articles in the urologic oncology literature, and reviews including an in-depth Seminar examining a specific clinical dilemma. The journal periodically publishes supplement issues devoted to areas of current interest to the urologic oncology community. Articles published are of interest to researchers and the clinicians involved in the practice of urologic oncology including urologists, oncologists, and radiologists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信