使用大型语言模型从胰腺囊肿监测的放射学报告中提取和管理数据。

IF 3.8 2区医学 Q1 SURGERY

Journal of the American College of Surgeons Pub Date : 2025-07-10 DOI:10.1097/XCS.0000000000001478

Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares

{"title":"使用大型语言模型从胰腺囊肿监测的放射学报告中提取和管理数据。","authors":"Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares","doi":"10.1097/XCS.0000000000001478","DOIUrl":null,"url":null,"abstract":"Introduction: Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.Methods: A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.Results: Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.Conclusion: LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.","PeriodicalId":17140,"journal":{"name":"Journal of the American College of Surgeons","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252180/pdf/","citationCount":"0","resultStr":"{\"title\":\"Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.\",\"authors\":\"Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares\",\"doi\":\"10.1097/XCS.0000000000001478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.Methods: A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.Results: Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.Conclusion: LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.\",\"PeriodicalId\":17140,\"journal\":{\"name\":\"Journal of the American College of Surgeons\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252180/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American College of Surgeons\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/XCS.0000000000001478\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American College of Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/XCS.0000000000001478","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

人工管理胰腺囊肿登记的放射学特征进行数据提取和纵向评估是耗时的，并且限制了广泛的实施。我们检验了使用大型语言模型（llm）从放射学报告中提取临床变量的可行性和准确性。方法：采用单中心回顾性研究，纳入监测胰腺囊肿的患者。用于监测囊肿进展的9个影像学指标包括：囊肿大小、主胰管（MPD）大小（连续变量）、病变数量、MPD扩张≥5mm（分类）、支胰管扩张、实性成分存在、钙化病变、胰腺萎缩和胰腺炎。采用OpenAI GPT-4平台上的llm (GPT)，在没有任何训练数据的情况下，使用提示方便标注的零射击学习方法提取感兴趣的元素。人工注释的机构囊肿数据库被用作比较的基础真相（GT）。结果：共纳入991例患者的3198次纵向扫描。GPT以高精度成功提取了选定的射线元素。在分类变量中，准确率从固体成分的97%到钙化病变的99%不等。在连续变量中，准确度从囊肿大小的92%到MPD大小的97%不等。然而，与MPD大小（0.82）相比，囊肿大小的Cohen’s Kappa更高（0.92）。在囊肿数量的多类别变量中，准确率最低（81%）。结论：LLM可以准确提取和整理胰腺囊肿监测的影像学报告数据，并可可靠地用于建立纵向数据库。这项工作的未来应用可能会促进基于人工智能的监控模型的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.

Introduction: Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.

Methods: A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.

Results: Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.

Conclusion: LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American College of Surgeons 医学-外科

CiteScore

6.90

自引率

5.80%

发文量

1515

审稿时长

3-6 weeks

期刊介绍： The Journal of the American College of Surgeons (JACS) is a monthly journal publishing peer-reviewed original contributions on all aspects of surgery. These contributions include, but are not limited to, original clinical studies, review articles, and experimental investigations with clear clinical relevance. In general, case reports are not considered for publication. As the official scientific journal of the American College of Surgeons, JACS has the goal of providing its readership the highest quality rapid retrieval of information relevant to surgeons.