Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares
{"title":"使用大型语言模型从胰腺囊肿监测的放射学报告中提取和管理数据。","authors":"Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares","doi":"10.1097/XCS.0000000000001478","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.</p><p><strong>Methods: </strong>A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.</p><p><strong>Results: </strong>Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.</p><p><strong>Conclusion: </strong>LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.</p>","PeriodicalId":17140,"journal":{"name":"Journal of the American College of Surgeons","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252180/pdf/","citationCount":"0","resultStr":"{\"title\":\"Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.\",\"authors\":\"Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares\",\"doi\":\"10.1097/XCS.0000000000001478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.</p><p><strong>Methods: </strong>A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.</p><p><strong>Results: </strong>Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.</p><p><strong>Conclusion: </strong>LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.</p>\",\"PeriodicalId\":17140,\"journal\":{\"name\":\"Journal of the American College of Surgeons\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252180/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American College of Surgeons\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/XCS.0000000000001478\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American College of Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/XCS.0000000000001478","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.
Introduction: Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports.
Methods: A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison.
Results: Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts.
Conclusion: LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.
期刊介绍:
The Journal of the American College of Surgeons (JACS) is a monthly journal publishing peer-reviewed original contributions on all aspects of surgery. These contributions include, but are not limited to, original clinical studies, review articles, and experimental investigations with clear clinical relevance. In general, case reports are not considered for publication. As the official scientific journal of the American College of Surgeons, JACS has the goal of providing its readership the highest quality rapid retrieval of information relevant to surgeons.