Youmei Chen, Mengshi Dong, Jie Sun, Zhanao Meng, Yiqing Yang, Abudushalamu Muhetaier, Chao Li, Jie Qin
{"title":"利用gpt - 40从自由文本冠状动脉CT血管造影报告中自动提取和分类CAD-RADS特征:诊断研究。","authors":"Youmei Chen, Mengshi Dong, Jie Sun, Zhanao Meng, Yiqing Yang, Abudushalamu Muhetaier, Chao Li, Jie Qin","doi":"10.2196/70967","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.</p><p><strong>Objective: </strong>To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.</p><p><strong>Methods: </strong>This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient.</p><p><strong>Results: </strong>Among 999 patients (median age 66 y, range 58-74; 650 males), CAD-RADS categorization showed accuracy of 0.98-1.00 (95% CI 0.9730-1.0000), sensitivity of 0.95-1.00 (95% CI 0.9191-1.0000), specificity of 0.98-1.00 (95% CI 0.9669-1.0000), and F1-score of 0.96-1.00 (95% CI 0.9253-1.0000). P categories demonstrated accuracy of 0.97-1.00 (95% CI 0.9569-0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085-1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533-1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377-0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680-0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680-0.9870). Cohen κ values for all classifications exceeded 0.98.</p><p><strong>Conclusions: </strong>The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e70967"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422720/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.\",\"authors\":\"Youmei Chen, Mengshi Dong, Jie Sun, Zhanao Meng, Yiqing Yang, Abudushalamu Muhetaier, Chao Li, Jie Qin\",\"doi\":\"10.2196/70967\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.</p><p><strong>Objective: </strong>To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.</p><p><strong>Methods: </strong>This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient.</p><p><strong>Results: </strong>Among 999 patients (median age 66 y, range 58-74; 650 males), CAD-RADS categorization showed accuracy of 0.98-1.00 (95% CI 0.9730-1.0000), sensitivity of 0.95-1.00 (95% CI 0.9191-1.0000), specificity of 0.98-1.00 (95% CI 0.9669-1.0000), and F1-score of 0.96-1.00 (95% CI 0.9253-1.0000). P categories demonstrated accuracy of 0.97-1.00 (95% CI 0.9569-0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085-1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533-1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377-0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680-0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680-0.9870). Cohen κ values for all classifications exceeded 0.98.</p><p><strong>Conclusions: </strong>The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e70967\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422720/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/70967\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/70967","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
摘要
背景:尽管冠状动脉报告和数据系统(CAD-RADS)提供了一种标准化的方法,放射科医生仍然倾向于自由文本报告。这种偏好为纵向研究中的数据提取和分析带来了重大挑战,潜在地限制了大规模研究和质量评估计划。目的:评估生成式预训练变压器(GPT)- 40模型将真实冠状动脉计算机断层血管造影(CCTA)自由文本报告转换为结构化数据并自动识别CAD-RADS类别和P类别的能力。方法:回顾性分析2024年1月和2024年7月的CCTA报告。25份报告的子集被用于快速工程,以指导大型语言模型(LLMs)提取CAD-RADS类别,P类别以及心肌桥和非钙化斑块的存在。使用gpt - 40 API(应用程序编程接口)和自定义Python脚本处理报告。放射科医生根据CAD-RADS 2.0指南确定了基本事实。使用准确性、敏感性、特异性和f1评分来评估模型的性能。采用Cohen κ系数评估后信度。结果:在999例患者中(中位年龄66岁,范围58 ~ 74岁,男性650例),CAD-RADS分类准确率为0.98 ~ 1.00 (95% CI 0.973 ~ 1.0000),灵敏度为0.95 ~ 1.00 (95% CI 0.9191 ~ 1.0000),特异性为0.98 ~ 1.00 (95% CI 0.9669 ~ 1.0000), f1评分为0.96 ~ 1.00 (95% CI 0.9253 ~ 1.0000)。P分类的准确度为0.97-1.00 (95% CI 0.9569-0.9990),灵敏度为0.90 -1.00 (95% CI 0.8085-1.0000),特异性为0.97-1.00 (95% CI 0.9533-1.0000), f1评分为0.91 -0.99 (95% CI 0.8377-0.9967)。心肌桥检测准确率为0.98 (95% CI 0.9680 ~ 0.9870),非钙化冠状动脉斑块检测准确率为0.98 (95% CI 0.9680 ~ 0.9870)。所有分类的Cohen κ值均大于0.98。结论:gpt - 40模型高效准确地将CCTA自由文本报告转换为结构化数据,在CAD-RADS分类、斑块负担评估、心肌桥和钙化斑块检测方面表现出色。
Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.
Background: Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.
Objective: To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.
Methods: This retrospective study analyzed CCTA reports from January 2024 and July 2024. A subset of 25 reports was used for prompt engineering to instruct the large language models (LLMs) in extracting CAD-RADS categories, P categories, and the presence of myocardial bridges and noncalcified plaques. Reports were processed using the GPT-4o API (application programming interface) and custom Python scripts. The ground truth was established by radiologists based on the CAD-RADS 2.0 guidelines. Model performance was assessed using accuracy, sensitivity, specificity, and F1-score. Intrarater reliability was assessed using Cohen κ coefficient.
Results: Among 999 patients (median age 66 y, range 58-74; 650 males), CAD-RADS categorization showed accuracy of 0.98-1.00 (95% CI 0.9730-1.0000), sensitivity of 0.95-1.00 (95% CI 0.9191-1.0000), specificity of 0.98-1.00 (95% CI 0.9669-1.0000), and F1-score of 0.96-1.00 (95% CI 0.9253-1.0000). P categories demonstrated accuracy of 0.97-1.00 (95% CI 0.9569-0.9990), sensitivity from 0.90 to 1.00 (95% CI 0.8085-1.0000), specificity from 0.97 to 1.00 (95% CI 0.9533-1.0000), and F1-score from 0.91 to 0.99 (95% CI 0.8377-0.9967). Myocardial bridge detection achieved an accuracy of 0.98 (95% CI 0.9680-0.9870), and noncalcified coronary plaques detection showed an accuracy of 0.98 (95% CI 0.9680-0.9870). Cohen κ values for all classifications exceeded 0.98.
Conclusions: The GPT-4o model efficiently and accurately converts CCTA free-text reports into structured data, excelling in CAD-RADS classification, plaque burden assessment, and detection of myocardial bridges and calcified plaques.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.