人工智能大型语言模型 ChatGPT 对心脏成像问题回答的评估

IF 1.8 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Clinical Imaging Pub Date : 2024-05-23 DOI:10.1016/j.clinimag.2024.110193

Cynthia L. Monroe , Yasser G. Abdelhafez , Kwame Atsina , Edris Aman , Lorenzo Nardo , Mohammad H. Madani

{"title":"人工智能大型语言模型 ChatGPT 对心脏成像问题回答的评估","authors":"Cynthia L. Monroe , Yasser G. Abdelhafez , Kwame Atsina , Edris Aman , Lorenzo Nardo , Mohammad H. Madani","doi":"10.1016/j.clinimag.2024.110193","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.</p></div><div><h3>Methods</h3><p>30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.</p></div><div><h3>Results</h3><p>ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).</p></div><div><h3>Conclusion</h3><p>ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.</p></div>","PeriodicalId":50680,"journal":{"name":"Clinical Imaging","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0899707124001232/pdfft?md5=1511764978ad04d7187878b982fe9d58&pid=1-s2.0-S0899707124001232-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT\",\"authors\":\"Cynthia L. Monroe , Yasser G. Abdelhafez , Kwame Atsina , Edris Aman , Lorenzo Nardo , Mohammad H. Madani\",\"doi\":\"10.1016/j.clinimag.2024.110193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.</p></div><div><h3>Methods</h3><p>30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.</p></div><div><h3>Results</h3><p>ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).</p></div><div><h3>Conclusion</h3><p>ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.</p></div>\",\"PeriodicalId\":50680,\"journal\":{\"name\":\"Clinical Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0899707124001232/pdfft?md5=1511764978ad04d7187878b982fe9d58&pid=1-s2.0-S0899707124001232-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0899707124001232\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0899707124001232","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的评估 ChatGPT 作为向患者讲解心脏成像各方面知识的资源的能力，包括诊断、成像方式、适应症、放射学报告的解释和管理。方法在三个独立的聊天会话中，向 ChatGPT-3.5 和 ChatGPT-4 提出了 30 个问题。回答分为正确、错误或临床误导三类，由三名观察员进行评分，其中两名是获得医学会认证的心脏病专家，一名是获得医学会认证的放射科专家，他们都拥有心脏成像亚专业资质。此外，还对三个环节中回答的一致性进行了评估。结果 ChatGPT-3.5 以多数票正确回答了二十八个问题中的十七个（61%）。ChatGPT-4 以多数票正确回答了二十八个问题中的二十一个（75%）。有两个问题的正确率没有达到多数票。在 ChatGPT-3.5 中，30 个问题中有 26 个问题得到了一致回答（87%）。ChatGPT-4 对 30 个问题中的 29 个问题进行了一致回答（97%）。ChatGPT-3.5 对 28 个问题中的 17 个问题（61%）的回答既一致又正确。结论在回答心脏成像问题时，ChatGPT-4 在回答的正确性和一致性方面的整体表现优于 ChatGTP-3.5。虽然 ChatGPT-3.5 和 ChatGPT-4 都能正确回答一半以上的心脏成像问题，但不准确、误导临床和不一致的回答表明，在应用 ChatGPT-4 对患者进行心脏成像教育之前，有必要对其进行进一步改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT

Purpose

To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.

Methods

30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.

Results

ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).

Conclusion

ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Imaging 医学-核医学

CiteScore

4.60

自引率

0.00%

发文量

265

审稿时长

35 days

期刊介绍： The mission of Clinical Imaging is to publish, in a timely manner, the very best radiology research from the United States and around the world with special attention to the impact of medical imaging on patient care. The journal''s publications cover all imaging modalities, radiology issues related to patients, policy and practice improvements, and clinically-oriented imaging physics and informatics. The journal is a valuable resource for practicing radiologists, radiologists-in-training and other clinicians with an interest in imaging. Papers are carefully peer-reviewed and selected by our experienced subject editors who are leading experts spanning the range of imaging sub-specialties, which include: -Body Imaging- Breast Imaging- Cardiothoracic Imaging- Imaging Physics and Informatics- Molecular Imaging and Nuclear Medicine- Musculoskeletal and Emergency Imaging- Neuroradiology- Practice, Policy & Education- Pediatric Imaging- Vascular and Interventional Radiology