Ivan Zeljkovic MD, PhD , Matea Novak JD , Ana Jordan MD , Ante Lisicic MD , Tatjana Nemeth-Blažić MD, PhD , Nikola Pavlovic MD, PhD , Šime Manola MD, PhD
{"title":"Evaluating ChatGPT-4’s correctness in patient-focused informing and awareness for atrial fibrillation","authors":"Ivan Zeljkovic MD, PhD , Matea Novak JD , Ana Jordan MD , Ante Lisicic MD , Tatjana Nemeth-Blažić MD, PhD , Nikola Pavlovic MD, PhD , Šime Manola MD, PhD","doi":"10.1016/j.hroo.2024.10.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>As artificial intelligence and large language models continue to evolve, their application in health care is expanding. OpenAI’s Chat Generative Pre-trained Transformer 4 (ChatGPT-4) represents the latest advancement in this technology, capable of engaging in complex dialogues and providing information.</div></div><div><h3>Objective</h3><div>This study explores the correctness of ChatGPT-4 in informing patients about atrial fibrillation.</div></div><div><h3>Methods</h3><div>This cross-sectional observational study involved ChatGPT-4 in responding to a structured set of 108 questions across 10 categories related to atrial fibrillation. These categories included basic information, treatment options, lifestyle adjustments, and more, reflecting common patient inquiries. The model's responses were evaluated by a panel of 3 cardiologists on the basis of accuracy, comprehensiveness, clarity, relevance to clinical practice, and patient safety. The total correctness of ChatGPT-4 was quantitatively assessed through scores assigned in each category, and statistical analysis was performed to identify significant differences in performance across categories.</div></div><div><h3>Results</h3><div>ChatGPT-4 provided correct and relevant answers with considerable variability across categories. It excelled in \"Lifestyle Adjustments\" and \"Daily Life and Management\" with perfect and near-perfect scores but struggled with \"Miscellaneous Concerns\" scoring lower. Statistical analysis confirmed significant differences in total scores across categories (<em>P</em> = .020).</div></div><div><h3>Conclusion</h3><div>Our results suggest that while ChatGPT-4 is reliable in categories with structured and direct queries, it shows limitations when handling complex medical queries that require in-depth explanations or clinical judgment. ChatGPT-4 demonstrates promising potential as a tool for patient-focused informing in atrial fibrillation, particularly in straightforward informing content.</div></div>","PeriodicalId":29772,"journal":{"name":"Heart Rhythm O2","volume":"6 1","pages":"Pages 58-63"},"PeriodicalIF":2.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart Rhythm O2","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666501824003301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
As artificial intelligence and large language models continue to evolve, their application in health care is expanding. OpenAI’s Chat Generative Pre-trained Transformer 4 (ChatGPT-4) represents the latest advancement in this technology, capable of engaging in complex dialogues and providing information.
Objective
This study explores the correctness of ChatGPT-4 in informing patients about atrial fibrillation.
Methods
This cross-sectional observational study involved ChatGPT-4 in responding to a structured set of 108 questions across 10 categories related to atrial fibrillation. These categories included basic information, treatment options, lifestyle adjustments, and more, reflecting common patient inquiries. The model's responses were evaluated by a panel of 3 cardiologists on the basis of accuracy, comprehensiveness, clarity, relevance to clinical practice, and patient safety. The total correctness of ChatGPT-4 was quantitatively assessed through scores assigned in each category, and statistical analysis was performed to identify significant differences in performance across categories.
Results
ChatGPT-4 provided correct and relevant answers with considerable variability across categories. It excelled in "Lifestyle Adjustments" and "Daily Life and Management" with perfect and near-perfect scores but struggled with "Miscellaneous Concerns" scoring lower. Statistical analysis confirmed significant differences in total scores across categories (P = .020).
Conclusion
Our results suggest that while ChatGPT-4 is reliable in categories with structured and direct queries, it shows limitations when handling complex medical queries that require in-depth explanations or clinical judgment. ChatGPT-4 demonstrates promising potential as a tool for patient-focused informing in atrial fibrillation, particularly in straightforward informing content.
背景随着人工智能和大型语言模型的不断发展,它们在医疗保健领域的应用也在不断扩大。OpenAI 的 Chat Generative Pre-trained Transformer 4(ChatGPT-4)代表了这一技术的最新进展,它能够进行复杂的对话并提供信息。 Objective This study exploes the correctness of ChatGPT-4 in informing patients about atrial fibrillation.Methods This cross-sectional observational study involving ChatGPT-4 in responding to a structured set of 108 questions across 10 categories related to atrial fibrillation.这些类别包括基本信息、治疗方案、生活方式调整等,反映了患者常见的询问。这些类别包括基本信息、治疗方案、生活方式调整等,反映了患者的常见咨询。由 3 位心脏病专家组成的小组根据准确性、全面性、清晰度、与临床实践的相关性和患者安全性对该模型的回答进行了评估。ChatGPT-4 的总正确率通过每个类别的评分进行量化评估,并进行统计分析以确定不同类别之间的显著差异。它在 "生活方式调整 "和 "日常生活与管理 "方面表现出色,得分满分或接近满分,但在 "其他关注点 "方面得分较低。我们的结果表明,虽然 ChatGPT-4 在结构化和直接询问的类别中表现可靠,但在处理需要深入解释或临床判断的复杂医疗询问时却表现出局限性。ChatGPT-4 作为以患者为中心的心房颤动告知工具,尤其是在直接告知内容方面,显示出了巨大的潜力。