Comparative Analysis of AI Tools for Disseminating ADA 2025 Diabetes Care Standards: Implications for Cardiovascular Physicians

IF 3 2区 医学 Q2 ENDOCRINOLOGY & METABOLISM
Tengfei Zheng
{"title":"Comparative Analysis of AI Tools for Disseminating ADA 2025 Diabetes Care Standards: Implications for Cardiovascular Physicians","authors":"Tengfei Zheng","doi":"10.1111/1753-0407.70072","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) models are increasingly used in clinical practice, including medical education and the dissemination of updated clinical guidelines. In this study, we evaluated four AI tools—ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek—to assess their ability to summarize the <i>Standards of Care in Diabetes—2025</i> from the American Diabetes Association (ADA) for cardiovascular physicians in primary care settings [<span>1</span>].</p><p>Using a standardized prompt, we compared the AI-generated summaries across 10 key metrics, including accuracy (alignment with ADA 2025 guidelines), completeness (inclusion of core topics such as glycemic targets, blood pressure management, lipid control, and pharmacologic strategies), clarity (readability and conciseness for cardiovascular physicians), clinical relevance (utility for real-world cardiovascular practice), consistency (logical coherence and uniformity in recommendations), evidence support (reference to supporting studies and ADA standards), ethics (neutral and evidence-based recommendations), timeliness (inclusion of the latest ADA updates), actionability (practical guidance for cardiovascular physicians), and fluency (professional language and structure). Each AI tool was rated on a 0–5 scale for each category, yielding a total possible score of 50 points. All summaries were anonymized to remove identifiers. Each model (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek) was then tasked with evaluating all four anonymized summaries, including its own output, using the predefined 10 metrics. For each model, the four scores assigned by the evaluators (including self-evaluation) were averaged to calculate the final score per metric.</p><p>Our evaluation showed that ChatGPT-o1 performed best (48.3/50), excelling in completeness (5.0), clinical relevance (5.0), and actionability (5.0), with comprehensive coverage of diabetes screening, cardiovascular risk assessment, hypertension/lipid management, and multidisciplinary collaboration (Table 1). However, its evidence support (4.0) required improvement. ChatGPT-4o (45.5/50) demonstrated strengths in clarity (4.8) and structure but had limitations in timeliness (4.5) and evidence support (3.3), as it failed to incorporate 2025 guideline updates and lacked specific research references. The free models, O3Mini (47.3/50) and DeepSeek (47.3/50), performed comparably to paid tools. O3Mini excelled in consistency (5.0) and CKD/heart failure monitoring, while DeepSeek prioritized concise cardiovascular risk management (clarity: 5.0). Both free models, however, scored lower in completeness (O3Mini: 4.8; DeepSeek: 4.5) and evidence support (O3Mini: 4.0; DeepSeek: 3.8), reflecting insufficient integration of 2025 updates and trial data (Table 1).</p><p>Among the most critical takeaways for cardiovascular physicians were the importance of individualized glycemic targets, the use of SGLT2 inhibitors and GLP-1 receptor agonists for cardiovascular protection, and the necessity of multidisciplinary collaboration for diabetes management. However, while AI-generated summaries provide a convenient way to access guidelines, the lack of explicit reference to primary sources remains a limitation that requires human oversight.</p><p>Given the potential for AI to support clinical decision-making, integrating these models with validated medical sources and interactive decision-support systems could further enhance their utility [<span>2</span>]. Cost considerations are notable: DeepSeek and O3Mini are open-access, whereas ChatGPT-4o and ChatGPT-o1 require subscriptions. Despite its higher cost, ChatGPT-o1 outperformed other tools, while ChatGPT-4o lagged behind free models in total scores, raising questions about its cost-effectiveness (Table 1). Future studies should explore the integration of AI-generated summaries with interactive decision-support systems to optimize patient care.</p><p>T.Z. conceived and designed the study, performed all analyses, interpreted the results, and wrote the manuscript. The author approved the final version of the manuscript and takes full responsibility for all aspects of the work.</p><p>The author has nothing to report.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":189,"journal":{"name":"Journal of Diabetes","volume":"17 3","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1753-0407.70072","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1753-0407.70072","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) models are increasingly used in clinical practice, including medical education and the dissemination of updated clinical guidelines. In this study, we evaluated four AI tools—ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek—to assess their ability to summarize the Standards of Care in Diabetes—2025 from the American Diabetes Association (ADA) for cardiovascular physicians in primary care settings [1].

Using a standardized prompt, we compared the AI-generated summaries across 10 key metrics, including accuracy (alignment with ADA 2025 guidelines), completeness (inclusion of core topics such as glycemic targets, blood pressure management, lipid control, and pharmacologic strategies), clarity (readability and conciseness for cardiovascular physicians), clinical relevance (utility for real-world cardiovascular practice), consistency (logical coherence and uniformity in recommendations), evidence support (reference to supporting studies and ADA standards), ethics (neutral and evidence-based recommendations), timeliness (inclusion of the latest ADA updates), actionability (practical guidance for cardiovascular physicians), and fluency (professional language and structure). Each AI tool was rated on a 0–5 scale for each category, yielding a total possible score of 50 points. All summaries were anonymized to remove identifiers. Each model (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek) was then tasked with evaluating all four anonymized summaries, including its own output, using the predefined 10 metrics. For each model, the four scores assigned by the evaluators (including self-evaluation) were averaged to calculate the final score per metric.

Our evaluation showed that ChatGPT-o1 performed best (48.3/50), excelling in completeness (5.0), clinical relevance (5.0), and actionability (5.0), with comprehensive coverage of diabetes screening, cardiovascular risk assessment, hypertension/lipid management, and multidisciplinary collaboration (Table 1). However, its evidence support (4.0) required improvement. ChatGPT-4o (45.5/50) demonstrated strengths in clarity (4.8) and structure but had limitations in timeliness (4.5) and evidence support (3.3), as it failed to incorporate 2025 guideline updates and lacked specific research references. The free models, O3Mini (47.3/50) and DeepSeek (47.3/50), performed comparably to paid tools. O3Mini excelled in consistency (5.0) and CKD/heart failure monitoring, while DeepSeek prioritized concise cardiovascular risk management (clarity: 5.0). Both free models, however, scored lower in completeness (O3Mini: 4.8; DeepSeek: 4.5) and evidence support (O3Mini: 4.0; DeepSeek: 3.8), reflecting insufficient integration of 2025 updates and trial data (Table 1).

Among the most critical takeaways for cardiovascular physicians were the importance of individualized glycemic targets, the use of SGLT2 inhibitors and GLP-1 receptor agonists for cardiovascular protection, and the necessity of multidisciplinary collaboration for diabetes management. However, while AI-generated summaries provide a convenient way to access guidelines, the lack of explicit reference to primary sources remains a limitation that requires human oversight.

Given the potential for AI to support clinical decision-making, integrating these models with validated medical sources and interactive decision-support systems could further enhance their utility [2]. Cost considerations are notable: DeepSeek and O3Mini are open-access, whereas ChatGPT-4o and ChatGPT-o1 require subscriptions. Despite its higher cost, ChatGPT-o1 outperformed other tools, while ChatGPT-4o lagged behind free models in total scores, raising questions about its cost-effectiveness (Table 1). Future studies should explore the integration of AI-generated summaries with interactive decision-support systems to optimize patient care.

T.Z. conceived and designed the study, performed all analyses, interpreted the results, and wrote the manuscript. The author approved the final version of the manuscript and takes full responsibility for all aspects of the work.

The author has nothing to report.

The author declares no conflicts of interest.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Diabetes
Journal of Diabetes ENDOCRINOLOGY & METABOLISM-
CiteScore
6.50
自引率
2.20%
发文量
94
审稿时长
>12 weeks
期刊介绍: Journal of Diabetes (JDB) devotes itself to diabetes research, therapeutics, and education. It aims to involve researchers and practitioners in a dialogue between East and West via all aspects of epidemiology, etiology, pathogenesis, management, complications and prevention of diabetes, including the molecular, biochemical, and physiological aspects of diabetes. The Editorial team is international with a unique mix of Asian and Western participation. The Editors welcome submissions in form of original research articles, images, novel case reports and correspondence, and will solicit reviews, point-counterpoint, commentaries, editorials, news highlights, and educational content.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信