Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence.

IF 4.2 2区医学 Q1 ENDOCRINOLOGY & METABOLISM

International Journal of Obesity Pub Date : 2025-05-01 Epub Date: 2025-01-27 DOI:10.1038/s41366-025-01717-5

Eugene Annor, Joseph Atarere, Nneoma Ubah, Oladoyin Jolaoye, Bryce Kunkle, Olachi Egbo, Daniel K Martin

{"title":"Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence.","authors":"Eugene Annor, Joseph Atarere, Nneoma Ubah, Oladoyin Jolaoye, Bryce Kunkle, Olachi Egbo, Daniel K Martin","doi":"10.1038/s41366-025-01717-5","DOIUrl":null,"url":null,"abstract":"Background and aim: Managing obesity requires a comprehensive approach that involves therapeutic lifestyle changes, medications, or metabolic surgery. Many patients seek health information from online sources and artificial intelligence models like ChatGPT, Google Gemini, and Microsoft Copilot before consulting health professionals. This study aims to evaluate the appropriateness of the responses of Google Gemini and Microsoft Copilot to questions on pharmacologic and surgical management of obesity and assess for bias in their responses to either the ADA or AACE guidelines.Methods: Ten questions were compiled into a set and posed separately to the free editions of Google Gemini and Microsoft Copilot. Recommendations for the questions were extracted from the ADA and the AACE websites, and the responses were graded by reviewers for appropriateness, completeness, and bias to any of the guidelines.Results: All responses from Microsoft Copilot and 8/10 (80%) responses from Google Gemini were appropriate. There were no inappropriate responses. Google Gemini refused to respond to two questions and insisted on consulting a physician. Microsoft Copilot (10/10; 100%) provided a higher proportion of complete responses than Google Gemini (5/10; 50%). Of the eight responses from Google Gemini, none were biased towards any of the guidelines, while two of the responses from Microsoft Copilot were biased.Conclusion: The study highlights the role of Microsoft Copilot and Google Gemini in weight loss management. The differences in their responses may be attributed to the variation in the quality and scope of their training data and design.","PeriodicalId":14183,"journal":{"name":"International Journal of Obesity","volume":" ","pages":"896-901"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Obesity","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41366-025-01717-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Background and aim: Managing obesity requires a comprehensive approach that involves therapeutic lifestyle changes, medications, or metabolic surgery. Many patients seek health information from online sources and artificial intelligence models like ChatGPT, Google Gemini, and Microsoft Copilot before consulting health professionals. This study aims to evaluate the appropriateness of the responses of Google Gemini and Microsoft Copilot to questions on pharmacologic and surgical management of obesity and assess for bias in their responses to either the ADA or AACE guidelines.

Methods: Ten questions were compiled into a set and posed separately to the free editions of Google Gemini and Microsoft Copilot. Recommendations for the questions were extracted from the ADA and the AACE websites, and the responses were graded by reviewers for appropriateness, completeness, and bias to any of the guidelines.

Results: All responses from Microsoft Copilot and 8/10 (80%) responses from Google Gemini were appropriate. There were no inappropriate responses. Google Gemini refused to respond to two questions and insisted on consulting a physician. Microsoft Copilot (10/10; 100%) provided a higher proportion of complete responses than Google Gemini (5/10; 50%). Of the eight responses from Google Gemini, none were biased towards any of the guidelines, while two of the responses from Microsoft Copilot were biased.

Conclusion: The study highlights the role of Microsoft Copilot and Google Gemini in weight loss management. The differences in their responses may be attributed to the variation in the quality and scope of their training data and design.

查看原文本刊更多论文

评估基于在线聊天的人工智能模型在指南不一致的情况下减肥推荐的适当性和偏差。

背景和目的：控制肥胖需要一个综合的方法，包括治疗性生活方式的改变，药物治疗，或代谢手术。许多患者在咨询医疗专业人员之前，会先从在线资源和ChatGPT、b谷歌Gemini和Microsoft Copilot等人工智能模型中获取健康信息。本研究旨在评估谷歌Gemini和Microsoft Copilot对肥胖的药理学和外科治疗问题的反应的适当性，并评估他们对ADA或AACE指南的反应是否存在偏差。方法：将10个问题编成一组，分别对谷歌Gemini和Microsoft Copilot免费版进行问卷调查。对问题的建议从ADA和AACE网站上提取，并由审稿人对任何指南的适当性，完整性和偏见进行评分。结果：Microsoft Copilot的所有回答和谷歌Gemini的8/10（80%）回答都是合适的。没有不恰当的回应。双子座拒绝回答两个问题，坚持要去看医生。Microsoft Copilot (10/10；100%)提供了比谷歌Gemini (5/10；50%)。在谷歌Gemini的8个回复中，没有一个对任何指南有偏见，而来自Microsoft Copilot的两个回复有偏见。结论：本研究突出了Microsoft Copilot和谷歌Gemini在减肥管理中的作用。他们的反应差异可能是由于训练数据和设计的质量和范围不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Obesity 医学-内分泌学与代谢

CiteScore

10.00

自引率

2.00%

发文量

221

审稿时长

3 months

期刊介绍： The International Journal of Obesity is a multi-disciplinary forum for research describing basic, clinical and applied studies in biochemistry, physiology, genetics and nutrition, molecular, metabolic, psychological and epidemiological aspects of obesity and related disorders. We publish a range of content types including original research articles, technical reports, reviews, correspondence and brief communications that elaborate on significant advances in the field and cover topical issues.