Training ChatGPT for surgical decisions: Bariatric surgery analysis using algorithms and evidence

IF 2.5 4区医学 Q3 ENDOCRINOLOGY & METABOLISM

Obesity research & clinical practice Pub Date : 2025-07-01 DOI:10.1016/j.orcp.2025.08.002

Sergi Sanchez-Cordero , Ruth Lopez-Gonzalez , Helena Fernandez , Jordi Pujol-Gebellí

{"title":"Training ChatGPT for surgical decisions: Bariatric surgery analysis using algorithms and evidence","authors":"Sergi Sanchez-Cordero , Ruth Lopez-Gonzalez , Helena Fernandez , Jordi Pujol-Gebellí","doi":"10.1016/j.orcp.2025.08.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Selecting the most appropriate bariatric surgery (BS) technique is a complex, individualized process. Artificial intelligence (AI) tools like ChatGPT may assist, but their clinical utility is unclear. This study evaluates whether ChatGPT’s recommendations for BS improve after exposure to scientific literature and how they align with real-world clinical decisions.</div></div><div><h3>Methods</h3><div>A retrospective single-center study included 283 patients who underwent primary BS between 2023 and 2025. No exclusion criteria were applied. Clinical variables (age, sex, BMI, comorbidities, and preoperative data) were collected. ChatGPT was asked to recommend the most suitable BS technique for each patient profile, first without context and then after being exposed to 412 open-access scientific articles. Recommendations were compared with actual clinical decisions using percentage agreement and Cohen’s Kappa.</div></div><div><h3>Results</h3><div>Initially, ChatGPT favored sleeve gastrectomy (SG, 56.8 %), followed by Roux-en-Y gastric bypass (RYGB, 26.8 %) and one-anastomosis gastric bypass (OAGB, 16.4 %); SADI-S was never suggested. Concordance with clinical practice was 20.0 % (Kappa = 0.003; p = 0.96). After training, SG recommendations decreased (35.7 %), RYGB increased (30.3 %), SADI-S emerged (17.1 %), and dual options appeared in 4 %. Concordance improved modestly to 25.8 % (Kappa = 0.068; p = 0.29), with a significant shift in global distribution (p < 0.00001).</div></div><div><h3>Conclusions</h3><div>ChatGPT adapts its recommendations after contextual training, but concordance with clinical judgment remains low. While potentially useful as an educational tool, ChatGPT is not yet reliable for autonomous surgical decision-making.</div></div>","PeriodicalId":19408,"journal":{"name":"Obesity research & clinical practice","volume":"19 4","pages":"Pages 352-355"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Obesity research & clinical practice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1871403X25000948","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Selecting the most appropriate bariatric surgery (BS) technique is a complex, individualized process. Artificial intelligence (AI) tools like ChatGPT may assist, but their clinical utility is unclear. This study evaluates whether ChatGPT’s recommendations for BS improve after exposure to scientific literature and how they align with real-world clinical decisions.

Methods

A retrospective single-center study included 283 patients who underwent primary BS between 2023 and 2025. No exclusion criteria were applied. Clinical variables (age, sex, BMI, comorbidities, and preoperative data) were collected. ChatGPT was asked to recommend the most suitable BS technique for each patient profile, first without context and then after being exposed to 412 open-access scientific articles. Recommendations were compared with actual clinical decisions using percentage agreement and Cohen’s Kappa.

Results

Initially, ChatGPT favored sleeve gastrectomy (SG, 56.8 %), followed by Roux-en-Y gastric bypass (RYGB, 26.8 %) and one-anastomosis gastric bypass (OAGB, 16.4 %); SADI-S was never suggested. Concordance with clinical practice was 20.0 % (Kappa = 0.003; p = 0.96). After training, SG recommendations decreased (35.7 %), RYGB increased (30.3 %), SADI-S emerged (17.1 %), and dual options appeared in 4 %. Concordance improved modestly to 25.8 % (Kappa = 0.068; p = 0.29), with a significant shift in global distribution (p < 0.00001).

Conclusions

ChatGPT adapts its recommendations after contextual training, but concordance with clinical judgment remains low. While potentially useful as an educational tool, ChatGPT is not yet reliable for autonomous surgical decision-making.

查看原文本刊更多论文

训练ChatGPT用于手术决策：使用算法和证据进行减肥手术分析。

背景：选择最合适的减肥手术（BS）技术是一个复杂的、个性化的过程。ChatGPT等人工智能（AI）工具可能会有所帮助，但它们的临床用途尚不清楚。本研究评估了ChatGPT在接触科学文献后对BS的建议是否有所改善，以及它们如何与现实世界的临床决策相一致。方法：一项回顾性单中心研究包括283例在2023年至2025年间接受原发性BS的患者。未采用排除标准。收集临床变量（年龄、性别、BMI、合并症和术前数据）。ChatGPT被要求为每个患者推荐最合适的BS技术，首先在没有背景的情况下，然后在接触了412篇开放获取的科学文章后。建议比较实际临床决策使用百分比协议和科恩的Kappa。结果：ChatGPT最倾向于套筒胃切除术（SG, 56.8% %），其次是Roux-en-Y胃旁路术（RYGB, 26.8% %）和单吻合胃旁路术（OAGB, 16.4% %）；SADI-S从未被提及。与临床实践的一致性为20.0 % (Kappa = 0.003； = 0.96页)。训练后SG推荐值降低（35.7 %），RYGB推荐值增加（30.3 %），SADI-S出现（17.1 %），双选项出现（4 %）。一致性适度改善至25.8 % (Kappa = 0.068；p = 0.29)，全球分布发生了显著变化(p 结论：ChatGPT在情境训练后调整了其推荐，但与临床判断的一致性仍然很低。虽然ChatGPT作为一种教育工具可能有用，但在自主手术决策方面尚不可靠。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Obesity research & clinical practice 医学-内分泌学与代谢

CiteScore

7.10

自引率

0.00%

发文量

审稿时长

49 days

期刊介绍： The aim of Obesity Research & Clinical Practice (ORCP) is to publish high quality clinical and basic research relating to the epidemiology, mechanism, complications and treatment of obesity and the complication of obesity. Studies relating to the Asia Oceania region are particularly welcome, given the increasing burden of obesity in Asia Pacific, compounded by specific regional population-based and genetic issues, and the devastating personal and economic consequences. The journal aims to expose health care practitioners, clinical researchers, basic scientists, epidemiologists, and public health officials in the region to all areas of obesity research and practice. In addition to original research the ORCP publishes reviews, patient reports, short communications, and letters to the editor (including comments on published papers). The proceedings and abstracts of the Annual Meeting of the Asia Oceania Association for the Study of Obesity is published as a supplement each year.