Evaluating Google Gemini’s AI generated responses to questions after breast reconstruction

IF 2 3区医学 Q2 SURGERY

Journal of Plastic Reconstructive and Aesthetic Surgery Pub Date : 2025-04-17 DOI:10.1016/j.bjps.2025.04.021

Aaron N. Hendizadeh , Nicholas Schmitz , Annie Fritsch , Lauren Martin , Jaime Pardo Palau , Christodoulos Kaoutzanis , Mamtha Raj , David E. Kurlander , Darren Nin , George Kokosis

{"title":"Evaluating Google Gemini’s AI generated responses to questions after breast reconstruction","authors":"Aaron N. Hendizadeh , Nicholas Schmitz , Annie Fritsch , Lauren Martin , Jaime Pardo Palau , Christodoulos Kaoutzanis , Mamtha Raj , David E. Kurlander , Darren Nin , George Kokosis","doi":"10.1016/j.bjps.2025.04.021","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The purpose of this study was to evaluate Google Gemini’s responses to common post-operative questions pertaining to breast reconstruction surgery.</div></div><div><h3>Methods</h3><div>Google Gemini AI was prompted with 14 common post-operative questions related to breast reconstruction surgery. Four experienced breast reconstructive surgeons and four advanced practice providers (APPs) evaluated the responses for accuracy, completeness, relevance, and overall quality on a 4-point Likert scale. Median scores were calculated and utilized as the final score. Responses were further categorized as accurate vs inaccurate, complete vs incomplete, relevant vs irrelevant, and high vs low quality. Readability was evaluated using the Fleisch-Kincaid reading scale.</div></div><div><h3>Results</h3><div>Attending surgeons classified 12/14 responses (86%) as accurate, 12/14 (86%) as complete, 13/14 (93%) as relevant, and 12/14 (86%) as high quality. APPs rated 11/14 responses (79%) as accurate, 12/14 (86%) as complete, 14 /14 (100%) as relevant, and 10/14 (71%) as high quality. APPs assigned lower median scores for overall quality than physicians (p=0.003). The mean Flesch-Kincaid readability score was 52.3.</div></div><div><h3>Conclusion</h3><div>Google Gemini provided relevant and complete responses to common post-operative questions pertaining to breast reconstruction. The “fairly difficult” readability score may pose challenges for certain patient populations. Differences in scores between physicians and APPs for “overall quality” may be due to higher levels of experience among physicians, allowing them to evaluate answers in a broader context. While Google Gemini demonstrates potential as a tool for patient education, patients should be advised to consult with clinicians for the most reliable and personalized medical advice.</div></div>","PeriodicalId":50084,"journal":{"name":"Journal of Plastic Reconstructive and Aesthetic Surgery","volume":"105 ","pages":"Pages 185-188"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Reconstructive and Aesthetic Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1748681525002645","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The purpose of this study was to evaluate Google Gemini’s responses to common post-operative questions pertaining to breast reconstruction surgery.

Methods

Google Gemini AI was prompted with 14 common post-operative questions related to breast reconstruction surgery. Four experienced breast reconstructive surgeons and four advanced practice providers (APPs) evaluated the responses for accuracy, completeness, relevance, and overall quality on a 4-point Likert scale. Median scores were calculated and utilized as the final score. Responses were further categorized as accurate vs inaccurate, complete vs incomplete, relevant vs irrelevant, and high vs low quality. Readability was evaluated using the Fleisch-Kincaid reading scale.

Results

Attending surgeons classified 12/14 responses (86%) as accurate, 12/14 (86%) as complete, 13/14 (93%) as relevant, and 12/14 (86%) as high quality. APPs rated 11/14 responses (79%) as accurate, 12/14 (86%) as complete, 14 /14 (100%) as relevant, and 10/14 (71%) as high quality. APPs assigned lower median scores for overall quality than physicians (p=0.003). The mean Flesch-Kincaid readability score was 52.3.

Conclusion

Google Gemini provided relevant and complete responses to common post-operative questions pertaining to breast reconstruction. The “fairly difficult” readability score may pose challenges for certain patient populations. Differences in scores between physicians and APPs for “overall quality” may be due to higher levels of experience among physicians, allowing them to evaluate answers in a broader context. While Google Gemini demonstrates potential as a tool for patient education, patients should be advised to consult with clinicians for the most reliable and personalized medical advice.

查看原文本刊更多论文

评估谷歌Gemini的人工智能对乳房重建后问题的回答

本研究的目的是评估谷歌Gemini对与乳房重建手术有关的常见术后问题的反应。方法使用google Gemini AI查询乳房再造术后常见的14个问题。4名经验丰富的乳房重建外科医生和4名高级执业医师（APPs）对回答的准确性、完整性、相关性和整体质量进行了4分李克特评分。计算中位数得分作为最终得分。回答进一步分类为准确与不准确，完整与不完整，相关与不相关，高质量与低质量。采用Fleisch-Kincaid阅读量表评估可读性。结果主治医师对12/14反应的分类为准确（86%）、完整（86%）、相关（93%）和高质量（86%）。app将11/14（79%）的回答评为准确，12/14（86%）为完整，14 /14（100%）为相关，10/14（71%）为高质量。应用程序对整体质量的中位数评分低于医生（p=0.003）。Flesch-Kincaid的平均可读性评分为52.3分。结论google Gemini对乳房再造术后常见问题提供了相关且完整的回答。“相当难”的可读性评分可能对某些患者群体构成挑战。医生和app在“整体质量”方面的得分差异可能是由于医生的经验水平更高，这使他们能够在更广泛的背景下评估答案。虽然谷歌Gemini展示了作为患者教育工具的潜力，但应该建议患者咨询临床医生，以获得最可靠和个性化的医疗建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Plastic Reconstructive and Aesthetic Surgery 医学-外科

CiteScore

3.10

自引率

11.10%

发文量

578

审稿时长

3.5 months

期刊介绍： JPRAS An International Journal of Surgical Reconstruction is one of the world''s leading international journals, covering all the reconstructive and aesthetic aspects of plastic surgery. The journal presents the latest surgical procedures with audit and outcome studies of new and established techniques in plastic surgery including: cleft lip and palate and other heads and neck surgery, hand surgery, lower limb trauma, burns, skin cancer, breast surgery and aesthetic surgery.