Aaron N. Hendizadeh , Nicholas Schmitz , Annie Fritsch , Lauren Martin , Jaime Pardo Palau , Christodoulos Kaoutzanis , Mamtha Raj , David E. Kurlander , Darren Nin , George Kokosis
{"title":"Evaluating Google Gemini’s AI generated responses to questions after breast reconstruction","authors":"Aaron N. Hendizadeh , Nicholas Schmitz , Annie Fritsch , Lauren Martin , Jaime Pardo Palau , Christodoulos Kaoutzanis , Mamtha Raj , David E. Kurlander , Darren Nin , George Kokosis","doi":"10.1016/j.bjps.2025.04.021","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The purpose of this study was to evaluate Google Gemini’s responses to common post-operative questions pertaining to breast reconstruction surgery.</div></div><div><h3>Methods</h3><div>Google Gemini AI was prompted with 14 common post-operative questions related to breast reconstruction surgery. Four experienced breast reconstructive surgeons and four advanced practice providers (APPs) evaluated the responses for accuracy, completeness, relevance, and overall quality on a 4-point Likert scale. Median scores were calculated and utilized as the final score. Responses were further categorized as accurate vs inaccurate, complete vs incomplete, relevant vs irrelevant, and high vs low quality. Readability was evaluated using the Fleisch-Kincaid reading scale.</div></div><div><h3>Results</h3><div>Attending surgeons classified 12/14 responses (86%) as accurate, 12/14 (86%) as complete, 13/14 (93%) as relevant, and 12/14 (86%) as high quality. APPs rated 11/14 responses (79%) as accurate, 12/14 (86%) as complete, 14 /14 (100%) as relevant, and 10/14 (71%) as high quality. APPs assigned lower median scores for overall quality than physicians (p=0.003). The mean Flesch-Kincaid readability score was 52.3.</div></div><div><h3>Conclusion</h3><div>Google Gemini provided relevant and complete responses to common post-operative questions pertaining to breast reconstruction. The “fairly difficult” readability score may pose challenges for certain patient populations. Differences in scores between physicians and APPs for “overall quality” may be due to higher levels of experience among physicians, allowing them to evaluate answers in a broader context. While Google Gemini demonstrates potential as a tool for patient education, patients should be advised to consult with clinicians for the most reliable and personalized medical advice.</div></div>","PeriodicalId":50084,"journal":{"name":"Journal of Plastic Reconstructive and Aesthetic Surgery","volume":"105 ","pages":"Pages 185-188"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Reconstructive and Aesthetic Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1748681525002645","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
The purpose of this study was to evaluate Google Gemini’s responses to common post-operative questions pertaining to breast reconstruction surgery.
Methods
Google Gemini AI was prompted with 14 common post-operative questions related to breast reconstruction surgery. Four experienced breast reconstructive surgeons and four advanced practice providers (APPs) evaluated the responses for accuracy, completeness, relevance, and overall quality on a 4-point Likert scale. Median scores were calculated and utilized as the final score. Responses were further categorized as accurate vs inaccurate, complete vs incomplete, relevant vs irrelevant, and high vs low quality. Readability was evaluated using the Fleisch-Kincaid reading scale.
Results
Attending surgeons classified 12/14 responses (86%) as accurate, 12/14 (86%) as complete, 13/14 (93%) as relevant, and 12/14 (86%) as high quality. APPs rated 11/14 responses (79%) as accurate, 12/14 (86%) as complete, 14 /14 (100%) as relevant, and 10/14 (71%) as high quality. APPs assigned lower median scores for overall quality than physicians (p=0.003). The mean Flesch-Kincaid readability score was 52.3.
Conclusion
Google Gemini provided relevant and complete responses to common post-operative questions pertaining to breast reconstruction. The “fairly difficult” readability score may pose challenges for certain patient populations. Differences in scores between physicians and APPs for “overall quality” may be due to higher levels of experience among physicians, allowing them to evaluate answers in a broader context. While Google Gemini demonstrates potential as a tool for patient education, patients should be advised to consult with clinicians for the most reliable and personalized medical advice.
期刊介绍:
JPRAS An International Journal of Surgical Reconstruction is one of the world''s leading international journals, covering all the reconstructive and aesthetic aspects of plastic surgery.
The journal presents the latest surgical procedures with audit and outcome studies of new and established techniques in plastic surgery including: cleft lip and palate and other heads and neck surgery, hand surgery, lower limb trauma, burns, skin cancer, breast surgery and aesthetic surgery.