Artificial Intelligence for Patient Support: Assessing Retrieval-Augmented Generation for Answering Postoperative Rhinoplasty Questions.

IF 3 2区医学 Q1 SURGERY

Aesthetic Surgery Journal Pub Date : 2025-06-16 DOI:10.1093/asj/sjaf038

Ariana Genovese, Srinivasagam Prabha, Sahar Borna, Cesar A Gomez-Cabello, Syed Ali Haider, Maissa Trabilsy, Cui Tao, Keith T Aziz, Peter M Murray, Antonio Jorge Forte

{"title":"Artificial Intelligence for Patient Support: Assessing Retrieval-Augmented Generation for Answering Postoperative Rhinoplasty Questions.","authors":"Ariana Genovese, Srinivasagam Prabha, Sahar Borna, Cesar A Gomez-Cabello, Syed Ali Haider, Maissa Trabilsy, Cui Tao, Keith T Aziz, Peter M Murray, Antonio Jorge Forte","doi":"10.1093/asj/sjaf038","DOIUrl":null,"url":null,"abstract":"Background: Although artificial intelligence (AI) is revolutionizing healthcare, inaccurate or incomplete information from pretrained large language models (LLMs) like ChatGPT poses significant risks to patient safety. Retrieval-augmented generation (RAG) offers a promising solution by leveraging curated knowledge bases to enhance accuracy and reliability, especially in high-demand specialties like plastic surgery.Objectives: This study evaluates the performance of RAG-enabled AI models in addressing postoperative rhinoplasty questions, aiming to assess their safety and identify necessary improvements for effective implementation into clinical care.Methods: Four RAG models (Gemini-1.0-Pro-002, Gemini-1.5-Flash-001, Gemini-1.5-Pro-001, and PaLM 2) were tested on 30 common patient inquiries. Responses, sourced from authoritative rhinoplasty texts, were evaluated for accuracy (1-5 scale), comprehensiveness (1-3 scale), readability (Flesch Reading Ease [FRE], Flesch-Kincaid Grade Level), and understandability/actionability (Patient Education Materials Assessment Tool). Statistical analyses included Wilcoxon rank sum, Armitage trend tests, and pairwise comparisons.Results: When responses were generated, they were generally accurate (41.7% completely accurate); however, a 30.8% nonresponse rate revealed potential challenges with query context interpretation and retrieval. Gemini-1.0-Pro-002 demonstrated superior comprehensiveness (P < .001), but readability (FRE: 40-49) and understandability (mean: 0.7) fell below patient education standards. PaLM 2 scored lowest in actionability (P < .007).Conclusions: This first application of RAG to postoperative rhinoplasty patient care highlights its strengths in accuracy alongside its limitations, including nonresponse and contextual understanding. Addressing these challenges will enable safer, more effective implementation of RAG models across diverse surgical and medical contexts, with the potential to revolutionize patient care by reducing physician workload while enhancing patient engagement.","PeriodicalId":7728,"journal":{"name":"Aesthetic Surgery Journal","volume":" ","pages":"735-744"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aesthetic Surgery Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/asj/sjaf038","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Although artificial intelligence (AI) is revolutionizing healthcare, inaccurate or incomplete information from pretrained large language models (LLMs) like ChatGPT poses significant risks to patient safety. Retrieval-augmented generation (RAG) offers a promising solution by leveraging curated knowledge bases to enhance accuracy and reliability, especially in high-demand specialties like plastic surgery.

Objectives: This study evaluates the performance of RAG-enabled AI models in addressing postoperative rhinoplasty questions, aiming to assess their safety and identify necessary improvements for effective implementation into clinical care.

Methods: Four RAG models (Gemini-1.0-Pro-002, Gemini-1.5-Flash-001, Gemini-1.5-Pro-001, and PaLM 2) were tested on 30 common patient inquiries. Responses, sourced from authoritative rhinoplasty texts, were evaluated for accuracy (1-5 scale), comprehensiveness (1-3 scale), readability (Flesch Reading Ease [FRE], Flesch-Kincaid Grade Level), and understandability/actionability (Patient Education Materials Assessment Tool). Statistical analyses included Wilcoxon rank sum, Armitage trend tests, and pairwise comparisons.

Results: When responses were generated, they were generally accurate (41.7% completely accurate); however, a 30.8% nonresponse rate revealed potential challenges with query context interpretation and retrieval. Gemini-1.0-Pro-002 demonstrated superior comprehensiveness (P < .001), but readability (FRE: 40-49) and understandability (mean: 0.7) fell below patient education standards. PaLM 2 scored lowest in actionability (P < .007).

Conclusions: This first application of RAG to postoperative rhinoplasty patient care highlights its strengths in accuracy alongside its limitations, including nonresponse and contextual understanding. Addressing these challenges will enable safer, more effective implementation of RAG models across diverse surgical and medical contexts, with the potential to revolutionize patient care by reducing physician workload while enhancing patient engagement.

查看原文本刊更多论文

人工智能患者支持：评估检索增强代回答术后鼻整形问题。

背景：虽然人工智能（AI）正在彻底改变医疗保健，但来自ChatGPT等预训练大型语言模型（llm）的不准确或不完整的信息对患者安全构成了重大风险。检索增强生成（RAG）提供了一个很有前途的解决方案，它利用经过整理的知识库来提高准确性和可靠性，特别是在整形外科等高要求的专业领域。目的：本研究评估基于rag的人工智能模型在解决鼻整形术后问题方面的表现，旨在评估其安全性并确定必要的改进措施，以便有效地在临床护理中实施。方法：对30例常见患者问诊的4种RAG模型（Gemini-1.0-Pro-002、Gemini-1.5-Flash-001、Gemini-1.5-Pro-001、PaLM 2）进行检测。对来自权威鼻整形文本的回答进行准确性（1-5分）、全面性（1-3分）、可读性（Flesch Reading Ease, Flesch- kincaid Grade Level）和可理解性/可操作性（患者教育材料评估工具）的评估。统计分析包括Wilcoxon秩和、Armitage趋势检验和两两比较。结果：问卷生成时，问卷基本准确（41.7%完全准确）；然而，30.8%的非响应率揭示了查询上下文解释和检索的潜在挑战。Gemini-1.0-Pro-002表现出卓越的全面性（p ）结论：RAG在鼻整形术后患者护理中的首次应用突出了其准确性的优势以及其局限性，包括无反应和上下文理解。解决这些挑战将使RAG模型能够在不同的外科和医疗环境中更安全、更有效地实施，并有可能通过减少医生工作量和提高患者参与度来彻底改变患者护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Aesthetic Surgery Journal SURGERY-

CiteScore

6.20

自引率

20.70%

发文量

309

审稿时长

6-12 weeks

期刊介绍： Aesthetic Surgery Journal is a peer-reviewed international journal focusing on scientific developments and clinical techniques in aesthetic surgery. The official publication of The Aesthetic Society, ASJ is also the official English-language journal of many major international societies of plastic, aesthetic and reconstructive surgery representing South America, Central America, Europe, Asia, and the Middle East. It is also the official journal of the British Association of Aesthetic Plastic Surgeons, the Canadian Society for Aesthetic Plastic Surgery and The Rhinoplasty Society.