Pamela Munguía-Realpozo, Claudia Mendoza-Pinto, Ivet Etchegaray-Morales, Edith Ramírez-Lara, Juan Carlos Solis-Poblano, Socorro Méndez-Martínez, Laura Serrano Vertiz, Jorge Ayón-Aguilar
{"title":"Evaluating large language models as a supplementary patient information resource on antimalarial use in systemic lupus erythematosus.","authors":"Pamela Munguía-Realpozo, Claudia Mendoza-Pinto, Ivet Etchegaray-Morales, Edith Ramírez-Lara, Juan Carlos Solis-Poblano, Socorro Méndez-Martínez, Laura Serrano Vertiz, Jorge Ayón-Aguilar","doi":"10.1177/09612033251324501","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To assess the accuracy, completeness, and reproducibility of Large Language Models (LLMs) (Copilot, GPT-3.5, and GPT-4) on antimalarial use in systemic lupus erythematosus (SLE).</p><p><strong>Materials and methods: </strong>We utilized 13 questions derived from patient surveys and common inquiries from the National Health Service. Two independent rheumatologists assessed responses from the LLMs using predefined Likert scales for accuracy, completeness, and reproducibility.</p><p><strong>Results: </strong>The GPT models and Copilot achieved high scores in accuracy. However, the completeness of outputs was rated at 38.5%, 55.9%, and 92.3% for Copilot, GPT-3.5, and GPT-4. When questions related to \"mechanism of action\" and \"lifestyle\", were analyzed for completeness (<i>n</i> = 8), ChatGPT-4 scored significantly higher (100%) compared to Copilot (37.5%). In contrast, questions related to \"side-effects\" (<i>n</i> = 5) scored higher for ChatGPT models than Copilot, and the differences were not statistically significant. All three LLMs demonstrated high reproducibility, with rates ranging from 84.6% to 92.3%.</p><p><strong>Conclusions: </strong>Advanced LLMs like GPT -4 offer significant promise in enhancing patients' understanding of antimalarial therapy in SLE. Although chatbots' capability can potentially bridge the information gap patients face, the performance and limitations of such tools need further exploration to optimize their use in clinical settings.</p>","PeriodicalId":18044,"journal":{"name":"Lupus","volume":" ","pages":"9612033251324501"},"PeriodicalIF":1.9000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lupus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09612033251324501","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To assess the accuracy, completeness, and reproducibility of Large Language Models (LLMs) (Copilot, GPT-3.5, and GPT-4) on antimalarial use in systemic lupus erythematosus (SLE).
Materials and methods: We utilized 13 questions derived from patient surveys and common inquiries from the National Health Service. Two independent rheumatologists assessed responses from the LLMs using predefined Likert scales for accuracy, completeness, and reproducibility.
Results: The GPT models and Copilot achieved high scores in accuracy. However, the completeness of outputs was rated at 38.5%, 55.9%, and 92.3% for Copilot, GPT-3.5, and GPT-4. When questions related to "mechanism of action" and "lifestyle", were analyzed for completeness (n = 8), ChatGPT-4 scored significantly higher (100%) compared to Copilot (37.5%). In contrast, questions related to "side-effects" (n = 5) scored higher for ChatGPT models than Copilot, and the differences were not statistically significant. All three LLMs demonstrated high reproducibility, with rates ranging from 84.6% to 92.3%.
Conclusions: Advanced LLMs like GPT -4 offer significant promise in enhancing patients' understanding of antimalarial therapy in SLE. Although chatbots' capability can potentially bridge the information gap patients face, the performance and limitations of such tools need further exploration to optimize their use in clinical settings.
期刊介绍:
The only fully peer reviewed international journal devoted exclusively to lupus (and related disease) research. Lupus includes the most promising new clinical and laboratory-based studies from leading specialists in all lupus-related disciplines. Invaluable reading, with extended coverage, lupus-related disciplines include: Rheumatology, Dermatology, Immunology, Obstetrics, Psychiatry and Cardiovascular Research…