AI-Driven Large Language Models in Health Consultations for HIV Patients.

IF 2.4 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Journal of Multidisciplinary Healthcare Pub Date : 2025-08-25 eCollection Date: 2025-01-01 DOI:10.2147/JMDH.S533621

Chun-Yan Zhao, Chang Song, Tong Yang, Ai-Chun Huang, Hang-Biao Qiang, Chun-Ming Gong, Jing-Song Chen, Qing-Dong Zhu

{"title":"AI-Driven Large Language Models in Health Consultations for HIV Patients.","authors":"Chun-Yan Zhao, Chang Song, Tong Yang, Ai-Chun Huang, Hang-Biao Qiang, Chun-Ming Gong, Jing-Song Chen, Qing-Dong Zhu","doi":"10.2147/JMDH.S533621","DOIUrl":null,"url":null,"abstract":"Purpose: This study endeavors to conduct a comprehensive assessment on the performance of large language models (LLMs) in health consultation for individuals living with HIV, delve into their applicability across a diverse array of dimensions, and provide evidence-based support for clinical deployment.Patients and methods: A 23-question multi-dimensional HIV-specific question bank was developed, covering fundamental knowledge, diagnosis, treatment, prognosis, and case analysis. Four advanced LLMs-ChatGPT-4o, Copilot, Gemini, and Claude-were tested using a multi-dimensional evaluation system assessing medical accuracy, comprehensiveness, understandability, reliability, and humanistic care (which encompasses elements such as individual needs attention, emotional support, and ethical considerations). A five-point Likert scale was employed, with three experts independently scoring. Statistical metrics (mean, standard deviation, standard error) were calculated, followed by consistency analysis, difference analysis, and post-hoc testing.Results: Claude obtained the most outstanding performance with regard to information comprehensiveness (mean score 4.333), understandability (mean score 3.797), and humanistic care (mean score 2.855); Copilot demonstrated proficiency in diagnostic questions (mean score 3.880); Gemini illustrated exceptional performance in case analysis (mean score 4.111). Based on the post-hoc analysis, Claude outperformed other models in thoroughness and humanistic care (P < 0.05). Copilot showed better performance than ChatGPT in understandability (P = 0.045), while Gemini performed significantly better than ChatGPT in case analysis (P < 0.001). It is important to note that performance varied across tasks, and humanistic care remained a consistent weak point across all models.Conclusion: The superiority of diverse models in specific tasks suggest that LLMs hold extensive application potential in the management of HIV patients. Nevertheless, their efficacy in the realm of humanistic care still needs improvement.","PeriodicalId":16357,"journal":{"name":"Journal of Multidisciplinary Healthcare","volume":"18 ","pages":"5187-5198"},"PeriodicalIF":2.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12396217/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multidisciplinary Healthcare","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/JMDH.S533621","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study endeavors to conduct a comprehensive assessment on the performance of large language models (LLMs) in health consultation for individuals living with HIV, delve into their applicability across a diverse array of dimensions, and provide evidence-based support for clinical deployment.

Patients and methods: A 23-question multi-dimensional HIV-specific question bank was developed, covering fundamental knowledge, diagnosis, treatment, prognosis, and case analysis. Four advanced LLMs-ChatGPT-4o, Copilot, Gemini, and Claude-were tested using a multi-dimensional evaluation system assessing medical accuracy, comprehensiveness, understandability, reliability, and humanistic care (which encompasses elements such as individual needs attention, emotional support, and ethical considerations). A five-point Likert scale was employed, with three experts independently scoring. Statistical metrics (mean, standard deviation, standard error) were calculated, followed by consistency analysis, difference analysis, and post-hoc testing.

Results: Claude obtained the most outstanding performance with regard to information comprehensiveness (mean score 4.333), understandability (mean score 3.797), and humanistic care (mean score 2.855); Copilot demonstrated proficiency in diagnostic questions (mean score 3.880); Gemini illustrated exceptional performance in case analysis (mean score 4.111). Based on the post-hoc analysis, Claude outperformed other models in thoroughness and humanistic care (P < 0.05). Copilot showed better performance than ChatGPT in understandability (P = 0.045), while Gemini performed significantly better than ChatGPT in case analysis (P < 0.001). It is important to note that performance varied across tasks, and humanistic care remained a consistent weak point across all models.

Conclusion: The superiority of diverse models in specific tasks suggest that LLMs hold extensive application potential in the management of HIV patients. Nevertheless, their efficacy in the realm of humanistic care still needs improvement.

Abstract Image

查看原文本刊更多论文

ai驱动的大型语言模型在HIV患者健康咨询中的应用。

目的：本研究旨在全面评估大型语言模型（llm）在HIV感染者健康咨询中的表现，深入研究其在不同维度上的适用性，并为临床部署提供循证支持。患者和方法：开发了一个23题的多维hiv特异性题库，涵盖基础知识、诊断、治疗、预后和病例分析。四个高级llms - chatgpt - 40、Copilot、Gemini和claude -使用多维评估系统进行测试，评估医疗准确性、全面性、可理解性、可靠性和人文关怀（包括个人需求关注、情感支持和道德考虑等因素）。采用李克特五点量表，由三位专家独立评分。计算统计指标（均值、标准差、标准误差），然后进行一致性分析、差异分析和事后检验。结果：Claude在信息综合性（平均4.333分）、可理解性（平均3.797分）和人文关怀（平均2.855分）方面表现最为突出；副驾驶在诊断问题上表现熟练（平均得分3.880）；双子座在案例分析中表现出色（平均得分4.111）。事后分析，Claude模型在彻彻性和人文关怀方面优于其他模型（P < 0.05）。Copilot在可理解性方面优于ChatGPT (P = 0.045)， Gemini在案例分析方面优于ChatGPT （P < 0.001）。需要注意的是，不同任务的性能是不同的，而人文关怀在所有模型中仍然是一个一致的弱点。结论：不同模型在具体任务上的优势表明llm在HIV患者管理中具有广泛的应用潜力。然而，它们在人文关怀领域的功效仍有待提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Multidisciplinary Healthcare Nursing-General Nursing

CiteScore

4.60

自引率

3.00%

发文量

287

审稿时长

16 weeks

期刊介绍： The Journal of Multidisciplinary Healthcare (JMDH) aims to represent and publish research in healthcare areas delivered by practitioners of different disciplines. This includes studies and reviews conducted by multidisciplinary teams as well as research which evaluates or reports the results or conduct of such teams or healthcare processes in general. The journal covers a very wide range of areas and we welcome submissions from practitioners at all levels and from all over the world. Good healthcare is not bounded by person, place or time and the journal aims to reflect this. The JMDH is published as an open-access journal to allow this wide range of practical, patient relevant research to be immediately available to practitioners who can access and use it immediately upon publication.