Development and evaluation of a lightweight large language model chatbot for medication enquiry.

IF 7.7
PLOS digital health Pub Date : 2025-09-04 eCollection Date: 2025-09-01 DOI:10.1371/journal.pdig.0000961
Kabilan Elangovan, Jasmine Chiat Ling Ong, Liyuan Jin, Benjamin Jun Jie Seng, Yu Heng Kwan, Lit Soo Ng, Ryan Jian Zhong, Justina Koi Li Ma, Yu He Ke, Nan Liu, Kathleen M Giacomini, Daniel Shu Wei Ting
{"title":"Development and evaluation of a lightweight large language model chatbot for medication enquiry.","authors":"Kabilan Elangovan, Jasmine Chiat Ling Ong, Liyuan Jin, Benjamin Jun Jie Seng, Yu Heng Kwan, Lit Soo Ng, Ryan Jian Zhong, Justina Koi Li Ma, Yu He Ke, Nan Liu, Kathleen M Giacomini, Daniel Shu Wei Ting","doi":"10.1371/journal.pdig.0000961","DOIUrl":null,"url":null,"abstract":"<p><p>Large Language Models (LLMs) show promise in augmenting digital health applications. However, development and scaling of large models face computational constraints, data security concerns and limitations of internet accessibility in some regions. We developed and tested Med-Pal, a medical domain-specific LLM-chatbot fine-tuned with a fine-grained, expert curated medication-enquiry dataset consisting of 1,100 question and answer pairs. We trained and validated five light-weight, open-source LLMs of smaller parameter size (7 billion or less) on a validation dataset of 231 medication-related enquiries. We introduce SCORE, an LLM-specific evaluation criteria for clinical adjudication of LLM responses, performed by a multidisciplinary expert team. The best performing lighted-weight LLM was chosen as Med-Pal for further engineering with guard-railing against adversarial prompts. Med-Pal outperformed Biomistral and Meerkat, achieving 71.9% high-quality responses in a separate testing dataset. Med-Pal's light-weight architecture, clinical alignment and safety guardrails enable implementation under varied settings, including those with limited digital infrastructure.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 9","pages":"e0000961"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12410746/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Models (LLMs) show promise in augmenting digital health applications. However, development and scaling of large models face computational constraints, data security concerns and limitations of internet accessibility in some regions. We developed and tested Med-Pal, a medical domain-specific LLM-chatbot fine-tuned with a fine-grained, expert curated medication-enquiry dataset consisting of 1,100 question and answer pairs. We trained and validated five light-weight, open-source LLMs of smaller parameter size (7 billion or less) on a validation dataset of 231 medication-related enquiries. We introduce SCORE, an LLM-specific evaluation criteria for clinical adjudication of LLM responses, performed by a multidisciplinary expert team. The best performing lighted-weight LLM was chosen as Med-Pal for further engineering with guard-railing against adversarial prompts. Med-Pal outperformed Biomistral and Meerkat, achieving 71.9% high-quality responses in a separate testing dataset. Med-Pal's light-weight architecture, clinical alignment and safety guardrails enable implementation under varied settings, including those with limited digital infrastructure.

Abstract Image

Abstract Image

Abstract Image

一种用于药物查询的轻量级大语言模型聊天机器人的开发与评估。
大型语言模型(llm)在增强数字健康应用方面显示出前景。然而,在某些地区,大型模型的开发和扩展面临着计算限制、数据安全问题和互联网可访问性的限制。我们开发并测试了Med-Pal,这是一个医疗领域特定的llm聊天机器人,通过由1,100对问题和答案组成的细粒度专家策划的药物查询数据集进行了微调。我们在231个药物相关查询的验证数据集上训练和验证了5个轻量级的、较小参数大小(70亿或更少)的开源llm。我们引入SCORE,这是一个由多学科专家团队执行的法学硕士临床疗效评估标准。表现最好的轻型LLM被选为Med-Pal,用于进一步的工程,具有对抗提示的防护栏杆。Med-Pal优于Biomistral和Meerkat,在单独的测试数据集中实现了71.9%的高质量响应。Med-Pal的轻量级架构、临床校准和安全护栏使其能够在各种环境下实施,包括那些数字基础设施有限的环境。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信