Development and evaluation of a lightweight large language model chatbot for medication enquiry.

IF 7.7

PLOS digital health Pub Date : 2025-09-04 eCollection Date: 2025-09-01 DOI:10.1371/journal.pdig.0000961

Kabilan Elangovan, Jasmine Chiat Ling Ong, Liyuan Jin, Benjamin Jun Jie Seng, Yu Heng Kwan, Lit Soo Ng, Ryan Jian Zhong, Justina Koi Li Ma, Yu He Ke, Nan Liu, Kathleen M Giacomini, Daniel Shu Wei Ting

{"title":"Development and evaluation of a lightweight large language model chatbot for medication enquiry.","authors":"Kabilan Elangovan, Jasmine Chiat Ling Ong, Liyuan Jin, Benjamin Jun Jie Seng, Yu Heng Kwan, Lit Soo Ng, Ryan Jian Zhong, Justina Koi Li Ma, Yu He Ke, Nan Liu, Kathleen M Giacomini, Daniel Shu Wei Ting","doi":"10.1371/journal.pdig.0000961","DOIUrl":null,"url":null,"abstract":"<p><p>Large Language Models (LLMs) show promise in augmenting digital health applications. However, development and scaling of large models face computational constraints, data security concerns and limitations of internet accessibility in some regions. We developed and tested Med-Pal, a medical domain-specific LLM-chatbot fine-tuned with a fine-grained, expert curated medication-enquiry dataset consisting of 1,100 question and answer pairs. We trained and validated five light-weight, open-source LLMs of smaller parameter size (7 billion or less) on a validation dataset of 231 medication-related enquiries. We introduce SCORE, an LLM-specific evaluation criteria for clinical adjudication of LLM responses, performed by a multidisciplinary expert team. The best performing lighted-weight LLM was chosen as Med-Pal for further engineering with guard-railing against adversarial prompts. Med-Pal outperformed Biomistral and Meerkat, achieving 71.9% high-quality responses in a separate testing dataset. Med-Pal's light-weight architecture, clinical alignment and safety guardrails enable implementation under varied settings, including those with limited digital infrastructure.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 9","pages":"e0000961"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12410746/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) show promise in augmenting digital health applications. However, development and scaling of large models face computational constraints, data security concerns and limitations of internet accessibility in some regions. We developed and tested Med-Pal, a medical domain-specific LLM-chatbot fine-tuned with a fine-grained, expert curated medication-enquiry dataset consisting of 1,100 question and answer pairs. We trained and validated five light-weight, open-source LLMs of smaller parameter size (7 billion or less) on a validation dataset of 231 medication-related enquiries. We introduce SCORE, an LLM-specific evaluation criteria for clinical adjudication of LLM responses, performed by a multidisciplinary expert team. The best performing lighted-weight LLM was chosen as Med-Pal for further engineering with guard-railing against adversarial prompts. Med-Pal outperformed Biomistral and Meerkat, achieving 71.9% high-quality responses in a separate testing dataset. Med-Pal's light-weight architecture, clinical alignment and safety guardrails enable implementation under varied settings, including those with limited digital infrastructure.

Abstract Image

查看原文本刊更多论文

一种用于药物查询的轻量级大语言模型聊天机器人的开发与评估。

大型语言模型（llm）在增强数字健康应用方面显示出前景。然而，在某些地区，大型模型的开发和扩展面临着计算限制、数据安全问题和互联网可访问性的限制。我们开发并测试了Med-Pal，这是一个医疗领域特定的llm聊天机器人，通过由1,100对问题和答案组成的细粒度专家策划的药物查询数据集进行了微调。我们在231个药物相关查询的验证数据集上训练和验证了5个轻量级的、较小参数大小（70亿或更少）的开源llm。我们引入SCORE，这是一个由多学科专家团队执行的法学硕士临床疗效评估标准。表现最好的轻型LLM被选为Med-Pal，用于进一步的工程，具有对抗提示的防护栏杆。Med-Pal优于Biomistral和Meerkat，在单独的测试数据集中实现了71.9%的高质量响应。Med-Pal的轻量级架构、临床校准和安全护栏使其能够在各种环境下实施，包括那些数字基础设施有限的环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量