从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验

Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain
{"title":"从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\nlike hallucinations, incomplete information, and bias hinder their reliability.\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\nplatform, enabling developers to create LLM-powered chatbots with integrated\nexpert verification. CataractBot, its first implementation, provides\nexpert-verified responses to cataract surgery questions. A pilot evaluation\nshowed its potential; however the study had a small sample size and was\nprimarily qualitative. In this work, we conducted a large-scale 24-week\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\nmessages, with 91.71\\% of responses verified by seven experts. Analysis of\ninteraction logs revealed that medical questions significantly outnumbered\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\% of\nmedical answers as accurate. As the knowledge base expanded with expert\ncorrections, system performance improved by 19.02\\%, reducing expert workload.\nThese insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot\",\"authors\":\"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain\",\"doi\":\"arxiv-2409.10354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) are widely used in healthcare, but limitations\\nlike hallucinations, incomplete information, and bias hinder their reliability.\\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\\nplatform, enabling developers to create LLM-powered chatbots with integrated\\nexpert verification. CataractBot, its first implementation, provides\\nexpert-verified responses to cataract surgery questions. A pilot evaluation\\nshowed its potential; however the study had a small sample size and was\\nprimarily qualitative. In this work, we conducted a large-scale 24-week\\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\\nmessages, with 91.71\\\\% of responses verified by seven experts. Analysis of\\ninteraction logs revealed that medical questions significantly outnumbered\\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\\\% of\\nmedical answers as accurate. As the knowledge base expanded with expert\\ncorrections, system performance improved by 19.02\\\\%, reducing expert workload.\\nThese insights guide the design of future LLM-powered chatbots.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大语言模型(LLM)被广泛应用于医疗保健领域,但幻觉、信息不完整和偏见等局限性阻碍了其可靠性。为了解决这些问题,研究人员发布了 "打造你自己的专家机器人"(BYOeB)平台,使开发人员能够创建由 LLM 驱动的聊天机器人,并集成专家验证功能。白内障机器人(CataractBot)是该平台的首款应用,它为白内障手术问题提供了经过专家验证的回复。一项试点评估显示了它的潜力,但这项研究的样本量很小,而且主要是定性研究。在这项工作中,我们对 CataractBot 进行了为期 24 周的大规模部署,共有 318 名患者和护理人员参与,他们发送了 1,992 条信息,其中 91.71% 的回复经过了七位专家的验证。对交互日志的分析表明,医疗问题明显多于逻辑问题,幻觉几乎可以忽略不计,专家们认为 84.52% 的医疗回答是准确的。随着知识库在专家纠正下不断扩大,系统性能提高了 19.02%,减少了专家的工作量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot
Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71\% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52\% of medical answers as accurate. As the knowledge base expanded with expert corrections, system performance improved by 19.02\%, reducing expert workload. These insights guide the design of future LLM-powered chatbots.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信