Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot

arXiv - CS - Human-Computer Interaction Pub Date : 2024-09-16 DOI:arxiv-2409.10354

Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain

{"title":"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\nlike hallucinations, incomplete information, and bias hinder their reliability.\nTo address these, researchers released the Build Your Own expert Bot (BYOeB)\nplatform, enabling developers to create LLM-powered chatbots with integrated\nexpert verification. CataractBot, its first implementation, provides\nexpert-verified responses to cataract surgery questions. A pilot evaluation\nshowed its potential; however the study had a small sample size and was\nprimarily qualitative. In this work, we conducted a large-scale 24-week\ndeployment of CataractBot involving 318 patients and attendants who sent 1,992\nmessages, with 91.71\\% of responses verified by seven experts. Analysis of\ninteraction logs revealed that medical questions significantly outnumbered\nlogistical ones, hallucinations were negligible, and experts rated 84.52\\% of\nmedical answers as accurate. As the knowledge base expanded with expert\ncorrections, system performance improved by 19.02\\%, reducing expert workload.\nThese insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71\% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52\% of medical answers as accurate. As the knowledge base expanded with expert corrections, system performance improved by 19.02\%, reducing expert workload. These insights guide the design of future LLM-powered chatbots.

查看原文本刊更多论文

从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验

大语言模型（LLM）被广泛应用于医疗保健领域，但幻觉、信息不完整和偏见等局限性阻碍了其可靠性。为了解决这些问题，研究人员发布了 "打造你自己的专家机器人"（BYOeB）平台，使开发人员能够创建由 LLM 驱动的聊天机器人，并集成专家验证功能。白内障机器人（CataractBot）是该平台的首款应用，它为白内障手术问题提供了经过专家验证的回复。一项试点评估显示了它的潜力，但这项研究的样本量很小，而且主要是定性研究。在这项工作中，我们对 CataractBot 进行了为期 24 周的大规模部署，共有 318 名患者和护理人员参与，他们发送了 1,992 条信息，其中 91.71% 的回复经过了七位专家的验证。对交互日志的分析表明，医疗问题明显多于逻辑问题，幻觉几乎可以忽略不计，专家们认为 84.52% 的医疗回答是准确的。随着知识库在专家纠正下不断扩大，系统性能提高了 19.02%，减少了专家的工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Human-Computer Interaction

自引率

0.00%

发文量