使用上下文增强的大型语言模型优化治疗学聊天机器人。

IF 13.3 1区 医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL
Theranostics Pub Date : 2025-04-21 eCollection Date: 2025-01-01 DOI:10.7150/thno.107757
Pia Koller, Christoph Clement, Albert van Eijk, Robert Seifert, Jingjing Zhang, George Prenosil, Mike M Sathekge, Ken Herrmann, Richard Baum, Wolfgang A Weber, Axel Rominger, Kuangyu Shi
{"title":"使用上下文增强的大型语言模型优化治疗学聊天机器人。","authors":"Pia Koller, Christoph Clement, Albert van Eijk, Robert Seifert, Jingjing Zhang, George Prenosil, Mike M Sathekge, Ken Herrmann, Richard Baum, Wolfgang A Weber, Axel Rominger, Kuangyu Shi","doi":"10.7150/thno.107757","DOIUrl":null,"url":null,"abstract":"<p><p><b>Introduction</b>: Nuclear medicine theranostics is rapidly emerging, as an interdisciplinary therapy option with multi-dimensional considerations. Healthcare Professionals do not have the time to do in depth research on every therapy option. Personalized Chatbots might help to educate them. Chatbots using Large Language Models (LLMs), such as ChatGPT, are gaining interest addressing these challenges. However, chatbot performances often fall short in specific domains, which is critical in healthcare applications. <b>Methods</b>: This study develops a framework to examine the use of contextual augmentation to improve the performance of medical theranostic chatbots to create the first theranostic chatbot. Contextual augmentation involves providing additional relevant information to LLMs to improve their responses. We evaluate five state-of-the-art LLMs on questions translated into English and German. We compare answers generated with and without contextual augmentation, where the LLMs access pre-selected research papers via Retrieval Augmented Generation (RAG). We are using two RAG techniques: Naïve RAG and Advanced RAG. <b>Results</b>: A user study and LLM-based evaluation assess answer quality across different metrics. Results show that Advanced RAG techniques considerably enhance LLM performance. Among the models, the best-performing variants are CLAUDE 3 OPUS and GPT-4O. These models consistently achieve the highest scores, indicating robust integration and utilization of contextual information. The most notable improvements between Naive RAG and Advanced RAG are observed in the GEMINI 1.5 and COMMAND R+ variants. <b>Conclusion</b>: This study demonstrates that contextual augmentation addresses the complexities inherent in theranostics. Despite promising results, key limitations include the biased selection of questions focusing primarily on PRRT, the need for comprehensive context documents. Future research should include a broader range of theranostics questions, explore additional RAG methods and aim to compare human and LLM evaluations more directly to enhance LLM performance further.</p>","PeriodicalId":22932,"journal":{"name":"Theranostics","volume":"15 12","pages":"5693-5704"},"PeriodicalIF":13.3000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068303/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimizing theranostics chatbots with context-augmented large language models.\",\"authors\":\"Pia Koller, Christoph Clement, Albert van Eijk, Robert Seifert, Jingjing Zhang, George Prenosil, Mike M Sathekge, Ken Herrmann, Richard Baum, Wolfgang A Weber, Axel Rominger, Kuangyu Shi\",\"doi\":\"10.7150/thno.107757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Introduction</b>: Nuclear medicine theranostics is rapidly emerging, as an interdisciplinary therapy option with multi-dimensional considerations. Healthcare Professionals do not have the time to do in depth research on every therapy option. Personalized Chatbots might help to educate them. Chatbots using Large Language Models (LLMs), such as ChatGPT, are gaining interest addressing these challenges. However, chatbot performances often fall short in specific domains, which is critical in healthcare applications. <b>Methods</b>: This study develops a framework to examine the use of contextual augmentation to improve the performance of medical theranostic chatbots to create the first theranostic chatbot. Contextual augmentation involves providing additional relevant information to LLMs to improve their responses. We evaluate five state-of-the-art LLMs on questions translated into English and German. We compare answers generated with and without contextual augmentation, where the LLMs access pre-selected research papers via Retrieval Augmented Generation (RAG). We are using two RAG techniques: Naïve RAG and Advanced RAG. <b>Results</b>: A user study and LLM-based evaluation assess answer quality across different metrics. Results show that Advanced RAG techniques considerably enhance LLM performance. Among the models, the best-performing variants are CLAUDE 3 OPUS and GPT-4O. These models consistently achieve the highest scores, indicating robust integration and utilization of contextual information. The most notable improvements between Naive RAG and Advanced RAG are observed in the GEMINI 1.5 and COMMAND R+ variants. <b>Conclusion</b>: This study demonstrates that contextual augmentation addresses the complexities inherent in theranostics. Despite promising results, key limitations include the biased selection of questions focusing primarily on PRRT, the need for comprehensive context documents. Future research should include a broader range of theranostics questions, explore additional RAG methods and aim to compare human and LLM evaluations more directly to enhance LLM performance further.</p>\",\"PeriodicalId\":22932,\"journal\":{\"name\":\"Theranostics\",\"volume\":\"15 12\",\"pages\":\"5693-5704\"},\"PeriodicalIF\":13.3000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068303/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theranostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.7150/thno.107757\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theranostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7150/thno.107757","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

简介:核医学治疗作为一种跨学科的治疗选择,具有多方面的考虑,正在迅速兴起。医疗保健专业人员没有时间对每个治疗方案进行深入研究。个性化聊天机器人可能有助于教育他们。使用大型语言模型(llm)的聊天机器人,如ChatGPT,正在对解决这些挑战产生兴趣。然而,聊天机器人在特定领域的性能往往不足,这在医疗保健应用中至关重要。方法:本研究开发了一个框架来检查使用上下文增强来提高医疗治疗聊天机器人的性能,以创建第一个治疗聊天机器人。上下文增强包括向法学硕士提供额外的相关信息,以改善他们的反应。我们评估五位最先进的法学硕士的问题翻译成英语和德语。我们比较了有上下文增强和没有上下文增强生成的答案,其中llm通过检索增强生成(RAG)访问预先选择的研究论文。我们使用两种RAG技术:Naïve RAG和Advanced RAG。结果:用户研究和基于法学硕士的评估评估了不同指标的答案质量。结果表明,先进的RAG技术显著提高了LLM的性能。在这些型号中,性能最好的型号是CLAUDE 3 OPUS和gpt - 40。这些模型始终获得最高分,表明了上下文信息的健壮集成和利用。朴素RAG和高级RAG之间最显著的改进是在GEMINI 1.5和COMMAND R+变体中看到的。结论:本研究表明,情境增强解决了治疗学中固有的复杂性。尽管取得了令人鼓舞的结果,但主要的局限性包括主要侧重于PRRT的问题有偏见的选择,需要全面的背景文件。未来的研究应该包括更广泛的治疗学问题,探索更多的RAG方法,并旨在更直接地比较人类和LLM的评估,以进一步提高LLM的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing theranostics chatbots with context-augmented large language models.

Introduction: Nuclear medicine theranostics is rapidly emerging, as an interdisciplinary therapy option with multi-dimensional considerations. Healthcare Professionals do not have the time to do in depth research on every therapy option. Personalized Chatbots might help to educate them. Chatbots using Large Language Models (LLMs), such as ChatGPT, are gaining interest addressing these challenges. However, chatbot performances often fall short in specific domains, which is critical in healthcare applications. Methods: This study develops a framework to examine the use of contextual augmentation to improve the performance of medical theranostic chatbots to create the first theranostic chatbot. Contextual augmentation involves providing additional relevant information to LLMs to improve their responses. We evaluate five state-of-the-art LLMs on questions translated into English and German. We compare answers generated with and without contextual augmentation, where the LLMs access pre-selected research papers via Retrieval Augmented Generation (RAG). We are using two RAG techniques: Naïve RAG and Advanced RAG. Results: A user study and LLM-based evaluation assess answer quality across different metrics. Results show that Advanced RAG techniques considerably enhance LLM performance. Among the models, the best-performing variants are CLAUDE 3 OPUS and GPT-4O. These models consistently achieve the highest scores, indicating robust integration and utilization of contextual information. The most notable improvements between Naive RAG and Advanced RAG are observed in the GEMINI 1.5 and COMMAND R+ variants. Conclusion: This study demonstrates that contextual augmentation addresses the complexities inherent in theranostics. Despite promising results, key limitations include the biased selection of questions focusing primarily on PRRT, the need for comprehensive context documents. Future research should include a broader range of theranostics questions, explore additional RAG methods and aim to compare human and LLM evaluations more directly to enhance LLM performance further.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Theranostics
Theranostics MEDICINE, RESEARCH & EXPERIMENTAL-
CiteScore
25.40
自引率
1.60%
发文量
433
审稿时长
1 months
期刊介绍: Theranostics serves as a pivotal platform for the exchange of clinical and scientific insights within the diagnostic and therapeutic molecular and nanomedicine community, along with allied professions engaged in integrating molecular imaging and therapy. As a multidisciplinary journal, Theranostics showcases innovative research articles spanning fields such as in vitro diagnostics and prognostics, in vivo molecular imaging, molecular therapeutics, image-guided therapy, biosensor technology, nanobiosensors, bioelectronics, system biology, translational medicine, point-of-care applications, and personalized medicine. Encouraging a broad spectrum of biomedical research with potential theranostic applications, the journal rigorously peer-reviews primary research, alongside publishing reviews, news, and commentary that aim to bridge the gap between the laboratory, clinic, and biotechnology industries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信