LeanContext: Cost-efficient domain-specific question answering using LLMs

Natural Language Processing Journal Pub Date : 2024-03-18 DOI:10.1016/j.nlp.2024.100065

Md Adnan Arefeen , Biplob Debnath , Srimat Chakradhar

{"title":"LeanContext: Cost-efficient domain-specific question answering using LLMs","authors":"Md Adnan Arefeen , Biplob Debnath , Srimat Chakradhar","doi":"10.1016/j.nlp.2024.100065","DOIUrl":null,"url":null,"abstract":"<div><p>Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. Extracting context from domain-specific data is implemented by a Retrieval Augmented Generation (RAG) approach. One option is to summarize the RAG context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts <em>k</em> key sentences from the context that are closely aligned with the query. The choice of <em>k</em> is neither static nor random; we introduce a reinforcement learning technique that dynamically determines <em>k</em> based on the query and context. The rest of the less important sentences are either reduced using a free open-source text reduction method or eliminated. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles, NarrativeQA). Despite cost reductions of 37.29% to 67.81%, LeanContext’s ROUGE-1 score decreases only by 1.41% to 2.65% compared to a baseline that retains the entire context (no summarization). LeanContext stands out for its ability to provide precise responses, outperforming competitors by leveraging open-source summarization techniques. Human evaluations of the responses further confirm and validate this superiority. Additionally, if open-source pre-trained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by 13.22% to 24.61%.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"7 ","pages":"Article 100065"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294971912400013X/pdfft?md5=635c034287e104fec6128cc735fdc367&pid=1-s2.0-S294971912400013X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294971912400013X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. Extracting context from domain-specific data is implemented by a Retrieval Augmented Generation (RAG) approach. One option is to summarize the RAG context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts k key sentences from the context that are closely aligned with the query. The choice of k is neither static nor random; we introduce a reinforcement learning technique that dynamically determines k based on the query and context. The rest of the less important sentences are either reduced using a free open-source text reduction method or eliminated. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles, NarrativeQA). Despite cost reductions of 37.29% to 67.81%, LeanContext’s ROUGE-1 score decreases only by 1.41% to 2.65% compared to a baseline that retains the entire context (no summarization). LeanContext stands out for its ability to provide precise responses, outperforming competitors by leveraging open-source summarization techniques. Human evaluations of the responses further confirm and validate this superiority. Additionally, if open-source pre-trained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by 13.22% to 24.61%.

查看原文本刊更多论文

LeanContext：使用 LLM 进行具有成本效益的特定领域问题解答

问题解答（QA）是大型语言模型（LLM）的一项重要应用，它塑造了聊天机器人在医疗保健、教育和客户服务方面的能力。然而，由于使用 LLM API 的费用较高，广泛集成 LLM 对小型企业来说是一项挑战。当特定领域数据（上下文）与查询一起用于准确的特定领域 LLM 响应时，成本会迅速上升。从特定领域数据中提取上下文是通过检索增强生成（RAG）方法实现的。一种方法是使用 LLM 总结 RAG 上下文并减少上下文。然而，这也会过滤掉回答某些特定领域查询所需的有用信息。在本文中，我们从面向人类的摘要器转向人工智能模型友好型摘要。我们的方法--LeanContext--能有效地从上下文中提取与查询密切相关的 k 个关键句。k 的选择既不是静态的，也不是随机的；我们引入了一种强化学习技术，可根据查询和上下文动态地确定 k。其余不太重要的句子要么使用免费开源文本缩减方法进行缩减，要么被删除。我们在著名的数据集（arxiv 论文和 BBC 新闻文章、NarrativeQA）上对 LeanContext 与最近几种查询感知和查询非感知上下文缩减方法进行了评估。尽管成本降低了 37.29% 到 67.81%，但与保留整个上下文（无摘要）的基线相比，LeanContext 的 ROUGE-1 分数仅降低了 1.41% 到 2.65%。LeanContext 的突出特点是能够提供精确的回复，利用开源摘要技术超越了竞争对手。对回复的人工评估进一步确认和验证了这一优势。此外，如果使用基于 LLM 的开源预训练摘要器来缩减上下文（转化为人类可使用的摘要），LeanContext 还能进一步修改缩减后的上下文，从而将准确率（ROUGE-1 分数）提高 13.22% 到 24.61%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量