A goal-oriented document-grounded dialogue based on evidence generation

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2024-11-22 DOI:10.1016/j.datak.2024.102378

Yong Song , Hongjie Fan , Junfei Liu , Yunxin Liu , Xiaozhou Ye , Ye Ouyang

{"title":"A goal-oriented document-grounded dialogue based on evidence generation","authors":"Yong Song , Hongjie Fan , Junfei Liu , Yunxin Liu , Xiaozhou Ye , Ye Ouyang","doi":"10.1016/j.datak.2024.102378","DOIUrl":null,"url":null,"abstract":"<div><div>Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval & reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102378"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24001022","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval & reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.

查看原文本刊更多论文

基于证据生成的以目标为导向、以文件为基础的对话

面向目标的文档基础对话（DGD）用于检索特定领域的文档，协助用户进行文档内容检索、问题解答和文档管理。现有方法通常采用关键词提取和向量空间模型来理解文档内容、识别问题的意图，并根据生成模型的能力生成答案。然而，在语义理解、长文本处理和上下文理解方面仍然存在挑战。大型语言模型（LLM）的出现为上下文学习和逐步推理带来了新的能力。这些模型与检索增强生成（RAG）方法相结合，在文本理解、意图检测和语言组织方面取得了重大突破，为 DGD 研究提供了令人振奋的前景。然而，LLMs 产生的 "幻觉 "问题需要补充方法来确保其输出结果的可信度。在本文中，我们提出了一种基于使用 LLMs 生成证据的目标导向文档基础对话方法。它设计并实现了文档内容检索&；重排、微调和推理以及证据生成的方法。通过实验，将 LLMs 与向量空间模型或与关键信息匹配技术相结合的方法进行比较，发现所提方法的准确率分别提高了 21.91% 和 12.81%，全面性分别提高了 10.89% 和 69.83%，一致性分别提高了 38.98% 和 53.27%，完整性平均提高了 16.13% 和 36.97%。此外，进行的消融分析表明，证据生成方法对全面性和完整性也有显著的促进作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.