Shiqi Sun , Kun Zhang , Jingyuan Li , Min Yu , Kun Hou , Yuanzhuo Wang , Xueqi Cheng
{"title":"Retriever-generator-verification: A novel approach to enhancing factual coherence in open-domain question answering","authors":"Shiqi Sun , Kun Zhang , Jingyuan Li , Min Yu , Kun Hou , Yuanzhuo Wang , Xueqi Cheng","doi":"10.1016/j.ipm.2025.104147","DOIUrl":null,"url":null,"abstract":"<div><div>In recent research on open-domain question answering (ODQA), significant advances have been achieved by merging document retrieval techniques with large language models (LLMs) to answer questions. However, current ODQA methods present two challenges: the introduction of noise during retrieval and granularity errors during generation. To address these challenges, we propose the Retriever-Generator-Verification (RGV) framework, which consists of the Evidence Document Generator (EDG), the Candidate Entities Generator (CEG), and the Candidate Subgraphs Validator and Ranker (CSVR). EDG combines retrieval and generative mechanisms to construct comprehensive reference documents, ensuring broad coverage of potential answers. CEG then extracts and expands multi-dimensional candidate answer entities from these reference documents, capturing finer-grained information. Finally, CSVR verifies the candidate subgraphs against external knowledge sources and ranks them based on relevance, refining the final answers to enhance their accuracy and reliability. By systematically integrating these components, the RGV framework improves the completeness of retrieved information while effectively mitigating noise during retrieval and granularity errors during generation, thereby enhancing the overall reliability of ODQA. We assessed the efficacy of our method on three widely used datasets, and the experimental results demonstrate that our method exhibits competitive performance in benchmark tests. Compared to the state-of-the-art method, our approach achieves a 2.3% improvement in F1 score on the WebQSP dataset and a 1.3% increase in Hits@1 on the CWQ dataset.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104147"},"PeriodicalIF":7.4000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000883","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In recent research on open-domain question answering (ODQA), significant advances have been achieved by merging document retrieval techniques with large language models (LLMs) to answer questions. However, current ODQA methods present two challenges: the introduction of noise during retrieval and granularity errors during generation. To address these challenges, we propose the Retriever-Generator-Verification (RGV) framework, which consists of the Evidence Document Generator (EDG), the Candidate Entities Generator (CEG), and the Candidate Subgraphs Validator and Ranker (CSVR). EDG combines retrieval and generative mechanisms to construct comprehensive reference documents, ensuring broad coverage of potential answers. CEG then extracts and expands multi-dimensional candidate answer entities from these reference documents, capturing finer-grained information. Finally, CSVR verifies the candidate subgraphs against external knowledge sources and ranks them based on relevance, refining the final answers to enhance their accuracy and reliability. By systematically integrating these components, the RGV framework improves the completeness of retrieved information while effectively mitigating noise during retrieval and granularity errors during generation, thereby enhancing the overall reliability of ODQA. We assessed the efficacy of our method on three widely used datasets, and the experimental results demonstrate that our method exhibits competitive performance in benchmark tests. Compared to the state-of-the-art method, our approach achieves a 2.3% improvement in F1 score on the WebQSP dataset and a 1.3% increase in Hits@1 on the CWQ dataset.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.