{"title":"Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs","authors":"Kexin Ma, Ruochun Jin, Xi Wang, Huan Chen, Jing Ren, Yuhua Tang","doi":"arxiv-2408.05524","DOIUrl":null,"url":null,"abstract":"Retrieval-Augmented Large Language Models (RALMs) have made significant\nstrides in enhancing the accuracy of generated responses.However, existing\nresearch often overlooks the data quality issues within retrieval results,\noften caused by inaccurate existing vector-distance-based retrieval methods.We\npropose to boost the precision of RALMs' answers from a data quality\nperspective through the Context-Driven Index Trimming (CDIT) framework, where\nContext Matching Dependencies (CMDs) are employed as logical data quality rules\nto capture and regulate the consistency between retrieved contexts.Based on the\nsemantic comprehension capabilities of Large Language Models (LLMs), CDIT can\neffectively identify and discard retrieval results that are inconsistent with\nthe query context and further modify indexes in the database, thereby improving\nanswer quality.Experiments demonstrate on challenging question-answering\ntasks.Also, the flexibility of CDIT is verified through its compatibility with\nvarious language models and indexing methods, which offers a promising approach\nto bolster RALMs' data quality and retrieval precision jointly.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Retrieval-Augmented Large Language Models (RALMs) have made significant
strides in enhancing the accuracy of generated responses.However, existing
research often overlooks the data quality issues within retrieval results,
often caused by inaccurate existing vector-distance-based retrieval methods.We
propose to boost the precision of RALMs' answers from a data quality
perspective through the Context-Driven Index Trimming (CDIT) framework, where
Context Matching Dependencies (CMDs) are employed as logical data quality rules
to capture and regulate the consistency between retrieved contexts.Based on the
semantic comprehension capabilities of Large Language Models (LLMs), CDIT can
effectively identify and discard retrieval results that are inconsistent with
the query context and further modify indexes in the database, thereby improving
answer quality.Experiments demonstrate on challenging question-answering
tasks.Also, the flexibility of CDIT is verified through its compatibility with
various language models and indexing methods, which offers a promising approach
to bolster RALMs' data quality and retrieval precision jointly.