{"title":"An Evaluation Framework for Attributed Information Retrieval using Large Language Models","authors":"Hanane Djeddal, Pierre Erbacher, Raouf Toukal, Laure Soulier, Karen Pinel-Sauvagnat, Sophia Katrenko, Lynda Tamine","doi":"arxiv-2409.08014","DOIUrl":null,"url":null,"abstract":"With the growing success of Large Language models (LLMs) in\ninformation-seeking scenarios, search engines are now adopting generative\napproaches to provide answers along with in-line citations as attribution.\nWhile existing work focuses mainly on attributed question answering, in this\npaper, we target information-seeking scenarios which are often more challenging\ndue to the open-ended nature of the queries and the size of the label space in\nterms of the diversity of candidate-attributed answers per query. We propose a\nreproducible framework to evaluate and benchmark attributed information\nseeking, using any backbone LLM, and different architectural designs: (1)\nGenerate (2) Retrieve then Generate, and (3) Generate then Retrieve.\nExperiments using HAGRID, an attributed information-seeking dataset, show the\nimpact of different scenarios on both the correctness and attributability of\nanswers.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the growing success of Large Language models (LLMs) in
information-seeking scenarios, search engines are now adopting generative
approaches to provide answers along with in-line citations as attribution.
While existing work focuses mainly on attributed question answering, in this
paper, we target information-seeking scenarios which are often more challenging
due to the open-ended nature of the queries and the size of the label space in
terms of the diversity of candidate-attributed answers per query. We propose a
reproducible framework to evaluate and benchmark attributed information
seeking, using any backbone LLM, and different architectural designs: (1)
Generate (2) Retrieve then Generate, and (3) Generate then Retrieve.
Experiments using HAGRID, an attributed information-seeking dataset, show the
impact of different scenarios on both the correctness and attributability of
answers.