{"title":"使用大型语言模型进行归属式信息检索的评估框架","authors":"Hanane Djeddal, Pierre Erbacher, Raouf Toukal, Laure Soulier, Karen Pinel-Sauvagnat, Sophia Katrenko, Lynda Tamine","doi":"arxiv-2409.08014","DOIUrl":null,"url":null,"abstract":"With the growing success of Large Language models (LLMs) in\ninformation-seeking scenarios, search engines are now adopting generative\napproaches to provide answers along with in-line citations as attribution.\nWhile existing work focuses mainly on attributed question answering, in this\npaper, we target information-seeking scenarios which are often more challenging\ndue to the open-ended nature of the queries and the size of the label space in\nterms of the diversity of candidate-attributed answers per query. We propose a\nreproducible framework to evaluate and benchmark attributed information\nseeking, using any backbone LLM, and different architectural designs: (1)\nGenerate (2) Retrieve then Generate, and (3) Generate then Retrieve.\nExperiments using HAGRID, an attributed information-seeking dataset, show the\nimpact of different scenarios on both the correctness and attributability of\nanswers.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Evaluation Framework for Attributed Information Retrieval using Large Language Models\",\"authors\":\"Hanane Djeddal, Pierre Erbacher, Raouf Toukal, Laure Soulier, Karen Pinel-Sauvagnat, Sophia Katrenko, Lynda Tamine\",\"doi\":\"arxiv-2409.08014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growing success of Large Language models (LLMs) in\\ninformation-seeking scenarios, search engines are now adopting generative\\napproaches to provide answers along with in-line citations as attribution.\\nWhile existing work focuses mainly on attributed question answering, in this\\npaper, we target information-seeking scenarios which are often more challenging\\ndue to the open-ended nature of the queries and the size of the label space in\\nterms of the diversity of candidate-attributed answers per query. We propose a\\nreproducible framework to evaluate and benchmark attributed information\\nseeking, using any backbone LLM, and different architectural designs: (1)\\nGenerate (2) Retrieve then Generate, and (3) Generate then Retrieve.\\nExperiments using HAGRID, an attributed information-seeking dataset, show the\\nimpact of different scenarios on both the correctness and attributability of\\nanswers.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evaluation Framework for Attributed Information Retrieval using Large Language Models
With the growing success of Large Language models (LLMs) in
information-seeking scenarios, search engines are now adopting generative
approaches to provide answers along with in-line citations as attribution.
While existing work focuses mainly on attributed question answering, in this
paper, we target information-seeking scenarios which are often more challenging
due to the open-ended nature of the queries and the size of the label space in
terms of the diversity of candidate-attributed answers per query. We propose a
reproducible framework to evaluate and benchmark attributed information
seeking, using any backbone LLM, and different architectural designs: (1)
Generate (2) Retrieve then Generate, and (3) Generate then Retrieve.
Experiments using HAGRID, an attributed information-seeking dataset, show the
impact of different scenarios on both the correctness and attributability of
answers.