段落检索与伪题回答的文档检索

C. Clarke, E. Terra
{"title":"段落检索与伪题回答的文档检索","authors":"C. Clarke, E. Terra","doi":"10.1145/860435.860534","DOIUrl":null,"url":null,"abstract":"Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":"{\"title\":\"Passage retrieval vs. document retrieval for factoid question answering\",\"authors\":\"C. Clarke, E. Terra\",\"doi\":\"10.1145/860435.860534\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine\",\"PeriodicalId\":209809,\"journal\":{\"name\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"54\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/860435.860534\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 54

摘要

问答(QA)系统通常包含一个信息检索子系统,用于识别可能出现问题答案的文档或段落[1 - 3,5,6,10]。QA系统从问题中生成查询,并将其提交给IR子系统。IR子系统返回排名靠前的文档或段落,QA系统从中选择答案。在许多QA系统中,IR组件检索整个文档。然后,在检索后的步骤中,系统扫描检索到的文档并定位包含大部分或全部问题关键字的句子组[3,10等]。这些句子要经过进一步分析才能选出答案。在其他QA系统中,采用段落检索技术直接识别文档集合中可能找到答案的位置,避免了检索后的步骤[1,2,5,6等]。在这种情况下,“相关”文件或段落是包含答案的文件或段落。我们利用这一相关性的概念,通过应用IR有效性的标准措施,孤立于其QA系统的其余部分来评估IR子系统。通过将我们的评估限制在单个子系统上,我们希望获得适用于QA系统以外的经验。这种方法固有的一个假设是,IR子系统精度的提高将转化为QA系统整体性能的提高。这个假设适用于我们自己的系统,并且应该(至少)适用于任何利用冗余的系统——它利用了答案往往出现在多个检索通道中的观察结果[1,2,5]。在本文中,我们比较了一种成功的文献检索方法[1,5]与一种众所周知且有效的文献检索方法:Okapi BM25[7]。我们的目标是检验
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Passage retrieval vs. document retrieval for factoid question answering
Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信