{"title":"段落检索与伪题回答的文档检索","authors":"C. Clarke, E. Terra","doi":"10.1145/860435.860534","DOIUrl":null,"url":null,"abstract":"Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":"{\"title\":\"Passage retrieval vs. document retrieval for factoid question answering\",\"authors\":\"C. Clarke, E. Terra\",\"doi\":\"10.1145/860435.860534\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine\",\"PeriodicalId\":209809,\"journal\":{\"name\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"54\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/860435.860534\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Passage retrieval vs. document retrieval for factoid question answering
Question answering (QA) systems often contain an information retrieval subsystem that identifies documents or passages where the answer to a question might appear [1–3, 5, 6, 10]. The QA system generates queries from the questions and submits them to the IR subsystem. The IR subsystem returns the top-ranked documents or passages, and the QA system selects the answers from them. In many QA systems, the IR component retrieves entire documents. Then, in a post-retrieval step, the system scans the retrieved documents and locates groups of sentences that contain most or all of the question keywords [3,10, and others]. These sentences are subjected to further analysis to select the answer. In other QA systems, a passage-retrieval technique is employed to directly identify locations within the document collection where the answer might be found, avoiding the post-retrieval step [1, 2, 5, 6, and others]. In this context, a “relevant” document or passage is one that contains an answer. We utilize this notion of relevance to evaluate an IR subsystem in isolation from the rest of its QA system by applying standard measures of IR effectiveness. By restricting our evaluation to a single subsystem we hope to gain experience that is applicable to QA systems beyond our own. An assumption inherent in this approach is that improved precision in the IR subsystem will translate to improved performance of the QA system as a whole. This assumption holds for our own system, and should (at least) hold for any system that exploits redundancy—that takes advantage of the observation that answers tend to occur in more than one retrieved passage [1, 2, 5]. In this paper we compare a successful passage-retrieval method [1, 5] with a well-known and effective documentretrieval method: Okapi BM25 [7]. Our goal is to examine