{"title":"基于词嵌入的伪相关反馈的深度平均网络在阿拉伯语文档检索中的应用","authors":"Yasir Hadi Farhan, S. Noah, M. Mohd, Jaffar Atwan","doi":"10.1633/JISTAP.2021.9.2.1","DOIUrl":null,"url":null,"abstract":"Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudo-relevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query’s elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.","PeriodicalId":37582,"journal":{"name":"Journal of Information Science Theory and Practice","volume":"111 1","pages":"1-17"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval\",\"authors\":\"Yasir Hadi Farhan, S. Noah, M. Mohd, Jaffar Atwan\",\"doi\":\"10.1633/JISTAP.2021.9.2.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudo-relevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query’s elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.\",\"PeriodicalId\":37582,\"journal\":{\"name\":\"Journal of Information Science Theory and Practice\",\"volume\":\"111 1\",\"pages\":\"1-17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science Theory and Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1633/JISTAP.2021.9.2.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science Theory and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1633/JISTAP.2021.9.2.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval
Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudo-relevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query’s elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.
期刊介绍:
The Journal of Information Science Theory and Practice (JISTaP) is an international journal that aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and management, knowledge organization, scholarly communication and bibliometrics. To foster scholarly communication among researchers and practitioners of library and information science around the globe, JISTaP offers a no-fee open access publishing venue where a team of dedicated editors, reviewers and staff members volunteer their services to ensure rapid dissemination and communication of scholarly works that make significant contributions. In a modern society, where information production and consumption grow at an astronomical rate, the science of information management, organization, and analysis is invaluable in effective utilization of information. The key objective of the journal is to foster research that can contribute to advancements and innovations in the theory and practice of information and library science so as to promote timely application of the findings from scientific investigations to everyday life. Recognizing the importance of the global perspective with understanding of region-specific issues, JISTaP encourages submissions of manuscripts that discuss global implications of regional findings as well as regional implications of global findings.