{"title":"弥合差距:高资源语言和低资源语言的文档检索技术调查","authors":"Samreen Kazi , Shakeel Khoja , Ali Daud","doi":"10.1016/j.cosrev.2025.100756","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing need for efficient document retrieval in low-resource languages (LRLs), traditional retrieval methods struggle to overcome linguistic challenges such as data scarcity, morphological complexity, and orthographic variations. To address this, hybrid and neural ranking approaches have been explored, integrating statistical retrieval with transformer-based models to enhance search accuracy. Unlike high-resource languages, LRL retrieval requires specialized strategies, including cross-lingual retrieval, domain adaptation, and culturally aware search mechanisms. This article provides a comprehensive review of document retrieval in LRLs, covering classical models, deep learning-based techniques, and their adaptation to resource-constrained languages. A structured taxonomy is introduced, classifying retrieval methods based on model architectures, linguistic processing, and ranking strategies.The paper concludes by highlighting key challenges, benchmarking efforts, and future directions for improving retrieval effectiveness in LRLs.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"57 ","pages":"Article 100756"},"PeriodicalIF":13.3000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging the gap: A survey of document retrieval techniques for high-resource and low-resource languages\",\"authors\":\"Samreen Kazi , Shakeel Khoja , Ali Daud\",\"doi\":\"10.1016/j.cosrev.2025.100756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the increasing need for efficient document retrieval in low-resource languages (LRLs), traditional retrieval methods struggle to overcome linguistic challenges such as data scarcity, morphological complexity, and orthographic variations. To address this, hybrid and neural ranking approaches have been explored, integrating statistical retrieval with transformer-based models to enhance search accuracy. Unlike high-resource languages, LRL retrieval requires specialized strategies, including cross-lingual retrieval, domain adaptation, and culturally aware search mechanisms. This article provides a comprehensive review of document retrieval in LRLs, covering classical models, deep learning-based techniques, and their adaptation to resource-constrained languages. A structured taxonomy is introduced, classifying retrieval methods based on model architectures, linguistic processing, and ranking strategies.The paper concludes by highlighting key challenges, benchmarking efforts, and future directions for improving retrieval effectiveness in LRLs.</div></div>\",\"PeriodicalId\":48633,\"journal\":{\"name\":\"Computer Science Review\",\"volume\":\"57 \",\"pages\":\"Article 100756\"},\"PeriodicalIF\":13.3000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574013725000322\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000322","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Bridging the gap: A survey of document retrieval techniques for high-resource and low-resource languages
With the increasing need for efficient document retrieval in low-resource languages (LRLs), traditional retrieval methods struggle to overcome linguistic challenges such as data scarcity, morphological complexity, and orthographic variations. To address this, hybrid and neural ranking approaches have been explored, integrating statistical retrieval with transformer-based models to enhance search accuracy. Unlike high-resource languages, LRL retrieval requires specialized strategies, including cross-lingual retrieval, domain adaptation, and culturally aware search mechanisms. This article provides a comprehensive review of document retrieval in LRLs, covering classical models, deep learning-based techniques, and their adaptation to resource-constrained languages. A structured taxonomy is introduced, classifying retrieval methods based on model architectures, linguistic processing, and ranking strategies.The paper concludes by highlighting key challenges, benchmarking efforts, and future directions for improving retrieval effectiveness in LRLs.
期刊介绍:
Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.