Bridging the gap: A survey of document retrieval techniques for high-resource and low-resource languages

IF 13.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Samreen Kazi , Shakeel Khoja , Ali Daud
{"title":"Bridging the gap: A survey of document retrieval techniques for high-resource and low-resource languages","authors":"Samreen Kazi ,&nbsp;Shakeel Khoja ,&nbsp;Ali Daud","doi":"10.1016/j.cosrev.2025.100756","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing need for efficient document retrieval in low-resource languages (LRLs), traditional retrieval methods struggle to overcome linguistic challenges such as data scarcity, morphological complexity, and orthographic variations. To address this, hybrid and neural ranking approaches have been explored, integrating statistical retrieval with transformer-based models to enhance search accuracy. Unlike high-resource languages, LRL retrieval requires specialized strategies, including cross-lingual retrieval, domain adaptation, and culturally aware search mechanisms. This article provides a comprehensive review of document retrieval in LRLs, covering classical models, deep learning-based techniques, and their adaptation to resource-constrained languages. A structured taxonomy is introduced, classifying retrieval methods based on model architectures, linguistic processing, and ranking strategies.The paper concludes by highlighting key challenges, benchmarking efforts, and future directions for improving retrieval effectiveness in LRLs.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"57 ","pages":"Article 100756"},"PeriodicalIF":13.3000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000322","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

With the increasing need for efficient document retrieval in low-resource languages (LRLs), traditional retrieval methods struggle to overcome linguistic challenges such as data scarcity, morphological complexity, and orthographic variations. To address this, hybrid and neural ranking approaches have been explored, integrating statistical retrieval with transformer-based models to enhance search accuracy. Unlike high-resource languages, LRL retrieval requires specialized strategies, including cross-lingual retrieval, domain adaptation, and culturally aware search mechanisms. This article provides a comprehensive review of document retrieval in LRLs, covering classical models, deep learning-based techniques, and their adaptation to resource-constrained languages. A structured taxonomy is introduced, classifying retrieval methods based on model architectures, linguistic processing, and ranking strategies.The paper concludes by highlighting key challenges, benchmarking efforts, and future directions for improving retrieval effectiveness in LRLs.
弥合差距:高资源语言和低资源语言的文档检索技术调查
随着对低资源语言(LRLs)高效文档检索的需求日益增加,传统的检索方法难以克服诸如数据稀缺性、形态复杂性和正字法变化等语言挑战。为了解决这个问题,已经探索了混合和神经排序方法,将统计检索与基于变压器的模型相结合,以提高搜索精度。与资源丰富的语言不同,LRL检索需要专门的策略,包括跨语言检索、领域适应和具有文化意识的搜索机制。本文全面回顾了LRLs中的文档检索,包括经典模型、基于深度学习的技术以及它们对资源受限语言的适应。介绍了一种结构化分类法,基于模型体系结构、语言处理和排序策略对检索方法进行分类。文章最后强调了在LRLs中提高检索效率的主要挑战、基准工作和未来方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Science Review
Computer Science Review Computer Science-General Computer Science
CiteScore
32.70
自引率
0.00%
发文量
26
审稿时长
51 days
期刊介绍: Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信