Propeller: A Scalable Real-Time File-Search Service in Distributed Systems

Lei Xu, Hong Jiang, Lei Tian, Ziling Huang
{"title":"Propeller: A Scalable Real-Time File-Search Service in Distributed Systems","authors":"Lei Xu, Hong Jiang, Lei Tian, Ziling Huang","doi":"10.1109/ICDCS.2014.46","DOIUrl":null,"url":null,"abstract":"File-search service is a valuable facility to accelerate many analytics applications, because it can drastically reduce the scale of the input data. The main challenge facing the design of large-scale and accurate file-search services is how to support real-time indexing in an efficient and scalable way. To address this challenge, we propose a distributed file-search service, called Propeller, which utilizes a special file-access pattern, called access-causality, to partition file-indices in order to expose substantial access locality and parallelism to accelerate the file-indexing process. The extensive evaluations of Propeller show that it is real-time in file-indexing operations, accurate in file-search results, and scalable in large datasets. It achieves significantly better file-indexing and file-search performance (up to 250x) than a centralized solution (MySQL) and much higher accuracy and substantially lower query latency (up to 22x than a state-of-the-art desktop search engine (Spotlight).","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 34th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2014.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

File-search service is a valuable facility to accelerate many analytics applications, because it can drastically reduce the scale of the input data. The main challenge facing the design of large-scale and accurate file-search services is how to support real-time indexing in an efficient and scalable way. To address this challenge, we propose a distributed file-search service, called Propeller, which utilizes a special file-access pattern, called access-causality, to partition file-indices in order to expose substantial access locality and parallelism to accelerate the file-indexing process. The extensive evaluations of Propeller show that it is real-time in file-indexing operations, accurate in file-search results, and scalable in large datasets. It achieves significantly better file-indexing and file-search performance (up to 250x) than a centralized solution (MySQL) and much higher accuracy and substantially lower query latency (up to 22x than a state-of-the-art desktop search engine (Spotlight).
螺旋桨:分布式系统中可扩展的实时文件搜索服务
文件搜索服务是加速许多分析应用程序的一个有价值的工具,因为它可以大大减少输入数据的规模。设计大规模和精确的文件搜索服务面临的主要挑战是如何以有效和可扩展的方式支持实时索引。为了应对这一挑战,我们提出了一种名为Propeller的分布式文件搜索服务,它利用一种特殊的文件访问模式(称为访问因果关系)对文件索引进行分区,以暴露大量的访问局部性和并行性,从而加速文件索引过程。对Propeller的广泛评估表明,它在文件索引操作中是实时的,在文件搜索结果中是准确的,并且在大型数据集中是可扩展的。它实现了比集中式解决方案(MySQL)更好的文件索引和文件搜索性能(高达250倍),并且实现了更高的准确性和更低的查询延迟(比最先进的桌面搜索引擎(Spotlight)低22倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信