Hardware Accelerator for Full-Text Search (HAFTS) with Succinct Data Structure

N. Tanida, M. Inaba, K. Hiraki, Takeshi Yoshino
{"title":"Hardware Accelerator for Full-Text Search (HAFTS) with Succinct Data Structure","authors":"N. Tanida, M. Inaba, K. Hiraki, Takeshi Yoshino","doi":"10.1109/ReConFig.2009.74","DOIUrl":null,"url":null,"abstract":"Efficient utilization of massive data, such as full-text search has become important in view of the growing needs for Web search and genome analysis. High-speed search and small storage space are required to handle massive amounts of data. For high-speed search, generally, a data structure such as index which needs additional storage space is required. Recently, compressed suffix array, which is a data structure with an indexable dictionary that can be used to compress data to its information-theoretic lower bound, has been proposed. The distinctive feature of this array is that it enables direct data retrieval without decompression from the compressed data. Further, theoretically, the computational complexity of data retrieval is the same for both compressed and uncompressed data when we assume that rank operation involving the bit vector can be executed in constant time; this rank operation returns the number of occurrences of smaller elements. Practically, rank operation involves many bit-manipulations and random access to the memory. Hence, this constant time is not negligible, and as a result, data retrieval using compressed suffix array is relatively slower than that using plain suffix array. Although compression to create an indexable dictionary is performed only once, data retrieval queries occur repeatedly. Hence, high speed rank operations involving bit vectors are essential for compressed suffix arrays. We propose a FPGA-based hardware accelerator for full-text search (HAFTS) with compressed suffix array. FPGA helps speedup rank operation for compressed suffix array by enabling many bit calculations performed simultaneously and controlling the order of memory accesses. We conduct performance simulations of HAFTS. We consider a development board on which FPGA is connected to DDR2-800 SDRAM by a 64-bit bus as our model. We evaluate the performance of HAFTS by comparing it with that of software implementation. As a result, we conclude that the search speed of FPGA-based HAFTS is seven times faster than that of software implementation.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Reconfigurable Computing and FPGAs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2009.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Efficient utilization of massive data, such as full-text search has become important in view of the growing needs for Web search and genome analysis. High-speed search and small storage space are required to handle massive amounts of data. For high-speed search, generally, a data structure such as index which needs additional storage space is required. Recently, compressed suffix array, which is a data structure with an indexable dictionary that can be used to compress data to its information-theoretic lower bound, has been proposed. The distinctive feature of this array is that it enables direct data retrieval without decompression from the compressed data. Further, theoretically, the computational complexity of data retrieval is the same for both compressed and uncompressed data when we assume that rank operation involving the bit vector can be executed in constant time; this rank operation returns the number of occurrences of smaller elements. Practically, rank operation involves many bit-manipulations and random access to the memory. Hence, this constant time is not negligible, and as a result, data retrieval using compressed suffix array is relatively slower than that using plain suffix array. Although compression to create an indexable dictionary is performed only once, data retrieval queries occur repeatedly. Hence, high speed rank operations involving bit vectors are essential for compressed suffix arrays. We propose a FPGA-based hardware accelerator for full-text search (HAFTS) with compressed suffix array. FPGA helps speedup rank operation for compressed suffix array by enabling many bit calculations performed simultaneously and controlling the order of memory accesses. We conduct performance simulations of HAFTS. We consider a development board on which FPGA is connected to DDR2-800 SDRAM by a 64-bit bus as our model. We evaluate the performance of HAFTS by comparing it with that of software implementation. As a result, we conclude that the search speed of FPGA-based HAFTS is seven times faster than that of software implementation.
具有简洁数据结构的全文检索(HAFTS)硬件加速器
鉴于网络搜索和基因组分析的需求日益增长,高效利用海量数据(如全文检索)变得非常重要。处理海量数据需要高速搜索和小存储空间。对于高速搜索,通常需要一个需要额外存储空间的数据结构,如索引。压缩后缀数组是一种具有可索引字典的数据结构,可以将数据压缩到其信息论下界。该数组的独特之处在于,它支持直接数据检索,而无需对压缩数据进行解压缩。此外,理论上,当我们假设涉及位向量的秩运算可以在恒定时间内执行时,压缩和未压缩数据的数据检索的计算复杂度是相同的;这个排序操作返回较小元素出现的次数。实际上,排序操作涉及许多位操作和对内存的随机访问。因此,这个常数时间是不可忽略的,因此,使用压缩后缀数组的数据检索比使用普通后缀数组的数据检索要慢。虽然创建可索引字典的压缩只执行一次,但数据检索查询会重复执行。因此,涉及位向量的高速排序操作对于压缩后缀数组是必不可少的。提出了一种基于fpga的基于压缩后缀数组的全文检索硬件加速器。FPGA通过同时执行多位计算和控制内存访问顺序,加快了压缩后缀数组的秩运算速度。我们对HAFTS进行了性能模拟。我们考虑将FPGA通过64位总线连接到DDR2-800 SDRAM的开发板作为我们的模型。我们通过与软件实现的性能进行比较来评估HAFTS的性能。结果表明,基于fpga的HAFTS的搜索速度比软件实现的搜索速度快7倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信