A parallel accelerator for semantic search

2011 IEEE 9th Symposium on Application Specific Processors (SASP) Pub Date : 2011-06-05 DOI:10.1109/SASP.2011.5941090

Abhinandan Majumdar, S. Cadambi, S. Chakradhar, H. Graf

{"title":"A parallel accelerator for semantic search","authors":"Abhinandan Majumdar, S. Cadambi, S. Chakradhar, H. Graf","doi":"10.1109/SASP.2011.5941090","DOIUrl":null,"url":null,"abstract":"Semantic text analysis is a technique used in advertisement placement, cognitive databases and search engines. With increasing amounts of data and stringent response-time requirements, improving the underlying implementation of semantic analysis becomes critical. To this end, we look at Supervised Semantic Indexing (SSI), a recently proposed algorithm for semantic analysis. SSI ranks a large number of documents based on their semantic similarity to a text query. For each query, it computes millions of dot products on unstructured data, generates a large intermediate result, and then performs ranking. SSI underperforms on both state-of-the-art multi-cores as well as GPUs. Its performance scalability on multi-cores is hampered by their limited support for fine-grained data parallelism. GPUs, though beat multi-cores by running thousands of threads, cannot handle large intermediate data because of their small on-chip memory. Motivated by this, we present an FPGA-based hardware accelerator for semantic analysis. As a key feature, the accelerator combines hundreds of simple processing elements together with in-memory processing to simultaneously generate and process (consume) the large intermediate data. It also supports “dynamic parallelism” - a feature that configures the PEs differently for full utilization of the available processin logic after the FPGA is programmed. Our FPGA prototype is 10–13x faster than a 2.5 GHz quad-core Xeon, and 1.5–5x faster than a 240 core 1.3 GHz Tesla GPU, despite operating at a modest frequency of 125 MHz.","PeriodicalId":375788,"journal":{"name":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2011.5941090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Semantic text analysis is a technique used in advertisement placement, cognitive databases and search engines. With increasing amounts of data and stringent response-time requirements, improving the underlying implementation of semantic analysis becomes critical. To this end, we look at Supervised Semantic Indexing (SSI), a recently proposed algorithm for semantic analysis. SSI ranks a large number of documents based on their semantic similarity to a text query. For each query, it computes millions of dot products on unstructured data, generates a large intermediate result, and then performs ranking. SSI underperforms on both state-of-the-art multi-cores as well as GPUs. Its performance scalability on multi-cores is hampered by their limited support for fine-grained data parallelism. GPUs, though beat multi-cores by running thousands of threads, cannot handle large intermediate data because of their small on-chip memory. Motivated by this, we present an FPGA-based hardware accelerator for semantic analysis. As a key feature, the accelerator combines hundreds of simple processing elements together with in-memory processing to simultaneously generate and process (consume) the large intermediate data. It also supports “dynamic parallelism” - a feature that configures the PEs differently for full utilization of the available processin logic after the FPGA is programmed. Our FPGA prototype is 10–13x faster than a 2.5 GHz quad-core Xeon, and 1.5–5x faster than a 240 core 1.3 GHz Tesla GPU, despite operating at a modest frequency of 125 MHz.

查看原文本刊更多论文

语义搜索的并行加速器

语义文本分析是一种用于广告投放、认知数据库和搜索引擎的技术。随着数据量的增加和严格的响应时间需求，改进语义分析的底层实现变得至关重要。为此，我们研究了监督语义索引(SSI)，这是最近提出的一种语义分析算法。SSI根据与文本查询的语义相似度对大量文档进行排序。对于每个查询，它在非结构化数据上计算数百万个点积，生成一个大的中间结果，然后执行排序。SSI在最先进的多核和gpu上都表现不佳。它在多核上的性能可伸缩性由于对细粒度数据并行性的有限支持而受到限制。gpu虽然通过运行数千个线程胜过多核，但由于其片上内存较小，无法处理大型中间数据。基于此，我们提出了一种基于fpga的语义分析硬件加速器。作为一个关键特性，加速器将数百个简单的处理元素与内存处理结合在一起，以同时生成和处理(消费)大型中间数据。它还支持“动态并行性”——这是一种对pe进行不同配置的特性，以便在FPGA编程后充分利用可用的处理逻辑。我们的FPGA原型比2.5 GHz四核至强处理器快10 - 13倍，比240核1.3 GHz特斯拉GPU快1.5 - 5倍，尽管工作频率为125 MHz。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 9th Symposium on Application Specific Processors (SASP)

自引率

0.00%

发文量