基于GPU的ho- corasick算法的高吞吐量并行实现

Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi
{"title":"基于GPU的ho- corasick算法的高吞吐量并行实现","authors":"Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi","doi":"10.1109/IPDPSW.2013.116","DOIUrl":null,"url":null,"abstract":"Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU\",\"authors\":\"Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi\",\"doi\":\"10.1109/IPDPSW.2013.116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.\",\"PeriodicalId\":234552,\"journal\":{\"name\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2013.116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

模式匹配在计算机和网络安全、生物信息学、图像处理等许多应用中都是一项重要的操作。Aho-Corasick (AC)算法是一种多模式匹配算法。为了满足这些应用的高性能要求,实现AC算法的高性能是至关重要的。在本文中,我们提出了一种在图形处理单元(GPU)上的高性能并行实现AC算法,该算法有效地利用了GPU的高度片上并行性和内存层次结构,从而使GPU的总性能(或吞吐量)最大化。为此,我们的方法小心地将输入文本数据和用于模式匹配的参考模式数据放置和缓存在片上共享内存和GPU的纹理缓存中。此外,它有效地调度片外全局内存负载和共享内存存储,以尽量减少将输入数据加载到共享内存中的开销,并尽量减少共享内存库冲突。所建议的方法导致有效内存访问延迟的显著减少,并导致令人印象深刻的性能改进。在Nvidia GeForce GTX 285 GPU上的实验结果表明,与在2.2Ghz Core2Duo Intel处理器上运行的串行版本相比,我们的方法提供了高达127Gbps的吞吐量性能和高达222倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU
Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信