基于GPU的ho- corasick算法的高吞吐量并行实现

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI:10.1109/IPDPSW.2013.116

Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi

{"title":"基于GPU的ho- corasick算法的高吞吐量并行实现","authors":"Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi","doi":"10.1109/IPDPSW.2013.116","DOIUrl":null,"url":null,"abstract":"Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU\",\"authors\":\"Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi\",\"doi\":\"10.1109/IPDPSW.2013.116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.\",\"PeriodicalId\":234552,\"journal\":{\"name\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2013.116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

模式匹配在计算机和网络安全、生物信息学、图像处理等许多应用中都是一项重要的操作。Aho-Corasick (AC)算法是一种多模式匹配算法。为了满足这些应用的高性能要求，实现AC算法的高性能是至关重要的。在本文中，我们提出了一种在图形处理单元(GPU)上的高性能并行实现AC算法，该算法有效地利用了GPU的高度片上并行性和内存层次结构，从而使GPU的总性能(或吞吐量)最大化。为此，我们的方法小心地将输入文本数据和用于模式匹配的参考模式数据放置和缓存在片上共享内存和GPU的纹理缓存中。此外，它有效地调度片外全局内存负载和共享内存存储，以尽量减少将输入数据加载到共享内存中的开销，并尽量减少共享内存库冲突。所建议的方法导致有效内存访问延迟的显著减少，并导致令人印象深刻的性能改进。在Nvidia GeForce GTX 285 GPU上的实验结果表明，与在2.2Ghz Core2Duo Intel处理器上运行的串行版本相比，我们的方法提供了高达127Gbps的吞吐量性能和高达222倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU

Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

自引率

0.00%

发文量