GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00053

Yuguang Wang, Robbie Watling, Junqiao Qiu, Zhenlin Wang

{"title":"GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs","authors":"Yuguang Wang, Robbie Watling, Junqiao Qiu, Zhenlin Wang","doi":"10.1109/ipdps53621.2022.00053","DOIUrl":null,"url":null,"abstract":"Finite State Machine (FSM) plays a critical role in many real-world applications, ranging from pattern matching to network security. In recent years, significant research efforts have been made to accelerate FSM computations on different parallel platforms, including multicores, GPUs, and DRAM-based accelerators. A popular direction is the speculation-centric parallelization. Despite their abundance and promising results, the benefits of speculation-centric FSM parallelization on GPUs heavily depend on high speculation accuracy and are greatly limited by the inefficient sequential recovery. Inspired by speculative data forwarding used in Thread Level Speculation (TLS), this work addresses the existing bottlenecks by introducing speculative recovery with two heuristics for thread scheduling, which can effectively remove redundant computations and increase the GPU thread utilization. To maximize the performance of running FSMs on GPUs, this work integrates different speculative parallelization schemes into a latency-sensitive framework, GSpecPal, along with a scheme selector which aims to automatically configure the optimal GPU-based parallelization for a given FSM. Evaluation on a set of real-world FSMs with diverse characteristics confirms the effectiveness of GSpecPal. Experimental results show that GSpecPal can obtain 7.2× speedup on average (up to 20×) over the state-of-the-art on an Nvidia GeForce RTX 3090 GPU.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Finite State Machine (FSM) plays a critical role in many real-world applications, ranging from pattern matching to network security. In recent years, significant research efforts have been made to accelerate FSM computations on different parallel platforms, including multicores, GPUs, and DRAM-based accelerators. A popular direction is the speculation-centric parallelization. Despite their abundance and promising results, the benefits of speculation-centric FSM parallelization on GPUs heavily depend on high speculation accuracy and are greatly limited by the inefficient sequential recovery. Inspired by speculative data forwarding used in Thread Level Speculation (TLS), this work addresses the existing bottlenecks by introducing speculative recovery with two heuristics for thread scheduling, which can effectively remove redundant computations and increase the GPU thread utilization. To maximize the performance of running FSMs on GPUs, this work integrates different speculative parallelization schemes into a latency-sensitive framework, GSpecPal, along with a scheme selector which aims to automatically configure the optimal GPU-based parallelization for a given FSM. Evaluation on a set of real-world FSMs with diverse characteristics confirms the effectiveness of GSpecPal. Experimental results show that GSpecPal can obtain 7.2× speedup on average (up to 20×) over the state-of-the-art on an Nvidia GeForce RTX 3090 GPU.

查看原文本刊更多论文

GSpecPal: gpu上以推测为中心的有限状态机并行化

有限状态机(FSM)在从模式匹配到网络安全的许多实际应用中起着关键作用。近年来，在不同的并行平台(包括多核、gpu和基于dram的加速器)上进行了大量的FSM计算加速研究。一个流行的方向是以推测为中心的并行化。尽管有大量的结果，但以推测为中心的FSM并行化在gpu上的好处很大程度上依赖于高推测精度，并受到低效的顺序恢复的极大限制。受线程级别推测(TLS)中推测性数据转发的启发，本文通过引入推测性恢复和两种启发式线程调度来解决现有的瓶颈问题，有效地消除了冗余计算，提高了GPU线程利用率。为了最大限度地提高在gpu上运行FSM的性能，这项工作将不同的推测并行化方案集成到一个延迟敏感框架GSpecPal中，以及一个方案选择器，该选择器旨在为给定的FSM自动配置基于gpu的最佳并行化。对一组具有不同特征的真实fsm的评估证实了GSpecPal的有效性。实验结果表明，与Nvidia GeForce RTX 3090 GPU相比，GSpecPal平均可以获得7.2倍的加速(最高可达20倍)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量