Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI:10.1109/HPCA47549.2020.00017

Elaheh Sadredini, Reza Rahimi, Marzieh Lenjani, M. Stan, K. Skadron

{"title":"Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching","authors":"Elaheh Sadredini, Reza Rahimi, Marzieh Lenjani, M. Stan, K. Skadron","doi":"10.1109/HPCA47549.2020.00017","DOIUrl":null,"url":null,"abstract":"High-throughput and concurrent processing of thousands of patterns on each byte of an input stream is critical for many applications with real-time processing needs, such as network intrusion detection, spam filters, virus scanners, and many more. The demand for accelerated pattern matching has motivated several recent in-memory accelerator architectures for automata processing, which is an efficient computation model for pattern matching. Our key observations are: (1) all these architectures are based on 8-bit symbol processing (derived from ASCII), and our analysis on a large set of real-world automata benchmarks reveals that the 8-bit processing dramatically underutilizes hardware resources, and (2) multi-stride symbol processing, a major source of throughput growth, is not explored in the existing in-memory solutions. This paper presents Impala, a multi-stride in-memory automata processing architecture by leveraging our observations. The key insight of our work is that transforming 8-bit processing to 4-bit processing exponentially reduces hardware resources for state-matching and improves resource utilization. This, in turn, brings the opportunity to have a denser design, and be able to utilize more memory columns to process multiple symbols per cycle with a linear increase in state-matching resources. Impala thus introduces three-fold area, throughput, and energy benefits at the expense of increased offline compilation time. Our empirical evaluations on a wide range of automata benchmarks reveal that Impala has on average 2.7X (up to 3.7X) higher throughput per unit area and 1.22X lower power consumption than Cache Automaton, which is the best performing prior work.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

High-throughput and concurrent processing of thousands of patterns on each byte of an input stream is critical for many applications with real-time processing needs, such as network intrusion detection, spam filters, virus scanners, and many more. The demand for accelerated pattern matching has motivated several recent in-memory accelerator architectures for automata processing, which is an efficient computation model for pattern matching. Our key observations are: (1) all these architectures are based on 8-bit symbol processing (derived from ASCII), and our analysis on a large set of real-world automata benchmarks reveals that the 8-bit processing dramatically underutilizes hardware resources, and (2) multi-stride symbol processing, a major source of throughput growth, is not explored in the existing in-memory solutions. This paper presents Impala, a multi-stride in-memory automata processing architecture by leveraging our observations. The key insight of our work is that transforming 8-bit processing to 4-bit processing exponentially reduces hardware resources for state-matching and improves resource utilization. This, in turn, brings the opportunity to have a denser design, and be able to utilize more memory columns to process multiple symbols per cycle with a linear increase in state-matching resources. Impala thus introduces three-fold area, throughput, and energy benefits at the expense of increased offline compilation time. Our empirical evaluations on a wide range of automata benchmarks reveal that Impala has on average 2.7X (up to 3.7X) higher throughput per unit area and 1.22X lower power consumption than Cache Automaton, which is the best performing prior work.

查看原文本刊更多论文

Impala:内存中多步模式匹配的算法/架构协同设计

对输入流的每个字节进行数千个模式的高吞吐量和并发处理对于许多具有实时处理需求的应用程序(如网络入侵检测、垃圾邮件过滤器、病毒扫描程序等)至关重要。对加速模式匹配的需求促使近年来出现了一些用于自动机处理的内存加速架构，这是一种高效的模式匹配计算模型。我们的主要观察结果是:(1)所有这些架构都基于8位符号处理(源自ASCII)，我们对大量现实世界自动机基准测试的分析表明，8位处理严重低估了硬件资源，(2)多步符号处理是吞吐量增长的主要来源，在现有的内存解决方案中没有得到探索。本文利用我们的观察，提出了一种多步内存自动机处理架构Impala。我们工作的关键见解是，将8位处理转换为4位处理以指数方式减少了用于状态匹配的硬件资源，并提高了资源利用率。这反过来又带来了更密集设计的机会，并且能够利用更多的内存列每个周期处理多个符号，并且状态匹配资源呈线性增长。因此，Impala以增加离线编译时间为代价，带来了三倍的面积、吞吐量和能源优势。我们在广泛的自动机基准测试中进行的实证评估显示，与Cache Automaton相比，Impala的单位面积吞吐量平均提高2.7倍(最高3.7倍)，功耗平均降低1.22倍，是目前表现最好的车型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量