SIMD-accelerated regular expression matching

Evangelia A. Sitaridi, Orestis Polychroniou, K. A. Ross
{"title":"SIMD-accelerated regular expression matching","authors":"Evangelia A. Sitaridi, Orestis Polychroniou, K. A. Ross","doi":"10.1145/2933349.2933357","DOIUrl":null,"url":null,"abstract":"String processing tasks are common in analytical queries powering business intelligence. Besides substring matching, provided in SQL by the like operator, popular DBMSs also support regular expressions as selective filters. Substring matching can be optimized by using specialized SIMD instructions on mainstream CPUs, reaching the performance of numeric column scans. However, generic regular expressions are harder to evaluate, being dependent on both the DFA size and the irregularity of the input. Here, we optimize matching string columns against regular expressions using SIMD-vectorized code. Our approach avoids accessing the strings in lockstep without branching, to exploit cases when some strings are accepted or rejected early by looking at the first few characters. On common string lengths, our implementation is up to 2X faster than scalar code on a mainstream CPU and up to 5X faster on the Xeon Phi co-processor, improving regular expression support in DBMSs.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933349.2933357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

String processing tasks are common in analytical queries powering business intelligence. Besides substring matching, provided in SQL by the like operator, popular DBMSs also support regular expressions as selective filters. Substring matching can be optimized by using specialized SIMD instructions on mainstream CPUs, reaching the performance of numeric column scans. However, generic regular expressions are harder to evaluate, being dependent on both the DFA size and the irregularity of the input. Here, we optimize matching string columns against regular expressions using SIMD-vectorized code. Our approach avoids accessing the strings in lockstep without branching, to exploit cases when some strings are accepted or rejected early by looking at the first few characters. On common string lengths, our implementation is up to 2X faster than scalar code on a mainstream CPU and up to 5X faster on the Xeon Phi co-processor, improving regular expression support in DBMSs.
simd加速正则表达式匹配
字符串处理任务在支持业务智能的分析查询中很常见。除了SQL中由like操作符提供的子字符串匹配之外,流行的dbms还支持正则表达式作为选择性过滤器。可以通过在主流cpu上使用专门的SIMD指令来优化子字符串匹配,从而达到数字列扫描的性能。然而,通用正则表达式更难求值,这取决于DFA大小和输入的不规则性。在这里,我们使用simd矢量化代码针对正则表达式优化匹配字符串列。我们的方法避免在没有分支的情况下同步访问字符串,以便通过查看前几个字符来利用某些字符串被接受或拒绝的情况。对于常见的字符串长度,我们的实现比主流CPU上的标量代码快2倍,在Xeon Phi协处理器上快5倍,从而改进了dbms中的正则表达式支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信