正则表达式隶属性测试的二分法

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) Pub Date : 2016-11-03 DOI:10.1109/FOCS.2017.36

K. Bringmann, A. Jørgensen, Kasper Green Larsen

{"title":"正则表达式隶属性测试的二分法","authors":"K. Bringmann, A. Jørgensen, Kasper Green Larsen","doi":"10.1109/FOCS.2017.36","DOIUrl":null,"url":null,"abstract":"We study regular expression membership testing: Given a regular expression of size m and a string of size n, decide whether the string is in the language described by the regular expression. Its classic O(nm) algorithm is one of the big success stories of the 70s, which allowed pattern matching to develop into the standard tool that it is today.Many special cases of pattern matching have been studied that can be solved faster than in quadratic time. However, a systematic study of tractable cases was made possible only recently, with the first conditional lower bounds reported by Backurs and Indyk [FOCS16]. Restricted to any type of homogeneous regular expressions of depth 2 or 3, they either presented a near-linear time algorithm or a quadratic conditional lower bound, with one exception known as the Word Break problem.In this paper we complete their work as follows:• We present two almost-linear time algorithms that generalize all known almost-linear time algorithms for special cases of regular expression membership testing.• We classify all types, except for the Word Break problem, into almost-linear time or quadratic time assuming the Strong Exponential Time Hypothesis. This extends the classification from depth 2 and 3 to any constant depth.• For the Word Break problem we give an improved O(nm1/3 + m) algorithm. Surprisingly, we also prove a matching conditional lower bound for combinatorial algorithms. This establishes Word Break as the only intermediate problem.In total, we prove matching upper and lower bounds for any type of bounded-depth homogeneous regular expressions, which yields a full dichotomy for regular expression member-ship testing.","PeriodicalId":311592,"journal":{"name":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"A Dichotomy for Regular Expression Membership Testing\",\"authors\":\"K. Bringmann, A. Jørgensen, Kasper Green Larsen\",\"doi\":\"10.1109/FOCS.2017.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study regular expression membership testing: Given a regular expression of size m and a string of size n, decide whether the string is in the language described by the regular expression. Its classic O(nm) algorithm is one of the big success stories of the 70s, which allowed pattern matching to develop into the standard tool that it is today.Many special cases of pattern matching have been studied that can be solved faster than in quadratic time. However, a systematic study of tractable cases was made possible only recently, with the first conditional lower bounds reported by Backurs and Indyk [FOCS16]. Restricted to any type of homogeneous regular expressions of depth 2 or 3, they either presented a near-linear time algorithm or a quadratic conditional lower bound, with one exception known as the Word Break problem.In this paper we complete their work as follows:• We present two almost-linear time algorithms that generalize all known almost-linear time algorithms for special cases of regular expression membership testing.• We classify all types, except for the Word Break problem, into almost-linear time or quadratic time assuming the Strong Exponential Time Hypothesis. This extends the classification from depth 2 and 3 to any constant depth.• For the Word Break problem we give an improved O(nm1/3 + m) algorithm. Surprisingly, we also prove a matching conditional lower bound for combinatorial algorithms. This establishes Word Break as the only intermediate problem.In total, we prove matching upper and lower bounds for any type of bounded-depth homogeneous regular expressions, which yields a full dichotomy for regular expression member-ship testing.\",\"PeriodicalId\":311592,\"journal\":{\"name\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2017.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2017.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

摘要

我们研究正则表达式的隶属性测试:给定大小为m的正则表达式和大小为n的字符串，判断该字符串是否使用正则表达式所描述的语言。其经典的0 (nm)算法是70年代的一大成功案例，它使模式匹配发展成为今天的标准工具。研究了许多特殊的模式匹配问题，这些问题的求解速度比二次型时间更快。然而，直到最近才有可能对可处理病例进行系统研究，Backurs和Indyk报道了第一个条件下界[FOCS16]。对于深度为2或3的任何类型的齐次正则表达式，他们要么提出了一个近线性时间算法，要么提出了一个二次条件下界，只有一个例外，即所谓的断字问题。本文完成的工作如下:•我们提出了两种近似线性时间算法，它们推广了所有已知的近似线性时间算法，用于正则表达式隶属性测试的特殊情况。•假设强指数时间假设，我们将除断词问题外的所有类型划分为几乎线性时间或二次时间。这将分类从深度2和3扩展到任何恒定深度。•对于断字问题，我们给出了一种改进的O(nm3 /3 + m)算法。令人惊讶的是，我们还证明了组合算法的匹配条件下界。这就确定了Break是唯一的中间问题。总之，我们证明了任何类型的有界深度齐次正则表达式的上界和下界的匹配，从而得到了正则表达式隶属性测试的完全二分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Dichotomy for Regular Expression Membership Testing

We study regular expression membership testing: Given a regular expression of size m and a string of size n, decide whether the string is in the language described by the regular expression. Its classic O(nm) algorithm is one of the big success stories of the 70s, which allowed pattern matching to develop into the standard tool that it is today.Many special cases of pattern matching have been studied that can be solved faster than in quadratic time. However, a systematic study of tractable cases was made possible only recently, with the first conditional lower bounds reported by Backurs and Indyk [FOCS16]. Restricted to any type of homogeneous regular expressions of depth 2 or 3, they either presented a near-linear time algorithm or a quadratic conditional lower bound, with one exception known as the Word Break problem.In this paper we complete their work as follows:• We present two almost-linear time algorithms that generalize all known almost-linear time algorithms for special cases of regular expression membership testing.• We classify all types, except for the Word Break problem, into almost-linear time or quadratic time assuming the Strong Exponential Time Hypothesis. This extends the classification from depth 2 and 3 to any constant depth.• For the Word Break problem we give an improved O(nm1/3 + m) algorithm. Surprisingly, we also prove a matching conditional lower bound for combinatorial algorithms. This establishes Word Break as the only intermediate problem.In total, we prove matching upper and lower bounds for any type of bounded-depth homogeneous regular expressions, which yields a full dichotomy for regular expression member-ship testing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

自引率

0.00%

发文量