{"title":"RegexClassifier: A GNN-Based Recognition Method for State-Explosive Regular Expressions","authors":"Yuhai Lu, Xiaolin Wang, Fangfang Yuan, Cong Cao, Xiaoliang Zhang, Yanbing Liu","doi":"10.1109/ISCC58397.2023.10218248","DOIUrl":null,"url":null,"abstract":"Regular expression (regex) matching technology has been widely used in various applications. For the sake of low time complexity and stable performance, Deterministic Finite Automaton (DFA) has become the first choice to perform fast regular expression matching. However, DFA has the state explosion problem, that is, the number of DFA states may increase exponentially while compiling some specific regexes to DFA. The huge memory consumption restricts its practical applications. A lot of works have addressed the DFA state explosion problem; however, none has met the requirements of fast recognition and small memory image. In this paper, we proposed RegexClassifier to recognize state-explosive regexes intelligently and efficiently. It firstly transforms regexes into Non-deterministic Finite Automatons(NFAs), then uses Graph Neural Network(GNN) models to classify NFAs in order to recognize regexes that may cause DFA state explosion. Experiments on typical rule sets show that the classification accuracy of the proposed model is up to 98%.","PeriodicalId":265337,"journal":{"name":"2023 IEEE Symposium on Computers and Communications (ISCC)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC58397.2023.10218248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Regular expression (regex) matching technology has been widely used in various applications. For the sake of low time complexity and stable performance, Deterministic Finite Automaton (DFA) has become the first choice to perform fast regular expression matching. However, DFA has the state explosion problem, that is, the number of DFA states may increase exponentially while compiling some specific regexes to DFA. The huge memory consumption restricts its practical applications. A lot of works have addressed the DFA state explosion problem; however, none has met the requirements of fast recognition and small memory image. In this paper, we proposed RegexClassifier to recognize state-explosive regexes intelligently and efficiently. It firstly transforms regexes into Non-deterministic Finite Automatons(NFAs), then uses Graph Neural Network(GNN) models to classify NFAs in order to recognize regexes that may cause DFA state explosion. Experiments on typical rule sets show that the classification accuracy of the proposed model is up to 98%.