{"title":"Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage","authors":"N. Hoan, Pham Ngoc Hung","doi":"10.1109/KSE53942.2021.9648604","DOIUrl":null,"url":null,"abstract":"This paper introduces a testing tool for regular expressions (regexes) named AREXT. AREXT can automatically extract regexes from C++ source files and visually represent them as DFA graphs. Given a regex, AREXT can generate a set of positive and negative strings with 100% coverage of nodes, edges, and edge pairs. We leverage prior works of synthesizing regexes from natural language to create benchmarks for evaluating AREXT. Some current tools, i.e., EGRET and MUTREX, are also being evaluated and compared. Experiments show that AREXT can outperform EGRET as AREXT can detect more unexpected synthesized regexes in almost all benchmarks. The evaluation results indicate that strings with 100% coverage metrics (generated by AREXT) or strings with maximum mutation score (generated by MUTREX) are not enough to ensure the correctness of regexes under test. Experiments also show that combining AREXT, EGRET, and MUTREX can detect a majority of unwanted synthesized regexes (87–91%).","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a testing tool for regular expressions (regexes) named AREXT. AREXT can automatically extract regexes from C++ source files and visually represent them as DFA graphs. Given a regex, AREXT can generate a set of positive and negative strings with 100% coverage of nodes, edges, and edge pairs. We leverage prior works of synthesizing regexes from natural language to create benchmarks for evaluating AREXT. Some current tools, i.e., EGRET and MUTREX, are also being evaluated and compared. Experiments show that AREXT can outperform EGRET as AREXT can detect more unexpected synthesized regexes in almost all benchmarks. The evaluation results indicate that strings with 100% coverage metrics (generated by AREXT) or strings with maximum mutation score (generated by MUTREX) are not enough to ensure the correctness of regexes under test. Experiments also show that combining AREXT, EGRET, and MUTREX can detect a majority of unwanted synthesized regexes (87–91%).