Arext: Automatic Regular Expression Testing Tool Based on Generating Strings With Full Coverage

2021 13th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2021-11-10 DOI:10.1109/KSE53942.2021.9648604

N. Hoan, Pham Ngoc Hung

引用次数: 0

Abstract

This paper introduces a testing tool for regular expressions (regexes) named AREXT. AREXT can automatically extract regexes from C++ source files and visually represent them as DFA graphs. Given a regex, AREXT can generate a set of positive and negative strings with 100% coverage of nodes, edges, and edge pairs. We leverage prior works of synthesizing regexes from natural language to create benchmarks for evaluating AREXT. Some current tools, i.e., EGRET and MUTREX, are also being evaluated and compared. Experiments show that AREXT can outperform EGRET as AREXT can detect more unexpected synthesized regexes in almost all benchmarks. The evaluation results indicate that strings with 100% coverage metrics (generated by AREXT) or strings with maximum mutation score (generated by MUTREX) are not enough to ensure the correctness of regexes under test. Experiments also show that combining AREXT, EGRET, and MUTREX can detect a majority of unwanted synthesized regexes (87–91%).

查看原文本刊更多论文

基于生成全覆盖字符串的自动正则表达式测试工具

本文介绍了一个正则表达式(regexes)的测试工具AREXT。AREXT可以自动从c++源文件中提取正则表达式，并将它们可视化地表示为DFA图。给定一个正则表达式，AREXT可以生成一组正字符串和负字符串，这些字符串100%覆盖节点、边和边对。我们利用先前从自然语言合成正则表达式的工作来创建评估AREXT的基准。目前的一些工具，如EGRET和MUTREX，也正在进行评估和比较。实验表明，在几乎所有的基准测试中，AREXT都可以检测到更多意想不到的合成正则表达式，因此优于EGRET。评估结果表明，100%覆盖率指标的字符串(由AREXT生成)或最大突变分数的字符串(由MUTREX生成)不足以确保被测试正则表达式的正确性。实验还表明，结合AREXT、EGRET和MUTREX可以检测出大多数不需要的合成正则(87% - 91%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 13th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量