Multi-modal synthesis of regular expressions

Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, Işıl Dillig
{"title":"Multi-modal synthesis of regular expressions","authors":"Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, Işıl Dillig","doi":"10.1145/3385412.3385988","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called hierarchical sketch that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space. We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80 % accuracy whereas the NLP-only and PBE-only baselines achieve 43 % and 26 % respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61

Abstract

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called hierarchical sketch that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space. We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80 % accuracy whereas the NLP-only and PBE-only baselines achieve 43 % and 26 % respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.
正则表达式的多模态合成
在本文中,我们提出了一种多模态合成技术,用于从示例和自然语言的组合中自动构造正则表达式(regexes)。在这种情况下,使用多种模态是有用的,因为自然语言本身往往是高度模糊的,而孤立的例子往往不足以传达用户意图。我们提出的技术首先将英文描述解析为一个所谓的分层草图,该草图指导我们的示例编程(PBE)引擎。由于分层草图捕获了关键提示,因此PBE引擎可以利用这些信息来确定搜索的优先级,并为修剪搜索空间做出有用的推断。我们已经在一个名为Regel的工具中实现了所提出的技术,并在超过300个regex上对其进行了评估。我们的评估表明,Regel达到80%的准确率,而nlp和pbe基线分别达到43%和26%。我们还将我们提出的PBE引擎与AlphaRegex(一种最先进的正则表达式合成工具)的适配进行了比较,并表明我们提出的PBE引擎要快一个数量级,即使我们调整了AlphaRegex的搜索算法来利用草图。最后,我们进行了一项涉及20名参与者的用户研究,结果表明,使用Regel的用户成功找到所需正则表达式的可能性是不使用Regel的两倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信