{"title":"Oregano: staging regular expressions with Moore Cayley fusion","authors":"Jamie Willis, Nicolas Wu, T. Schrijvers","doi":"10.1145/3546189.3549916","DOIUrl":null,"url":null,"abstract":"Regular expressions are a tool for recognising regular languages, historically implemented using derivatives or non-deterministic finite automata. They are convenient for many light-weight parsing workloads, but their traditional formulation only lends them to matching text, not returning fully-structured results. This contrasts with other forms of parsing, where the aim is to extract meaningful data, for example abstract syntax trees. Yet, most regular expression libraries do not support this useful output, and those that do are often slower, and backed by parser combinator libraries. This paper presents Oregano, a redesign of regular expressions to make use of Moore machines as the underlying machinery; this way the regular expression matcher can produce results. We further show how to produce heterogeneous results, providing a classic applicative interface. To make this representation performant, we leverage the relationship between Cayley representations, continuation-passing style, and staged meta-programming to generate performant code for regular expression matching with fully-structured results.","PeriodicalId":118757,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN International Haskell Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM SIGPLAN International Haskell Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546189.3549916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Regular expressions are a tool for recognising regular languages, historically implemented using derivatives or non-deterministic finite automata. They are convenient for many light-weight parsing workloads, but their traditional formulation only lends them to matching text, not returning fully-structured results. This contrasts with other forms of parsing, where the aim is to extract meaningful data, for example abstract syntax trees. Yet, most regular expression libraries do not support this useful output, and those that do are often slower, and backed by parser combinator libraries. This paper presents Oregano, a redesign of regular expressions to make use of Moore machines as the underlying machinery; this way the regular expression matcher can produce results. We further show how to produce heterogeneous results, providing a classic applicative interface. To make this representation performant, we leverage the relationship between Cayley representations, continuation-passing style, and staged meta-programming to generate performant code for regular expression matching with fully-structured results.