Jonathan Hillblom, Johan Garcia, Anders Waldenborg
{"title":"Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates","authors":"Jonathan Hillblom, Johan Garcia, Anders Waldenborg","doi":"10.1109/NoF52522.2021.9609828","DOIUrl":null,"url":null,"abstract":"Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.","PeriodicalId":314720,"journal":{"name":"2021 12th International Conference on Network of the Future (NoF)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Conference on Network of the Future (NoF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NoF52522.2021.9609828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.