Jonathan Hillblom, Johan Garcia, Anders Waldenborg
{"title":"基于ML代理的GA优化构建高效正则表达式匹配器","authors":"Jonathan Hillblom, Johan Garcia, Anders Waldenborg","doi":"10.1109/NoF52522.2021.9609828","DOIUrl":null,"url":null,"abstract":"Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.","PeriodicalId":314720,"journal":{"name":"2021 12th International Conference on Network of the Future (NoF)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates\",\"authors\":\"Jonathan Hillblom, Johan Garcia, Anders Waldenborg\",\"doi\":\"10.1109/NoF52522.2021.9609828\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.\",\"PeriodicalId\":314720,\"journal\":{\"name\":\"2021 12th International Conference on Network of the Future (NoF)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Conference on Network of the Future (NoF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NoF52522.2021.9609828\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Conference on Network of the Future (NoF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NoF52522.2021.9609828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates
Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.