M. Luckie, Alexander Marder, B. Huffaker, K. Claffy
{"title":"学习正则表达式从主机名中提取网络名称","authors":"M. Luckie, Alexander Marder, B. Huffaker, K. Claffy","doi":"10.1145/3497777.3498545","DOIUrl":null,"url":null,"abstract":"We present the design, implementation, evaluation, and validation of a system that automatically learns regular expressions (regexes) to extract network names from Internet hostnames assigned by operators using their own conventions. Our fully automated method does not rely on a human to provide a starting regex, labeled examples of valid extractions, or a dictionary of network names. Our method first learns the dictionary of network names, and then automatically generates and evaluates regexes that extract these names. We validate our dictionary against ground truth, finding that 97.3% of the names our regexes extract are valid names for the networks.","PeriodicalId":248679,"journal":{"name":"Proceedings of the 16th Asian Internet Engineering Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Regexes to Extract Network Names from Hostnames\",\"authors\":\"M. Luckie, Alexander Marder, B. Huffaker, K. Claffy\",\"doi\":\"10.1145/3497777.3498545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the design, implementation, evaluation, and validation of a system that automatically learns regular expressions (regexes) to extract network names from Internet hostnames assigned by operators using their own conventions. Our fully automated method does not rely on a human to provide a starting regex, labeled examples of valid extractions, or a dictionary of network names. Our method first learns the dictionary of network names, and then automatically generates and evaluates regexes that extract these names. We validate our dictionary against ground truth, finding that 97.3% of the names our regexes extract are valid names for the networks.\",\"PeriodicalId\":248679,\"journal\":{\"name\":\"Proceedings of the 16th Asian Internet Engineering Conference\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th Asian Internet Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3497777.3498545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Asian Internet Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3497777.3498545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Regexes to Extract Network Names from Hostnames
We present the design, implementation, evaluation, and validation of a system that automatically learns regular expressions (regexes) to extract network names from Internet hostnames assigned by operators using their own conventions. Our fully automated method does not rely on a human to provide a starting regex, labeled examples of valid extractions, or a dictionary of network names. Our method first learns the dictionary of network names, and then automatically generates and evaluates regexes that extract these names. We validate our dictionary against ground truth, finding that 97.3% of the names our regexes extract are valid names for the networks.