Junlang Zhan, X. Liao, Yukun Bao, Lu Gan, Zhiwen Tan, Mengxue Zhang, Ruan He, Jialiang Lu
{"title":"利用字节对编码和TF-IDF的web日志数据的有效特征表示","authors":"Junlang Zhan, X. Liao, Yukun Bao, Lu Gan, Zhiwen Tan, Mengxue Zhang, Ruan He, Jialiang Lu","doi":"10.1145/3321408.3321568","DOIUrl":null,"url":null,"abstract":"Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.","PeriodicalId":364264,"journal":{"name":"Proceedings of the ACM Turing Celebration Conference - China","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An effective feature representation of web log data by leveraging byte pair encoding and TF-IDF\",\"authors\":\"Junlang Zhan, X. Liao, Yukun Bao, Lu Gan, Zhiwen Tan, Mengxue Zhang, Ruan He, Jialiang Lu\",\"doi\":\"10.1145/3321408.3321568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.\",\"PeriodicalId\":364264,\"journal\":{\"name\":\"Proceedings of the ACM Turing Celebration Conference - China\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Turing Celebration Conference - China\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3321408.3321568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Turing Celebration Conference - China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3321408.3321568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An effective feature representation of web log data by leveraging byte pair encoding and TF-IDF
Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.