Junlang Zhan, X. Liao, Yukun Bao, Lu Gan, Zhiwen Tan, Mengxue Zhang, Ruan He, Jialiang Lu
{"title":"An effective feature representation of web log data by leveraging byte pair encoding and TF-IDF","authors":"Junlang Zhan, X. Liao, Yukun Bao, Lu Gan, Zhiwen Tan, Mengxue Zhang, Ruan He, Jialiang Lu","doi":"10.1145/3321408.3321568","DOIUrl":null,"url":null,"abstract":"Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.","PeriodicalId":364264,"journal":{"name":"Proceedings of the ACM Turing Celebration Conference - China","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Turing Celebration Conference - China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3321408.3321568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Web log data analysis is important in intrusion detection. Various machine learning techniques have been applied. However, compared to abundant researches on machine learning, ways to extract features from log data are still under research. In this paper, we present an effective feature extraction approach by leveraging Byte Pair Encoding (BPE) and Term Frequency-Inverse Document Frequency (TF-IDF). We have applied this approach on various downstream machine learning algorithms and proved its usefulness.