{"title":"Catching Unusual Traffic Behavior using TF–IDF-based Port Access Statistics Analysis","authors":"K. Shima","doi":"10.1109/CCCI52664.2021.9583212","DOIUrl":null,"url":null,"abstract":"Detecting the anomalous behavior of traffic is one of the important actions for network operators. In this study, we applied term frequency – inverse document frequency (TF–IDF), which is a popular method used in natural language processing, to detect unusual behavior from network access logs. We mapped the term and document concept to the port number and daily access history, respectively, and calculated the TF–IDF. With this approach, we could obtain ports frequently observed in fewer days compared to other port access activities. Such access behaviors are not always malicious activities; however, such information is a good indicator for starting a deeper analysis of traffic behavior. Using a real-life dataset, we could detect two bot-oriented accesses and one unique UDP traffic.","PeriodicalId":136382,"journal":{"name":"2021 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCCI52664.2021.9583212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Detecting the anomalous behavior of traffic is one of the important actions for network operators. In this study, we applied term frequency – inverse document frequency (TF–IDF), which is a popular method used in natural language processing, to detect unusual behavior from network access logs. We mapped the term and document concept to the port number and daily access history, respectively, and calculated the TF–IDF. With this approach, we could obtain ports frequently observed in fewer days compared to other port access activities. Such access behaviors are not always malicious activities; however, such information is a good indicator for starting a deeper analysis of traffic behavior. Using a real-life dataset, we could detect two bot-oriented accesses and one unique UDP traffic.