Sina Fathi Kazerooni, Yagiz Kaymak, R. Rojas-Cessa
{"title":"Identification of User Application by an External Eavesdropper using Machine Learning Analysis on Network Traffic","authors":"Sina Fathi Kazerooni, Yagiz Kaymak, R. Rojas-Cessa","doi":"10.1109/ICCW.2019.8756709","DOIUrl":null,"url":null,"abstract":"An eavesdropper may infer the computer applications a person uses by collecting and analyzing the network traffic they generate. Such inference may be performed despite applying encryption on the generated packets. In this paper, we investigate the extent of the ability of several machine learning algorithms to perform this privacy breach on the network traffic generated by a user. We measure their accuracy in identifying different applications by analyzing several statistical properties of the generated traffic rather than looking into the encrypted content. We compare the performance of these algorithms and select the one with higher precision; random forest. We also evaluate the application of packet padding to modify the packet length to avoid identification by machine learning algorithms. We test the effect of packet padding on the identification ability of the various machine-learning algorithms. We investigate the performance of the random forest algorithm in detail when applied to intact and padded traffic. We show that padding may decrease the efficacy of a machine-learning algorithm when used for application classification.","PeriodicalId":426086,"journal":{"name":"2019 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCW.2019.8756709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
An eavesdropper may infer the computer applications a person uses by collecting and analyzing the network traffic they generate. Such inference may be performed despite applying encryption on the generated packets. In this paper, we investigate the extent of the ability of several machine learning algorithms to perform this privacy breach on the network traffic generated by a user. We measure their accuracy in identifying different applications by analyzing several statistical properties of the generated traffic rather than looking into the encrypted content. We compare the performance of these algorithms and select the one with higher precision; random forest. We also evaluate the application of packet padding to modify the packet length to avoid identification by machine learning algorithms. We test the effect of packet padding on the identification ability of the various machine-learning algorithms. We investigate the performance of the random forest algorithm in detail when applied to intact and padded traffic. We show that padding may decrease the efficacy of a machine-learning algorithm when used for application classification.