{"title":"On the Analysis of Network Measurements Through Machine Learning: The Power of the Crowd","authors":"P. Casas","doi":"10.23919/TMA.2018.8506486","DOIUrl":null,"url":null,"abstract":"The application of Machine Learning (ML) models to the analysis of network measurement problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context, and only adhoc and very tailored approaches have been evaluated so far. While deep-learning models have provided a major breakthrough in highly-dimensional problems such as image processing, it is difficult to say today which is the best model or most fitted category of models to address the analysis of large volumes of highly-dimensional data collected in operational networks. In this paper we evaluate and benchmark different ML models applied to the analysis of three different and assorted network measurement problems, including detection of network attacks, detection of smartphone-apps anomalies and QoE prediction in cellular networks. We consider an extensive battery of ML models, including both supervised and semi-supervised techniques, as well as ML ensembles such as bagging, boosting and stacking. Proposed models are evaluated using real network measurements coming from operational networks. Results suggest that both neural networks and decision-tree-based models provide in general better results in terms of accuracy and prediction, with a much smaller computation overhead for decision trees as compared to models based on neural networks or support vector machines. In addition, collaborative models taking advantage of multiple machine learning algorithms, and in particular stacking models, are more robust and perform better than single ML models, pointing out the benefits of a crowd as compared to individual models.","PeriodicalId":6607,"journal":{"name":"2018 Network Traffic Measurement and Analysis Conference (TMA)","volume":"24 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Network Traffic Measurement and Analysis Conference (TMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/TMA.2018.8506486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The application of Machine Learning (ML) models to the analysis of network measurement problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context, and only adhoc and very tailored approaches have been evaluated so far. While deep-learning models have provided a major breakthrough in highly-dimensional problems such as image processing, it is difficult to say today which is the best model or most fitted category of models to address the analysis of large volumes of highly-dimensional data collected in operational networks. In this paper we evaluate and benchmark different ML models applied to the analysis of three different and assorted network measurement problems, including detection of network attacks, detection of smartphone-apps anomalies and QoE prediction in cellular networks. We consider an extensive battery of ML models, including both supervised and semi-supervised techniques, as well as ML ensembles such as bagging, boosting and stacking. Proposed models are evaluated using real network measurements coming from operational networks. Results suggest that both neural networks and decision-tree-based models provide in general better results in terms of accuracy and prediction, with a much smaller computation overhead for decision trees as compared to models based on neural networks or support vector machines. In addition, collaborative models taking advantage of multiple machine learning algorithms, and in particular stacking models, are more robust and perform better than single ML models, pointing out the benefits of a crowd as compared to individual models.