{"title":"Scalable analysis of network measurements with Hadoop and Pig","authors":"T. Samak, D. Gunter, V. Hendrix","doi":"10.1109/NOMS.2012.6212060","DOIUrl":null,"url":null,"abstract":"The deployment of ubiquitous distributed monitoring infrastructure such as perfSONAR is greatly increasing the availability and quality of network performance data. Cross-cutting analyses are now possible that can detect anomalies and provide real-time automated alerts to network management services. However, scaling these analyses to the volumes of available data remains a difficult task. Although there is significant research into offline analysis techniques, most of these approaches do not address the systems and scalability issues. This work presents an analysis framework incorporating industry best-practices and tools to perform large-scale analyses. Our framework integrates the expressiveness of Pig, the scalability of Hadoop, and the analysis and visualization capabilities of R to achieve a significant increase in both speed and power of analysis. Evaluation of our framework on a large dataset of real measurements from perfSONAR demonstrate a large speedup and novel statistical capabilities.","PeriodicalId":364494,"journal":{"name":"2012 IEEE Network Operations and Management Symposium","volume":"237 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2012.6212060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
The deployment of ubiquitous distributed monitoring infrastructure such as perfSONAR is greatly increasing the availability and quality of network performance data. Cross-cutting analyses are now possible that can detect anomalies and provide real-time automated alerts to network management services. However, scaling these analyses to the volumes of available data remains a difficult task. Although there is significant research into offline analysis techniques, most of these approaches do not address the systems and scalability issues. This work presents an analysis framework incorporating industry best-practices and tools to perform large-scale analyses. Our framework integrates the expressiveness of Pig, the scalability of Hadoop, and the analysis and visualization capabilities of R to achieve a significant increase in both speed and power of analysis. Evaluation of our framework on a large dataset of real measurements from perfSONAR demonstrate a large speedup and novel statistical capabilities.