Ahmad Beirami, Liling Huang, Mohsen Sardari, F. Fekri
{"title":"On optimality of data clustering for packet-level memory-assisted compression of network traffic","authors":"Ahmad Beirami, Liling Huang, Mohsen Sardari, F. Fekri","doi":"10.1109/SPAWC.2014.6941726","DOIUrl":null,"url":null,"abstract":"Recently, we proposed a framework called memory-assisted compression that learns the statistical properties of the sequence-generating server at intermediate network nodes and then leverages the learnt models to overcome the inevitable redundancy (overhead) in the universal compression of the payloads of the short-length network packets. In this paper, we prove that when the content-generating server is comprised of a mixture of parametric sources, label-based clustering of the data to their original sequence-generating models from the mixture is optimal almost surely as it achieves the mixture entropy (which is the lower bound on the average codeword length). Motivated by this result, we present a K-means clustering technique as the proof of concept to demonstrate the benefits of memory-assisted compression performance. Simulation results confirm the effectiveness of the proposed approach by matching the expected improvements predicted by theory on man-made mixture sources. Finally, the benefits of the cluster-based memory-assisted compression are validated on real data traffic traces demonstrating more than 50% traffic reduction on average in data gathered from wireless users.","PeriodicalId":420837,"journal":{"name":"2014 IEEE 15th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 15th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPAWC.2014.6941726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Recently, we proposed a framework called memory-assisted compression that learns the statistical properties of the sequence-generating server at intermediate network nodes and then leverages the learnt models to overcome the inevitable redundancy (overhead) in the universal compression of the payloads of the short-length network packets. In this paper, we prove that when the content-generating server is comprised of a mixture of parametric sources, label-based clustering of the data to their original sequence-generating models from the mixture is optimal almost surely as it achieves the mixture entropy (which is the lower bound on the average codeword length). Motivated by this result, we present a K-means clustering technique as the proof of concept to demonstrate the benefits of memory-assisted compression performance. Simulation results confirm the effectiveness of the proposed approach by matching the expected improvements predicted by theory on man-made mixture sources. Finally, the benefits of the cluster-based memory-assisted compression are validated on real data traffic traces demonstrating more than 50% traffic reduction on average in data gathered from wireless users.