样本:用于分类移动应用流量的持久词法片段的自适应挖掘

Proceedings of the 21st Annual International Conference on Mobile Computing and Networking Pub Date : 2015-09-07 DOI:10.1145/2789168.2790097

Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao

{"title":"样本:用于分类移动应用流量的持久词法片段的自适应挖掘","authors":"Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao","doi":"10.1145/2789168.2790097","DOIUrl":null,"url":null,"abstract":"We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.","PeriodicalId":424497,"journal":{"name":"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":"{\"title\":\"SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic\",\"authors\":\"Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao\",\"doi\":\"10.1145/2789168.2790097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.\",\"PeriodicalId\":424497,\"journal\":{\"name\":\"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"88\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2789168.2790097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2789168.2790097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

摘要

我们提供的样本包括:持久词法片段的自适应挖掘;对移动应用程序产生的网络流量进行分类的系统框架。通过对一组标记流(训练集)的监督方法，以自动化的方式构建合取规则。每个连接规则对应于词法上下文，与HTTP报头片段中的应用程序标识符相关联，并由以下方式定义:(a)标识符类型，(b)它出现的HTTP报头字段，以及(c)围绕其出现的前缀/后缀。随后，这些连接规则经历一个聚合和验证步骤，以提高准确性并确定优先顺序。然后将精炼的规则集加载到应用程序识别引擎中，它在每个流粒度中以提取和查找范式进行操作，以识别负责给定流的应用程序。因此，样本可以促进重要的网络测量和管理任务——例如行为分析[29]，应用程序级防火墙[21,22]等——这需要比传统的基于协议/端口的方法提供的更详细的底层流量视图。我们在一个测试集上评估样本，该测试集由来自Android、iOS和Nokia市场的超过700k个应用程序生成的1500万流(大约)组成。样本成功地识别了90%以上的应用程序，平均准确率为99%。尽管在三个市场中，每个市场在培训阶段只需要不到2%的应用程序。这证明了我们方法的普遍性和可扩展性。因此，我们希望样本能够在其他移动平台(如黑莓和Windows mobile)上具有合理的覆盖率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic

We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st Annual International Conference on Mobile Computing and Networking

自引率

0.00%

发文量