样本:用于分类移动应用流量的持久词法片段的自适应挖掘

Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao
{"title":"样本:用于分类移动应用流量的持久词法片段的自适应挖掘","authors":"Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao","doi":"10.1145/2789168.2790097","DOIUrl":null,"url":null,"abstract":"We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.","PeriodicalId":424497,"journal":{"name":"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":"{\"title\":\"SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic\",\"authors\":\"Hongyi Yao, Gyan Ranjan, A. Tongaonkar, Yong Liao, Z. Morley Mao\",\"doi\":\"10.1145/2789168.2790097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.\",\"PeriodicalId\":424497,\"journal\":{\"name\":\"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"88\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2789168.2790097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st Annual International Conference on Mobile Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2789168.2790097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 88

摘要

我们提供的样本包括:持久词法片段的自适应挖掘;对移动应用程序产生的网络流量进行分类的系统框架。通过对一组标记流(训练集)的监督方法,以自动化的方式构建合取规则。每个连接规则对应于词法上下文,与HTTP报头片段中的应用程序标识符相关联,并由以下方式定义:(a)标识符类型,(b)它出现的HTTP报头字段,以及(c)围绕其出现的前缀/后缀。随后,这些连接规则经历一个聚合和验证步骤,以提高准确性并确定优先顺序。然后将精炼的规则集加载到应用程序识别引擎中,它在每个流粒度中以提取和查找范式进行操作,以识别负责给定流的应用程序。因此,样本可以促进重要的网络测量和管理任务——例如行为分析[29],应用程序级防火墙[21,22]等——这需要比传统的基于协议/端口的方法提供的更详细的底层流量视图。我们在一个测试集上评估样本,该测试集由来自Android、iOS和Nokia市场的超过700k个应用程序生成的1500万流(大约)组成。样本成功地识别了90%以上的应用程序,平均准确率为99%。尽管在三个市场中,每个市场在培训阶段只需要不到2%的应用程序。这证明了我们方法的普遍性和可扩展性。因此,我们希望样本能够在其他移动平台(如黑莓和Windows mobile)上具有合理的覆盖率和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic
We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信