{"title":"POSTER: A PU Learning based System for Potential Malicious URL Detection","authors":"Ya-Lin Zhang, Longfei Li, Jun Zhou, Xiaolong Li, Yujiang Liu, Yuanchao Zhang, Zhi-Hua Zhou","doi":"10.1145/3133956.3138825","DOIUrl":null,"url":null,"abstract":"This paper describes a PU learning (Positive and Unlabeled learning) based system for potential URL attack detection. Previous machine learning based solutions for this task mainly formalize it as a supervised learning problem. However, in some scenarios, the data obtained always contains only a handful of known attack URLs, along with a large number of unlabeled instances, making the supervised learning paradigms infeasible. In this work, we formalize this setting as a PU learning problem, and solve it by combining two different strategies (two-stage strategy and cost-sensitive strategy). Experimental results show that the developed system can effectively find potential URL attacks. This system can either be deployed as an assistance for existing system or be employed to help cyber-security engineers to effectively discover potential attack mode so that they can improve the existing system with significantly less efforts.","PeriodicalId":191367,"journal":{"name":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3133956.3138825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
This paper describes a PU learning (Positive and Unlabeled learning) based system for potential URL attack detection. Previous machine learning based solutions for this task mainly formalize it as a supervised learning problem. However, in some scenarios, the data obtained always contains only a handful of known attack URLs, along with a large number of unlabeled instances, making the supervised learning paradigms infeasible. In this work, we formalize this setting as a PU learning problem, and solve it by combining two different strategies (two-stage strategy and cost-sensitive strategy). Experimental results show that the developed system can effectively find potential URL attacks. This system can either be deployed as an assistance for existing system or be employed to help cyber-security engineers to effectively discover potential attack mode so that they can improve the existing system with significantly less efforts.
本文描述了一种基于PU学习(Positive and Unlabeled learning)的潜在URL攻击检测系统。之前针对该任务的基于机器学习的解决方案主要将其形式化为监督学习问题。然而,在某些情况下,获得的数据总是只包含少数已知的攻击url,以及大量未标记的实例,这使得监督学习范式不可行。在这项工作中,我们将这种设置形式化为PU学习问题,并通过结合两种不同的策略(两阶段策略和成本敏感策略)来解决它。实验结果表明,该系统能够有效地发现潜在的URL攻击。该系统既可以作为现有系统的辅助部署,也可以用来帮助网络安全工程师有效地发现潜在的攻击模式,从而大大减少他们对现有系统的改进。