Classification of Landing and Distribution Domains Using Whois’ Text Mining

Tran Thao Phuong, A. Yamada, Kosuke Murakami, J. Urakawa, Y. Sawaya, A. Kubota
{"title":"Classification of Landing and Distribution Domains Using Whois’ Text Mining","authors":"Tran Thao Phuong, A. Yamada, Kosuke Murakami, J. Urakawa, Y. Sawaya, A. Kubota","doi":"10.1109/Trustcom/BigDataSE/ICESS.2017.213","DOIUrl":null,"url":null,"abstract":"Detection of drive-by-download attack has gained a focus in security research since the attack has turned into the most popular and serious threat to web infrastructure. The attack exploits vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Often, the victim is sent through a long chain of redirection operations in order to take down the offending pages. Concretely, the attack is triggered when a user visits a benign webpage that is compromised by the attacker (called landing page) and is inserted some malicious code inside. The user is then automatically redirected to an actual page that installs malware on the user's computer (called distribution page) without his/her consent or knowledge. While there is a large body of works targeting on detection of drive-by download attack, there is little attention on the redirection which is a crucial characteristic of the attack. In this paper, for the first time, we propose an approach to the classification of landing and distribution domains which are important components forming the head and tail of a redirection chain in the attack. The methodology in our approach is to use machine learning for text mining on the registered information of the domains called whois. We intensively implemented our approach with six popular supervised learning algorithms, compared the results and concluded that Linear-based Support Vector Machine and CART algorithm-based Decision Tree are the best models for our dataset which respectively give 98.55% and 99.28% of accuracy, 97.78% and 98.95% of F1 score, 98.35% and 99.45% of average precision.","PeriodicalId":170253,"journal":{"name":"2017 IEEE Trustcom/BigDataSE/ICESS","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Trustcom/BigDataSE/ICESS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Detection of drive-by-download attack has gained a focus in security research since the attack has turned into the most popular and serious threat to web infrastructure. The attack exploits vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Often, the victim is sent through a long chain of redirection operations in order to take down the offending pages. Concretely, the attack is triggered when a user visits a benign webpage that is compromised by the attacker (called landing page) and is inserted some malicious code inside. The user is then automatically redirected to an actual page that installs malware on the user's computer (called distribution page) without his/her consent or knowledge. While there is a large body of works targeting on detection of drive-by download attack, there is little attention on the redirection which is a crucial characteristic of the attack. In this paper, for the first time, we propose an approach to the classification of landing and distribution domains which are important components forming the head and tail of a redirection chain in the attack. The methodology in our approach is to use machine learning for text mining on the registered information of the domains called whois. We intensively implemented our approach with six popular supervised learning algorithms, compared the results and concluded that Linear-based Support Vector Machine and CART algorithm-based Decision Tree are the best models for our dataset which respectively give 98.55% and 99.28% of accuracy, 97.78% and 98.95% of F1 score, 98.35% and 99.45% of average precision.
基于Whois文本挖掘的着陆域和分布域分类
由于下载驱动攻击已成为网络基础设施中最常见和最严重的威胁,其检测已成为安全研究的焦点。这种攻击利用网络浏览器及其扩展的漏洞,悄无声息地下载恶意软件。通常,受害者被发送到一个长链的重定向操作,以删除违规页面。具体来说,当用户访问一个被攻击者破坏的良性网页(称为登陆页)并在其中插入一些恶意代码时,就会触发攻击。然后,在未经用户同意或不知情的情况下,用户被自动重定向到在用户计算机上安装恶意软件的实际页面(称为分发页面)。虽然针对驱动下载攻击的检测已有大量的工作,但对驱动下载攻击的关键特征重定向的关注却很少。在本文中,我们首次提出了一种分类着陆域和分布域的方法,它们是攻击中形成重定向链的头和尾的重要组成部分。我们的方法是使用机器学习对被称为whois的域名的注册信息进行文本挖掘。我们将我们的方法与六种流行的监督学习算法进行了深入的实践,比较了结果,得出基于线性的支持向量机和基于CART算法的决策树是我们数据集的最佳模型,分别给出了98.55%和99.28%的准确率,97.78%和98.95%的F1分数,98.35%和99.45%的平均精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信