Boosting Homograph Attack Classification Using Ensemble Learning and N-gram Model

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) Pub Date : 2020-12-01 DOI:10.1109/TrustCom50675.2020.00271

Tran Thao Phuong, Hoang-Quoc Nguyen-Son, R. Yamaguchi, Toshiyuki Nakata

{"title":"Boosting Homograph Attack Classification Using Ensemble Learning and N-gram Model","authors":"Tran Thao Phuong, Hoang-Quoc Nguyen-Son, R. Yamaguchi, Toshiyuki Nakata","doi":"10.1109/TrustCom50675.2020.00271","DOIUrl":null,"url":null,"abstract":"A visual homograph attack is a way that the attacker deceives the web users about which domain they are visiting by exploiting forged domains that look similar to the genuine domains. T. Thao et al. (IFIP SEC'19) proposed a homograph classification by applying conventional supervised learning algorithms on the features extracted from a single-character-based Structural Similarity Index (SSIM). This paper aims to improve the classification accuracy by combining their SSIM features with 199 features extracted from a N-gram model and applying advanced ensemble learning algorithms. The experimental result showed that our proposed method could enhance even 1.81% of accuracy and reduce 2.15% of false-positive rate. Furthermore, existing work applied machine learning on some features without being able to explain why applying it can improve the accuracy. Even though the accuracy could be improved, understanding the ground-truth is also crucial. Therefore, in this paper, we conducted an error empirical analysis and could obtain several findings behind our proposed approach.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A visual homograph attack is a way that the attacker deceives the web users about which domain they are visiting by exploiting forged domains that look similar to the genuine domains. T. Thao et al. (IFIP SEC'19) proposed a homograph classification by applying conventional supervised learning algorithms on the features extracted from a single-character-based Structural Similarity Index (SSIM). This paper aims to improve the classification accuracy by combining their SSIM features with 199 features extracted from a N-gram model and applying advanced ensemble learning algorithms. The experimental result showed that our proposed method could enhance even 1.81% of accuracy and reduce 2.15% of false-positive rate. Furthermore, existing work applied machine learning on some features without being able to explain why applying it can improve the accuracy. Even though the accuracy could be improved, understanding the ground-truth is also crucial. Therefore, in this paper, we conducted an error empirical analysis and could obtain several findings behind our proposed approach.

查看原文本刊更多论文

利用集成学习和n -图模型增强同形攻击分类

视觉同形图攻击是攻击者利用与真实域名相似的伪造域名欺骗网络用户正在访问的域名的一种方式。T. Thao等人(IFIP SEC'19)通过对从基于单个字符的结构相似指数(SSIM)中提取的特征应用传统的监督学习算法，提出了一种同形词分类方法。本文旨在将他们的SSIM特征与从N-gram模型中提取的199个特征相结合，并应用先进的集成学习算法来提高分类精度。实验结果表明，该方法可提高1.81%的准确率，降低2.15%的假阳性率。此外，现有的工作将机器学习应用于某些特征，但无法解释为什么应用它可以提高准确性。尽管准确性可以提高，但了解基本事实也至关重要。因此，在本文中，我们进行了误差实证分析，在我们提出的方法背后可以得到几个发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

自引率

0.00%

发文量