SemanticPhish: A Semantic-based Scanning System for Early Detection of Phishing Attacks

2020 APWG Symposium on Electronic Crime Research (eCrime) Pub Date : 2020-11-16 DOI:10.1109/eCrime51433.2020.9493252

Q. Cui, Guy-Vincent Jourdan, G. Bochmann, Iosif-Viorel Onut

{"title":"SemanticPhish: A Semantic-based Scanning System for Early Detection of Phishing Attacks","authors":"Q. Cui, Guy-Vincent Jourdan, G. Bochmann, Iosif-Viorel Onut","doi":"10.1109/eCrime51433.2020.9493252","DOIUrl":null,"url":null,"abstract":"In the fight against phishing attacks, time is of the essence. Each individual attack is usually short-lived, but many people are still victimized during that short timeframe. To curb the problem, one way is to detect the attack shortly after the site is deployed, before victims have a chance to access it. Monitoring every new URL on the internet clearly is not a practical option, but monitoring sites that have a good chance of hosting an attack can be done. One of the ways to spot such a site is to monitor domain names. It is known that a growing number of phishing attacks are hosted by the attacker [1], [2], using their own domain names. Therefore, domain names might help spotting likely attacks. In this paper, we look at the following questions: can we currently tell apart domain names used in phishing attacks from other domains? If so, can we train a system to automatically detect these domains? And can such a system find attacks before they are being reported by victims? We show that the semantic of the words used by many phishing domains is different from the semantic of the words used by benign domain names, and that we can train a classifier to reliably flag these domains. We propose a system, SemanticPhish, which efficiently monitors these domains and is able to detect many phishing attacks without requiring the attack to be reported first. SemanticPhish can find attacks several days before Google’s “safe browsing” starts flagging them.","PeriodicalId":103272,"journal":{"name":"2020 APWG Symposium on Electronic Crime Research (eCrime)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 APWG Symposium on Electronic Crime Research (eCrime)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eCrime51433.2020.9493252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In the fight against phishing attacks, time is of the essence. Each individual attack is usually short-lived, but many people are still victimized during that short timeframe. To curb the problem, one way is to detect the attack shortly after the site is deployed, before victims have a chance to access it. Monitoring every new URL on the internet clearly is not a practical option, but monitoring sites that have a good chance of hosting an attack can be done. One of the ways to spot such a site is to monitor domain names. It is known that a growing number of phishing attacks are hosted by the attacker [1], [2], using their own domain names. Therefore, domain names might help spotting likely attacks. In this paper, we look at the following questions: can we currently tell apart domain names used in phishing attacks from other domains? If so, can we train a system to automatically detect these domains? And can such a system find attacks before they are being reported by victims? We show that the semantic of the words used by many phishing domains is different from the semantic of the words used by benign domain names, and that we can train a classifier to reliably flag these domains. We propose a system, SemanticPhish, which efficiently monitors these domains and is able to detect many phishing attacks without requiring the attack to be reported first. SemanticPhish can find attacks several days before Google’s “safe browsing” starts flagging them.

查看原文本刊更多论文

语义网络钓鱼:一种基于语义的网络钓鱼攻击早期检测系统

在与网络钓鱼攻击的斗争中，时间是至关重要的。每次单独的攻击通常是短暂的，但在这短暂的时间内仍然有许多人受害。为了遏制这个问题，一种方法是在网站部署后不久，在受害者有机会访问它之前检测到攻击。监控互联网上的每一个新URL显然不是一个实际的选择，但监控那些很有可能受到攻击的网站是可以做到的。发现这类网站的方法之一是监控域名。众所周知，越来越多的网络钓鱼攻击是由攻击者[1]，[2]使用自己的域名发起的。因此，域名可能有助于发现可能的攻击。在本文中，我们着眼于以下问题:我们目前能否将用于网络钓鱼攻击的域名与其他域名区分开来?如果是这样，我们可以训练一个系统来自动检测这些域吗?这样的系统能在受害者举报之前发现攻击吗?我们证明了许多网络钓鱼域名使用的词的语义不同于良性域名使用的词的语义，并且我们可以训练一个分类器来可靠地标记这些域名。我们提出了一个系统，语义网络钓鱼，它有效地监控这些域，并能够检测到许多网络钓鱼攻击，而不需要首先报告攻击。语义钓鱼可以在谷歌的“安全浏览”开始标记之前几天发现攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 APWG Symposium on Electronic Crime Research (eCrime)

自引率

0.00%

发文量