Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild

Proceedings of the Internet Measurement Conference 2018 Pub Date : 2018-10-31 DOI:10.1145/3278532.3278569

K. Tian, Steve T. K. Jan, Hang Hu, D. Yao, G. Wang

{"title":"Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild","authors":"K. Tian, Steve T. K. Jan, Hang Hu, D. Yao, G. Wang","doi":"10.1145/3278532.3278569","DOIUrl":null,"url":null,"abstract":"Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"104","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Internet Measurement Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3278532.3278569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 104

Abstract

Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.

查看原文本刊更多论文

大海捞针:在野外追踪精英网络钓鱼域名

当今的网络钓鱼网站不断发展，欺骗用户，逃避检测。在本文中，我们对蹲式钓鱼域名进行了测量研究，其中网站不仅在页面内容级别而且在web域名级别冒充可信实体。为了搜索抢注网络钓鱼页面，我们扫描了五种类型的抢注域名，超过2.24亿个DNS记录，并确定了657K个可能冒充702个流行品牌的域名。然后，我们构建了一种新的机器学习分类器来检测来自网页和移动页面的钓鱼页面。一个关键的新颖之处在于，我们的分类器是建立在对网络钓鱼页面规避行为的仔细测量之上的。我们引入了视觉分析和光学字符识别(OCR)的新特性来克服攻击者对内容的严重混淆。我们总共发现并验证了1175个钓鱼页面。我们表明，这些网络钓鱼页面用于各种有针对性的诈骗，并且非常有效地逃避检测。超过90%的人成功地躲过了流行黑名单至少一个月。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Internet Measurement Conference 2018

自引率

0.00%

发文量