How to Learn Klingon without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

Hao Yang, Xiulin Ma, Kun Du, Zhou Li, Haixin Duan, XiaoDong Su, Guang Liu, Zhifeng Geng, Jianping Wu
{"title":"How to Learn Klingon without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy","authors":"Hao Yang, Xiulin Ma, Kun Du, Zhou Li, Haixin Duan, XiaoDong Su, Guang Liu, Zhifeng Geng, Jianping Wu","doi":"10.1109/SP.2017.11","DOIUrl":null,"url":null,"abstract":"Online underground economy is an important channel that connects the merchants of illegal products and their buyers, which is also constantly monitored by legal authorities. As one common way for evasion, the merchants and buyers together create a vocabulary of jargons (called \"black keywords\" in this paper) to disguise the transaction (e.g., \"smack\" is one street name for \"heroin\" [1]). Black keywords are often \"unfriendly\" to the outsiders, which are created by either distorting the original meaning of common words or tweaking other black keywords. Understanding black keywords is of great importance to track and disrupt the underground economy, but it is also prohibitively difficult: the investigators have to infiltrate the inner circle of criminals to learn their meanings, a task both risky and time-consuming. In this paper, we make the first attempt towards capturing and understanding the ever-changing black keywords. We investigated the underground business promoted through blackhat SEO (search engine optimization) and demonstrate that the black keywords targeted by the SEOers can be discovered through a fully automated approach. Our insights are two-fold: first, the pages indexed under black keywords are more likely to contain malicious or fraudulent content (e.g., SEO pages) and alarmed by off-the-shelf detectors, second, people tend to query multiple similar black keywords to find the merchandise. Therefore, we could infer whether a search keyword is \"black\" by inspecting the associated search results and then use the related search queries to extend our findings. To this end, we built a system called KDES (Keywords Detection and Expansion System), and applied it to the search results of Baidu, China's top search engine. So far, we have already identified 478,879 black keywords which were clustered under 1,522 core words based on text similarity. We further extracted the information like emails, mobile phone numbers and instant messenger IDs from the pages and domains relevant to the underground business. Such information helps us gain better understanding about the underground economy of China in particular. In addition, our work could help search engine vendors purify the search results and disrupt the channel of the underground market. Our co-authors from Baidu compared our results with their blacklist, found many of them (e.g., long-tail and obfuscated keywords) were not in it, and then added them to Baidu's internal blacklist.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"14 1","pages":"751-769"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

Online underground economy is an important channel that connects the merchants of illegal products and their buyers, which is also constantly monitored by legal authorities. As one common way for evasion, the merchants and buyers together create a vocabulary of jargons (called "black keywords" in this paper) to disguise the transaction (e.g., "smack" is one street name for "heroin" [1]). Black keywords are often "unfriendly" to the outsiders, which are created by either distorting the original meaning of common words or tweaking other black keywords. Understanding black keywords is of great importance to track and disrupt the underground economy, but it is also prohibitively difficult: the investigators have to infiltrate the inner circle of criminals to learn their meanings, a task both risky and time-consuming. In this paper, we make the first attempt towards capturing and understanding the ever-changing black keywords. We investigated the underground business promoted through blackhat SEO (search engine optimization) and demonstrate that the black keywords targeted by the SEOers can be discovered through a fully automated approach. Our insights are two-fold: first, the pages indexed under black keywords are more likely to contain malicious or fraudulent content (e.g., SEO pages) and alarmed by off-the-shelf detectors, second, people tend to query multiple similar black keywords to find the merchandise. Therefore, we could infer whether a search keyword is "black" by inspecting the associated search results and then use the related search queries to extend our findings. To this end, we built a system called KDES (Keywords Detection and Expansion System), and applied it to the search results of Baidu, China's top search engine. So far, we have already identified 478,879 black keywords which were clustered under 1,522 core words based on text similarity. We further extracted the information like emails, mobile phone numbers and instant messenger IDs from the pages and domains relevant to the underground business. Such information helps us gain better understanding about the underground economy of China in particular. In addition, our work could help search engine vendors purify the search results and disrupt the channel of the underground market. Our co-authors from Baidu compared our results with their blacklist, found many of them (e.g., long-tail and obfuscated keywords) were not in it, and then added them to Baidu's internal blacklist.
如何在没有字典的情况下学习克林贡语:地下经济使用的黑色关键词的检测与测量
网络地下经济是连接非法产品商家和买家的重要渠道,也受到法律部门的持续监控。作为一种常见的逃避方式,商人和买家共同创造了一套行话词汇(本文称之为“黑关键词”)来掩盖交易(例如,“smack”是“海洛因”的一个街头名称[1])。黑关键词往往对外人“不友好”,要么是曲解常用词的原意,要么是对其他黑关键词进行了微调。了解黑色关键词对于追踪和破坏地下经济非常重要,但也非常困难:调查人员必须渗透到犯罪分子的核心圈子,才能了解它们的含义,这是一项既危险又耗时的任务。在本文中,我们首次尝试捕捉和理解不断变化的黑色关键词。我们调查了通过黑帽SEO(搜索引擎优化)推广的地下业务,并证明了SEOers针对的黑色关键字可以通过全自动方法发现。我们的发现是双重的:首先,在黑色关键词下索引的页面更有可能包含恶意或欺诈内容(例如,SEO页面),并且被现成的检测器警告,其次,人们倾向于查询多个类似的黑色关键词来找到商品。因此,我们可以通过检查相关的搜索结果来推断搜索关键字是否为“黑色”,然后使用相关的搜索查询来扩展我们的发现。为此,我们搭建了一个KDES (Keywords Detection and Expansion system)系统,并将其应用到中国顶级搜索引擎百度的搜索结果中。到目前为止,我们已经识别了478,879个黑色关键字,这些关键字基于文本相似度聚类在1,522个核心词下。我们进一步从与地下业务相关的页面和域名中提取电子邮件、手机号码和即时通讯id等信息。这些信息有助于我们更好地了解中国的地下经济。此外,我们的工作可以帮助搜索引擎供应商净化搜索结果,破坏地下市场的渠道。我们来自百度的合著者将我们的结果与他们的黑名单进行了比较,发现其中许多(如长尾和混淆关键字)不在黑名单中,然后将它们添加到百度的内部黑名单中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信