Lost in Translation: AI-based Generator of Cross-Language Sound-squatting

R. Valentim, I. Drago, M. Mellia, Federico Cerutti
{"title":"Lost in Translation: AI-based Generator of Cross-Language Sound-squatting","authors":"R. Valentim, I. Drago, M. Mellia, Federico Cerutti","doi":"10.1109/EuroSPW59978.2023.00063","DOIUrl":null,"url":null,"abstract":"Sound-squatting is a phishing attack that tricks users into accessing malicious resources by exploiting similarities in the pronunciation of words. It is an understudied threat that gains traction with the popularity of smart-speakers and the resurgence of content consumption exclusively via audio, such as podcasts. Defending against sound-squatting is complex, and existing solutions rely on manually curated lists of homophones, which limits the search to a few (and mostly existing) words only. We introduce Sound-squatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defense that covers over 80% of exact homophones and further generating thousands of high-quality approximated homophones. Sound-squatter relies on a state-of-art Transformer Network to learn transliteration. We search for Sound-squatter generated cross-language sound-squatting domains over hundreds of millions of emitted TLS certificates comparing with other types of squatting candidates. Our finding reveals that around 6% of generated sound-squatting candidates have emitted TLS certificates, compared to 8% of other types of squatting candidates. We believe Sound-squatter uncovers the usage of multilingual sound-squatting phenomenon on the Internet and it is a crucial asset for proactive protection against sound-squatting.","PeriodicalId":220415,"journal":{"name":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EuroSPW59978.2023.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sound-squatting is a phishing attack that tricks users into accessing malicious resources by exploiting similarities in the pronunciation of words. It is an understudied threat that gains traction with the popularity of smart-speakers and the resurgence of content consumption exclusively via audio, such as podcasts. Defending against sound-squatting is complex, and existing solutions rely on manually curated lists of homophones, which limits the search to a few (and mostly existing) words only. We introduce Sound-squatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defense that covers over 80% of exact homophones and further generating thousands of high-quality approximated homophones. Sound-squatter relies on a state-of-art Transformer Network to learn transliteration. We search for Sound-squatter generated cross-language sound-squatting domains over hundreds of millions of emitted TLS certificates comparing with other types of squatting candidates. Our finding reveals that around 6% of generated sound-squatting candidates have emitted TLS certificates, compared to 8% of other types of squatting candidates. We believe Sound-squatter uncovers the usage of multilingual sound-squatting phenomenon on the Internet and it is a crucial asset for proactive protection against sound-squatting.
迷失在翻译中:基于ai的跨语言蹲音生成器
Sound-squatting是一种网络钓鱼攻击,通过利用单词发音的相似性来诱骗用户访问恶意资源。随着智能音箱的普及和专门通过音频(如播客)进行的内容消费的复苏,这是一个尚未得到充分研究的威胁。防止“抢音”是很复杂的,现有的解决方案依赖于人工管理的同音异义词列表,这将搜索限制在少数(而且大多数是现有的)单词上。我们介绍了一种基于多语言人工智能的系统,该系统可以生成用于主动防御的候选音蹲,覆盖80%以上的精确同音异义字,并进一步生成数千个高质量的近似同音异义字。声音霸占者依靠最先进的变压器网络来学习音译。我们在数以亿计的发出的TLS证书中,与其他类型的抢注候选证书进行比较,搜索由抢注生成的跨语言抢注域名。我们的研究结果显示,大约6%的生成的语音抢注候选对象发出了TLS证书,而其他类型的抢注候选对象的这一比例为8%。我们认为,“拼音”揭示了互联网上多语言拼音现象的使用,这是主动防止拼音的重要资产。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信