Vishing: Detecting social engineering in spoken communication — A first survey & urgent roadmap to address an emerging societal challenge

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-04-15 DOI:10.1016/j.csl.2025.101802

Andreas Triantafyllopoulos , Anika A. Spiesberger , Iosif Tsangko , Xin Jing , Verena Distler , Felix Dietz , Florian Alt , Björn W. Schuller

{"title":"Vishing: Detecting social engineering in spoken communication — A first survey & urgent roadmap to address an emerging societal challenge","authors":"Andreas Triantafyllopoulos , Anika A. Spiesberger , Iosif Tsangko , Xin Jing , Verena Distler , Felix Dietz , Florian Alt , Björn W. Schuller","doi":"10.1016/j.csl.2025.101802","DOIUrl":null,"url":null,"abstract":"<div><div>Vishing – the use of voice calls for phishing – is a form of Social Engineering (SE) attacks. The latter have become a pervasive challenge in modern societies, with over 300,000 yearly victims in the US alone. An increasing number of those attacks is conducted via voice communication, be it through machine-generated ‘robocalls’ or human actors. The goals of ‘social engineers’ can be manifold, from outright fraud to more subtle forms of persuasion. Accordingly, social engineers adopt multi-faceted strategies for voice-based attacks, utilising a variety of ‘tricks’ to exert influence and achieve their goals. Importantly, while organisations have set in place a series of guardrails against other types of SE attacks, voice calls still remain ‘open ground’ for potential bad actors. In the present contribution, we provide an overview of the existing speech technology subfields that need to coalesce into a protective net against one of the major challenges to societies worldwide. Given the dearth of speech science and technology works targeting this issue, we have opted for a narrative review that bridges the gap between the existing psychological literature on the topic and research that has been pursued in parallel by the speech community on some of the constituent constructs. Our review reveals that very little literature exists on addressing this very important topic from a speech technology perspective, an omission further exacerbated by the lack of available data. Thus, our main goal is to highlight this gap and sketch out a roadmap to mitigate it, beginning with the psychological underpinnings of vishing, which primarily include deception and persuasion strategies, continuing with the speech-based approaches that can be used to detect those, as well as the generation and detection of AI-based vishing attempts, and close with a discussion of ethical and legal considerations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101802"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000270","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Vishing – the use of voice calls for phishing – is a form of Social Engineering (SE) attacks. The latter have become a pervasive challenge in modern societies, with over 300,000 yearly victims in the US alone. An increasing number of those attacks is conducted via voice communication, be it through machine-generated ‘robocalls’ or human actors. The goals of ‘social engineers’ can be manifold, from outright fraud to more subtle forms of persuasion. Accordingly, social engineers adopt multi-faceted strategies for voice-based attacks, utilising a variety of ‘tricks’ to exert influence and achieve their goals. Importantly, while organisations have set in place a series of guardrails against other types of SE attacks, voice calls still remain ‘open ground’ for potential bad actors. In the present contribution, we provide an overview of the existing speech technology subfields that need to coalesce into a protective net against one of the major challenges to societies worldwide. Given the dearth of speech science and technology works targeting this issue, we have opted for a narrative review that bridges the gap between the existing psychological literature on the topic and research that has been pursued in parallel by the speech community on some of the constituent constructs. Our review reveals that very little literature exists on addressing this very important topic from a speech technology perspective, an omission further exacerbated by the lack of available data. Thus, our main goal is to highlight this gap and sketch out a roadmap to mitigate it, beginning with the psychological underpinnings of vishing, which primarily include deception and persuasion strategies, continuing with the speech-based approaches that can be used to detect those, as well as the generation and detection of AI-based vishing attempts, and close with a discussion of ethical and legal considerations.

查看原文本刊更多论文

维辛：在口语交流中发现社会工程——第一次调查和解决新出现的社会挑战的紧急路线图

网络钓鱼（利用语音电话进行网络钓鱼）是社会工程（SE）攻击的一种形式。后者已成为现代社会普遍面临的挑战，仅在美国每年就有超过 30 万名受害者。越来越多的攻击是通过语音通信进行的，无论是通过机器生成的 "机器人电话"，还是通过人类行为者。社交工程师 "的目标可以是多方面的，既可以是赤裸裸的欺诈，也可以是更隐蔽的说服。因此，社交工程师会采取多方面的策略进行语音攻击，利用各种 "伎俩 "施加影响并实现其目标。重要的是，虽然企业已经针对其他类型的社会工程攻击设置了一系列防范措施，但语音通话仍然是潜在坏人的 "禁区"。在本文中，我们概述了现有的语音技术子领域，这些领域需要凝聚成一个防护网，以应对全球社会面临的主要挑战之一。鉴于针对这一问题的语音科学和技术著作十分匮乏，我们选择了叙事性综述，以弥合有关该主题的现有心理学文献与语音界同时进行的有关某些构成要素的研究之间的差距。我们的综述显示，从语音技术角度探讨这一重要课题的文献很少，而现有数据的缺乏进一步加剧了这一疏漏。因此，我们的主要目标是强调这一空白，并勾勒出一个路线图来缩小这一空白。我们将从网络钓鱼的心理基础（主要包括欺骗和说服策略）入手，继续探讨可用于检测网络钓鱼的基于语音的方法，以及基于人工智能的网络钓鱼尝试的生成和检测，最后讨论伦理和法律方面的考虑因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.