ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation

IF 9 1区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL
Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang
{"title":"ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation","authors":"Xu Han ,&nbsp;Qiang Li ,&nbsp;Yaling Qi ,&nbsp;Hongbo Cao ,&nbsp;Witold Pedrycz ,&nbsp;Wei Wang","doi":"10.1016/j.chb.2024.108451","DOIUrl":null,"url":null,"abstract":"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>","PeriodicalId":48471,"journal":{"name":"Computers in Human Behavior","volume":null,"pages":null},"PeriodicalIF":9.0000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747563224003194","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce ScamGen, a template-based data augmentation technique designed to enhance Chinese telephone scam data. ScamGen leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that ScamGen outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.
ScamGen:通过先进的模板增强语料生成技术揭示电信诈骗的心理模式
电话诈骗具有深刻的心理影响,往往迫使受害者做出草率而严重的决定。由于电话互动的私密性,全面的数据集非常稀缺,因此研究这些骗局具有挑战性。在本文中,我们介绍了 ScamGen,这是一种基于模板的数据增强技术,旨在增强中国的电话诈骗数据。ScamGen 利用心理学的洞察力生成多样化的真实诈骗场景,重点关注诈骗者和受害者之间的心理动态。这种新颖的方法将心理学理论与数据增强相结合,与传统方法不同,它强调骗子与受害者之间的互动。我们的方法以多源数据收集框架为起点,编制了一个电话诈骗样本的初始种子数据集。利用句子和单词级别的扰动,我们扩展了种子数据,创建了一个涵盖各种诈骗场景的全面、多样的数据集。严格的评估表明,ScamGen 在生成高质量、多样化数据集方面优于大型语言模型。此外,我们还开发了五个深度学习模型,用于该数据集的意图检测,其中 BERT 的精度最高,达到 86.68%。该数据集将公开发布,它标志着我们在了解骗子伎俩和改进电话诈骗检测系统方面迈出了重要一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.10
自引率
4.00%
发文量
381
审稿时长
40 days
期刊介绍: Computers in Human Behavior is a scholarly journal that explores the psychological aspects of computer use. It covers original theoretical works, research reports, literature reviews, and software and book reviews. The journal examines both the use of computers in psychology, psychiatry, and related fields, and the psychological impact of computer use on individuals, groups, and society. Articles discuss topics such as professional practice, training, research, human development, learning, cognition, personality, and social interactions. It focuses on human interactions with computers, considering the computer as a medium through which human behaviors are shaped and expressed. Professionals interested in the psychological aspects of computer use will find this journal valuable, even with limited knowledge of computers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信