Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang
{"title":"ScamGen:通过先进的模板增强语料生成技术揭示电信诈骗的心理模式","authors":"Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang","doi":"10.1016/j.chb.2024.108451","DOIUrl":null,"url":null,"abstract":"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>","PeriodicalId":48471,"journal":{"name":"Computers in Human Behavior","volume":null,"pages":null},"PeriodicalIF":9.0000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation\",\"authors\":\"Xu Han , Qiang Li , Yaling Qi , Hongbo Cao , Witold Pedrycz , Wei Wang\",\"doi\":\"10.1016/j.chb.2024.108451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce <span>ScamGen</span>, a template-based data augmentation technique designed to enhance Chinese telephone scam data. <span>ScamGen</span> leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that <span>ScamGen</span> outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.</div></div>\",\"PeriodicalId\":48471,\"journal\":{\"name\":\"Computers in Human Behavior\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2024-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0747563224003194\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747563224003194","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation
Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce ScamGen, a template-based data augmentation technique designed to enhance Chinese telephone scam data. ScamGen leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that ScamGen outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems.
期刊介绍:
Computers in Human Behavior is a scholarly journal that explores the psychological aspects of computer use. It covers original theoretical works, research reports, literature reviews, and software and book reviews. The journal examines both the use of computers in psychology, psychiatry, and related fields, and the psychological impact of computer use on individuals, groups, and society. Articles discuss topics such as professional practice, training, research, human development, learning, cognition, personality, and social interactions. It focuses on human interactions with computers, considering the computer as a medium through which human behaviors are shaped and expressed. Professionals interested in the psychological aspects of computer use will find this journal valuable, even with limited knowledge of computers.