{"title":"LMs go Phishing: Adapting Pre-trained Language Models to Detect Phishing Emails","authors":"Kanishka Misra, J. Rayz","doi":"10.1109/WI-IAT55865.2022.00028","DOIUrl":null,"url":null,"abstract":"Despite decades of research, the problem of Phishing in everyday email communication is ever so prevalent. Traditionally viewed as a text-classification task, the task of phishing detection is an active defense against phishing attempts. Mean-while, progress in natural language processing has established the universal usefulness of adapting pre-trained language models to perform downstream tasks, in a paradigm known as pre-train-then-fine-tune. In this work, we build on this paradigm, and propose two language models that are adapted on 725k emails containing phishing and legitimate messages. We use these two models in two ways: 1) by performing classification-based fine-tuning, and 2) by developing a simple priming-based approach. Our approaches achieve empirical gains over a good deal of prior work, achieving near perfect performance on in-domain data, and relative improvements on out-of-domain emails.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Despite decades of research, the problem of Phishing in everyday email communication is ever so prevalent. Traditionally viewed as a text-classification task, the task of phishing detection is an active defense against phishing attempts. Mean-while, progress in natural language processing has established the universal usefulness of adapting pre-trained language models to perform downstream tasks, in a paradigm known as pre-train-then-fine-tune. In this work, we build on this paradigm, and propose two language models that are adapted on 725k emails containing phishing and legitimate messages. We use these two models in two ways: 1) by performing classification-based fine-tuning, and 2) by developing a simple priming-based approach. Our approaches achieve empirical gains over a good deal of prior work, achieving near perfect performance on in-domain data, and relative improvements on out-of-domain emails.