一种检测网络钓鱼链接的算法

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9979860

Sea Ran Cleon Liew, N. F. Law

{"title":"一种检测网络钓鱼链接的算法","authors":"Sea Ran Cleon Liew, N. F. Law","doi":"10.23919/APSIPAASC55919.2022.9979860","DOIUrl":null,"url":null,"abstract":"This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"BEAM - An Algorithm for Detecting Phishing Link\",\"authors\":\"Sea Ran Cleon Liew, N. F. Law\",\"doi\":\"10.23919/APSIPAASC55919.2022.9979860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9979860\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文的目的是开发一个基于注意的网络钓鱼检测器，通过执行子词标记和自调整双向编码器表示从变压器(BERT)模型。它被称为BERT嵌入注意模型(BEAM)。我们提出的BEAM方法包含五个构建块:根据URL结构提取组件的数据预处理块，将单个URL组件标记为多个子词的标记化块，生成数字序列表示的嵌入块，给出上下文特征向量的编码块和用于网络钓鱼URL检测的分类块。子词标记化允许我们描述连接子词之间的关系，而BERT中的注意机制允许所提出的模型选择性地关注导致网络钓鱼行为的重要部分。我们将我们提出的BEAM方法与其他现有的最先进的网络钓鱼检测方法(如CNN、Bi-LSTM和机器学习模型(随机森林和XGBoost))进行了比较。实验结果表明，本文提出的BEAM方法能够有效地检测网络钓鱼链接，并且优于现有的其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BEAM - An Algorithm for Detecting Phishing Link

This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量