基于加权特征线嵌入的钓鱼网站检测

ISC Int. J. Inf. Secur. Pub Date : 2017-07-31 DOI:10.22042/ISECURE.2017.83439.377

M. Imani, G. Montazer

{"title":"基于加权特征线嵌入的钓鱼网站检测","authors":"M. Imani, G. Montazer","doi":"10.22042/ISECURE.2017.83439.377","DOIUrl":null,"url":null,"abstract":"The aim of phishing is tracing the users’ s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. Moreover, among the available training samples, there are abnormal samples that cause classification error. For instance, it is possible that there are phishing samples with similar features to legitimate ones and vice versa. A supervised feature extraction method, called weighted feature line embedding, is proposed in this paper to solve these problems. The proposed method virtually generates training samples by utilizing the feature line metric. Hence, it can solve the small sample size problem. Moreover, by assigning appropriate weights to each pair of feature points, it corrects the undesirable quality of abnormal samples. The features extracted by our method improve the performance of phishing website detection specially by using small training","PeriodicalId":436674,"journal":{"name":"ISC Int. J. Inf. Secur.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Phishing website detection using weighted feature line embedding\",\"authors\":\"M. Imani, G. Montazer\",\"doi\":\"10.22042/ISECURE.2017.83439.377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of phishing is tracing the users’ s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. Moreover, among the available training samples, there are abnormal samples that cause classification error. For instance, it is possible that there are phishing samples with similar features to legitimate ones and vice versa. A supervised feature extraction method, called weighted feature line embedding, is proposed in this paper to solve these problems. The proposed method virtually generates training samples by utilizing the feature line metric. Hence, it can solve the small sample size problem. Moreover, by assigning appropriate weights to each pair of feature points, it corrects the undesirable quality of abnormal samples. The features extracted by our method improve the performance of phishing website detection specially by using small training\",\"PeriodicalId\":436674,\"journal\":{\"name\":\"ISC Int. J. Inf. Secur.\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISC Int. J. Inf. Secur.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22042/ISECURE.2017.83439.377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISC Int. J. Inf. Secur.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22042/ISECURE.2017.83439.377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

网络钓鱼的目的是在未经用户允许的情况下，通过设计一个模仿可信网站的新网站来追踪用户的私人信息。信息技术专家对网络钓鱼网站的鉴别特征没有统一的定义。因此，网络钓鱼检测问题中可靠的训练样本数量是有限的。此外，在可用的训练样本中，存在导致分类误差的异常样本。例如，有可能存在与合法样本具有相似特征的网络钓鱼样本，反之亦然。为了解决这些问题，本文提出了一种有监督的特征提取方法——加权特征线嵌入。该方法利用特征线度量虚拟生成训练样本。因此，它可以解决小样本量问题。此外，通过对每对特征点分配适当的权重，它纠正了异常样本的不良质量。该方法提取的特征通过小训练提高了网络钓鱼网站的检测性能

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users’ s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. Moreover, among the available training samples, there are abnormal samples that cause classification error. For instance, it is possible that there are phishing samples with similar features to legitimate ones and vice versa. A supervised feature extraction method, called weighted feature line embedding, is proposed in this paper to solve these problems. The proposed method virtually generates training samples by utilizing the feature line metric. Hence, it can solve the small sample size problem. Moreover, by assigning appropriate weights to each pair of feature points, it corrects the undesirable quality of abnormal samples. The features extracted by our method improve the performance of phishing website detection specially by using small training

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISC Int. J. Inf. Secur.

自引率

0.00%

发文量