基于异构迁移学习的网络钓鱼网页检测

2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC) Pub Date : 2017-10-01 DOI:10.1109/CIC.2017.00034

Karl R. Weiss, T. Khoshgoftaar

{"title":"基于异构迁移学习的网络钓鱼网页检测","authors":"Karl R. Weiss, T. Khoshgoftaar","doi":"10.1109/CIC.2017.00034","DOIUrl":null,"url":null,"abstract":"The detection of phishing websites using traditional machine learning methods has been demonstrated in previous studies. Traditional machine learning methods assume that the input feature space is the same between the training and testing data. There are scenarios in machine learning, where the available labeled training data has a different input feature space than the testing data. In cases where the input feature space between the testing and training data are different, traditional machine learning methods cannot be used. Heterogeneous transfer learning methods are used to transform the different input feature spaces between the testing and the training data into a unique and common set of input features. For our experiment, we construct numerous scenarios for the application of phishing website detection, where the features of the testing and training data are different. Our experiment starts with a baseline dataset for the detection of phishing websites. This baseline dataset is used to create separate training and testing datasets by splitting the features, such that the features in the training and testing data are mutually exclusive. Then, a heterogeneous transfer learning technique called Canonical Correlation Analysis is used to align the input feature space between the training and testing data. The feature aligned training and testing data is used with various traditional machine learning methods and homogeneous transfer learning methods to predict phishing websites. The performance results of the different scenarios and algorithms are reported and analyzed.","PeriodicalId":156843,"journal":{"name":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","volume":"789 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Detection of Phishing Webpages Using Heterogeneous Transfer Learning\",\"authors\":\"Karl R. Weiss, T. Khoshgoftaar\",\"doi\":\"10.1109/CIC.2017.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection of phishing websites using traditional machine learning methods has been demonstrated in previous studies. Traditional machine learning methods assume that the input feature space is the same between the training and testing data. There are scenarios in machine learning, where the available labeled training data has a different input feature space than the testing data. In cases where the input feature space between the testing and training data are different, traditional machine learning methods cannot be used. Heterogeneous transfer learning methods are used to transform the different input feature spaces between the testing and the training data into a unique and common set of input features. For our experiment, we construct numerous scenarios for the application of phishing website detection, where the features of the testing and training data are different. Our experiment starts with a baseline dataset for the detection of phishing websites. This baseline dataset is used to create separate training and testing datasets by splitting the features, such that the features in the training and testing data are mutually exclusive. Then, a heterogeneous transfer learning technique called Canonical Correlation Analysis is used to align the input feature space between the training and testing data. The feature aligned training and testing data is used with various traditional machine learning methods and homogeneous transfer learning methods to predict phishing websites. The performance results of the different scenarios and algorithms are reported and analyzed.\",\"PeriodicalId\":156843,\"journal\":{\"name\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"volume\":\"789 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIC.2017.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC.2017.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

使用传统的机器学习方法检测钓鱼网站已经在以前的研究中得到了证明。传统的机器学习方法假设训练数据和测试数据之间的输入特征空间是相同的。在机器学习中有一些场景，可用的标记训练数据与测试数据具有不同的输入特征空间。在测试数据和训练数据的输入特征空间不同的情况下，不能使用传统的机器学习方法。采用异构迁移学习方法，将测试数据和训练数据之间不同的输入特征空间转化为一组唯一的、通用的输入特征。在我们的实验中，我们构建了许多应用钓鱼网站检测的场景，其中测试数据和训练数据的特征是不同的。我们的实验从检测钓鱼网站的基线数据集开始。该基线数据集用于通过拆分特征来创建单独的训练和测试数据集，这样训练和测试数据中的特征是相互排斥的。然后，使用一种称为典型相关分析的异构迁移学习技术来对齐训练数据和测试数据之间的输入特征空间。将特征对齐的训练和测试数据与各种传统机器学习方法和同构迁移学习方法一起用于预测钓鱼网站。报告并分析了不同场景和算法的性能结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection of Phishing Webpages Using Heterogeneous Transfer Learning

The detection of phishing websites using traditional machine learning methods has been demonstrated in previous studies. Traditional machine learning methods assume that the input feature space is the same between the training and testing data. There are scenarios in machine learning, where the available labeled training data has a different input feature space than the testing data. In cases where the input feature space between the testing and training data are different, traditional machine learning methods cannot be used. Heterogeneous transfer learning methods are used to transform the different input feature spaces between the testing and the training data into a unique and common set of input features. For our experiment, we construct numerous scenarios for the application of phishing website detection, where the features of the testing and training data are different. Our experiment starts with a baseline dataset for the detection of phishing websites. This baseline dataset is used to create separate training and testing datasets by splitting the features, such that the features in the training and testing data are mutually exclusive. Then, a heterogeneous transfer learning technique called Canonical Correlation Analysis is used to align the input feature space between the training and testing data. The feature aligned training and testing data is used with various traditional machine learning methods and homogeneous transfer learning methods to predict phishing websites. The performance results of the different scenarios and algorithms are reported and analyzed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)

自引率

0.00%

发文量