基于SVM的网页分类器特征选择

2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI) Pub Date : 2017-11-01 DOI:10.1109/ISCMI.2017.8279603

Nhamo Mtetwa, M. Yousefi, V. Reddy

{"title":"基于SVM的网页分类器特征选择","authors":"Nhamo Mtetwa, M. Yousefi, V. Reddy","doi":"10.1109/ISCMI.2017.8279603","DOIUrl":null,"url":null,"abstract":"Machine-learning techniques are a handy tool for deriving insights from data extracted from the web. Because of the structure of web data extracted by web crawlers there is need for preprocessing the data to extract features that can be used to train a machine learning classifier. The number of available features that can be linked to a website is huge. Narrowing down to a minimum number of features required to drive a classifier has huge benefits. This paper presents a workflow that uses a set of metrics that can be used to reduce the numbers of features for training a support vector machine (SVM) for classifying webpages as fraudulent or not. The paper reports that a three quarter reduction in feature set size only incurs a 5% reduction in classification accuracy which has huge computational benefits.","PeriodicalId":119111,"journal":{"name":"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Feature selection for an SVM based webpage classifier\",\"authors\":\"Nhamo Mtetwa, M. Yousefi, V. Reddy\",\"doi\":\"10.1109/ISCMI.2017.8279603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine-learning techniques are a handy tool for deriving insights from data extracted from the web. Because of the structure of web data extracted by web crawlers there is need for preprocessing the data to extract features that can be used to train a machine learning classifier. The number of available features that can be linked to a website is huge. Narrowing down to a minimum number of features required to drive a classifier has huge benefits. This paper presents a workflow that uses a set of metrics that can be used to reduce the numbers of features for training a support vector machine (SVM) for classifying webpages as fraudulent or not. The paper reports that a three quarter reduction in feature set size only incurs a 5% reduction in classification accuracy which has huge computational benefits.\",\"PeriodicalId\":119111,\"journal\":{\"name\":\"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCMI.2017.8279603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI.2017.8279603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

机器学习技术是一种方便的工具，可以从从网络中提取的数据中获得见解。由于网络爬虫提取的网络数据的结构，需要对数据进行预处理，以提取可用于训练机器学习分类器的特征。可以链接到网站的可用功能的数量是巨大的。缩小到驱动分类器所需的最小特征数量有巨大的好处。本文提出了一个工作流，该工作流使用一组指标来减少训练支持向量机(SVM)的特征数量，以对网页进行欺诈或非欺诈分类。本文报告说，特征集大小减少四分之三只会导致分类精度降低5%，这具有巨大的计算效益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature selection for an SVM based webpage classifier

Machine-learning techniques are a handy tool for deriving insights from data extracted from the web. Because of the structure of web data extracted by web crawlers there is need for preprocessing the data to extract features that can be used to train a machine learning classifier. The number of available features that can be linked to a website is huge. Narrowing down to a minimum number of features required to drive a classifier has huge benefits. This paper presents a workflow that uses a set of metrics that can be used to reduce the numbers of features for training a support vector machine (SVM) for classifying webpages as fraudulent or not. The paper reports that a three quarter reduction in feature set size only incurs a 5% reduction in classification accuracy which has huge computational benefits.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)

自引率

0.00%

发文量