Pornpon Thamrongrat, L. Preechaveerakul, W. Wettayaprasit
{"title":"A novel Voting Algorithm of multi-class SVM for web page classification","authors":"Pornpon Thamrongrat, L. Preechaveerakul, W. Wettayaprasit","doi":"10.1109/ICCSIT.2009.5234603","DOIUrl":null,"url":null,"abstract":"The increasing numbers of web pages on the cyber world result to the less effectiveness of document retrieval that matches the need of users. The classification of web pages is one of the solutions to solve this problem. This paper proposes VAMSVM_WPC model which is a novel voting algorithm for classifying the web pages, which uses a multi-class SVM method. First, feature is generated from text and title, and then reduces the number of features by two feature selection techniques. Use these two types of features to give input to multi-class SVM. Finally, on the output of SVM, a voting algorithm is used to determine the category of the web pages. Results on CMU benchmark dataset show that using text and title feature with 1vsAll_Voting Algorithm gives the highest F-measure value.","PeriodicalId":342396,"journal":{"name":"2009 2nd IEEE International Conference on Computer Science and Information Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 2nd IEEE International Conference on Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSIT.2009.5234603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The increasing numbers of web pages on the cyber world result to the less effectiveness of document retrieval that matches the need of users. The classification of web pages is one of the solutions to solve this problem. This paper proposes VAMSVM_WPC model which is a novel voting algorithm for classifying the web pages, which uses a multi-class SVM method. First, feature is generated from text and title, and then reduces the number of features by two feature selection techniques. Use these two types of features to give input to multi-class SVM. Finally, on the output of SVM, a voting algorithm is used to determine the category of the web pages. Results on CMU benchmark dataset show that using text and title feature with 1vsAll_Voting Algorithm gives the highest F-measure value.