{"title":"Online News Extraction and Multiclass Classification Using Linear Support Vector Machines","authors":"Apoorva Gupta, Smriti Arora, Niyati Baliyan","doi":"10.1109/ISCMI56532.2022.10068460","DOIUrl":null,"url":null,"abstract":"Online news articles, blogs, sites are a rich source of diverse text-based data. However, the data contained in these sources cannot be manually extricated, recorded, and listed because it comes in colossal size. Accurate mapping of precise news into their corresponding category is challenging in these times. Several methods have been proposed over time for news classification when training documents for each predefined class are present readily, however such methods were tried and tested upon a small dataset. With the underlying research, the aim is to propose a method that can be used when lakhs and lakhs of instances are present. This research analysis involves the task of news classification using multiclass classifiers - OneVsRest and OneVsOne classifiers over the Linear Support Vector Classification to learn the performance of multiclass news categorization. The proposed methodology “Keyword Based Classification Technique (KBCT)” in this study was executed and concluded using Python and deployed using Google Colaboratory. The result was expressed using four distinguished news classes over a multivariate dataset of 422419 instances from the uci-news-aggregator dataset. The OneVsRestClassifier's accuracy was computed to be 95.76% that was 0.09% more than the OneVsOneClassifier's accuracy of 95.67%. The proposed prototype was compared with some of the related studies and algorithms, and the outcomes produced by the OneVsRest model were the most optimum in terms of accuracy.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI56532.2022.10068460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Online news articles, blogs, sites are a rich source of diverse text-based data. However, the data contained in these sources cannot be manually extricated, recorded, and listed because it comes in colossal size. Accurate mapping of precise news into their corresponding category is challenging in these times. Several methods have been proposed over time for news classification when training documents for each predefined class are present readily, however such methods were tried and tested upon a small dataset. With the underlying research, the aim is to propose a method that can be used when lakhs and lakhs of instances are present. This research analysis involves the task of news classification using multiclass classifiers - OneVsRest and OneVsOne classifiers over the Linear Support Vector Classification to learn the performance of multiclass news categorization. The proposed methodology “Keyword Based Classification Technique (KBCT)” in this study was executed and concluded using Python and deployed using Google Colaboratory. The result was expressed using four distinguished news classes over a multivariate dataset of 422419 instances from the uci-news-aggregator dataset. The OneVsRestClassifier's accuracy was computed to be 95.76% that was 0.09% more than the OneVsOneClassifier's accuracy of 95.67%. The proposed prototype was compared with some of the related studies and algorithms, and the outcomes produced by the OneVsRest model were the most optimum in terms of accuracy.