{"title":"A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms","authors":"N. Disayiram, R. Rupasingha","doi":"10.1109/TEECCON54414.2022.9854832","DOIUrl":null,"url":null,"abstract":"Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.","PeriodicalId":251455,"journal":{"name":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEECCON54414.2022.9854832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.