A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms

2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON) Pub Date : 2022-05-26 DOI:10.1109/TEECCON54414.2022.9854832

N. Disayiram, R. Rupasingha

{"title":"A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms","authors":"N. Disayiram, R. Rupasingha","doi":"10.1109/TEECCON54414.2022.9854832","DOIUrl":null,"url":null,"abstract":"Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.","PeriodicalId":251455,"journal":{"name":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEECCON54414.2022.9854832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.

查看原文本刊更多论文

基于机器学习算法的英语新闻分类比较研究

新闻主要是帮助人们识别周围发生的事情。新闻作为一个社交空间也是必不可少的，因此，无论是在线还是实体的日报都强调新闻。现在大多数人都使用新闻网站。每个新闻阅读器都有自己喜欢的新闻类别，并不是所有的阅读器都能阅读所有类别的新闻。这是因人而异的。因此，新闻的范畴是非常重要的。从一个新闻网站获取所有的新闻是非常困难的。有时它可能没有搜索对象的类别。为了克服这些问题，使用分类技术对新闻文章进行分类。这种英语新闻分类方法旨在从各种在线新闻网站中提取新闻文章，并根据其领域类别(如商业、健康、政治、技术和体育)对其进行分类。采用术语频率-逆文档频率(TF-IDF)技术将数据集转换为机器可训练的格式。将不同的分类算法应用于训练和测试数据集，创建模型，然后根据相关类别对新闻文章进行分类。采用决策树、随机森林、Naïve贝叶斯、多层感知器(MLP)和支持向量机(SVM)五种分类算法。其中，SVM表现出较好的效果。当使用集成学习方法将这五种算法组合在一起时，根据分类预测新闻文章的准确率为92.75%，错误率最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)

自引率

0.00%

发文量