基于机器学习算法的英语新闻分类比较研究

N. Disayiram, R. Rupasingha
{"title":"基于机器学习算法的英语新闻分类比较研究","authors":"N. Disayiram, R. Rupasingha","doi":"10.1109/TEECCON54414.2022.9854832","DOIUrl":null,"url":null,"abstract":"Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.","PeriodicalId":251455,"journal":{"name":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms\",\"authors\":\"N. Disayiram, R. Rupasingha\",\"doi\":\"10.1109/TEECCON54414.2022.9854832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.\",\"PeriodicalId\":251455,\"journal\":{\"name\":\"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)\",\"volume\":\"304 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TEECCON54414.2022.9854832\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEECCON54414.2022.9854832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

新闻主要是帮助人们识别周围发生的事情。新闻作为一个社交空间也是必不可少的,因此,无论是在线还是实体的日报都强调新闻。现在大多数人都使用新闻网站。每个新闻阅读器都有自己喜欢的新闻类别,并不是所有的阅读器都能阅读所有类别的新闻。这是因人而异的。因此,新闻的范畴是非常重要的。从一个新闻网站获取所有的新闻是非常困难的。有时它可能没有搜索对象的类别。为了克服这些问题,使用分类技术对新闻文章进行分类。这种英语新闻分类方法旨在从各种在线新闻网站中提取新闻文章,并根据其领域类别(如商业、健康、政治、技术和体育)对其进行分类。采用术语频率-逆文档频率(TF-IDF)技术将数据集转换为机器可训练的格式。将不同的分类算法应用于训练和测试数据集,创建模型,然后根据相关类别对新闻文章进行分类。采用决策树、随机森林、Naïve贝叶斯、多层感知器(MLP)和支持向量机(SVM)五种分类算法。其中,SVM表现出较好的效果。当使用集成学习方法将这五种算法组合在一起时,根据分类预测新闻文章的准确率为92.75%,错误率最低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms
Mainly news helps people to identify, what happened around them. News is imperative as a social gathering space as well, subsequently, daily papers either online or physical put an accentuation on news. Most people use news websites nowadays. Every newsreader has a preferable news category and not all readers are read all the categories of news. It is different from person to person. Therefore, the category of news is very important. It is very hard to get all the news from one news website. Sometimes it may not have the category who is searching for. To overcome these issues classification techniques are used to classify the news articles. This English news classification method aims to extract news articles from a variety of online news sites and classify them based on their domain categories such as business, health, politics, technology, and sports. The datasets are converted into machine trainable format by using Term Frequency-Inverse Document Frequency (TF-IDF) techniques. The different classification algorithms are applied to the training and testing dataset to create a model and then classify the news articles based on their relevant categories. The five individual classifications algorithms such as Decision Tree, Random Forest, Naïve Bayes, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are applied. Among them, SVM shows better results. When combining these five algorithms using the Ensemble Learning method, it shows a better accuracy result 92.75% with the lowest error rate than individual algorithms to predict the news articles based on their category.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信