A Comparative Study of Different Text Classification Approaches for Bangla News Classification

Kamrus Salehin, M. Alam, Md. Ashifun Nabi, Fahim Ahmed, Faisal Bin Ashraf
{"title":"A Comparative Study of Different Text Classification Approaches for Bangla News Classification","authors":"Kamrus Salehin, M. Alam, Md. Ashifun Nabi, Fahim Ahmed, Faisal Bin Ashraf","doi":"10.1109/ICCIT54785.2021.9689843","DOIUrl":null,"url":null,"abstract":"At present, we have seen everything is getting digitized where technology almost takes full control over our life. As a result, a massive number of textual documents are generated on online platforms and news articles are no exception. People prefer to get connected with online news portals as they are updated every single hour. Newspaper articles have so many categories such as politics, sports, business, entertainment etc. Recently, we have noticed the rapid growth and increase of Bangla online news portals on the internet. It will be helpful for the online readers to get recommended the preferable news category which assists them in locating desired articles. Manually categorizing news articles takes huge time and effort. So, text categorization is necessary for the modern day, as enormous amounts of uncategorized data are an issue here. Although the study has improved in categorizing news articles greatly for languages such as English, Arabic, Chinese, Urdu, and Hindi. Among others, the Bangla language has shown little development. However, some approaches were applied to categorize Bangla news articles, using some machine learning algorithms where resources were minimum. We have applied five machine learning classifiers and two neural networks to categorize Bangla news articles where neural network LSTM performed best. To show the comparison between applied algorithms, which one is performing better, we have used four metrics that measure performance.","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

At present, we have seen everything is getting digitized where technology almost takes full control over our life. As a result, a massive number of textual documents are generated on online platforms and news articles are no exception. People prefer to get connected with online news portals as they are updated every single hour. Newspaper articles have so many categories such as politics, sports, business, entertainment etc. Recently, we have noticed the rapid growth and increase of Bangla online news portals on the internet. It will be helpful for the online readers to get recommended the preferable news category which assists them in locating desired articles. Manually categorizing news articles takes huge time and effort. So, text categorization is necessary for the modern day, as enormous amounts of uncategorized data are an issue here. Although the study has improved in categorizing news articles greatly for languages such as English, Arabic, Chinese, Urdu, and Hindi. Among others, the Bangla language has shown little development. However, some approaches were applied to categorize Bangla news articles, using some machine learning algorithms where resources were minimum. We have applied five machine learning classifiers and two neural networks to categorize Bangla news articles where neural network LSTM performed best. To show the comparison between applied algorithms, which one is performing better, we have used four metrics that measure performance.
孟加拉语新闻分类中不同文本分类方法的比较研究
目前,我们已经看到一切都在数字化,技术几乎完全控制了我们的生活。因此,网络平台上产生了大量的文本文档,新闻文章也不例外。人们更喜欢与在线新闻门户网站联系,因为它们每小时都会更新。报纸文章有很多分类,如政治、体育、商业、娱乐等。最近,我们注意到孟加拉在线新闻门户网站在互联网上的快速增长和增加。这将有助于在线读者获得推荐的优选新闻类别,这有助于他们找到所需的文章。手动对新闻文章进行分类需要花费大量的时间和精力。因此,文本分类对于现代来说是必要的,因为这里存在大量未分类的数据。尽管这项研究在英语、阿拉伯语、中文、乌尔都语和印地语等语言的新闻文章分类方面有了很大的改进。在其他语言中,孟加拉语几乎没有发展。然而,一些方法被应用于对孟加拉国新闻文章进行分类,使用一些机器学习算法,在资源最少的情况下。我们应用了5个机器学习分类器和2个神经网络对孟加拉语新闻文章进行分类,其中神经网络LSTM表现最好。为了显示应用算法之间的比较,哪一种性能更好,我们使用了四个度量性能的指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信