Automated News Classification using N-gram Model and Key Features of Nepali Language

Dinesh Dangol, Rupesh Dahi Shrestha, Arun K. Timalsina
{"title":"Automated News Classification using N-gram Model and Key Features of Nepali Language","authors":"Dinesh Dangol, Rupesh Dahi Shrestha, Arun K. Timalsina","doi":"10.3126/SCITECH.V13I1.23504","DOIUrl":null,"url":null,"abstract":"With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.","PeriodicalId":183221,"journal":{"name":"SCITECH Nepal","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SCITECH Nepal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3126/SCITECH.V13I1.23504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.
基于N-gram模型和尼泊尔语关键特征的自动新闻分类
随着新闻在网站上发布的趋势日益增加,文本的自动处理变得越来越重要。几十年来,文本自动分类一直是许多语言研究者关注的焦点。关于英语语言特征及其在自动文本处理中的应用,有大量的研究文献。本研究实现了尼泊尔语关键特征对尼泊尔语新闻的自动文本分类。特别是尼泊尔语特征的研究,由于尼泊尔语与英语有着极大的不同,因此研究尼泊尔语特征的影响更具挑战性,因为需要解决的复杂性更高。针对尼泊尔语,采用向量空间模型、n-gram模型和基于关键特征的处理方法进行尼泊尔语新闻自动分类的研究实验表明,与词袋模型相比,尼泊尔语新闻自动分类具有良好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信