News Topic Classification Using Mutual Information and Bayesian Network

Fahmi Salman Nurfikri, M. S. Mubarok, Adiwijaya
{"title":"News Topic Classification Using Mutual Information and Bayesian Network","authors":"Fahmi Salman Nurfikri, M. S. Mubarok, Adiwijaya","doi":"10.1109/ICOICT.2018.8528806","DOIUrl":null,"url":null,"abstract":"News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.","PeriodicalId":266335,"journal":{"name":"2018 6th International Conference on Information and Communication Technology (ICoICT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2018.8528806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.
基于互信息和贝叶斯网络的新闻主题分类
本研究中的新闻主题分类是根据新闻中包含的信息,以文本数据格式将新闻分类或区分为特定的类别。可用于此任务的方法之一是贝叶斯网络,它是一种不确定性推理方法,使用概率和有向无环图来建模变量之间的条件依赖关系。然而,文本数据通常包含相当数量的变量,这对于贝叶斯网络来说可能是一个问题,因为大量的变量导致贝叶斯网络在学习结构和参数方面的高复杂性,特别是时间复杂性。此外,大量的变量可能会降低准确性,因为有些变量可能是不相关的。在本研究中,我们采用互信息作为文本特征选择方法,为贝叶斯网络分类器提供相关特征。根据研究结果,互信息作为特征选择器能够提高贝叶斯网络的分类性能。采用互信息的分类率最高为75.34%,未采用互信息的分类率最高为45.95%,均为微平均f1分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信