Automatic multilabel classification for Indonesian news articles

Dyah Rahmawati, M. L. Khodra
{"title":"Automatic multilabel classification for Indonesian news articles","authors":"Dyah Rahmawati, M. L. Khodra","doi":"10.1109/ICAICTA.2015.7335382","DOIUrl":null,"url":null,"abstract":"Problem transformation and algorithm adaptation are the two main approaches in machine learning to solve multilabel classification problem. The purpose of this paper is to investigate both approaches in multilabel classification for Indonesian news articles. Since this classification deals with a large number of features, we also employ some feature selection methods to reduce feature dimension. There are four factors as the focuses of this paper, i.e., feature weighting method, feature selection method, multilabel classification approach, and single-label classification algorithm. These factors will be combined to determine the best combination. The experiments show that the best performer for multilabel classification of Indonesian news articles is the combination of TF-IDF feature weighting method, Symmetrical Uncertainty feature selection method, Calibrated Label Ranking - which belongs to problem transformation approach -, and SVM algorithm. This best combination achieves F-measure of 85.13% in 10-fold cross-validation, but the F-measure decreases to 76.73% in testing because of OOV.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Problem transformation and algorithm adaptation are the two main approaches in machine learning to solve multilabel classification problem. The purpose of this paper is to investigate both approaches in multilabel classification for Indonesian news articles. Since this classification deals with a large number of features, we also employ some feature selection methods to reduce feature dimension. There are four factors as the focuses of this paper, i.e., feature weighting method, feature selection method, multilabel classification approach, and single-label classification algorithm. These factors will be combined to determine the best combination. The experiments show that the best performer for multilabel classification of Indonesian news articles is the combination of TF-IDF feature weighting method, Symmetrical Uncertainty feature selection method, Calibrated Label Ranking - which belongs to problem transformation approach -, and SVM algorithm. This best combination achieves F-measure of 85.13% in 10-fold cross-validation, but the F-measure decreases to 76.73% in testing because of OOV.
印尼新闻文章的自动多标签分类
问题变换和算法自适应是机器学习中解决多标签分类问题的两种主要方法。本文的目的是研究这两种方法在印尼新闻文章的多标签分类。由于这种分类处理大量的特征,我们还采用了一些特征选择方法来降低特征维数。本文重点研究了特征加权方法、特征选择方法、多标签分类方法和单标签分类算法。这些因素将综合起来确定最佳组合。实验表明,结合TF-IDF特征加权法、对称不确定性特征选择法、校准标签排序法(属于问题变换方法)和支持向量机算法对印尼语新闻文章进行多标签分类效果最好。该最佳组合在10倍交叉验证中F-measure达到85.13%,但在检验中由于OOV的影响,F-measure下降到76.73%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信