Kevin Djajadinata, Hussein Faisol, G. F. Shidik, Muljono, A. Z. Fanani
{"title":"Evaluation of Feature Extraction for Indonesian News Classification","authors":"Kevin Djajadinata, Hussein Faisol, G. F. Shidik, Muljono, A. Z. Fanani","doi":"10.1109/iSemantic50169.2020.9234252","DOIUrl":null,"url":null,"abstract":"News is information about knowledge or event that occurs within a certain period. In the text news, there are several categories can be classified. This research proposes an evaluation of feature extraction to classify Indonesian language news. The dataset are from www.cnnindonesia.com (May 2018 - July 2018) with 4 categories and has a total of 3677 data and www.liputan6.com with 4 categories and has a total of 3415 data. All existing data will be processed to structured form and then the feature is extracted with 8 feature extraction method (TF, TF-IDF, TF-RF, TF-Prob, TF-CHI, TF-IDF-ISCDF, TF-IGM, and RTF-IGM) combined with 6 classification algorithms (Gaussian Naïve Bayes, k-NN, Decision Tree, Neural Network, Logistic Regression, and Support Vector Machine). From this research can be concluded that the Gaussian Naïve Bayes algorithm with TF-Prob was able to obtain the best accuracy with 99.701% (CNN Indonesia) and 99.824% (Liputan6) from 5 fold cross-validation.","PeriodicalId":345558,"journal":{"name":"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSemantic50169.2020.9234252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
News is information about knowledge or event that occurs within a certain period. In the text news, there are several categories can be classified. This research proposes an evaluation of feature extraction to classify Indonesian language news. The dataset are from www.cnnindonesia.com (May 2018 - July 2018) with 4 categories and has a total of 3677 data and www.liputan6.com with 4 categories and has a total of 3415 data. All existing data will be processed to structured form and then the feature is extracted with 8 feature extraction method (TF, TF-IDF, TF-RF, TF-Prob, TF-CHI, TF-IDF-ISCDF, TF-IGM, and RTF-IGM) combined with 6 classification algorithms (Gaussian Naïve Bayes, k-NN, Decision Tree, Neural Network, Logistic Regression, and Support Vector Machine). From this research can be concluded that the Gaussian Naïve Bayes algorithm with TF-Prob was able to obtain the best accuracy with 99.701% (CNN Indonesia) and 99.824% (Liputan6) from 5 fold cross-validation.