A. Muhaimin, Tresna Maulana Fahrudin, Trimono, P. Riyantoko, K. M. Hindrayani
{"title":"Metric Comparison For Text Classification","authors":"A. Muhaimin, Tresna Maulana Fahrudin, Trimono, P. Riyantoko, K. M. Hindrayani","doi":"10.33005/ijdasea.v2i1.34","DOIUrl":null,"url":null,"abstract":"Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar evaluation scores. Another finding is that Random Forest is the best method among others, also the best metric for text classification is Term Frequencies – Inverse Document Frequencies.","PeriodicalId":220622,"journal":{"name":"Internasional Journal of Data Science, Engineering, and Anaylitics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internasional Journal of Data Science, Engineering, and Anaylitics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33005/ijdasea.v2i1.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text classifications have been popular in recent years. To classify the text, the first step that needs to be done is to convert the text into some value. Some values that can be used, such as Term Frequencies, Inverse Document Frequencies, Term Frequencies – Inverse Document Frequencies, and Frequency of the word itself. This study aims to get which metric value is best in text classification. The method used is Naïve Bayes, Logistic Regression, and Random Forest. The evaluation score that is used is accuracy and Area Under Curve value. It comes out that some metric values produce similar evaluation scores. Another finding is that Random Forest is the best method among others, also the best metric for text classification is Term Frequencies – Inverse Document Frequencies.