{"title":"A comparative approach for multiclass text analysis","authors":"Semuel Franko, I. B. Parlak","doi":"10.1109/ISDFS.2018.8355325","DOIUrl":null,"url":null,"abstract":"This paper presents multiclass text analysis for the classification problem in Spanish documents. Even if Spanish language is considered as one the most spoken language, text classification problem has not yet been carried out for different problems in multiclass analysis. Two different approaches; Naive Bayes and Maximum Entropy were used as machine learning techniques. The corpus was created with 10 different categories. Smoothing parameters and three different document models were integrated to the study. During the comparative analysis, optimal parameters were determined using their sensitivity on the accuracy, the precision and the recall. Consequently, Maximum Entropy was found as the best technique even if both techniques were relevant in multiclass classification.","PeriodicalId":154279,"journal":{"name":"2018 6th International Symposium on Digital Forensic and Security (ISDFS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Symposium on Digital Forensic and Security (ISDFS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDFS.2018.8355325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper presents multiclass text analysis for the classification problem in Spanish documents. Even if Spanish language is considered as one the most spoken language, text classification problem has not yet been carried out for different problems in multiclass analysis. Two different approaches; Naive Bayes and Maximum Entropy were used as machine learning techniques. The corpus was created with 10 different categories. Smoothing parameters and three different document models were integrated to the study. During the comparative analysis, optimal parameters were determined using their sensitivity on the accuracy, the precision and the recall. Consequently, Maximum Entropy was found as the best technique even if both techniques were relevant in multiclass classification.