{"title":"Automatic classification of documents by formality","authors":"Fadi Abu Sheikha, D. Inkpen","doi":"10.1109/NLPKE.2010.5587767","DOIUrl":null,"url":null,"abstract":"This paper addresses the task of classifying documents into formal or informal style. We studied the main characteristics of each style in order to choose features that allowed us to train classifiers that can distinguish between the two styles. We built our data set by collecting documents for both styles, from different sources. We tested several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results. We performed attribute selection in order to determine the contribution of each feature to our model.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
This paper addresses the task of classifying documents into formal or informal style. We studied the main characteristics of each style in order to choose features that allowed us to train classifiers that can distinguish between the two styles. We built our data set by collecting documents for both styles, from different sources. We tested several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results. We performed attribute selection in order to determine the contribution of each feature to our model.