Rogerio C. P. Fragoso, Roberto H. W. Pinheiro, George D. C. Cavalcanti
{"title":"一种自动确定文本分类特征向量大小的方法","authors":"Rogerio C. P. Fragoso, Roberto H. W. Pinheiro, George D. C. Cavalcanti","doi":"10.1109/BRACIS.2016.055","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a feature selection method for text categorization based on the filtering approach named Automatic Feature Subsets Analyzer (AFSA). The AFSA extends the Class-dependent Maximum Features per Document (cMFDR) algorithm and automatically defines the best number of features per document. In the cMFDR algorithm, the number of features is selected after a repetitive application of the methods which is a time-consuming strategy. In contrast, AFSA finds the best number of features in a data-driven way which is faster than cMFDR. The experiments with the Naïve Bayes Multinomial classifier, using four benchmark datasets, and three Feature Evaluation Function showed that the AFSA outperforms or presents similar results when compared with the cMFDR.","PeriodicalId":183149,"journal":{"name":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Method for Automatic Determination of the Feature Vector Size for Text Categorization\",\"authors\":\"Rogerio C. P. Fragoso, Roberto H. W. Pinheiro, George D. C. Cavalcanti\",\"doi\":\"10.1109/BRACIS.2016.055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a feature selection method for text categorization based on the filtering approach named Automatic Feature Subsets Analyzer (AFSA). The AFSA extends the Class-dependent Maximum Features per Document (cMFDR) algorithm and automatically defines the best number of features per document. In the cMFDR algorithm, the number of features is selected after a repetitive application of the methods which is a time-consuming strategy. In contrast, AFSA finds the best number of features in a data-driven way which is faster than cMFDR. The experiments with the Naïve Bayes Multinomial classifier, using four benchmark datasets, and three Feature Evaluation Function showed that the AFSA outperforms or presents similar results when compared with the cMFDR.\",\"PeriodicalId\":183149,\"journal\":{\"name\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRACIS.2016.055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2016.055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Method for Automatic Determination of the Feature Vector Size for Text Categorization
In this paper, we propose a feature selection method for text categorization based on the filtering approach named Automatic Feature Subsets Analyzer (AFSA). The AFSA extends the Class-dependent Maximum Features per Document (cMFDR) algorithm and automatically defines the best number of features per document. In the cMFDR algorithm, the number of features is selected after a repetitive application of the methods which is a time-consuming strategy. In contrast, AFSA finds the best number of features in a data-driven way which is faster than cMFDR. The experiments with the Naïve Bayes Multinomial classifier, using four benchmark datasets, and three Feature Evaluation Function showed that the AFSA outperforms or presents similar results when compared with the cMFDR.