Manda Thejaswee, P. Srilakshmi, G. Karuna, K. Anuradha
{"title":"Hybrid IG and GA based Feature Selection Approach for Text Categorization","authors":"Manda Thejaswee, P. Srilakshmi, G. Karuna, K. Anuradha","doi":"10.1109/ICECA49313.2020.9297468","DOIUrl":null,"url":null,"abstract":"Feature selection is considered as the most important research area due to its accuracy and time considerations in the field of text classification. If the initial feature set is large, it becomes very important to select the necessary features. Text classification remains as one of the examples that one can see when hundreds or even thousands of records can be included in the size of the feature set. Many research studies are carried out on feature selection by proposing different feature selection approaches for text classification. Although several numbers of studies are done on feature selection, but there is no substantial work to prove the combination of features. The aim of the analysis is to evaluate the redundancy of textual properties selected using a different method such as data set features, algorithms, metrics, a hybrid feature selection method. The test results show that the combination of characteristics chosen by different methods is precise over those selected by each selection process. In any case, the proposed selection of hybrid features depends on the data set characteristics, classification algorithm selection and assessment metrics.","PeriodicalId":297285,"journal":{"name":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA49313.2020.9297468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Feature selection is considered as the most important research area due to its accuracy and time considerations in the field of text classification. If the initial feature set is large, it becomes very important to select the necessary features. Text classification remains as one of the examples that one can see when hundreds or even thousands of records can be included in the size of the feature set. Many research studies are carried out on feature selection by proposing different feature selection approaches for text classification. Although several numbers of studies are done on feature selection, but there is no substantial work to prove the combination of features. The aim of the analysis is to evaluate the redundancy of textual properties selected using a different method such as data set features, algorithms, metrics, a hybrid feature selection method. The test results show that the combination of characteristics chosen by different methods is precise over those selected by each selection process. In any case, the proposed selection of hybrid features depends on the data set characteristics, classification algorithm selection and assessment metrics.