{"title":"一种Odia特征选择与分类的集成方法","authors":"M. Das, M. Panda","doi":"10.1109/ODICON50556.2021.9428979","DOIUrl":null,"url":null,"abstract":"Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.","PeriodicalId":197132,"journal":{"name":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An ensemble method of feature selection and classification of Odia characters\",\"authors\":\"M. Das, M. Panda\",\"doi\":\"10.1109/ODICON50556.2021.9428979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.\",\"PeriodicalId\":197132,\"journal\":{\"name\":\"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ODICON50556.2021.9428979\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ODICON50556.2021.9428979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An ensemble method of feature selection and classification of Odia characters
Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.