An ensemble method of feature selection and classification of Odia characters

2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON) Pub Date : 2021-01-08 DOI:10.1109/ODICON50556.2021.9428979

M. Das, M. Panda

{"title":"An ensemble method of feature selection and classification of Odia characters","authors":"M. Das, M. Panda","doi":"10.1109/ODICON50556.2021.9428979","DOIUrl":null,"url":null,"abstract":"Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.","PeriodicalId":197132,"journal":{"name":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ODICON50556.2021.9428979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.

查看原文本刊更多论文

一种Odia特征选择与分类的集成方法

Odia文本的离线手写字符识别已经引起了研究者的极大关注。在模式识别领域，最具挑战性的任务是如何以接近人类的准确率对字符进行分类和识别。特征选择方法广泛应用于机器学习、生物信息学和模式识别等领域。在高维数据集中，并不是所有的特征都与问题相关;其中一些干扰和降低精度，也是造成维度诅咒的原因。为了克服上述问题，我们尝试了不同的特征选择技术，即过滤器和包装方法，以选择与输出变量具有最强关系的相关和非冗余特征子集。在OHCS 1.0数据库Odia脚本的元音基础上，通过选取前k个重要特征，实现了基于滤波和包装的Odia字符识别方法，并对其准确率进行了分析。随机森林(RF)技术是一种决策树的集合，它结合了基于基尼重要度的特征选择，对模型的准确性有很大影响。通过改变随机森林中决策树的数量，比较特征选择前和特征选择后构建树的准确率和执行时间。在丢弃许多不携带任何信息的特征用于类预测后，集成模型在减少特征集的情况下平均准确率为99.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)

自引率

0.00%

发文量