An ensemble method of feature selection and classification of Odia characters

M. Das, M. Panda
{"title":"An ensemble method of feature selection and classification of Odia characters","authors":"M. Das, M. Panda","doi":"10.1109/ODICON50556.2021.9428979","DOIUrl":null,"url":null,"abstract":"Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.","PeriodicalId":197132,"journal":{"name":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ODICON50556.2021.9428979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Offline Handwritten Character Recognition of Odia scripts have drawn remarkable attention by researchers. In the area of Pattern recognition, most challenging task is to classify and recognize characters with human like accuracy. Feature selection methods are widely used in various sectors of machine learning, bioinformatics, and Pattern recognition. In high dimensional datasets, not all the features are relevant to the problem; some of them interfere and reduce accuracy and also the causes of curse of dimensionality. To overcome the above problems, we have experimented different feature selection technique i.e. filter and wrapper methods, to select a subset of relevant and nonredundant features that have the strongest relationships with the output variable. Filter and wrapper based methods are implemented, on vowels of Odia script of OHCS 1.0 database for Odia character recognition, by selecting top ‘k’ important features and their accuracy is analyzed. The Random Forest (RF) technique, which is an ensemble of decision trees, incorporates Gini importance based feature selection and influences greatly the model's accuracy. By varying the number of decision tress in the Random Forest, the accuracy and the execution time for building the tree before feature selection and after feature selection is compared. After discarding many features that do not carry any information for class prediction, the ensemble model results on an average of 99.2% accuracy with reduced feature set.
一种Odia特征选择与分类的集成方法
Odia文本的离线手写字符识别已经引起了研究者的极大关注。在模式识别领域,最具挑战性的任务是如何以接近人类的准确率对字符进行分类和识别。特征选择方法广泛应用于机器学习、生物信息学和模式识别等领域。在高维数据集中,并不是所有的特征都与问题相关;其中一些干扰和降低精度,也是造成维度诅咒的原因。为了克服上述问题,我们尝试了不同的特征选择技术,即过滤器和包装方法,以选择与输出变量具有最强关系的相关和非冗余特征子集。在OHCS 1.0数据库Odia脚本的元音基础上,通过选取前k个重要特征,实现了基于滤波和包装的Odia字符识别方法,并对其准确率进行了分析。随机森林(RF)技术是一种决策树的集合,它结合了基于基尼重要度的特征选择,对模型的准确性有很大影响。通过改变随机森林中决策树的数量,比较特征选择前和特征选择后构建树的准确率和执行时间。在丢弃许多不携带任何信息的特征用于类预测后,集成模型在减少特征集的情况下平均准确率为99.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信