BiSign-Net: Fine-grained Static Sign Language Recognition based on Bilinear CNN

Arezoo Sadeghzadeh, Md Baharul Islam
{"title":"BiSign-Net: Fine-grained Static Sign Language Recognition based on Bilinear CNN","authors":"Arezoo Sadeghzadeh, Md Baharul Islam","doi":"10.1109/ISPACS57703.2022.10082808","DOIUrl":null,"url":null,"abstract":"Sign language (SL) is a type of communication language used by deaf and hard-of-hearing people. Large varieties in different SLs and lack of knowledge in general public to interpret them bring an inevitable necessity for breaking down the communication barriers by automatic sign language recognition (SLR) systems. Despite the existence of numerous approaches with satisfactory performance, they still suffer from severe challenges in dealing with large intra-class and slight inter-class variations, which make them infeasible for real-world applications. To address this issue, a novel end-to-end fine-grained static SLR (SSLR) system is proposed, namely BiSign-Net, based on Bilinear Convolutional Neural Network (Bi-CNN) to efficiently model the variations both in the location and appearance of the hands in the images for enhancing the accuracy, speed, and robustness against the translation. To this end, fine-grained orderless bilinear features are generated by pooled outer product of the extracted features from two identical novel CNN-based feature extractors. Bilinear features pass a normalization module including the signed square root and l2 normalization through which the accuracy of the model is further improved. A dropout layer is deployed in the classification module to aid the model in dealing with small-scale datasets by preventing overfitting. The number of layers, hyper-parameters, and optimization technique of the proposed CNN are adjusted to achieve high performance and faster convergence with low number of parameters. Experimental results on four datasets of Static ASL, NUS I, Massey, and ArASL from two SLs (i.e. American and Arabic) with an accuracy of 100%, 100%, 99.20%, and 99.35%, respectively, demonstrate that the proposed model surpasses the existing approaches with high robustness and generalization ability.","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sign language (SL) is a type of communication language used by deaf and hard-of-hearing people. Large varieties in different SLs and lack of knowledge in general public to interpret them bring an inevitable necessity for breaking down the communication barriers by automatic sign language recognition (SLR) systems. Despite the existence of numerous approaches with satisfactory performance, they still suffer from severe challenges in dealing with large intra-class and slight inter-class variations, which make them infeasible for real-world applications. To address this issue, a novel end-to-end fine-grained static SLR (SSLR) system is proposed, namely BiSign-Net, based on Bilinear Convolutional Neural Network (Bi-CNN) to efficiently model the variations both in the location and appearance of the hands in the images for enhancing the accuracy, speed, and robustness against the translation. To this end, fine-grained orderless bilinear features are generated by pooled outer product of the extracted features from two identical novel CNN-based feature extractors. Bilinear features pass a normalization module including the signed square root and l2 normalization through which the accuracy of the model is further improved. A dropout layer is deployed in the classification module to aid the model in dealing with small-scale datasets by preventing overfitting. The number of layers, hyper-parameters, and optimization technique of the proposed CNN are adjusted to achieve high performance and faster convergence with low number of parameters. Experimental results on four datasets of Static ASL, NUS I, Massey, and ArASL from two SLs (i.e. American and Arabic) with an accuracy of 100%, 100%, 99.20%, and 99.35%, respectively, demonstrate that the proposed model surpasses the existing approaches with high robustness and generalization ability.
BiSign-Net:基于双线性CNN的细粒度静态手语识别
手语(SL)是聋哑人和听力障碍者使用的一种交流语言。由于手语种类繁多,而普通大众又缺乏对手语的理解能力,因此手语自动识别系统打破交流障碍的必要性是必然的。尽管存在许多性能令人满意的方法,但它们在处理大的类内变化和小的类间变化方面仍然面临严峻的挑战,这使得它们在实际应用中不可行。为了解决这一问题,提出了一种基于双线性卷积神经网络(Bi-CNN)的端到端细粒度静态单反(SSLR)系统,即BiSign-Net,以有效地模拟图像中手的位置和外观的变化,以提高准确性、速度和对翻译的鲁棒性。为此,从两个相同的新型基于cnn的特征提取器中提取特征的外积池生成细粒度无序双线性特征。双线性特征通过一个归一化模块,包括有符号的平方根和l2归一化,从而进一步提高模型的精度。在分类模块中部署dropout层,通过防止过拟合来帮助模型处理小规模数据集。调整所提出的CNN的层数、超参数和优化技术,以实现低参数下的高性能和更快的收敛。在两种语言(美国语和阿拉伯语)的静态ASL、NUS I、Massey和ArASL四个数据集上的实验结果表明,该模型的准确率分别为100%、100%、99.20%和99.35%,优于现有的方法,具有较高的鲁棒性和泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信