BiSign-Net: Fine-grained Static Sign Language Recognition based on Bilinear CNN

2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) Pub Date : 2022-11-22 DOI:10.1109/ISPACS57703.2022.10082808

Arezoo Sadeghzadeh, Md Baharul Islam

{"title":"BiSign-Net: Fine-grained Static Sign Language Recognition based on Bilinear CNN","authors":"Arezoo Sadeghzadeh, Md Baharul Islam","doi":"10.1109/ISPACS57703.2022.10082808","DOIUrl":null,"url":null,"abstract":"Sign language (SL) is a type of communication language used by deaf and hard-of-hearing people. Large varieties in different SLs and lack of knowledge in general public to interpret them bring an inevitable necessity for breaking down the communication barriers by automatic sign language recognition (SLR) systems. Despite the existence of numerous approaches with satisfactory performance, they still suffer from severe challenges in dealing with large intra-class and slight inter-class variations, which make them infeasible for real-world applications. To address this issue, a novel end-to-end fine-grained static SLR (SSLR) system is proposed, namely BiSign-Net, based on Bilinear Convolutional Neural Network (Bi-CNN) to efficiently model the variations both in the location and appearance of the hands in the images for enhancing the accuracy, speed, and robustness against the translation. To this end, fine-grained orderless bilinear features are generated by pooled outer product of the extracted features from two identical novel CNN-based feature extractors. Bilinear features pass a normalization module including the signed square root and l2 normalization through which the accuracy of the model is further improved. A dropout layer is deployed in the classification module to aid the model in dealing with small-scale datasets by preventing overfitting. The number of layers, hyper-parameters, and optimization technique of the proposed CNN are adjusted to achieve high performance and faster convergence with low number of parameters. Experimental results on four datasets of Static ASL, NUS I, Massey, and ArASL from two SLs (i.e. American and Arabic) with an accuracy of 100%, 100%, 99.20%, and 99.35%, respectively, demonstrate that the proposed model surpasses the existing approaches with high robustness and generalization ability.","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sign language (SL) is a type of communication language used by deaf and hard-of-hearing people. Large varieties in different SLs and lack of knowledge in general public to interpret them bring an inevitable necessity for breaking down the communication barriers by automatic sign language recognition (SLR) systems. Despite the existence of numerous approaches with satisfactory performance, they still suffer from severe challenges in dealing with large intra-class and slight inter-class variations, which make them infeasible for real-world applications. To address this issue, a novel end-to-end fine-grained static SLR (SSLR) system is proposed, namely BiSign-Net, based on Bilinear Convolutional Neural Network (Bi-CNN) to efficiently model the variations both in the location and appearance of the hands in the images for enhancing the accuracy, speed, and robustness against the translation. To this end, fine-grained orderless bilinear features are generated by pooled outer product of the extracted features from two identical novel CNN-based feature extractors. Bilinear features pass a normalization module including the signed square root and l2 normalization through which the accuracy of the model is further improved. A dropout layer is deployed in the classification module to aid the model in dealing with small-scale datasets by preventing overfitting. The number of layers, hyper-parameters, and optimization technique of the proposed CNN are adjusted to achieve high performance and faster convergence with low number of parameters. Experimental results on four datasets of Static ASL, NUS I, Massey, and ArASL from two SLs (i.e. American and Arabic) with an accuracy of 100%, 100%, 99.20%, and 99.35%, respectively, demonstrate that the proposed model surpasses the existing approaches with high robustness and generalization ability.

查看原文本刊更多论文

BiSign-Net:基于双线性CNN的细粒度静态手语识别

手语(SL)是聋哑人和听力障碍者使用的一种交流语言。由于手语种类繁多，而普通大众又缺乏对手语的理解能力，因此手语自动识别系统打破交流障碍的必要性是必然的。尽管存在许多性能令人满意的方法，但它们在处理大的类内变化和小的类间变化方面仍然面临严峻的挑战，这使得它们在实际应用中不可行。为了解决这一问题，提出了一种基于双线性卷积神经网络(Bi-CNN)的端到端细粒度静态单反(SSLR)系统，即BiSign-Net，以有效地模拟图像中手的位置和外观的变化，以提高准确性、速度和对翻译的鲁棒性。为此，从两个相同的新型基于cnn的特征提取器中提取特征的外积池生成细粒度无序双线性特征。双线性特征通过一个归一化模块，包括有符号的平方根和l2归一化，从而进一步提高模型的精度。在分类模块中部署dropout层，通过防止过拟合来帮助模型处理小规模数据集。调整所提出的CNN的层数、超参数和优化技术，以实现低参数下的高性能和更快的收敛。在两种语言(美国语和阿拉伯语)的静态ASL、NUS I、Massey和ArASL四个数据集上的实验结果表明，该模型的准确率分别为100%、100%、99.20%和99.35%，优于现有的方法，具有较高的鲁棒性和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)

自引率

0.00%

发文量