EI-RNN-based text generation for the static and dynamic isolated sign language videos

IF 1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent & Fuzzy Systems Pub Date : 2023-10-26 DOI:10.3233/jifs-233610

S. Subburaj, S. Murugavalli, B. Muthusenthil

{"title":"EI-RNN-based text generation for the static and dynamic isolated sign language videos","authors":"S. Subburaj, S. Murugavalli, B. Muthusenthil","doi":"10.3233/jifs-233610","DOIUrl":null,"url":null,"abstract":"SLR, which assists hearing-impaired people to communicate with other persons by sign language, is considered as a promising method. However, as the features of some of the static SL could be the same as the feature in a single frame of dynamic Isolated Sign Language (ISL), the generation of accurate text corresponding to the SL is necessary during the SLR. Therefore, Edge-directed Interpolation-based Recurrent Neural Network (EI-RNN)-centered text generation with varied features of the static and dynamic Isolated SL is proposed in this article. Primarily, ISL videos are converted to frames and pre-processed with key frame extraction and illumination control. After that, the foreground is separated with the Symmetric Normalised Laplacian-centered Otsu Thresholding (SLOT) technique for finding accurate key points in the human pose. The human pose’s key points are extracted with the Media Pipeline Holistic (MPH) pipeline approach and to improve the features of the face and hand sign, the resultant frame is fused with the depth image. After that, to differentiate the static and dynamic actions, the action change in the fused frames is determined with a correlation matrix. After that, to engender the output text for the respective SL, features are extracted individually as of the static and dynamic frames. It is obtained from the analysis that when analogized to the prevailing models, the proposed EI-RNN’s translation accuracy is elevated by 2.05% in INCLUDE 50 Indian SL based Dataset and Top 1 Accuracy 2.44% and Top 10 accuracy, 1.71% improved in WLASL 100 American SL.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"26 4","pages":"0"},"PeriodicalIF":1.0000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-233610","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

SLR, which assists hearing-impaired people to communicate with other persons by sign language, is considered as a promising method. However, as the features of some of the static SL could be the same as the feature in a single frame of dynamic Isolated Sign Language (ISL), the generation of accurate text corresponding to the SL is necessary during the SLR. Therefore, Edge-directed Interpolation-based Recurrent Neural Network (EI-RNN)-centered text generation with varied features of the static and dynamic Isolated SL is proposed in this article. Primarily, ISL videos are converted to frames and pre-processed with key frame extraction and illumination control. After that, the foreground is separated with the Symmetric Normalised Laplacian-centered Otsu Thresholding (SLOT) technique for finding accurate key points in the human pose. The human pose’s key points are extracted with the Media Pipeline Holistic (MPH) pipeline approach and to improve the features of the face and hand sign, the resultant frame is fused with the depth image. After that, to differentiate the static and dynamic actions, the action change in the fused frames is determined with a correlation matrix. After that, to engender the output text for the respective SL, features are extracted individually as of the static and dynamic frames. It is obtained from the analysis that when analogized to the prevailing models, the proposed EI-RNN’s translation accuracy is elevated by 2.05% in INCLUDE 50 Indian SL based Dataset and Top 1 Accuracy 2.44% and Top 10 accuracy, 1.71% improved in WLASL 100 American SL.

查看原文本刊更多论文

基于ei - rnn的静态和动态孤立手语视频文本生成

SLR是一种帮助听障人士用手语与其他人交流的方法，被认为是一种很有前途的方法。然而，由于一些静态手语的特征可能与动态孤立手语(ISL)的单帧特征相同，因此在SLR过程中需要生成与该手语相对应的准确文本。因此，本文提出了一种基于边缘导向插值的递归神经网络(EI-RNN)中心的文本生成方法，该方法具有静态和动态孤立语言的不同特征。首先，将ISL视频转换为帧，并进行关键帧提取和光照控制等预处理。之后，使用对称归一化拉普拉斯中心大津阈值(SLOT)技术分离前景，以找到人体姿势的准确关键点。采用媒体管道整体(MPH)管道方法提取人体姿态的关键点，并将生成的帧与深度图像融合，以改善人脸和手势的特征。然后，利用相关矩阵确定融合框架中的动作变化，以区分静态和动态动作。之后，为了生成各自SL的输出文本，将分别从静态框架和动态框架中提取特征。分析结果表明，与主流模型进行类比时，所提出的EI-RNN在INCLUDE 50印度语数据集中的翻译精度提高了2.05%，在WLASL 100美国语数据集中的翻译精度提高了2.44%，前10名的翻译精度提高了1.71%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Intelligent & Fuzzy Systems 工程技术-计算机：人工智能

CiteScore

3.40

自引率

10.00%

发文量

965

审稿时长

5.1 months

期刊介绍： The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.