手语视频中单词手势的分割

Q4 Engineering

Scientific and Technical Journal of Information Technologies, Mechanics and Optics Pub Date : 2023-10-01 DOI:10.17586/2226-1494-2023-23-5-980-988

Dang Khanh, I.A. Bessmertny

{"title":"手语视频中单词手势的分割","authors":"Dang Khanh, I.A. Bessmertny","doi":"10.17586/2226-1494-2023-23-5-980-988","DOIUrl":null,"url":null,"abstract":"Despite the widespread use of automatic speech recognition and video subtitles, sign language is still a significant communication channel for people with hearing impairments. An important task in the process of automatic recognition of sign language is the segmentation of video into fragments corresponding to individual words. In contrast to the known methods of segmentation of sign language words, the paper proposes an approach that does not require the use of sensors (accelerometers). To segment the video into words in this study, an assessment of the dynamics of the image is used, and the boundary between words is determined using a threshold value. Since in addition to the speaker, there may be other moving objects in the frame that create noise, the dynamics in the work is estimated by the average change from frame to frame of the Euclidean distance between the coordinate characteristics of the hand, forearm, eyes and mouth. The calculation of the coordinate characteristics of the hands and head is carried out using the MediaPipe library. The developed algorithm was tested for the Vietnamese sign language on an open set of 4364 videos collected at the Vietnamese Sign Language Training Center, and demonstrated accuracy comparable to manual segmentation of video by an operator and low resource consumption, which will allow using the algorithm for automatic gesture recognition in real time. The experiments have shown that the task of segmentation of sign language, unlike the known methods, can be effectively solved without the use of sensors. Like other methods of gesture segmentation, the proposed algorithm does not work satisfactorily at a high speed of sign language when words overlap each other. This problem is the subject of further research.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Segmentation of word gestures in sign language video\",\"authors\":\"Dang Khanh, I.A. Bessmertny\",\"doi\":\"10.17586/2226-1494-2023-23-5-980-988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the widespread use of automatic speech recognition and video subtitles, sign language is still a significant communication channel for people with hearing impairments. An important task in the process of automatic recognition of sign language is the segmentation of video into fragments corresponding to individual words. In contrast to the known methods of segmentation of sign language words, the paper proposes an approach that does not require the use of sensors (accelerometers). To segment the video into words in this study, an assessment of the dynamics of the image is used, and the boundary between words is determined using a threshold value. Since in addition to the speaker, there may be other moving objects in the frame that create noise, the dynamics in the work is estimated by the average change from frame to frame of the Euclidean distance between the coordinate characteristics of the hand, forearm, eyes and mouth. The calculation of the coordinate characteristics of the hands and head is carried out using the MediaPipe library. The developed algorithm was tested for the Vietnamese sign language on an open set of 4364 videos collected at the Vietnamese Sign Language Training Center, and demonstrated accuracy comparable to manual segmentation of video by an operator and low resource consumption, which will allow using the algorithm for automatic gesture recognition in real time. The experiments have shown that the task of segmentation of sign language, unlike the known methods, can be effectively solved without the use of sensors. Like other methods of gesture segmentation, the proposed algorithm does not work satisfactorily at a high speed of sign language when words overlap each other. This problem is the subject of further research.\",\"PeriodicalId\":21700,\"journal\":{\"name\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17586/2226-1494-2023-23-5-980-988\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17586/2226-1494-2023-23-5-980-988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

摘要

尽管自动语音识别和视频字幕被广泛使用，手语仍然是听障人士重要的沟通渠道。在手语自动识别过程中，一个重要的任务是将视频分割成与单个单词相对应的片段。与已知的手语词分割方法相比，本文提出了一种不需要使用传感器(加速度计)的方法。在本研究中，为了将视频分割成单词，使用了对图像动态的评估，并使用阈值确定单词之间的边界。由于除了说话人之外，画面中可能还有其他运动物体产生噪声，因此作品中的动态是通过手、前臂、眼睛和嘴的坐标特征之间的欧几里得距离在每帧之间的平均变化来估计的。利用MediaPipe库计算手和头的坐标特征。开发的算法在越南手语培训中心收集的4364个开放视频集上对越南手语进行了测试，并证明了与操作员手动分割视频相当的准确性和低资源消耗，这将允许使用该算法进行实时自动手势识别。实验表明，与已知的方法不同，手势语言的分割任务可以在不使用传感器的情况下有效地解决。与其他手势分割方法一样，本文提出的算法在单词重叠的高速手语中效果不理想。这个问题有待进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Segmentation of word gestures in sign language video

Despite the widespread use of automatic speech recognition and video subtitles, sign language is still a significant communication channel for people with hearing impairments. An important task in the process of automatic recognition of sign language is the segmentation of video into fragments corresponding to individual words. In contrast to the known methods of segmentation of sign language words, the paper proposes an approach that does not require the use of sensors (accelerometers). To segment the video into words in this study, an assessment of the dynamics of the image is used, and the boundary between words is determined using a threshold value. Since in addition to the speaker, there may be other moving objects in the frame that create noise, the dynamics in the work is estimated by the average change from frame to frame of the Euclidean distance between the coordinate characteristics of the hand, forearm, eyes and mouth. The calculation of the coordinate characteristics of the hands and head is carried out using the MediaPipe library. The developed algorithm was tested for the Vietnamese sign language on an open set of 4364 videos collected at the Vietnamese Sign Language Training Center, and demonstrated accuracy comparable to manual segmentation of video by an operator and low resource consumption, which will allow using the algorithm for automatic gesture recognition in real time. The experiments have shown that the task of segmentation of sign language, unlike the known methods, can be effectively solved without the use of sensors. Like other methods of gesture segmentation, the proposed algorithm does not work satisfactorily at a high speed of sign language when words overlap each other. This problem is the subject of further research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊