PereiraASLNet:考虑平均精度和推理时间的YOLOX手语字母识别

2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) Pub Date : 2022-02-12 DOI:10.1109/AISP53593.2022.9760665

Noel Pereira

{"title":"PereiraASLNet:考虑平均精度和推理时间的YOLOX手语字母识别","authors":"Noel Pereira","doi":"10.1109/AISP53593.2022.9760665","DOIUrl":null,"url":null,"abstract":"Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively","PeriodicalId":6793,"journal":{"name":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","volume":"20 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"PereiraASLNet: ASL letter recognition with YOLOX taking Mean Average Precision and Inference Time considerations\",\"authors\":\"Noel Pereira\",\"doi\":\"10.1109/AISP53593.2022.9760665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively\",\"PeriodicalId\":6793,\"journal\":{\"name\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"volume\":\"20 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AISP53593.2022.9760665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP53593.2022.9760665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

从本质上讲，手语允许不需要明确说出单词的交流。它是由美国聋人学校在90年代早期开发的。它是一种自然产生的语言，结合面部动作和手势来传达思想和想法。在现代，它主要被聋哑人和重听人使用。与大多数语言不同，美国手语并没有被广泛教授，这使得普通人群很难与那些主要使用美国手语作为唯一交流手段的人进行有效交流。因此，需要一种系统来检测和预测图像中的字母，然后可以实时使用，以克服这种语言障碍。本研究旨在开发基于YOLOX的手语识别系统，该系统建立在YOLOV3的基础上，其架构中包含卷积神经网络。利用YOLOX的各种主干，介绍并提供了从最不准确/最快响应时间到最准确/最慢响应时间的精度测试时间谱的每一端的六种模型。因此，我提出了PereiraASLNet，它使用字母a - z的自定义类和由Roboflow开发的Pascal VOC XML美国手语数据集来训练YOLOX，并开发了YOLOX的变体，考虑到所有YOLOX骨干架构(即YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large和YOLOX-xlarge)的平均精度和推理时间。模型的测试平均精度分别为- 0.9046、0.9070、0.9227、0.9304、0.9329和0.9578，测试推断时间分别为3.50ms、12.97ms、34.86ms、64.56ms、83.23ms和97.56ms

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PereiraASLNet: ASL letter recognition with YOLOX taking Mean Average Precision and Inference Time considerations

Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)

自引率

0.00%

发文量