S. Mohanty, Supriya Prasad, Tanvi Sinha, B. N. Krupa
{"title":"基于三维手部姿态估计和深度学习的德语手语翻译","authors":"S. Mohanty, Supriya Prasad, Tanvi Sinha, B. N. Krupa","doi":"10.1109/TENCON50793.2020.9293763","DOIUrl":null,"url":null,"abstract":"Sign language is the primary medium of communication for the majority of the world’s population suffering from disabling hearing loss that creates a barrier between the hearing and the hearing-impaired people. In this paper, sign language translation is undertaken for German Sign Language (GSL) characters from a single image by leveraging the technique of 3D object detection. We make use of a three-network architecture that performs segmentation, keypoint localization, and elevation from a two-dimensional plane to the three-dimensional space, from a single RGB image containing the signed gesture. Thirty gestures have been used and the best results were obtained using a combination of pose representation coordinates, joint angles, and pool layer features of AlexNet for classification. The system gives a character error rate of 0.29, a reduction of error rate by 12.12% when compared to the state-of-the-art approach.","PeriodicalId":283131,"journal":{"name":"2020 IEEE REGION 10 CONFERENCE (TENCON)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"German Sign Language Translation using 3D Hand Pose Estimation and Deep Learning\",\"authors\":\"S. Mohanty, Supriya Prasad, Tanvi Sinha, B. N. Krupa\",\"doi\":\"10.1109/TENCON50793.2020.9293763\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign language is the primary medium of communication for the majority of the world’s population suffering from disabling hearing loss that creates a barrier between the hearing and the hearing-impaired people. In this paper, sign language translation is undertaken for German Sign Language (GSL) characters from a single image by leveraging the technique of 3D object detection. We make use of a three-network architecture that performs segmentation, keypoint localization, and elevation from a two-dimensional plane to the three-dimensional space, from a single RGB image containing the signed gesture. Thirty gestures have been used and the best results were obtained using a combination of pose representation coordinates, joint angles, and pool layer features of AlexNet for classification. The system gives a character error rate of 0.29, a reduction of error rate by 12.12% when compared to the state-of-the-art approach.\",\"PeriodicalId\":283131,\"journal\":{\"name\":\"2020 IEEE REGION 10 CONFERENCE (TENCON)\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE REGION 10 CONFERENCE (TENCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON50793.2020.9293763\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE REGION 10 CONFERENCE (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON50793.2020.9293763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
German Sign Language Translation using 3D Hand Pose Estimation and Deep Learning
Sign language is the primary medium of communication for the majority of the world’s population suffering from disabling hearing loss that creates a barrier between the hearing and the hearing-impaired people. In this paper, sign language translation is undertaken for German Sign Language (GSL) characters from a single image by leveraging the technique of 3D object detection. We make use of a three-network architecture that performs segmentation, keypoint localization, and elevation from a two-dimensional plane to the three-dimensional space, from a single RGB image containing the signed gesture. Thirty gestures have been used and the best results were obtained using a combination of pose representation coordinates, joint angles, and pool layer features of AlexNet for classification. The system gives a character error rate of 0.29, a reduction of error rate by 12.12% when compared to the state-of-the-art approach.