使用深度学习网络的短句子自动唇读

International Journal of Advances in Intelligent Informatics Pub Date : 2023-03-15 DOI:10.26555/ijain.v9i1.920

M. Rajab, Kadhim M. Hashim

{"title":"使用深度学习网络的短句子自动唇读","authors":"M. Rajab, Kadhim M. Hashim","doi":"10.26555/ijain.v9i1.920","DOIUrl":null,"url":null,"abstract":"One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jones to detect the face area. Then 68 landmarks of the facial area are determined, and the landmarks from 48 to 68 represent the lip area extracted based on building a binary mask. Then, the contrast is enhanced to improve the quality of the lip image by applying contrast adjustment. Finally, sentences are classified using two deep learning models, the first is AlexNet, and the second is VGG-16 Net. The database consists of 39 participants (32 males and 7 females). Each participant repeats the short sentences five times. The outcomes demonstrate the accuracy rate of AlexNet is 90.00%, whereas the accuracy rate for VGG-16 Net is 82.34%. We concluded that AlexNet performs better for classifying short sentences than VGG-16 Net.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An automatic lip reading for short sentences using deep learning nets\",\"authors\":\"M. Rajab, Kadhim M. Hashim\",\"doi\":\"10.26555/ijain.v9i1.920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jones to detect the face area. Then 68 landmarks of the facial area are determined, and the landmarks from 48 to 68 represent the lip area extracted based on building a binary mask. Then, the contrast is enhanced to improve the quality of the lip image by applying contrast adjustment. Finally, sentences are classified using two deep learning models, the first is AlexNet, and the second is VGG-16 Net. The database consists of 39 participants (32 males and 7 females). Each participant repeats the short sentences five times. The outcomes demonstrate the accuracy rate of AlexNet is 90.00%, whereas the accuracy rate for VGG-16 Net is 82.34%. We concluded that AlexNet performs better for classifying short sentences than VGG-16 Net.\",\"PeriodicalId\":52195,\"journal\":{\"name\":\"International Journal of Advances in Intelligent Informatics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advances in Intelligent Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26555/ijain.v9i1.920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v9i1.920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

唇读是近年来重要性显著提高的一项研究，尤其是随着深度学习技术的广泛使用。唇读对于嘈杂环境中的语音识别或听力受损的人来说是必不可少的。它指的是利用从嘴唇运动中获得的视觉信息来识别口语句子。此外，嘴唇区域，尤其是男性，有几个问题，比如嘴巴区域包含胡子和胡须，可能会覆盖嘴唇区域。本文提出了一种自动唇读系统，利用深度学习网络对说话者所说的英语短句进行识别和分类。输入视频提取帧，每一帧被传递到维奥拉-琼斯检测人脸区域。然后确定面部区域的68个标志，48 ~ 68个标志代表基于构建二值掩模提取的唇区域。然后，通过对比度调整来增强对比度，提高唇形图像的质量。最后，使用两个深度学习模型对句子进行分类，第一个是AlexNet，第二个是VGG-16 Net。该数据库包括39名参与者(32名男性和7名女性)。每个参与者重复五次短句子。结果表明，AlexNet的准确率为90.00%，而VGG-16 Net的准确率为82.34%。我们得出结论，AlexNet在对短句的分类方面比VGG-16 Net表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An automatic lip reading for short sentences using deep learning nets

One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jones to detect the face area. Then 68 landmarks of the facial area are determined, and the landmarks from 48 to 68 represent the lip area extracted based on building a binary mask. Then, the contrast is enhanced to improve the quality of the lip image by applying contrast adjustment. Finally, sentences are classified using two deep learning models, the first is AlexNet, and the second is VGG-16 Net. The database consists of 39 participants (32 males and 7 females). Each participant repeats the short sentences five times. The outcomes demonstrate the accuracy rate of AlexNet is 90.00%, whereas the accuracy rate for VGG-16 Net is 82.34%. We concluded that AlexNet performs better for classifying short sentences than VGG-16 Net.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition

CiteScore

3.00

自引率

0.00%

发文量