A Lipreading Model Based on Fine-Grained Global Synergy of Lip Movement

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2022-10-01 DOI:10.1109/ICTAI56018.2022.00130

Baosheng Sun, Dongliang Xie, Dawei Luo, Xiaojie Yin

{"title":"A Lipreading Model Based on Fine-Grained Global Synergy of Lip Movement","authors":"Baosheng Sun, Dongliang Xie, Dawei Luo, Xiaojie Yin","doi":"10.1109/ICTAI56018.2022.00130","DOIUrl":null,"url":null,"abstract":"Lipreading is a type of speech recognition based on visual information. It is instructive to design a lipreading model according to the lip movement law. Algorithms in the field of computer vision cannot fully satisfy the characteristics of lipreading, and direct use does not necessarily improve the performance of lipreading. In this paper, we propose that lipreading has fine-grained global synergy by comparing other computer vision tasks and analyzing lip muscle motion patterns. To address this feature, we propose a tailored model and name it Fine-Grained Global Synergy Lipreading (FGSLip). Our model aims to make features synergistic to improve lipreading performance. We introduce global features to represent the overall characteristics of the lip, and local features to learn coarse-grained and fine-grained correlations between features. Then, diffusion and fusion methods are used to make the local features and global features synergistic. Based on the above, several different feature extraction structures are constructed to demonstrate the fine-grained global synergy of lipreading. To verify the effectiveness of the proposed model, extensive experiments are conducted on the laboratory record dataset ICSLR and the public dataset CMLR, and the experimental results show that the proposed method can effectively improve the accuracy of lipreading.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Lipreading is a type of speech recognition based on visual information. It is instructive to design a lipreading model according to the lip movement law. Algorithms in the field of computer vision cannot fully satisfy the characteristics of lipreading, and direct use does not necessarily improve the performance of lipreading. In this paper, we propose that lipreading has fine-grained global synergy by comparing other computer vision tasks and analyzing lip muscle motion patterns. To address this feature, we propose a tailored model and name it Fine-Grained Global Synergy Lipreading (FGSLip). Our model aims to make features synergistic to improve lipreading performance. We introduce global features to represent the overall characteristics of the lip, and local features to learn coarse-grained and fine-grained correlations between features. Then, diffusion and fusion methods are used to make the local features and global features synergistic. Based on the above, several different feature extraction structures are constructed to demonstrate the fine-grained global synergy of lipreading. To verify the effectiveness of the proposed model, extensive experiments are conducted on the laboratory record dataset ICSLR and the public dataset CMLR, and the experimental results show that the proposed method can effectively improve the accuracy of lipreading.

查看原文本刊更多论文

一种基于唇动细粒度全局协同的唇读模型

唇读是一种基于视觉信息的语音识别。根据唇的运动规律设计读唇模型具有一定的指导意义。计算机视觉领域的算法不能完全满足唇读的特点，直接使用也不一定能提高唇读的性能。在本文中，我们通过比较其他计算机视觉任务和分析唇肌运动模式，提出唇读具有细粒度的全局协同作用。为了解决这个问题，我们提出了一个量身定制的模型，并将其命名为细粒度全局协同唇读(FGSLip)。我们的模型旨在使特征协同以提高读唇性能。我们引入全局特征来表示唇的整体特征，引入局部特征来学习特征之间的粗粒度和细粒度相关性。然后，采用扩散和融合方法，使局部特征和全局特征协同;在此基础上，构建了几种不同的特征提取结构，以展示唇读的细粒度全局协同作用。为了验证所提模型的有效性，在实验室记录数据集ICSLR和公共数据集CMLR上进行了大量实验，实验结果表明，所提方法可以有效提高唇读的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量