Multi-Scale Feature Fusion Network for Lip Recognition

2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA) Pub Date : 2024-01-26 DOI:10.1109/ICPECA60615.2024.10471068

Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan

{"title":"Multi-Scale Feature Fusion Network for Lip Recognition","authors":"Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan","doi":"10.1109/ICPECA60615.2024.10471068","DOIUrl":null,"url":null,"abstract":"Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.","PeriodicalId":518671,"journal":{"name":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","volume":"55 4","pages":"541-545"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA60615.2024.10471068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.

查看原文本刊更多论文

用于唇语识别的多尺度特征融合网络

视觉语音识别（VSR）又称唇语识别。最近，由于深度学习的发展，它得到了广泛的探索。嘴唇识别是一个辨别问题，其中嘴唇的微妙运动所提供的信息最为显著。这就对模型提取嘴唇周围细微变化特征的能力提出了更高的要求。本文提出了一种三维卷积网络（3D CNN）多分支特征融合网络，用于提取连续图像的时空特征。利用多分支特征融合网络的特征从连续图像中充分提取局部和总体特征，并进一步增强特征信息，从而为后端分类网络提供更准确的功能信息。不少方法的优劣需要海量数据的支持，而小规模数据集则有利于测试效果。本实验使用奥卢 Vs2 数据集进行，获得了令人振奋的实验结果。经过 20 次迭代实验后，最大准确率绝对提高了 0.8%，平均准确率提高了 1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)

自引率

0.00%

发文量