Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal

Yohei Abe, A. Ito
{"title":"Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal","authors":"Yohei Abe, A. Ito","doi":"10.1109/IIH-MSP.2013.76","DOIUrl":null,"url":null,"abstract":"Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIH-MSP.2013.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
将图像特征嵌入语音信号的多模态语音活动检测
嘴唇的运动与语言有着密切的关系,因为我们说话的时候嘴唇会动。这项工作的思想是从面部视频中提取唇部运动特征,并利用信息隐藏技术将运动特征嵌入到语音信号中。使用该框架,我们可以在不增加信号比特率的情况下,仅使用包含唇部运动特征的语音信号来提供高级语音通信。本文给出了该方法的基本框架,并将该方法应用于多模态语音活动检测(VAD)。通过支持向量机的检测实验,我们在噪声环境下获得了比纯音频VAD更好的性能。此外,我们还研究了数据嵌入语音信号对声音质量和检测性能的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信