基于i向量和DNN混合系统的音频流鲁棒主播检测

Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao
{"title":"基于i向量和DNN混合系统的音频流鲁棒主播检测","authors":"Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao","doi":"10.1109/APSIPA.2014.7041717","DOIUrl":null,"url":null,"abstract":"Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system\",\"authors\":\"Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao\",\"doi\":\"10.1109/APSIPA.2014.7041717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.\",\"PeriodicalId\":231382,\"journal\":{\"name\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2014.7041717\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

主播片段检测使高效的视频内容索引信息检索。基于音频分析的主播检测以其较低的计算复杂度和令人满意的性能得到了广泛的应用。本文提出了一种鲁棒框架,利用混合i向量和深度神经网络(DNN)系统来执行基于视频内容音频流的主播检测。该系统首先利用i向量从音频数据中提取说话人身份特征。使用提取的说话人身份特征,然后使用DNN分类器来验证所声明的主播身份。此外,在混合系统中引入了子空间特征归一化(SFN)来进行鲁棒特征提取,以补偿由录音设备引起的音频不匹配问题。通过主播验证实验对该混合系统的等错误率进行了评价。实验结果表明,该系统优于当前最先进的i向量和支持向量机(SVM)混合系统。此外,该系统通过集成SFN进一步增强,有效补偿主播检测任务中的音频不匹配问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system
Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信