鲁棒ASR的直接掩蔽方法

William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang
{"title":"鲁棒ASR的直接掩蔽方法","authors":"William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang","doi":"10.1109/TASL.2013.2263802","DOIUrl":null,"url":null,"abstract":"Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263802","citationCount":"43","resultStr":"{\"title\":\"A Direct Masking Approach to Robust ASR\",\"authors\":\"William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang\",\"doi\":\"10.1109/TASL.2013.2263802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2263802\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2263802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2263802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

近年来,人们对用于语音隔离的二进制掩码的计算进行了大量的研究。ASR领域的传统观点认为,这些二元掩模不能直接使用;缺失能量显著影响ASR中常用的倒谱特征的计算。我们表明,这种普遍持有的信念可能是一种误解;我们演示了在小型和大型词汇表数据集上直接使用屏蔽数据的有效性。事实上,这种方法(我们称之为直接掩蔽方法)的性能与之前提出的两种缺失特征技术相当。我们还调查了其他研究人员可能没有得出这一结论的原因;特征的方差归一化是影响性能的一个重要因素。这项工作为缺失特征ASR的未来工作提供了比未增强语音更好的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Direct Masking Approach to Robust ASR
Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Audio Speech and Language Processing
IEEE Transactions on Audio Speech and Language Processing 工程技术-工程:电子与电气
自引率
0.00%
发文量
0
审稿时长
24.0 months
期刊介绍: The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信