An Investigation into the Multi-channel Time Domain Speaker Extraction Network

Catalin Zorila, Mohan Li, R. Doddipatla
{"title":"An Investigation into the Multi-channel Time Domain Speaker Extraction Network","authors":"Catalin Zorila, Mohan Li, R. Doddipatla","doi":"10.1109/SLT48900.2021.9383582","DOIUrl":null,"url":null,"abstract":"This paper presents an investigation into the effectiveness of spatial features for improving time-domain speaker extraction systems. A two-dimensional Convolutional Neural Network (CNN) based encoder is proposed to capture the spatial information within the multichannel input, which are then combined with the spectral features of a single channel extraction network. Two variants of target speaker extraction methods were tested, one which employs a pre-trained i-vector system to compute a speaker embedding (System A), and one which employs a jointly trained neural network to extract the embeddings directly from time domain enrolment signals (System B). The evaluation was performed on the spatialized WSJ0-2mix dataset using the Signal-to-Distortion Ratio (SDR) metric, and ASR accuracy. In the anechoic condition, more than 10 dB and 7 dB absolute SDR gains were achieved when the 2-D CNN spatial encoder features were included with Systems A and B, respectively. The performance gains in reverberation were lower, however, we have demonstrated that retraining the systems by applying dereverberation preprocessing can significantly boost both the target speaker extraction and ASR performances.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This paper presents an investigation into the effectiveness of spatial features for improving time-domain speaker extraction systems. A two-dimensional Convolutional Neural Network (CNN) based encoder is proposed to capture the spatial information within the multichannel input, which are then combined with the spectral features of a single channel extraction network. Two variants of target speaker extraction methods were tested, one which employs a pre-trained i-vector system to compute a speaker embedding (System A), and one which employs a jointly trained neural network to extract the embeddings directly from time domain enrolment signals (System B). The evaluation was performed on the spatialized WSJ0-2mix dataset using the Signal-to-Distortion Ratio (SDR) metric, and ASR accuracy. In the anechoic condition, more than 10 dB and 7 dB absolute SDR gains were achieved when the 2-D CNN spatial encoder features were included with Systems A and B, respectively. The performance gains in reverberation were lower, however, we have demonstrated that retraining the systems by applying dereverberation preprocessing can significantly boost both the target speaker extraction and ASR performances.
多通道时域说话人提取网络的研究
本文研究了空间特征在改进时域说话人提取系统中的有效性。提出了一种基于二维卷积神经网络(CNN)的编码器,用于捕获多通道输入中的空间信息,然后将其与单通道提取网络的频谱特征相结合。测试了两种不同的目标说话人提取方法,一种是使用预训练的i向量系统来计算说话人嵌入(系统a),另一种是使用联合训练的神经网络直接从时域注册信号中提取嵌入(系统B)。使用信号失真比(SDR)度量和ASR精度对空间化WSJ0-2mix数据集进行评估。在消声条件下,当系统A和系统B分别包含二维CNN空间编码器特征时,SDR绝对增益分别超过10 dB和7 dB。混响时的性能增益较低,然而,我们已经证明,通过应用去混响预处理对系统进行再训练可以显著提高目标扬声器提取和ASR性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信