Deep encoder and decoder for time-domain speech separation

IF 0.4 Q4 ENGINEERING, MECHANICAL
Kohei TAKAHASHI, Toshihiko SHIRAISHI
{"title":"Deep encoder and decoder for time-domain speech separation","authors":"Kohei TAKAHASHI, Toshihiko SHIRAISHI","doi":"10.1299/mej.23-00124","DOIUrl":null,"url":null,"abstract":"The previous research of speech separation has significantly improved separation performance based on the time-domain method: encoder, separator, and decoder. Most research has focused on revising the architecture of the separator. In contrast, a single 1-D convolution layer and 1-D transposed convolution layer have been used as encoder and decoder, respectively. This study proposes deep encoder and decoder architectures, consisting of stacked 1-D convolution layers, 1-D transposed convolution layers, or residual blocks, for the time-domain speech separation. The intentions of revising them are to improve separation performance and overcome the tradeoff between separation performance and computational cost due to their stride by enhancing their mapping ability. We applied them to Conv-TasNet, the typical model in the time-domain speech separation. Our results indicate that the better separation performance is archived as the number of their layers increases and that changing the number of their layers from 1 to 12 results in more than 1 dB improvement of SI-SDR on WSJ0-2mix. Additionally, it is suggested that the encoder and decoder should be deeper, corresponding to their stride since their task may be more difficult as the stride becomes larger. This study represents the importance of improving these architectures as well as separators.","PeriodicalId":45233,"journal":{"name":"Mechanical Engineering Journal","volume":"8 1","pages":"0"},"PeriodicalIF":0.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mechanical Engineering Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1299/mej.23-00124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The previous research of speech separation has significantly improved separation performance based on the time-domain method: encoder, separator, and decoder. Most research has focused on revising the architecture of the separator. In contrast, a single 1-D convolution layer and 1-D transposed convolution layer have been used as encoder and decoder, respectively. This study proposes deep encoder and decoder architectures, consisting of stacked 1-D convolution layers, 1-D transposed convolution layers, or residual blocks, for the time-domain speech separation. The intentions of revising them are to improve separation performance and overcome the tradeoff between separation performance and computational cost due to their stride by enhancing their mapping ability. We applied them to Conv-TasNet, the typical model in the time-domain speech separation. Our results indicate that the better separation performance is archived as the number of their layers increases and that changing the number of their layers from 1 to 12 results in more than 1 dB improvement of SI-SDR on WSJ0-2mix. Additionally, it is suggested that the encoder and decoder should be deeper, corresponding to their stride since their task may be more difficult as the stride becomes larger. This study represents the importance of improving these architectures as well as separators.
时域语音分离的深度编码器和解码器
以往的语音分离研究,基于时域方法:编码器、分隔符和解码器,显著提高了分离性能。大多数研究都集中在改进分离器的结构上。相比之下,单个1-D卷积层和1-D转置卷积层分别用作编码器和解码器。本研究提出了用于时域语音分离的深度编码器和解码器架构,包括堆叠的一维卷积层,一维转置卷积层或残差块。修改它们的目的是为了提高分离性能,通过增强它们的映射能力来克服分离性能和计算成本之间的权衡。并将其应用到时域语音分离的典型模型——卷积tasnet中。结果表明,随着层数的增加,SI-SDR在WSJ0-2mix上的分离性能得到了更好的提高,将层数从1层增加到12层,SI-SDR在WSJ0-2mix上的分离性能提高了1 dB以上。此外,建议编码器和解码器应该更深,与他们的步幅相对应,因为随着步幅变大,他们的任务可能会更困难。这项研究表明了改进这些体系结构以及分离器的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mechanical Engineering Journal
Mechanical Engineering Journal ENGINEERING, MECHANICAL-
自引率
20.00%
发文量
42
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信