用于长录音语音分离的双路径RNN

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI:10.1109/SLT48900.2021.9383514

Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, K. Kinoshita, Christoph Böddeker, Y. Qian, Shinji Watanabe, Zhuo Chen

{"title":"用于长录音语音分离的双路径RNN","authors":"Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, K. Kinoshita, Christoph Böddeker, Y. Qian, Shinji Watanabe, Zhuo Chen","doi":"10.1109/SLT48900.2021.9383514","DOIUrl":null,"url":null,"abstract":"Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Dual-Path RNN for Long Recording Speech Separation\",\"authors\":\"Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, K. Kinoshita, Christoph Böddeker, Y. Qian, Shinji Watanabe, Zhuo Chen\",\"doi\":\"10.1109/SLT48900.2021.9383514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.\",\"PeriodicalId\":243211,\"journal\":{\"name\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT48900.2021.9383514\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

连续语音分离(CSS)是语音分离领域的一项新兴任务，旨在从长而部分重叠的录音中分离出无重叠的目标。先前提出的句子级分离模型的一个直接扩展是将长记录分割成固定长度的块，并独立地对它们进行分离。然而，这种简单的扩展并不能完全解决跨块依赖，并且分离性能可能不令人满意。在本文中，我们重点研究了如何通过探索利用跨块信息的方法来提高块级分离性能。基于最近提出的双路径RNN (DPRNN)架构，我们研究了DPRNN如何通过交错的块内和块间模块来帮助块级分离。实验结果表明，在一定的设置下，DPRNN在离线和块在线配置下都能显著优于基线块级模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dual-Path RNN for Long Recording Speech Separation

Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量