{"title":"DPARNet-RSE: Toward Angular Region-Customizable Speech Extraction","authors":"Yi Yang;Caigen Zhou","doi":"10.1109/LSP.2025.3613271","DOIUrl":null,"url":null,"abstract":"Most existing angular region-wise speech extraction methods face two critical limitations: inflexibility when handling different region boundaries, and performance degradation due to the varying numbers of speakers within the target regions. To address these issues, we adapt our recently proposed DPARNet, a lightweight dual-path attention and recurrent network for speech separation, into DPARNet-RSE, to perform angular region-customizable speech extraction. The key innovations include: (1) a boundary-conditioned attention module that encodes target boundaries into dynamic queries for robust region modeling; (2) a curriculum learning-based training approach that stabilizes convergence by progressively introducing data diversity; (3) a silence probability prediction module that directly triggers silent outputs when no target speaker is detected, effectively reducing speech and noise residuals in zero-target cases. The experimental results demonstrate its superior performance, robustness, generalization capability, and scalability in complex scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3779-3783"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11175497/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Most existing angular region-wise speech extraction methods face two critical limitations: inflexibility when handling different region boundaries, and performance degradation due to the varying numbers of speakers within the target regions. To address these issues, we adapt our recently proposed DPARNet, a lightweight dual-path attention and recurrent network for speech separation, into DPARNet-RSE, to perform angular region-customizable speech extraction. The key innovations include: (1) a boundary-conditioned attention module that encodes target boundaries into dynamic queries for robust region modeling; (2) a curriculum learning-based training approach that stabilizes convergence by progressively introducing data diversity; (3) a silence probability prediction module that directly triggers silent outputs when no target speaker is detected, effectively reducing speech and noise residuals in zero-target cases. The experimental results demonstrate its superior performance, robustness, generalization capability, and scalability in complex scenarios.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.