Journal on Audio Speech and Music Processing最新文献_第2页

Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization 基于单方向传声器和群稀疏优化的到达方向和功率谱密度估计

3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-10-04 DOI: 10.1186/s13636-023-00304-8

Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot

{"title":"Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization","authors":"Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot","doi":"10.1186/s13636-023-00304-8","DOIUrl":"https://doi.org/10.1186/s13636-023-00304-8","url":null,"abstract":"Abstract In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These approaches are based on a method previously presented by the authors, in which point source DOAs were estimated by using a broadband signal model and solving a group-sparse optimization problem, where the number of observations made by the rotating directional microphone can be lower than the number of candidate DOAs in an angular grid. The DOA estimation is followed by the estimation of the sources’ PSDs through the solution of an overdetermined least squares problem. The first approach proposed in this paper includes the use of an additional nonnegativity constraint on the residual noise term when solving the group-sparse optimization problem and is referred to as the Group Lasso Least Squares (GL-LS) approach. The second proposed approach, in addition to the new nonnegativity constraint, employs a narrowband signal model when building the linear system of equations used for formulating the group-sparse optimization problem, where the DOAs and PSDs can be jointly estimated by iterative, group-wise reweighting. This is referred to as the Group-Lasso with $$l_1$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:msub> <mml:mi>l</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:math> -reweighting (GL-L1) approach. Both proposed approaches are implemented using the alternating direction method of multipliers (ADMM), and their performance is evaluated through simulations in which different setup conditions are considered, ranging from different types of model mismatch to variations in the acoustic scene and microphone directivity pattern. The results obtained show that in a scenario involving a microphone response mismatch between observed data and the signal model used, having the additional nonnegativity constraint on the residual noise can improve the DOA estimation for the case of GL-LS and the PSD estimation for the case of GL-L1. Moreover, the GL-L1 approach can present an advantage over GL-LS in terms of DOA estimation performance in scenarios with low SNR or where multiple sources are closely located to each other. Finally, it is shown that having the least squares PSD re-estimation step is beneficial in most scenarios, such that GL-LS outperformed GL-L1 in terms of PSD estimation errors.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135590949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cascade algorithms for combined acoustic feedback cancelation and noise reduction 结合声反馈消除和降噪的级联算法

3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-09-21 DOI: 10.1186/s13636-023-00296-5

Santiago Ruiz, Toon van Waterschoot, Marc Moonen

{"title":"Cascade algorithms for combined acoustic feedback cancelation and noise reduction","authors":"Santiago Ruiz, Toon van Waterschoot, Marc Moonen","doi":"10.1186/s13636-023-00296-5","DOIUrl":"https://doi.org/10.1186/s13636-023-00296-5","url":null,"abstract":"Abstract This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelation (PEM-based AFC) algorithm is used for the AFC stage, while a multichannel Wiener filter (MWF) is applied for the NR stage. A scenario with M microphones and 1 loudspeaker is considered, without loss of generality. The first algorithm is the baseline algorithm, namely the cascade M -channel rank-1 MWF and PEM-AFC, where a NR stage is performed first using a rank-1 MWF followed by a single-channel AFC stage using a PEM-based AFC algorithm. The second algorithm is the cascade $$(M+1)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> -channel rank-2 MWF and PEM-AFC, where again a NR stage is applied first followed by a single-channel AFC stage. The novelty of this algorithm is to consider an ( $$M+1$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:math> )-channel data model in the MWF formulation with two different desired signals, i.e., the speech component in the reference microphone signal and in the loudspeaker signal, both defined by the speech source signal but not equal to each other. The two desired signal estimates are later used in a single-channel PEM-based AFC stage. The third algorithm is the cascade M -channel PEM-AFC and rank-1 MWF where an M -channel AFC stage is performed first followed by an M -channel NR stage. Although in cascade algorithms where NR is performed first and then AFC the estimation of the feedback path is usually affected by the NR stage, it is shown here that by performing a rank-2 approximation of the speech correlation matrix this issue can be avoided and the feedback path can be correctly estimated. The performance of the algorithms is assessed by means of closed-loop simulations where it is shown that for the considered input signal-to-noise ratios (iSNRs) the cascade $$(M+1)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> -channel rank-2 MWF and PEM-AFC and the cascade M -channel PEM-AFC and rank-1 MWF algorithms outperform the cascade M -channel rank-1 MWF and PEM-AFC algorithm in terms of the added stable gain (ASG) and misadjustment (Mis) as well as in terms of perceptual metrics such as the short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and signal distortion (SD).","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136155664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning-based robust speaker counting and separation with the aid of spatial coherence 基于学习的基于空间连贯的鲁棒说话人计数和分离

3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-09-20 DOI: 10.1186/s13636-023-00298-3

Yicheng Hsu, Mingsian R. Bai

{"title":"Learning-based robust speaker counting and separation with the aid of spatial coherence","authors":"Yicheng Hsu, Mingsian R. Bai","doi":"10.1186/s13636-023-00298-3","DOIUrl":"https://doi.org/10.1186/s13636-023-00298-3","url":null,"abstract":"Abstract A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker. In speaker counting, we use the eigenvalues of the SCM and the maximum similarity of the interframe global activity distributions between two speakers as the input features to the speaker counting network (SCnet). In speaker separation, a global and local activity-driven network (GLADnet) is used to extract each independent speaker signal, which is particularly useful for highly overlapping speech signals. Experimental results obtained from the real meeting recordings show that the proposed system achieves superior speaker counting and speaker separation performance compared to previous publications without the prior knowledge of the array configurations.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136263672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Acoustic object canceller: removing a known signal from monaural recording using blind synchronization 声学对象消除:使用盲同步从单声记录中去除已知信号

3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-09-11 DOI: 10.1186/s13636-023-00300-y

Takao Kawamura, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono, Ryoichi Miyazaki

引用次数: 0

The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study 幽默音频的力量:基于脑电图的交通拥堵情绪调节研究

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-09-07 DOI: 10.1186/s13636-023-00302-w

Lekai Zhang, Yingfan Wang, Kailun He, Hailong Zhang, Baixi Xing, Xiaofeng Liu, Fo Hu

引用次数: 0

Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning 具有个性化连续联合学习的学习域异构说话人识别系统

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-09-05 DOI: 10.1186/s13636-023-00299-2

Zhiyong Chen, Shugong Xu

引用次数: 0

Training audio transformers for cover song identification 训练用于翻唱歌曲识别的音频转换器

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-08-25 DOI: 10.1186/s13636-023-00297-4

Te Zeng, F. Lau

引用次数: 0

Channel and temporal-frequency attention UNet for monaural speech enhancement 用于单音语音增强的信道和时频注意UNet

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-08-14 DOI: 10.1186/s13636-023-00295-6

Shibiao Xu, Zehua Zhang, Mingjiang Wang

引用次数: 0

Dual input neural networks for positional sound source localization 位置声源定位的双输入神经网络

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-08-08 DOI: 10.1186/s13636-023-00301-x

Eric Grinstein, Vincent W. Neo, P. Naylor

引用次数: 0

Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting 用于远场说话人验证和关键词识别的多任务深度交叉注意网络

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-07-01 DOI: 10.1186/s13636-023-00293-8

Xingwei Liang, Zehua Zhang, Ruifeng Xu

引用次数: 0