Using a blind EC mechanism for modelling the interaction between binaural and temporal speech processing

IF 1.4 3区物理与天体物理 Q4 ACOUSTICS

Acta Acustica Pub Date : 2022-01-01 DOI:10.1051/aacus/2022009

Saskia Rӧttges, C. Hauth, J. Rennies, T. Brand

{"title":"Using a blind EC mechanism for modelling the interaction between binaural and temporal speech processing","authors":"Saskia Rӧttges, C. Hauth, J. Rennies, T. Brand","doi":"10.1051/aacus/2022009","DOIUrl":null,"url":null,"abstract":"We reanalyzed a study that investigated binaural and temporal integration of speech reflections with different amplitudes, delays, and interaural phase differences. We used a blind binaural speech intelligibility model (bBSIM), applying an equalization-cancellation process for modeling binaural release from masking. bBSIM is blind, as it requires only the mixed binaural speech and noise signals and no auxiliary information about the listening conditions. bBSIM was combined with two non-blind back-ends: The speech intelligibility index (SII) and the speech transmission index (STI) resulting in hybrid-models. Furthermore, bBSIM was combined with the non-intrusive short-time objective intelligibility (NI-STOI) resulting in a fully blind model. The fully non-blind reference model used in the previous study achieved the best prediction accuracy (R2 = 0.91 and RMSE = 1 dB). The fully blind model yielded a coefficient of determination (R2 = 0.87) similar to that of the reference model but also the highest root mean square error of the models tested in this study (RMSE = 4.4 dB). By adjusting the binaural processing errors of bBSIM as done in the reference model, the RMSE could be decreased to 1.9 dB. Furthermore, in this study, the dynamic range of the SII had to be adjusted to predict the low SRTs of the speech material used.","PeriodicalId":48486,"journal":{"name":"Acta Acustica","volume":"22 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Acustica","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1051/aacus/2022009","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 2

Abstract

We reanalyzed a study that investigated binaural and temporal integration of speech reflections with different amplitudes, delays, and interaural phase differences. We used a blind binaural speech intelligibility model (bBSIM), applying an equalization-cancellation process for modeling binaural release from masking. bBSIM is blind, as it requires only the mixed binaural speech and noise signals and no auxiliary information about the listening conditions. bBSIM was combined with two non-blind back-ends: The speech intelligibility index (SII) and the speech transmission index (STI) resulting in hybrid-models. Furthermore, bBSIM was combined with the non-intrusive short-time objective intelligibility (NI-STOI) resulting in a fully blind model. The fully non-blind reference model used in the previous study achieved the best prediction accuracy (R2 = 0.91 and RMSE = 1 dB). The fully blind model yielded a coefficient of determination (R2 = 0.87) similar to that of the reference model but also the highest root mean square error of the models tested in this study (RMSE = 4.4 dB). By adjusting the binaural processing errors of bBSIM as done in the reference model, the RMSE could be decreased to 1.9 dB. Furthermore, in this study, the dynamic range of the SII had to be adjusted to predict the low SRTs of the speech material used.

查看原文本刊更多论文

使用盲EC机制来模拟双耳和时间语音处理之间的相互作用

我们重新分析了一项研究，该研究调查了具有不同振幅，延迟和耳间相位差的双耳和时间整合语音反射。我们使用盲双耳语音清晰度模型(bBSIM)，应用均衡-抵消过程来建模双耳从掩蔽中释放。bBSIM是盲的，它只需要双耳混合语音和噪声信号，而不需要关于收听情况的辅助信息。bBSIM与两个非盲后端:语音可理解度指数(SII)和语音传输指数(STI)相结合，形成混合模型。此外，将bBSIM与非侵入性短时客观可解度(NI-STOI)相结合，形成全盲模型。先前研究中采用的全非盲参考模型预测精度最好(R2 = 0.91, RMSE = 1 dB)。全盲模型的决定系数(R2 = 0.87)与参考模型相似，但也是本研究检验的模型中均方根误差最高的(RMSE = 4.4 dB)。与参考模型一样，通过调整bBSIM双耳处理误差，可将RMSE降至1.9 dB。此外，在本研究中，必须调整SII的动态范围以预测所使用语音材料的低srt。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acta Acustica ACOUSTICS-

CiteScore

2.80

自引率

21.40%

发文量

审稿时长

12 weeks

期刊介绍： Acta Acustica, the Journal of the European Acoustics Association (EAA). After the publication of its Journal Acta Acustica from 1993 to 1995, the EAA published Acta Acustica united with Acustica from 1996 to 2019. From 2020, the EAA decided to publish a journal in full Open Access. See Article Processing charges. Acta Acustica reports on original scientific research in acoustics and on engineering applications. The journal considers review papers, scientific papers, technical and applied papers, short communications, letters to the editor. From time to time, special issues and review articles are also published. For book reviews or doctoral thesis abstracts, please contact the Editor in Chief.