Time Delay Estimation of Reverberant Meeting Speech: On the Use of Multichannel Linear Prediction

2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System Pub Date : 2007-12-16 DOI:10.1109/SITIS.2007.96

E. Cheng, I. Burnett, C. Ritz

{"title":"Time Delay Estimation of Reverberant Meeting Speech: On the Use of Multichannel Linear Prediction","authors":"E. Cheng, I. Burnett, C. Ritz","doi":"10.1109/SITIS.2007.96","DOIUrl":null,"url":null,"abstract":"Effective and efficient access to multiparty meeting recordings requires techniques for meeting analysis and indexing. Since meeting participants are generally stationary, speaker location information may be used to identify meeting events e.g., detect speaker changes. Time-delay estimation (TDE) utilizing cross-correlation of multichannel speech recordings is a common approach for deriving speech source location information. Research improved TDE by calculating TDE from linear prediction (LP) residual signals obtained from LP analysis on each individual speech channel. This paper investigates the use of LP residuals for speech TDE, where the residuals are obtained from jointly modeling the multiple speech channels. Experiments conducted with a simulated reverberant room and real room recordings show that jointly modeled LP better predicts the LP coefficients, compared to LP applied to individual channels. Both the individually and jointly modeled LP exhibit similar TDE performance, and outperform TDE on the speech alone, especially with the real recordings.","PeriodicalId":234433,"journal":{"name":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2007.96","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Effective and efficient access to multiparty meeting recordings requires techniques for meeting analysis and indexing. Since meeting participants are generally stationary, speaker location information may be used to identify meeting events e.g., detect speaker changes. Time-delay estimation (TDE) utilizing cross-correlation of multichannel speech recordings is a common approach for deriving speech source location information. Research improved TDE by calculating TDE from linear prediction (LP) residual signals obtained from LP analysis on each individual speech channel. This paper investigates the use of LP residuals for speech TDE, where the residuals are obtained from jointly modeling the multiple speech channels. Experiments conducted with a simulated reverberant room and real room recordings show that jointly modeled LP better predicts the LP coefficients, compared to LP applied to individual channels. Both the individually and jointly modeled LP exhibit similar TDE performance, and outperform TDE on the speech alone, especially with the real recordings.

查看原文本刊更多论文

混响会议演讲的时延估计:多通道线性预测的应用

有效和高效率地获取多方会议记录需要会议分析和索引技术。由于会议参与者通常是静止的，演讲者的位置信息可以用来识别会议事件，例如，检测演讲者的变化。利用多通道语音记录的互相关进行时延估计是提取语音源位置信息的常用方法。研究通过对每个语音通道进行线性预测分析得到的线性预测残差信号计算TDE来改进TDE。本文研究了残差在语音TDE中的应用，残差是通过对多个语音通道进行联合建模得到的。用模拟混响室和真实房间录音进行的实验表明，与单独通道的LP相比，联合建模的LP可以更好地预测LP系数。单独和联合建模的LP都表现出相似的TDE性能，并且在单独的语音上优于TDE，特别是在真实录音上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System

自引率

0.00%

发文量