基于hmm的单通道声源定位声传递函数分离

2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-03-14 DOI:10.1109/ICASSP.2010.5496188

R. Takashima, T. Takiguchi, Y. Ariki

{"title":"基于hmm的单通道声源定位声传递函数分离","authors":"R. Takashima, T. Takiguchi, Y. Ariki","doi":"10.1109/ICASSP.2010.5496188","DOIUrl":null,"url":null,"abstract":"This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position, where the cepstral parameters are used to effectively represent useful clean speech. Using the estimated frame sequence data, the GMM (Gaussian Mixture Model) of the acoustic transfer function is created to deal with the influence of a room impulse response. Then, for each test data set, we find a maximum-likelihood GMM from among the estimated GMMs corresponding to each position. The effectiveness of this method has been confirmed by talker localization experiments performed in a room environment.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"234 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"HMM-based separation of acoustic transfer function for single-channel sound source localization\",\"authors\":\"R. Takashima, T. Takiguchi, Y. Ariki\",\"doi\":\"10.1109/ICASSP.2010.5496188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position, where the cepstral parameters are used to effectively represent useful clean speech. Using the estimated frame sequence data, the GMM (Gaussian Mixture Model) of the acoustic transfer function is created to deal with the influence of a room impulse response. Then, for each test data set, we find a maximum-likelihood GMM from among the estimated GMMs corresponding to each position. The effectiveness of this method has been confirmed by talker localization experiments performed in a room environment.\",\"PeriodicalId\":293333,\"journal\":{\"name\":\"2010 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"234 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2010.5496188\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2010.5496188","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

本文提出了一种单麦克风声源(说话人)定位方法，该方法利用隐马尔可夫模型(HMM)从用户位置估计声音传递函数。新方法可以在不测量脉冲响应的情况下进行估计。通过最大化从给定位置发出的训练数据的可能性来估计声传递函数的帧序列，其中倒谱参数用于有效地表示有用的干净语音。利用估计的帧序列数据，建立了声学传递函数的高斯混合模型来处理房间脉冲响应的影响。然后，对于每个测试数据集，我们从每个位置对应的估计GMM中找到一个最大似然GMM。在室内环境下进行的对讲人定位实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HMM-based separation of acoustic transfer function for single-channel sound source localization

This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position, where the cepstral parameters are used to effectively represent useful clean speech. Using the estimated frame sequence data, the GMM (Gaussian Mixture Model) of the acoustic transfer function is created to deal with the influence of a room impulse response. Then, for each test data set, we find a maximum-likelihood GMM from among the estimated GMMs corresponding to each position. The effectiveness of this method has been confirmed by talker localization experiments performed in a room environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量