{"title":"Multi-speaker beamforming for voice activity classification","authors":"Thuy N. Tran, W. Cowley, A. Pollok","doi":"10.1109/AusCTW.2013.6510055","DOIUrl":null,"url":null,"abstract":"In a multi-speaker environment, voice activity classification (VAC) attempts to identify active speaker(s) at different recording periods. Using a beamformer-output-ratio (BOR) from a multi-beamforming system, an efficient solution for VAC is available by comparing the calculated BOR with pre-specified thresholds. Considering two speakers, this paper derives theoretical results on BOR statistics, including the probability distribution function and the cumulative distribution function (c.d.f.) of the BOR employing an assumption that the narrow-band signal power in the frequency domain is Gamma distributed. Using the c.d.f. of the BOR, the thresholds for VAC can be automatically calculated via a closed form expression for given acceptable mis-detection rates. The method is tested with simulated recording setups for a non-reverberant environment and a 0.3 second reverberation time environment. Both simulations show high accuracy for the classification.","PeriodicalId":177106,"journal":{"name":"2013 Australian Communications Theory Workshop (AusCTW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Australian Communications Theory Workshop (AusCTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AusCTW.2013.6510055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In a multi-speaker environment, voice activity classification (VAC) attempts to identify active speaker(s) at different recording periods. Using a beamformer-output-ratio (BOR) from a multi-beamforming system, an efficient solution for VAC is available by comparing the calculated BOR with pre-specified thresholds. Considering two speakers, this paper derives theoretical results on BOR statistics, including the probability distribution function and the cumulative distribution function (c.d.f.) of the BOR employing an assumption that the narrow-band signal power in the frequency domain is Gamma distributed. Using the c.d.f. of the BOR, the thresholds for VAC can be automatically calculated via a closed form expression for given acceptable mis-detection rates. The method is tested with simulated recording setups for a non-reverberant environment and a 0.3 second reverberation time environment. Both simulations show high accuracy for the classification.