{"title":"基于波束形成-输出比的语音活动分类","authors":"N. T. Tran, W. Cowley, A. Pollok","doi":"10.1109/AusCTW.2012.6164913","DOIUrl":null,"url":null,"abstract":"In a conversation between multiple speakers, each person participates in the speech at different times. Therefore the active speakers in each speech segment are unknown. However, identifying the voice activity (VA) of the speakers of interest is required for adaptive beamforming techniques such as minimum variance distortionless response beamforming and the adaptive blocking beamforming (AB). Considering two speakers, this paper addresses a voice activity classification (VAC) problem that focuses on identifying the active speaker(s) in each speech segment. The proposed method is based on a new concept, the beamformer-output-ratio (BOR). This value is calculated from the outputs of two different beamformers steering at two speakers. The first part of the paper introduces the definition of BOR, the VAC method using BOR and simulation results. The simulations are based on real recordings and show a high classification accuracy. In the second part of the paper, the theoretical results of the BOR of the delay-and-sum (DS) beamforming are presented, including BOR formula derived in different environments and its behaviour in relation to parameter errors.","PeriodicalId":320391,"journal":{"name":"2012 Australian Communications Theory Workshop (AusCTW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Voice activity classification using beamformer-output-ratio\",\"authors\":\"N. T. Tran, W. Cowley, A. Pollok\",\"doi\":\"10.1109/AusCTW.2012.6164913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a conversation between multiple speakers, each person participates in the speech at different times. Therefore the active speakers in each speech segment are unknown. However, identifying the voice activity (VA) of the speakers of interest is required for adaptive beamforming techniques such as minimum variance distortionless response beamforming and the adaptive blocking beamforming (AB). Considering two speakers, this paper addresses a voice activity classification (VAC) problem that focuses on identifying the active speaker(s) in each speech segment. The proposed method is based on a new concept, the beamformer-output-ratio (BOR). This value is calculated from the outputs of two different beamformers steering at two speakers. The first part of the paper introduces the definition of BOR, the VAC method using BOR and simulation results. The simulations are based on real recordings and show a high classification accuracy. In the second part of the paper, the theoretical results of the BOR of the delay-and-sum (DS) beamforming are presented, including BOR formula derived in different environments and its behaviour in relation to parameter errors.\",\"PeriodicalId\":320391,\"journal\":{\"name\":\"2012 Australian Communications Theory Workshop (AusCTW)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Australian Communications Theory Workshop (AusCTW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AusCTW.2012.6164913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Australian Communications Theory Workshop (AusCTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AusCTW.2012.6164913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice activity classification using beamformer-output-ratio
In a conversation between multiple speakers, each person participates in the speech at different times. Therefore the active speakers in each speech segment are unknown. However, identifying the voice activity (VA) of the speakers of interest is required for adaptive beamforming techniques such as minimum variance distortionless response beamforming and the adaptive blocking beamforming (AB). Considering two speakers, this paper addresses a voice activity classification (VAC) problem that focuses on identifying the active speaker(s) in each speech segment. The proposed method is based on a new concept, the beamformer-output-ratio (BOR). This value is calculated from the outputs of two different beamformers steering at two speakers. The first part of the paper introduces the definition of BOR, the VAC method using BOR and simulation results. The simulations are based on real recordings and show a high classification accuracy. In the second part of the paper, the theoretical results of the BOR of the delay-and-sum (DS) beamforming are presented, including BOR formula derived in different environments and its behaviour in relation to parameter errors.