Ofer Schwartz, Aviv David, Ofer Shahen-Tov, S. Gannot
{"title":"基于转向响应功率输出熵的多麦克风语音活动和单话音检测器","authors":"Ofer Schwartz, Aviv David, Ofer Shahen-Tov, S. Gannot","doi":"10.1109/ICSEE.2018.8646089","DOIUrl":null,"url":null,"abstract":"Voice activity detection (VAD), namely determining whether a speech signal is active or inactive, and single talk detector (STD), namely detecting that only one speaker is active, are important building blocks in many speech processing applications. A speaker-localization stage (such as the steered response power (SRP)) is often concurrently implemented on the same device.In this paper, the spatial properties of the SRP are utilized for improving the performance of both the voice activity detector (VAD) and the STD. We propose to measure the entropy at the SRP output and compare with the typical entropy of noise-only frames. This feature utilizes spatial information and may therefore become advantageous in nonstationary noise environments. The STD can then be implemented by determining local minimum values of the entropy measure of the SRP.The proposed VAD was tested for a single speaker with two cases, directional background noise with changing level and with a background music source. The proposed STD was tested using real recordings of two concurrent speakers.","PeriodicalId":254455,"journal":{"name":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multi-microphone voice activity and single-talk detectors based on steered-response power output entropy\",\"authors\":\"Ofer Schwartz, Aviv David, Ofer Shahen-Tov, S. Gannot\",\"doi\":\"10.1109/ICSEE.2018.8646089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice activity detection (VAD), namely determining whether a speech signal is active or inactive, and single talk detector (STD), namely detecting that only one speaker is active, are important building blocks in many speech processing applications. A speaker-localization stage (such as the steered response power (SRP)) is often concurrently implemented on the same device.In this paper, the spatial properties of the SRP are utilized for improving the performance of both the voice activity detector (VAD) and the STD. We propose to measure the entropy at the SRP output and compare with the typical entropy of noise-only frames. This feature utilizes spatial information and may therefore become advantageous in nonstationary noise environments. The STD can then be implemented by determining local minimum values of the entropy measure of the SRP.The proposed VAD was tested for a single speaker with two cases, directional background noise with changing level and with a background music source. The proposed STD was tested using real recordings of two concurrent speakers.\",\"PeriodicalId\":254455,\"journal\":{\"name\":\"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSEE.2018.8646089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEE.2018.8646089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-microphone voice activity and single-talk detectors based on steered-response power output entropy
Voice activity detection (VAD), namely determining whether a speech signal is active or inactive, and single talk detector (STD), namely detecting that only one speaker is active, are important building blocks in many speech processing applications. A speaker-localization stage (such as the steered response power (SRP)) is often concurrently implemented on the same device.In this paper, the spatial properties of the SRP are utilized for improving the performance of both the voice activity detector (VAD) and the STD. We propose to measure the entropy at the SRP output and compare with the typical entropy of noise-only frames. This feature utilizes spatial information and may therefore become advantageous in nonstationary noise environments. The STD can then be implemented by determining local minimum values of the entropy measure of the SRP.The proposed VAD was tested for a single speaker with two cases, directional background noise with changing level and with a background music source. The proposed STD was tested using real recordings of two concurrent speakers.