{"title":"语音编码器DTX操作的鲁棒语音活动检测","authors":"F. Basbug, S. Nandkumar, K. Swaminathan","doi":"10.1109/SCFT.1999.781483","DOIUrl":null,"url":null,"abstract":"Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Robust voice activity detection for DTX operation of speech coders\",\"authors\":\"F. Basbug, S. Nandkumar, K. Swaminathan\",\"doi\":\"10.1109/SCFT.1999.781483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.\",\"PeriodicalId\":372569,\"journal\":{\"name\":\"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCFT.1999.781483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCFT.1999.781483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust voice activity detection for DTX operation of speech coders
Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.