M. A. Sankar, M. Aiswariya, Dominic Anna Rose, B. Anushree, D. Shree, P. Lakshmipriya, P. S. Sathidevi
{"title":"Speech Sound Classification and Estimation of Optimal Order of LPC Using Neural Network","authors":"M. A. Sankar, M. Aiswariya, Dominic Anna Rose, B. Anushree, D. Shree, P. Lakshmipriya, P. S. Sathidevi","doi":"10.1145/3271553.3271611","DOIUrl":null,"url":null,"abstract":"Speech codec which is an integral part of most of the communication standards consists of a Voice activity detector (VAD) module followed by an encoder that uses Linear Predictive Coding (LPC). These two modules have a lot of potential for improvements that can yield low bit-rates without compromising quality. VAD is used for detecting voice activity in the input signal, which is an important step in achieving high efficiency speech coding. LPC analysis of input speech at an optimal order can assure maximum SNR and thereby perceptual quality while reducing the transmission bit-rate. This paper proposes a novel method to classify speech into Voiced/ Unvoiced/ Silence/ Music/ Background noise (V/UV/S/M/BN) frames and to find optimal order of LPC for each frame using neural network. The speech sound classifier module gives classification of frames into five categories with very high accuracy. Choosing the order predicted by neural network as the optimal LPC order for voiced frames while keeping a low order for unvoiced frames maintains the reconstruction quality and brings down the bit-rate.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3271553.3271611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Speech codec which is an integral part of most of the communication standards consists of a Voice activity detector (VAD) module followed by an encoder that uses Linear Predictive Coding (LPC). These two modules have a lot of potential for improvements that can yield low bit-rates without compromising quality. VAD is used for detecting voice activity in the input signal, which is an important step in achieving high efficiency speech coding. LPC analysis of input speech at an optimal order can assure maximum SNR and thereby perceptual quality while reducing the transmission bit-rate. This paper proposes a novel method to classify speech into Voiced/ Unvoiced/ Silence/ Music/ Background noise (V/UV/S/M/BN) frames and to find optimal order of LPC for each frame using neural network. The speech sound classifier module gives classification of frames into five categories with very high accuracy. Choosing the order predicted by neural network as the optimal LPC order for voiced frames while keeping a low order for unvoiced frames maintains the reconstruction quality and brings down the bit-rate.