{"title":"基于样本幅度方差的类元音语音检测","authors":"N. Srinivas, G. Pradhan, P. Kumar","doi":"10.1109/NCC.2019.8732268","DOIUrl":null,"url":null,"abstract":"Vowel, semi vowel and diphthong sound units are collectively referred to as vowel-like speech (VLS). VLS are dominant voiced regions in a given speech signal. Consequently, within a short-analysis frame the variance of sample magnitudes (VSM) is significantly higher for VLS when compared with other speech regions. In this work, a signal processing approach is proposed to robustly extract the VSM within an analysis frame. The VSM at each time instant is then non-linearly mapped (NLM) using negative exponential function to suppress the fluctuations. The NLM-VSM values are nearly constant and significantly less in magnitude for VLS than other speech, silence and noise regions. The NLM-VSM is used as a front-end feature for detecting the VLS in a given speech signal. The experimental results presented in this paper show that, for clean as well as noisy speech signals, the proposed feature outperforms some of the earlier reported features for the task of detecting VLS and corresponding onset and offset points.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"1 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Vowel-Like Speech Using Variance of Sample Magnitudes\",\"authors\":\"N. Srinivas, G. Pradhan, P. Kumar\",\"doi\":\"10.1109/NCC.2019.8732268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vowel, semi vowel and diphthong sound units are collectively referred to as vowel-like speech (VLS). VLS are dominant voiced regions in a given speech signal. Consequently, within a short-analysis frame the variance of sample magnitudes (VSM) is significantly higher for VLS when compared with other speech regions. In this work, a signal processing approach is proposed to robustly extract the VSM within an analysis frame. The VSM at each time instant is then non-linearly mapped (NLM) using negative exponential function to suppress the fluctuations. The NLM-VSM values are nearly constant and significantly less in magnitude for VLS than other speech, silence and noise regions. The NLM-VSM is used as a front-end feature for detecting the VLS in a given speech signal. The experimental results presented in this paper show that, for clean as well as noisy speech signals, the proposed feature outperforms some of the earlier reported features for the task of detecting VLS and corresponding onset and offset points.\",\"PeriodicalId\":6870,\"journal\":{\"name\":\"2019 National Conference on Communications (NCC)\",\"volume\":\"1 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2019.8732268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detection of Vowel-Like Speech Using Variance of Sample Magnitudes
Vowel, semi vowel and diphthong sound units are collectively referred to as vowel-like speech (VLS). VLS are dominant voiced regions in a given speech signal. Consequently, within a short-analysis frame the variance of sample magnitudes (VSM) is significantly higher for VLS when compared with other speech regions. In this work, a signal processing approach is proposed to robustly extract the VSM within an analysis frame. The VSM at each time instant is then non-linearly mapped (NLM) using negative exponential function to suppress the fluctuations. The NLM-VSM values are nearly constant and significantly less in magnitude for VLS than other speech, silence and noise regions. The NLM-VSM is used as a front-end feature for detecting the VLS in a given speech signal. The experimental results presented in this paper show that, for clean as well as noisy speech signals, the proposed feature outperforms some of the earlier reported features for the task of detecting VLS and corresponding onset and offset points.