{"title":"Evaluating audio features for speech/non-speech discrimination","authors":"H. Redelinghuys, Zenghui Wang","doi":"10.1109/ICAITPR51569.2022.9844226","DOIUrl":null,"url":null,"abstract":"In this paper, the suitability of audio features for application in speech-music discrimination was evaluated to select a feature set that produces high mean accuracy in the classification algorithm, while also reducing the total feature space. The first four standardized moments of twelve audio features were evaluated namely the mean, variance, skewness and kurtosis of the Root Mean Square value, Short Time Energy Ratio, Zero Crossing Rate, Spectral Rolloff, Spectral Flux, Spectral Centroid, Energy Entropy, Spectral Entropy, the first 13 Mel Frequency Cepstral Coefficients (MFCC), Percentage Low Energy Frames, Modified Low Energy Ratio and 4 Hz Modulation Energy. The 4 Hz modulation Energy feature was computed by two different methods, firstly as a by-product of the MFCC feature and secondly using the Hilbert transform for envelope detection. This resulted in an 88-dimensional feature space. It was demonstrated that with a thorough feature selection process a higher mean accuracy and 50% reduction in dimensionality was achieved.","PeriodicalId":262409,"journal":{"name":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAITPR51569.2022.9844226","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, the suitability of audio features for application in speech-music discrimination was evaluated to select a feature set that produces high mean accuracy in the classification algorithm, while also reducing the total feature space. The first four standardized moments of twelve audio features were evaluated namely the mean, variance, skewness and kurtosis of the Root Mean Square value, Short Time Energy Ratio, Zero Crossing Rate, Spectral Rolloff, Spectral Flux, Spectral Centroid, Energy Entropy, Spectral Entropy, the first 13 Mel Frequency Cepstral Coefficients (MFCC), Percentage Low Energy Frames, Modified Low Energy Ratio and 4 Hz Modulation Energy. The 4 Hz modulation Energy feature was computed by two different methods, firstly as a by-product of the MFCC feature and secondly using the Hilbert transform for envelope detection. This resulted in an 88-dimensional feature space. It was demonstrated that with a thorough feature selection process a higher mean accuracy and 50% reduction in dimensionality was achieved.