{"title":"语音信号听觉表示中的噪声鲁棒性","authors":"Kuansan Wang, S. Shamma, W. Byrne","doi":"10.1109/ICASSP.1993.319306","DOIUrl":null,"url":null,"abstract":"A common sequence of operations in the early stages of most biological sensory systems is a wavelet transform followed by a compressive nonlinearity. The contribution of these operations to the formation of robust and perceptually significant representations in the auditory system is explored. It is demonstrated that the neural representation of a complex signal such as speech is derived from a highly reduced version of its wavelet transform, specifically, from the distribution of its locally averaged zero-crossing rates along the temporal and scale axes. It is shown analytically that such encoding of the wavelet transform results in mutual suppressive interactions across its different scale representations. Suppression in turn endows the representation with enhanced spectral peaks and superior robustness in noisy environments. Examples using natural speech vowels are presented to illustrate the results.<<ETX>>","PeriodicalId":428449,"journal":{"name":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Noise robustness in the auditory representation of speech signals\",\"authors\":\"Kuansan Wang, S. Shamma, W. Byrne\",\"doi\":\"10.1109/ICASSP.1993.319306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common sequence of operations in the early stages of most biological sensory systems is a wavelet transform followed by a compressive nonlinearity. The contribution of these operations to the formation of robust and perceptually significant representations in the auditory system is explored. It is demonstrated that the neural representation of a complex signal such as speech is derived from a highly reduced version of its wavelet transform, specifically, from the distribution of its locally averaged zero-crossing rates along the temporal and scale axes. It is shown analytically that such encoding of the wavelet transform results in mutual suppressive interactions across its different scale representations. Suppression in turn endows the representation with enhanced spectral peaks and superior robustness in noisy environments. Examples using natural speech vowels are presented to illustrate the results.<<ETX>>\",\"PeriodicalId\":428449,\"journal\":{\"name\":\"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1993.319306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1993 IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1993.319306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Noise robustness in the auditory representation of speech signals
A common sequence of operations in the early stages of most biological sensory systems is a wavelet transform followed by a compressive nonlinearity. The contribution of these operations to the formation of robust and perceptually significant representations in the auditory system is explored. It is demonstrated that the neural representation of a complex signal such as speech is derived from a highly reduced version of its wavelet transform, specifically, from the distribution of its locally averaged zero-crossing rates along the temporal and scale axes. It is shown analytically that such encoding of the wavelet transform results in mutual suppressive interactions across its different scale representations. Suppression in turn endows the representation with enhanced spectral peaks and superior robustness in noisy environments. Examples using natural speech vowels are presented to illustrate the results.<>