基于时频图像特征的噪声条件下音频监控

2014 19th International Conference on Digital Signal Processing Pub Date : 2014-09-18 DOI:10.1109/ICDSP.2014.6900815

R. Sharan, T. Moir

{"title":"基于时频图像特征的噪声条件下音频监控","authors":"R. Sharan, T. Moir","doi":"10.1109/ICDSP.2014.6900815","DOIUrl":null,"url":null,"abstract":"In this paper, we use the novel method of using features extracted from the time-frequency image representation of a sound signal in an audio surveillance application. In particular, we investigate two image representations: linear grayscale and log grayscale. We first divide a sound signal into smaller frames and apply a windowing function. The absolute value of the Discrete Fourier Transform of each frame is then computed and normalized to get the intensity values for the linear grayscale image. The generation of the log grayscale image takes a similar approach but we take log power of the values before data normalization. Each image is then divided into blocks and central moments are computed in each block. We carry out experimentation under different noise conditions and varying signal-to-noise ratio using support vector machines for classification. Based on the classification accuracy, the linear grayscale image approach is found to be more noise robust than the log grayscale image approach. It was also found to perform better than using mel-frequency cepstral coefficients as features which is a common baseline feature in most sound recognition applications.","PeriodicalId":301856,"journal":{"name":"2014 19th International Conference on Digital Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Audio surveillance under noisy conditions using time-frequency image feature\",\"authors\":\"R. Sharan, T. Moir\",\"doi\":\"10.1109/ICDSP.2014.6900815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we use the novel method of using features extracted from the time-frequency image representation of a sound signal in an audio surveillance application. In particular, we investigate two image representations: linear grayscale and log grayscale. We first divide a sound signal into smaller frames and apply a windowing function. The absolute value of the Discrete Fourier Transform of each frame is then computed and normalized to get the intensity values for the linear grayscale image. The generation of the log grayscale image takes a similar approach but we take log power of the values before data normalization. Each image is then divided into blocks and central moments are computed in each block. We carry out experimentation under different noise conditions and varying signal-to-noise ratio using support vector machines for classification. Based on the classification accuracy, the linear grayscale image approach is found to be more noise robust than the log grayscale image approach. It was also found to perform better than using mel-frequency cepstral coefficients as features which is a common baseline feature in most sound recognition applications.\",\"PeriodicalId\":301856,\"journal\":{\"name\":\"2014 19th International Conference on Digital Signal Processing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 19th International Conference on Digital Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSP.2014.6900815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 19th International Conference on Digital Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2014.6900815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在本文中，我们在音频监控应用中使用从声音信号的时频图像表示中提取特征的新方法。特别地，我们研究了两种图像表示:线性灰度和对数灰度。我们首先将声音信号分成更小的帧并应用窗口函数。然后计算每帧的离散傅里叶变换的绝对值并归一化以得到线性灰度图像的强度值。对数灰度图像的生成采用类似的方法，但我们在数据归一化之前对值进行对数次幂。然后将每个图像分成块，并在每个块中计算中心矩。我们利用支持向量机在不同的噪声条件和不同的信噪比下进行分类实验。基于分类精度，发现线性灰度图像方法比对数灰度图像方法具有更强的噪声鲁棒性。我们还发现，它比使用mel-frequency倒谱系数作为特征表现得更好，这是大多数声音识别应用中常见的基线特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Audio surveillance under noisy conditions using time-frequency image feature

In this paper, we use the novel method of using features extracted from the time-frequency image representation of a sound signal in an audio surveillance application. In particular, we investigate two image representations: linear grayscale and log grayscale. We first divide a sound signal into smaller frames and apply a windowing function. The absolute value of the Discrete Fourier Transform of each frame is then computed and normalized to get the intensity values for the linear grayscale image. The generation of the log grayscale image takes a similar approach but we take log power of the values before data normalization. Each image is then divided into blocks and central moments are computed in each block. We carry out experimentation under different noise conditions and varying signal-to-noise ratio using support vector machines for classification. Based on the classification accuracy, the linear grayscale image approach is found to be more noise robust than the log grayscale image approach. It was also found to perform better than using mel-frequency cepstral coefficients as features which is a common baseline feature in most sound recognition applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 19th International Conference on Digital Signal Processing

自引率

0.00%

发文量