使用全极群延迟特性自动识别环境声音事件

2015 23rd European Signal Processing Conference (EUSIPCO) Pub Date : 2015-12-28 DOI:10.1109/EUSIPCO.2015.7362479

Aleksandr Diment, Emre Çakir, T. Heittola, T. Virtanen

{"title":"使用全极群延迟特性自动识别环境声音事件","authors":"Aleksandr Diment, Emre Çakir, T. Heittola, T. Virtanen","doi":"10.1109/EUSIPCO.2015.7362479","DOIUrl":null,"url":null,"abstract":"A feature based on the group delay function from all-pole models (APGD) is proposed for environmental sound event recognition. The commonly used spectral features take into account merely the magnitude information, whereas the phase is overlooked due to the complications related to its interpretation. Additional information concealed in the phase is hypothesised to be beneficial for sound event recognition. The APGD is an approach to inferring phase information, which has shown applicability for speech and music analysis and is now studied in environmental audio. The evaluation is performed within a multi-label deep neural network (DNN) framework on a diverse real-life dataset of environmental sounds. It shows performance improvement compared to the baseline log mel-band energy case. Combined with the magnitude-based features, APGD demonstrates further improvement.","PeriodicalId":401040,"journal":{"name":"2015 23rd European Signal Processing Conference (EUSIPCO)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Automatic recognition of environmental sound events using all-pole group delay features\",\"authors\":\"Aleksandr Diment, Emre Çakir, T. Heittola, T. Virtanen\",\"doi\":\"10.1109/EUSIPCO.2015.7362479\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A feature based on the group delay function from all-pole models (APGD) is proposed for environmental sound event recognition. The commonly used spectral features take into account merely the magnitude information, whereas the phase is overlooked due to the complications related to its interpretation. Additional information concealed in the phase is hypothesised to be beneficial for sound event recognition. The APGD is an approach to inferring phase information, which has shown applicability for speech and music analysis and is now studied in environmental audio. The evaluation is performed within a multi-label deep neural network (DNN) framework on a diverse real-life dataset of environmental sounds. It shows performance improvement compared to the baseline log mel-band energy case. Combined with the magnitude-based features, APGD demonstrates further improvement.\",\"PeriodicalId\":401040,\"journal\":{\"name\":\"2015 23rd European Signal Processing Conference (EUSIPCO)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 23rd European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUSIPCO.2015.7362479\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUSIPCO.2015.7362479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

提出了一种基于全极模型(APGD)群延迟函数的环境声事件识别特征。常用的光谱特征只考虑了星等信息，而相位由于其解释的复杂性而被忽略。假设隐藏在相位中的附加信息有利于声音事件识别。APGD是一种推断相位信息的方法，已显示出对语音和音乐分析的适用性，目前正在研究环境音频。评估是在一个多标签深度神经网络(DNN)框架内对不同的真实环境声音数据集进行的。与基线对数频带能量情况相比，它显示了性能改进。结合基于震级的特征，APGD显示出进一步的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic recognition of environmental sound events using all-pole group delay features

A feature based on the group delay function from all-pole models (APGD) is proposed for environmental sound event recognition. The commonly used spectral features take into account merely the magnitude information, whereas the phase is overlooked due to the complications related to its interpretation. Additional information concealed in the phase is hypothesised to be beneficial for sound event recognition. The APGD is an approach to inferring phase information, which has shown applicability for speech and music analysis and is now studied in environmental audio. The evaluation is performed within a multi-label deep neural network (DNN) framework on a diverse real-life dataset of environmental sounds. It shows performance improvement compared to the baseline log mel-band energy case. Combined with the magnitude-based features, APGD demonstrates further improvement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 23rd European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量