Feature-level Attention Pooling for Overlapping Sound Event Detection

2022 International Conference on Networking and Network Applications (NaNA) Pub Date : 2022-12-01 DOI:10.1109/NaNA56854.2022.00098

Mei Wang, Zhengyi An, Yuh-Lin Yao, Xiyu Song

{"title":"Feature-level Attention Pooling for Overlapping Sound Event Detection","authors":"Mei Wang, Zhengyi An, Yuh-Lin Yao, Xiyu Song","doi":"10.1109/NaNA56854.2022.00098","DOIUrl":null,"url":null,"abstract":"The aim of overlapping sound event detection (SED) is to detect the sound events classes contained and the corresponding time stamps in audio. In real life, the sound events contained in a piece of audio are usually very complicated, and various sound events overlap and alternate, which greatly increases the difficulty of overlapping SED. In our past research, we found that the attention mechanism can effectively cope with the mentioned difficulty, as long as its need for reinforcement training on a large amount of strongly labeled data is satisfied. However, strongly labeled data is usually not easy to obtain, so how to use weakly labeled data has become mainstream. Against this background, we propose a new overlapping SED method that can effectively utilize weakly labeled data through the use of the feature-level attention pooling strategy under the multiple-instance learning (MIL) framework. Moreover, a feature-level attention pooling strategy with different learnable parameters is studied and compared with a decision-level attention pooling strategy in our work, so the advantages of attention with a feature-level attention pooling strategy in overlapping SED can be better shown. We compare the classification results of overlapping SED using feature-level and decision-level attention pooling in the DCASE 2021 task 4 scenario (Sound Event Detection and Separation in Domestic Environments). Test results show that the PSDS1 (focus on the sensitiveness to the detection speed) of feature-level attention pooling is kept unchanged (it means the learnable parameters training does not burden the overlapping SED), but the PSDS2 (focus on the classification effect) is improved by 3%, which is slightly above the score generated from the decision-level attention pooling (it means feature-level attention pooling has better returns). Although the results are only slightly improved when compared to the decision-level attention pooling, it has somehow improved research guidance for the research of improving the classification results of overlapping SED. Our conclusion is that feature-level attention pooling is good for exploiting weak-label data, and can achieve better classification without increasing computational complexity.","PeriodicalId":113743,"journal":{"name":"2022 International Conference on Networking and Network Applications (NaNA)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Networking and Network Applications (NaNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NaNA56854.2022.00098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of overlapping sound event detection (SED) is to detect the sound events classes contained and the corresponding time stamps in audio. In real life, the sound events contained in a piece of audio are usually very complicated, and various sound events overlap and alternate, which greatly increases the difficulty of overlapping SED. In our past research, we found that the attention mechanism can effectively cope with the mentioned difficulty, as long as its need for reinforcement training on a large amount of strongly labeled data is satisfied. However, strongly labeled data is usually not easy to obtain, so how to use weakly labeled data has become mainstream. Against this background, we propose a new overlapping SED method that can effectively utilize weakly labeled data through the use of the feature-level attention pooling strategy under the multiple-instance learning (MIL) framework. Moreover, a feature-level attention pooling strategy with different learnable parameters is studied and compared with a decision-level attention pooling strategy in our work, so the advantages of attention with a feature-level attention pooling strategy in overlapping SED can be better shown. We compare the classification results of overlapping SED using feature-level and decision-level attention pooling in the DCASE 2021 task 4 scenario (Sound Event Detection and Separation in Domestic Environments). Test results show that the PSDS1 (focus on the sensitiveness to the detection speed) of feature-level attention pooling is kept unchanged (it means the learnable parameters training does not burden the overlapping SED), but the PSDS2 (focus on the classification effect) is improved by 3%, which is slightly above the score generated from the decision-level attention pooling (it means feature-level attention pooling has better returns). Although the results are only slightly improved when compared to the decision-level attention pooling, it has somehow improved research guidance for the research of improving the classification results of overlapping SED. Our conclusion is that feature-level attention pooling is good for exploiting weak-label data, and can achieve better classification without increasing computational complexity.

查看原文本刊更多论文

重叠声音事件检测的特征级注意力池

重叠声音事件检测(SED)的目的是检测音频中包含的声音事件类别和相应的时间戳。在现实生活中，一段音频中所包含的声音事件通常是非常复杂的，各种声音事件相互重叠和交替，这大大增加了重叠SED的难度。在我们过去的研究中，我们发现注意机制可以有效地应对上述困难，只要满足其对大量强标记数据的强化训练需求。然而，强标记数据通常不容易获得，因此如何使用弱标记数据成为主流。在此背景下，我们提出了一种新的重叠SED方法，通过在多实例学习(MIL)框架下使用特征级注意力池策略，可以有效地利用弱标记数据。此外，我们还研究了具有不同可学习参数的特征级注意力池策略，并将其与决策级注意力池策略进行了比较，从而更好地显示了特征级注意力池策略在重叠SED中的优势。我们在DCASE 2021任务4场景(家庭环境中的声音事件检测和分离)中比较了使用特征级和决策级注意力池的重叠SED分类结果。测试结果表明，特征级注意力池的PSDS1(关注检测速度的敏感性)保持不变(这意味着可学习参数的训练不会给重叠SED带来负担)，但PSDS2(关注分类效果)提高了3%，略高于决策级注意力池的得分(这意味着特征级注意力池具有更好的回报)。虽然与决策层注意力池相比，结果仅略有改善，但对改进重叠SED分类结果的研究具有一定的指导意义。我们的结论是，特征级注意力池可以很好地利用弱标签数据，并且可以在不增加计算复杂度的情况下实现更好的分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Networking and Network Applications (NaNA)

自引率

0.00%

发文量