Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection

Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino
{"title":"Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection","authors":"Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659528","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.
基于BLSTM-CTC的弱标记学习用于声音事件检测
本文提出了一种基于连接时间分类(connectionist temporal classification, BLSTM- ctc)的双向长短期记忆弱标记学习方法,以减少学习样本的手工标记成本。BLSTM-CTC使我们能够通过CTC的损失计算来更新BLSTM的参数,而不是使用弱标记样本时无法进行精确的误差计算,因为弱标记样本只有每个单个声音事件的事件类别。在本文提出的方法中,我们首先使用少量的强标记样本进行BLSTM的强标记学习,这些样本具有每个单个声音事件及其事件类的开始和结束的时间戳作为初始学习。然后,我们使用大量的弱标记样本作为额外的学习,基于BLSTM-CTC进行弱标记学习。为了评估该方法的性能,我们使用声学场景和事件检测与分类(DCASE) 2016任务2提供的数据集进行了声音事件检测实验。结果表明,本文提出的方法与前面提到的初始学习相比,将基于片段的F1分数提高了1.9%。此外,与使用大量强标记样本进行额外学习相比,它成功地将标记成本降低了95%,尽管F1分数下降了1.3%。这一结果证实了我们的弱标记学习方法对于学习BLSTM是有效的,并且人工标记成本很低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信