基于简化MobileNet架构的声学场景分类

Jun-Xiang Xu, Tzu-Ching Lin, Tsai-Ching Yu, Tzu-Chiang Tai, P. Chang
{"title":"基于简化MobileNet架构的声学场景分类","authors":"Jun-Xiang Xu, Tzu-Ching Lin, Tsai-Ching Yu, Tzu-Chiang Tai, P. Chang","doi":"10.1109/ISM.2018.00038","DOIUrl":null,"url":null,"abstract":"Sounds are ubiquitous in our daily lives, for instance, sounds of vehicles or sounds of conversations between people. Therefore, it is easy to collect all these soundtracks and categorize them into different groups. By doing so, we can use these assets to recognize the scene. Acoustic scene classification allows us to do so by training our machine which can further be installed on devices such as smartphones. This provides people with convenience which improves our lives. Our goal is to maximize our validation rate of our machine learning results and also optimize our usage of hardware. We utilize the dataset from IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) to train our machine. The data of DCASE 2017 contains 15 different kinds of outdoor audio recordings, including beach, bus, restaurant etc. In this work, we use two different types of signal processing techniques which are Log Mel and HPSS (Harmonic-Percussive Sound Separation). Next we modify and reduce the MobileNet structure to train our dataset. We also make use of fine-tuning and late fusion to make our results more accurate and to improve our performances. With the structure aforementioned, we succeed in reaching the validation rate of 75.99% which is approximately the seventh highest performing group of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2017, with less computational complexity comparing with others having higher accuracy. We deem it a worthy trade-off.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Acoustic Scene Classification Using Reduced MobileNet Architecture\",\"authors\":\"Jun-Xiang Xu, Tzu-Ching Lin, Tsai-Ching Yu, Tzu-Chiang Tai, P. Chang\",\"doi\":\"10.1109/ISM.2018.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sounds are ubiquitous in our daily lives, for instance, sounds of vehicles or sounds of conversations between people. Therefore, it is easy to collect all these soundtracks and categorize them into different groups. By doing so, we can use these assets to recognize the scene. Acoustic scene classification allows us to do so by training our machine which can further be installed on devices such as smartphones. This provides people with convenience which improves our lives. Our goal is to maximize our validation rate of our machine learning results and also optimize our usage of hardware. We utilize the dataset from IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) to train our machine. The data of DCASE 2017 contains 15 different kinds of outdoor audio recordings, including beach, bus, restaurant etc. In this work, we use two different types of signal processing techniques which are Log Mel and HPSS (Harmonic-Percussive Sound Separation). Next we modify and reduce the MobileNet structure to train our dataset. We also make use of fine-tuning and late fusion to make our results more accurate and to improve our performances. With the structure aforementioned, we succeed in reaching the validation rate of 75.99% which is approximately the seventh highest performing group of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2017, with less computational complexity comparing with others having higher accuracy. We deem it a worthy trade-off.\",\"PeriodicalId\":308698,\"journal\":{\"name\":\"2018 IEEE International Symposium on Multimedia (ISM)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Symposium on Multimedia (ISM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2018.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Multimedia (ISM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2018.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

声音在我们的日常生活中无处不在,例如,车辆的声音或人与人之间的对话声音。因此,很容易收集所有这些音轨并将它们分类为不同的组。通过这样做,我们可以使用这些资产来识别场景。声学场景分类允许我们通过训练我们的机器来做到这一点,这可以进一步安装在智能手机等设备上。这为人们提供了便利,改善了我们的生活。我们的目标是最大化机器学习结果的验证率,同时优化硬件的使用。我们利用来自IEEE声学场景和事件检测与分类(DCASE)的数据集来训练我们的机器。DCASE 2017的数据包含15种不同类型的户外录音,包括海滩、巴士、餐厅等。在这项工作中,我们使用了两种不同类型的信号处理技术,即对数梅尔和HPSS(谐波-冲击声分离)。接下来,我们修改和减少MobileNet结构来训练我们的数据集。我们还利用微调和后期融合使我们的结果更准确,并提高我们的性能。使用上述结构,我们成功地达到了75.99%的验证率,这大约是2017年声学场景和事件检测和分类(DCASE)挑战赛的第七高表现组,与其他具有更高精度的组相比,计算复杂度更低。我们认为这是值得的交易。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Acoustic Scene Classification Using Reduced MobileNet Architecture
Sounds are ubiquitous in our daily lives, for instance, sounds of vehicles or sounds of conversations between people. Therefore, it is easy to collect all these soundtracks and categorize them into different groups. By doing so, we can use these assets to recognize the scene. Acoustic scene classification allows us to do so by training our machine which can further be installed on devices such as smartphones. This provides people with convenience which improves our lives. Our goal is to maximize our validation rate of our machine learning results and also optimize our usage of hardware. We utilize the dataset from IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) to train our machine. The data of DCASE 2017 contains 15 different kinds of outdoor audio recordings, including beach, bus, restaurant etc. In this work, we use two different types of signal processing techniques which are Log Mel and HPSS (Harmonic-Percussive Sound Separation). Next we modify and reduce the MobileNet structure to train our dataset. We also make use of fine-tuning and late fusion to make our results more accurate and to improve our performances. With the structure aforementioned, we succeed in reaching the validation rate of 75.99% which is approximately the seventh highest performing group of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2017, with less computational complexity comparing with others having higher accuracy. We deem it a worthy trade-off.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信