Videolardaki Çevresel Sesleri Tanımak İçin Derin Öğrenme Tabanlı Bir Model Geliştirme

Bedirhan Karakaya, Emre Beray Boztepe, Bahadır Karasulu
{"title":"Videolardaki Çevresel Sesleri Tanımak İçin Derin Öğrenme Tabanlı Bir Model Geliştirme","authors":"Bedirhan Karakaya, Emre Beray Boztepe, Bahadır Karasulu","doi":"10.36287/setsci.5.1.011","DOIUrl":null,"url":null,"abstract":"— Nowadays, decomposition of various environmental sounds for environment recognition has gained popularity. Various background sounds in videos could be classified with high success with deep learning and machine learning techniques. In this way, semantically enriched video scenes can be depicted. This work contains the process of developing a convenient deep learning neural network model for environmental sounds recognition. In training the developed model, ten main categories have been chosen from a dataset that has various data to test the model's prediction power by experiment. From the limited data available, first, spectrograms have been produced and then, these spectrograms have been enriched by the help of data augmentation techniques. In this way, attribute diversity that was gained from data has been increased. Also, with three different design approaches for training the model, source codes have been written. These codes have been created by using deep learning network model-based methods such as Convolutional Neural Networks, Long Short Term Memory, Gated Recurrent Unit. Seven different designed neural network models have been trained by experiments and achievement has been proved by tests. With the highest accuracy obtained from one of the generated models, approximately %87 of accuracy has been obtained. This work contains obtained experimental results and scientific evaluation.","PeriodicalId":332893,"journal":{"name":"5th International Symposium on Innovative Approaches in Smart Technologies Proceedings","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Symposium on Innovative Approaches in Smart Technologies Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36287/setsci.5.1.011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

— Nowadays, decomposition of various environmental sounds for environment recognition has gained popularity. Various background sounds in videos could be classified with high success with deep learning and machine learning techniques. In this way, semantically enriched video scenes can be depicted. This work contains the process of developing a convenient deep learning neural network model for environmental sounds recognition. In training the developed model, ten main categories have been chosen from a dataset that has various data to test the model's prediction power by experiment. From the limited data available, first, spectrograms have been produced and then, these spectrograms have been enriched by the help of data augmentation techniques. In this way, attribute diversity that was gained from data has been increased. Also, with three different design approaches for training the model, source codes have been written. These codes have been created by using deep learning network model-based methods such as Convolutional Neural Networks, Long Short Term Memory, Gated Recurrent Unit. Seven different designed neural network models have been trained by experiments and achievement has been proved by tests. With the highest accuracy obtained from one of the generated models, approximately %87 of accuracy has been obtained. This work contains obtained experimental results and scientific evaluation.
-如今,对各种环境声音进行分解以进行环境识别已经得到了普及。利用深度学习和机器学习技术,可以很好地对视频中的各种背景声音进行分类。通过这种方式,可以描绘出语义丰富的视频场景。这项工作包含了开发一个方便的深度学习神经网络模型用于环境声音识别的过程。在训练所建立的模型时,从具有不同数据的数据集中选择了10个主要类别,通过实验来检验模型的预测能力。从有限的可用数据中,首先生成了谱图,然后通过数据增强技术丰富了这些谱图。这样可以增加从数据中获得的属性多样性。此外,还使用三种不同的设计方法来训练模型,并编写了源代码。这些代码是通过使用基于深度学习网络模型的方法(如卷积神经网络、长短期记忆、门控循环单元)创建的。通过实验训练了七种不同设计的神经网络模型,并通过测试验证了结果。从其中一个生成的模型中获得的最高精度约为%87。本工作包含已获得的实验结果和科学评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信