{"title":"Videolardaki Çevresel Sesleri Tanımak İçin Derin Öğrenme Tabanlı Bir Model Geliştirme","authors":"Bedirhan Karakaya, Emre Beray Boztepe, Bahadır Karasulu","doi":"10.36287/setsci.5.1.011","DOIUrl":null,"url":null,"abstract":"— Nowadays, decomposition of various environmental sounds for environment recognition has gained popularity. Various background sounds in videos could be classified with high success with deep learning and machine learning techniques. In this way, semantically enriched video scenes can be depicted. This work contains the process of developing a convenient deep learning neural network model for environmental sounds recognition. In training the developed model, ten main categories have been chosen from a dataset that has various data to test the model's prediction power by experiment. From the limited data available, first, spectrograms have been produced and then, these spectrograms have been enriched by the help of data augmentation techniques. In this way, attribute diversity that was gained from data has been increased. Also, with three different design approaches for training the model, source codes have been written. These codes have been created by using deep learning network model-based methods such as Convolutional Neural Networks, Long Short Term Memory, Gated Recurrent Unit. Seven different designed neural network models have been trained by experiments and achievement has been proved by tests. With the highest accuracy obtained from one of the generated models, approximately %87 of accuracy has been obtained. This work contains obtained experimental results and scientific evaluation.","PeriodicalId":332893,"journal":{"name":"5th International Symposium on Innovative Approaches in Smart Technologies Proceedings","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Symposium on Innovative Approaches in Smart Technologies Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36287/setsci.5.1.011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
— Nowadays, decomposition of various environmental sounds for environment recognition has gained popularity. Various background sounds in videos could be classified with high success with deep learning and machine learning techniques. In this way, semantically enriched video scenes can be depicted. This work contains the process of developing a convenient deep learning neural network model for environmental sounds recognition. In training the developed model, ten main categories have been chosen from a dataset that has various data to test the model's prediction power by experiment. From the limited data available, first, spectrograms have been produced and then, these spectrograms have been enriched by the help of data augmentation techniques. In this way, attribute diversity that was gained from data has been increased. Also, with three different design approaches for training the model, source codes have been written. These codes have been created by using deep learning network model-based methods such as Convolutional Neural Networks, Long Short Term Memory, Gated Recurrent Unit. Seven different designed neural network models have been trained by experiments and achievement has been proved by tests. With the highest accuracy obtained from one of the generated models, approximately %87 of accuracy has been obtained. This work contains obtained experimental results and scientific evaluation.