{"title":"Implementation of Constant-Q Transform (CQT) and Mel Spectrogram to converting Bird’s Sound","authors":"Silvester Dian Handy Permana, Ketut Bayu Yogha Bintoro","doi":"10.1109/COMNETSAT53002.2021.9530779","DOIUrl":null,"url":null,"abstract":"Classification of bird sounds can be done in various methods and ways. One method that can be used is CNN (Convolutional Neural Network). CNN is an algorithm used for image classification. For bird sounds to be classified by CNN, conversion from analogue sound to digital images is required objectively and accurately. This study will discuss the conversion of analogue sound from birds into spectrogram images using one of Constant-Q Transform (CQT) and Mel Spectrogram. Bird voices are recorded using a voice recorder. The recorded voice will represent the audio signal digitally. Constant-Q Transform will map the audio signal from a time domain to a frequency domain. The frequency will be converted into a log scale and the colour dimensions (amplitude) into decibels to form a spectrogram. The spectrogram will be mapped on a mel scale to form a mel spectrogram. This research is the change of bird’s voice analogously to mel spectrogram, classified in CNN. The resulting images from this study can be classified using CNN to help classify bird sounds.","PeriodicalId":148136,"journal":{"name":"2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","volume":"185 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMNETSAT53002.2021.9530779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Classification of bird sounds can be done in various methods and ways. One method that can be used is CNN (Convolutional Neural Network). CNN is an algorithm used for image classification. For bird sounds to be classified by CNN, conversion from analogue sound to digital images is required objectively and accurately. This study will discuss the conversion of analogue sound from birds into spectrogram images using one of Constant-Q Transform (CQT) and Mel Spectrogram. Bird voices are recorded using a voice recorder. The recorded voice will represent the audio signal digitally. Constant-Q Transform will map the audio signal from a time domain to a frequency domain. The frequency will be converted into a log scale and the colour dimensions (amplitude) into decibels to form a spectrogram. The spectrogram will be mapped on a mel scale to form a mel spectrogram. This research is the change of bird’s voice analogously to mel spectrogram, classified in CNN. The resulting images from this study can be classified using CNN to help classify bird sounds.