基于音频处理和AlexNet神经网络的鸟类物种识别

International Journal of Food and Nutritional Sciences Pub Date : 2023-04-05 DOI:10.48047/ijfans/v11/i12/173

Sanjay Gandhi Gundabatini, Sangam Sai, Sri Vinay, Reddy, Thota Chandrika, Somarouthu Kaarthikeya, Pavana Kumaar, Shaik Siddik, Torlikonda Satya Akhil

{"title":"基于音频处理和AlexNet神经网络的鸟类物种识别","authors":"Sanjay Gandhi Gundabatini, Sangam Sai, Sri Vinay, Reddy, Thota Chandrika, Somarouthu Kaarthikeya, Pavana Kumaar, Shaik Siddik, Torlikonda Satya Akhil","doi":"10.48047/ijfans/v11/i12/173","DOIUrl":null,"url":null,"abstract":"This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.","PeriodicalId":290296,"journal":{"name":"International Journal of Food and Nutritional Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bird Species Identification using Audio Processing and AlexNet Neural Network\",\"authors\":\"Sanjay Gandhi Gundabatini, Sangam Sai, Sri Vinay, Reddy, Thota Chandrika, Somarouthu Kaarthikeya, Pavana Kumaar, Shaik Siddik, Torlikonda Satya Akhil\",\"doi\":\"10.48047/ijfans/v11/i12/173\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.\",\"PeriodicalId\":290296,\"journal\":{\"name\":\"International Journal of Food and Nutritional Sciences\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Food and Nutritional Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48047/ijfans/v11/i12/173\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Food and Nutritional Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48047/ijfans/v11/i12/173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

这项研究包括在真实环境中录制的音频的帮助下识别鸟类。大多数方法使用图像来检测鸟类。但是，有些物种可能是相似的。因此，我们以音频作为分类的依据。将音频绘制为频谱图，并对其进行检测以提取模式并对鸟类进行分类。传统的做法包括手动检查频谱图，绘制在音频信号的频率上。但是，这是一个耗时的过程，而且经常产生不准确的结果。因此，我们创建了一个计算机程序来检查光谱图。在训练过程中，计算机从频谱图的模式中学习，并学会检测新的和未见过的音频。整个过程包括两个关键阶段。这个过程的第一阶段是从Xeno-canto org等网站收集的音频文件创建一个数据集，其中包括所有鸟类的录音。在这项研究工作中，我们考虑了德国地区的4种啄木鸟。因此，我们为每个物种收集了大约120个录音，因此总共收集了500个录音。收集到的声音经过了一系列的预处理阶段，如重建，框架和沉默去除，预先强调去除任何噪音，如人类的行为，风的声音，树的声音。对于每一个处理过的声音片段，谱图都被绘制出来，并作为第二阶段的神经网络的输入，该神经网络在最后检测录音。由于图像作为输入，我们使用卷积神经网络(CNN)，这是深度学习中基于文本和图像的任务中最好的神经网络。CNN对声音片段进行分类，并根据输入的特征确定鸟类的种类。建立了一个模型并付诸实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bird Species Identification using Audio Processing and AlexNet Neural Network

This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Food and Nutritional Sciences

自引率

0.00%

发文量