{"title":"基于音频处理和AlexNet神经网络的鸟类物种识别","authors":"Sanjay Gandhi Gundabatini, Sangam Sai, Sri Vinay, Reddy, Thota Chandrika, Somarouthu Kaarthikeya, Pavana Kumaar, Shaik Siddik, Torlikonda Satya Akhil","doi":"10.48047/ijfans/v11/i12/173","DOIUrl":null,"url":null,"abstract":"This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.","PeriodicalId":290296,"journal":{"name":"International Journal of Food and Nutritional Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bird Species Identification using Audio Processing and AlexNet Neural Network\",\"authors\":\"Sanjay Gandhi Gundabatini, Sangam Sai, Sri Vinay, Reddy, Thota Chandrika, Somarouthu Kaarthikeya, Pavana Kumaar, Shaik Siddik, Torlikonda Satya Akhil\",\"doi\":\"10.48047/ijfans/v11/i12/173\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.\",\"PeriodicalId\":290296,\"journal\":{\"name\":\"International Journal of Food and Nutritional Sciences\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Food and Nutritional Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48047/ijfans/v11/i12/173\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Food and Nutritional Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48047/ijfans/v11/i12/173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bird Species Identification using Audio Processing and AlexNet Neural Network
This research involved identifying the birds with the help of audio recorded from the real world environment. Most of the methods use images for detection of birds. But, some species may be similar to see. Hence, we took audio as the basis for classification. The audio frequency was plotted as spectrogram and it was inspected to extract the patterns and classify the bird. Legacy practices involved manual inspection of spectrogram that are plotted on the frequency of audio signals. But, this is time taking process and often produces inaccurate results. Hence, we created a computerized process to inspect the spectrogram. The computer learns from patterns of spectrogram during the training process and learns to detect a new and unseen audio. This entire procedure involved two crucial phases. The first stage of the process was to create a dataset with audio files collected from websites like Xeno-canto org that includes all sound recordings of birds. In this research work, we considered 4 species of wood pecker in the Germany region. Hence, we have collected approximately 120 recordings for each species, thus a total of 500 recordings were collected. The collected sounds undergone a series of pre-processing phases like reconstruction, framing, and silence removal, pre-emphasis for removing any noises like human actions, wind sounds, tree sounds. For every processed sound clip, the spectrogram is plotted and it was given as input to a neural network that is in the second stage, which in turn detects the recording at the end. Since the image was given as input, we used Convolutional Neural Network (CNN) which is a best neural net in deep learning for text and image based tasks. The CNN categorizes sound clip and determines the species of bird based on input features. A model was created and put into practice.