Saroj Anand Tripathy, Adeel Abdul Sakkeer, Utkarsh Utkarsh, Deepanshu Saini, S. Narayanan, Sourabh Tiwari, Kalyanaraman Pattabiraman, Rashmi T. Shankarappa
{"title":"Sound AI Engine for Detection and Classification of Overlapping Sounds in Home Environment","authors":"Saroj Anand Tripathy, Adeel Abdul Sakkeer, Utkarsh Utkarsh, Deepanshu Saini, S. Narayanan, Sourabh Tiwari, Kalyanaraman Pattabiraman, Rashmi T. Shankarappa","doi":"10.1109/DELCON57910.2023.10127311","DOIUrl":null,"url":null,"abstract":"Classification of sounds present in the environment is a major issue in the field of audio pattern recognition and classification. For gaining insights into the frequency and time features of Log-Mel Spectrogram with higher efficiency, the Neural Network model is implemented to give the classified audio output which can be further used for various use cases. In this research work, we have considered, both overlapping and non-overlapping audio data, and considered generic input features such as MFCC, bit depth, sampling rate etc. We conducted experiments with various deep learning models based on variants of Convolutional Neural Networks (CNN). Performance evaluation is carried out on the most popular datasets such as UrbanSound8k, DCASE, and AudioSet. Our proposed model using 13-layered CNN has achieved a mean Average Precision (mAP) of up to 0.42. In this paper we have shown the significance of our model using overlapping sounds compared to the state-of-the-art models.","PeriodicalId":193577,"journal":{"name":"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DELCON57910.2023.10127311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Classification of sounds present in the environment is a major issue in the field of audio pattern recognition and classification. For gaining insights into the frequency and time features of Log-Mel Spectrogram with higher efficiency, the Neural Network model is implemented to give the classified audio output which can be further used for various use cases. In this research work, we have considered, both overlapping and non-overlapping audio data, and considered generic input features such as MFCC, bit depth, sampling rate etc. We conducted experiments with various deep learning models based on variants of Convolutional Neural Networks (CNN). Performance evaluation is carried out on the most popular datasets such as UrbanSound8k, DCASE, and AudioSet. Our proposed model using 13-layered CNN has achieved a mean Average Precision (mAP) of up to 0.42. In this paper we have shown the significance of our model using overlapping sounds compared to the state-of-the-art models.