{"title":"EHTNet: Twin-pooled CNN with Empirical Mode Decomposition and Hilbert Spectrum for Acoustic Scene Classification","authors":"Aswathy Madhu, K. Suresh","doi":"10.1109/SPCOM55316.2022.9840514","DOIUrl":null,"url":null,"abstract":"The objective of Acoustic Scene Classification (ASC) is to assist the machines in identifying the unique acoustic characteristics that define an environment. In recent times, Convolutional Neural Networks (CNNs) have contributed significantly to the success of many state-of-the-art frameworks for ASC. The overall accuracy of the ASC framework depends on two factors: the signal representation and the learning model. In this work, we address these two factors as follows. First, we propose a time-frequency representation that employs empirical mode decomposition and Hilbert spectrum for meaningful characterization of the acoustic signal. Second, we introduce EHTNet, a framework for ASC which utilizes twin-pooled CNNs for classification and the proposed time-frequency representation to characterize the acoustic signal. Experiments on a benchmark dataset in ASC indicate that EHTNet outperforms state-of-the-art approaches for ASC in addition to a log mel spectrum-based baseline. Specifically, the proposed framework improves the classification accuracy by 91.04% and the f1-score by 93.61% as against the baseline.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"33 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The objective of Acoustic Scene Classification (ASC) is to assist the machines in identifying the unique acoustic characteristics that define an environment. In recent times, Convolutional Neural Networks (CNNs) have contributed significantly to the success of many state-of-the-art frameworks for ASC. The overall accuracy of the ASC framework depends on two factors: the signal representation and the learning model. In this work, we address these two factors as follows. First, we propose a time-frequency representation that employs empirical mode decomposition and Hilbert spectrum for meaningful characterization of the acoustic signal. Second, we introduce EHTNet, a framework for ASC which utilizes twin-pooled CNNs for classification and the proposed time-frequency representation to characterize the acoustic signal. Experiments on a benchmark dataset in ASC indicate that EHTNet outperforms state-of-the-art approaches for ASC in addition to a log mel spectrum-based baseline. Specifically, the proposed framework improves the classification accuracy by 91.04% and the f1-score by 93.61% as against the baseline.