{"title":"Audio feature enhancement based on quaternion filtering and deep hashing","authors":"Xun Jin , Bingkui Sun , De Li","doi":"10.1016/j.neucom.2024.128727","DOIUrl":null,"url":null,"abstract":"<div><div>This paper aims to solve the problems of difficult convergence of audio model training, large data demand, and large dimensionality of storage space for audio-generated feature vectors. To this end, this paper proposes the use of quaternion Gabor filtering to suppress the background information of the spectrogram and reduce the interference of the data for the case of insufficient data alignment between audio data and image data after shifting the domain. In addition, different scales of window lengths and frame shifts are used to capture the connections between different vocal objects. To address the problem that the generated feature vectors are large dimensional, we use a deep hash module to map high-dimensional features to low-dimensional features and use a probability function to make the learned samples more consistent with the overall distribution. In the experimental evaluation, the proposed method was evaluated on the environmental sound classification dataset and the music genre classification dataset. The proposed method uses only a common backbone network and improves the accuracy of audio recognition.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122401498X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper aims to solve the problems of difficult convergence of audio model training, large data demand, and large dimensionality of storage space for audio-generated feature vectors. To this end, this paper proposes the use of quaternion Gabor filtering to suppress the background information of the spectrogram and reduce the interference of the data for the case of insufficient data alignment between audio data and image data after shifting the domain. In addition, different scales of window lengths and frame shifts are used to capture the connections between different vocal objects. To address the problem that the generated feature vectors are large dimensional, we use a deep hash module to map high-dimensional features to low-dimensional features and use a probability function to make the learned samples more consistent with the overall distribution. In the experimental evaluation, the proposed method was evaluated on the environmental sound classification dataset and the music genre classification dataset. The proposed method uses only a common backbone network and improves the accuracy of audio recognition.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.