Enhancing music audio signal recognition through CNN-BiLSTM fusion with De-noising autoencoder for improved performance

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-02-01 DOI:10.1016/j.neucom.2025.129607

Xiaoying Mao , Ye Tian , Tairan Jin , Bo Di

{"title":"Enhancing music audio signal recognition through CNN-BiLSTM fusion with De-noising autoencoder for improved performance","authors":"Xiaoying Mao , Ye Tian , Tairan Jin , Bo Di","doi":"10.1016/j.neucom.2025.129607","DOIUrl":null,"url":null,"abstract":"<div><div>This study presents an advanced framework for music audio signal recognition that combines Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (BiLSTM) networks, and Noise Reduction Auto-encoder models to significantly improve accuracy and robustness. The core innovation is a novel noise reduction auto-encoder that integrates CNN and BiLSTM architectures, enabling superior recognition performance under varying noise levels and environmental conditions. The proposed framework, validated on several datasets including the Zhvoice, Common Voice, and LibriSpeech, demonstrates higher accuracy compared to existing methods. In addition, an optimized CNN architecture called Faster Region-based CNN with Multi-scale Information (FRCNN-MSI) is developed for efficient speech feature extraction, which shows significant improvements in noisy environments. The BiLSTM model is further enhanced with an attention mechanism that improves sequence modeling and contextual relationship capture. Together, these advances establish our approach as a robust solution to real-world speech recognition challenges, with potential implications for improving speech recognition systems in diverse applications.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129607"},"PeriodicalIF":5.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225002796","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents an advanced framework for music audio signal recognition that combines Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (BiLSTM) networks, and Noise Reduction Auto-encoder models to significantly improve accuracy and robustness. The core innovation is a novel noise reduction auto-encoder that integrates CNN and BiLSTM architectures, enabling superior recognition performance under varying noise levels and environmental conditions. The proposed framework, validated on several datasets including the Zhvoice, Common Voice, and LibriSpeech, demonstrates higher accuracy compared to existing methods. In addition, an optimized CNN architecture called Faster Region-based CNN with Multi-scale Information (FRCNN-MSI) is developed for efficient speech feature extraction, which shows significant improvements in noisy environments. The BiLSTM model is further enhanced with an attention mechanism that improves sequence modeling and contextual relationship capture. Together, these advances establish our approach as a robust solution to real-world speech recognition challenges, with potential implications for improving speech recognition systems in diverse applications.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.