Environmental sound recognition on embedded devices using deep learning: a review

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-03-15 DOI:10.1007/s10462-025-11106-z

Pau Gairí, Tomàs Pallejà, Marcel Tresanchez

{"title":"Environmental sound recognition on embedded devices using deep learning: a review","authors":"Pau Gairí, Tomàs Pallejà, Marcel Tresanchez","doi":"10.1007/s10462-025-11106-z","DOIUrl":null,"url":null,"abstract":"<div><p>Sound recognition has a wide range of applications beyond speech and music, including environmental monitoring, sound source classification, mechanical fault diagnosis, audio fingerprinting, and event detection. These applications often require real-time data processing, making them well-suited for embedded systems. However, embedded devices face significant challenges due to limited computational power, memory, and low power consumption. Despite these constraints, achieving high performance in environmental sound recognition typically requires complex algorithms. Deep Learning models have demonstrated high accuracy on existing datasets, making them a popular choice for such tasks. However, these models are resource-intensive, posing challenges for real-time edge applications. This paper presents a comprehensive review of integrating Deep Learning models into embedded systems, examining their state-of-the-art applications, key components, and steps involved. It also explores strategies to optimise performance in resource-constrained environments through a comparison of various implementation approaches such as knowledge distillation, pruning, and quantization, with studies achieving a reduction in complexity of up to 97% compared to the unoptimized model. Overall, we conclude that in spite of the availability of lightweight deep learning models, input features, and compression techniques, their integration into low-resource devices, such as microcontrollers, remains limited. Furthermore, more complex tasks, such as general sound classification, especially with expanded frequency bands and real-time operation have yet to be effectively implemented on these devices. These findings highlight the need for a standardised research framework to evaluate these technologies applied to resource-constrained devices, and for further development to realise the wide range of potential applications.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 6","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11106-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11106-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sound recognition has a wide range of applications beyond speech and music, including environmental monitoring, sound source classification, mechanical fault diagnosis, audio fingerprinting, and event detection. These applications often require real-time data processing, making them well-suited for embedded systems. However, embedded devices face significant challenges due to limited computational power, memory, and low power consumption. Despite these constraints, achieving high performance in environmental sound recognition typically requires complex algorithms. Deep Learning models have demonstrated high accuracy on existing datasets, making them a popular choice for such tasks. However, these models are resource-intensive, posing challenges for real-time edge applications. This paper presents a comprehensive review of integrating Deep Learning models into embedded systems, examining their state-of-the-art applications, key components, and steps involved. It also explores strategies to optimise performance in resource-constrained environments through a comparison of various implementation approaches such as knowledge distillation, pruning, and quantization, with studies achieving a reduction in complexity of up to 97% compared to the unoptimized model. Overall, we conclude that in spite of the availability of lightweight deep learning models, input features, and compression techniques, their integration into low-resource devices, such as microcontrollers, remains limited. Furthermore, more complex tasks, such as general sound classification, especially with expanded frequency bands and real-time operation have yet to be effectively implemented on these devices. These findings highlight the need for a standardised research framework to evaluate these technologies applied to resource-constrained devices, and for further development to realise the wide range of potential applications.

查看原文本刊更多论文

利用深度学习识别嵌入式设备上的环境声音：综述

除语音和音乐外，声音识别还具有广泛的应用领域，包括环境监测、声源分类、机械故障诊断、音频指纹识别和事件检测。这些应用通常需要实时数据处理，因此非常适合嵌入式系统。然而，由于计算能力、内存和低功耗有限，嵌入式设备面临着巨大的挑战。尽管存在这些限制，但要实现环境声音识别的高性能，通常需要复杂的算法。深度学习模型已在现有数据集上表现出很高的准确性，因此成为此类任务的热门选择。然而，这些模型是资源密集型的，给实时边缘应用带来了挑战。本文全面回顾了将深度学习模型集成到嵌入式系统中的情况，研究了它们的最新应用、关键组件和相关步骤。本文还通过对知识提炼、剪枝和量化等各种实现方法的比较，探讨了在资源受限环境中优化性能的策略，研究结果表明，与未优化的模型相比，复杂度最多可降低 97%。总之，我们得出的结论是，尽管存在轻量级深度学习模型、输入特征和压缩技术，但将它们集成到微控制器等低资源设备中仍然受到限制。此外，更复杂的任务，如一般声音分类，特别是扩大频带和实时操作，尚未在这些设备上有效实施。这些发现突出表明，需要一个标准化的研究框架来评估这些应用于资源受限设备的技术，并进一步开发以实现广泛的潜在应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.