Determination of Low-Level Audio Descriptors of a Musical Instrument Sound Using Neural Network

2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) Pub Date : 2020-09-23 DOI:10.23919/spa50552.2020.9241264

Maciej Blaszke, Damian Koszewski

{"title":"Determination of Low-Level Audio Descriptors of a Musical Instrument Sound Using Neural Network","authors":"Maciej Blaszke, Damian Koszewski","doi":"10.23919/spa50552.2020.9241264","DOIUrl":null,"url":null,"abstract":"Audio files and the audio channel of video files can be described with temporal, spectral, cepstral, and perceptual audio descriptors. The so-called low-level descriptors are closely related to the signal characteristics. One can discern at least three levels of extraction granularity from the signal: at any point in the signal, in small arbitrary regions (i.e., frames) and longer pre-segmented regions. Even though there are tools (e.g., MIRToolbox, Python/libROSA) available for computing these descriptors, the resulting feature vector is always redundant as it contains many high-correlated descriptors and there are some limitations connected to the performance of these tools. That is why, in this study, a method for obtaining those descriptors using Artificial Neural Network (ANN) with a deep structure (i.e., DNN) is proposed. In such a scheme, the raw audio signal representing a given musical instrument is fed to the DNN input. Such a network can be used as a standalone module or as a pre-trained part of the bigger architecture. The results of deep network performance in the context of MPEG-7 descriptor derivation are shown along with the loss function convergence and behavior.","PeriodicalId":157578,"journal":{"name":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"34 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/spa50552.2020.9241264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Audio files and the audio channel of video files can be described with temporal, spectral, cepstral, and perceptual audio descriptors. The so-called low-level descriptors are closely related to the signal characteristics. One can discern at least three levels of extraction granularity from the signal: at any point in the signal, in small arbitrary regions (i.e., frames) and longer pre-segmented regions. Even though there are tools (e.g., MIRToolbox, Python/libROSA) available for computing these descriptors, the resulting feature vector is always redundant as it contains many high-correlated descriptors and there are some limitations connected to the performance of these tools. That is why, in this study, a method for obtaining those descriptors using Artificial Neural Network (ANN) with a deep structure (i.e., DNN) is proposed. In such a scheme, the raw audio signal representing a given musical instrument is fed to the DNN input. Such a network can be used as a standalone module or as a pre-trained part of the bigger architecture. The results of deep network performance in the context of MPEG-7 descriptor derivation are shown along with the loss function convergence and behavior.

查看原文本刊更多论文

用神经网络确定乐器声音的低级音频描述符

音频文件和视频文件的音频通道可以用时间、频谱、倒谱和感知音频描述符来描述。所谓的低级描述符与信号特性密切相关。我们可以从信号中分辨出至少三个层次的提取粒度:在信号中的任何一点，在小的任意区域(即帧)和更长的预分割区域。尽管有工具(例如，MIRToolbox, Python/libROSA)可用于计算这些描述符，但所得到的特征向量总是冗余的，因为它包含许多高度相关的描述符，并且这些工具的性能存在一些限制。这就是为什么在本研究中，提出了一种使用具有深层结构的人工神经网络(ANN)(即DNN)获得这些描述符的方法。在这种方案中，代表给定乐器的原始音频信号被馈送到深度神经网络输入。这样的网络可以用作独立模块，也可以用作更大架构的预训练部分。给出了在MPEG-7描述符派生的背景下深度网络性能的结果，以及损失函数的收敛性和行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

自引率

0.00%

发文量