A Model You Can Hear: Audio Identification with Playable Prototypes

International Society for Music Information Retrieval Conference Pub Date : 2022-08-05 DOI:10.48550/arXiv.2208.03311

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loïc Landrieu

引用次数: 1

Abstract

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

查看原文本刊更多论文

你能听到的模型:基于可玩原型的音频识别

机器学习技术已经被证明对音频内容的分类和分析很有用。然而，最近的方法通常依赖于难以解释的抽象和高维表示。受用于图像和3D数据的变换不变方法的启发，我们提出了一种基于可学习光谱原型的音频识别模型。这些原型配备了专用的转换网络，可用于从大量声音集合中对输入音频样本进行聚类和分类。我们的模型可以在有或没有监督的情况下进行训练，并达到最先进的扬声器和乐器识别结果，同时保持易于解释。代码可从https://github.com/romainloiseau/a-model-you-can-hear获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量