Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI:10.1109/TASL.2013.2260741

Benoit Fuentes, R. Badeau, G. Richard

{"title":"Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription","authors":"Benoit Fuentes, R. Badeau, G. Richard","doi":"10.1109/TASL.2013.2260741","DOIUrl":null,"url":null,"abstract":"Recently, new methods for smart decomposition of time-frequency representations of audio have been proposed in order to address the problem of automatic music transcription. However those techniques are not necessarily suitable for notes having variations of both pitch and spectral envelope over time. The HALCA (Harmonic Adaptive Latent Component Analysis) model presented in this article allows considering those two kinds of variations simultaneously. Each note in a constant-Q transform is locally modeled as a weighted sum of fixed narrowband harmonic spectra, spectrally convolved with some impulse that defines the pitch. All parameters are estimated by means of the expectation-maximization (EM) algorithm, in the framework of Probabilistic Latent Component Analysis. Interesting priors over the parameters are also introduced in order to help the EM algorithm converging towards a meaningful solution. We applied this model for automatic music transcription: the onset time, duration and pitch of each note in an audio file are inferred from the estimated parameters. The system has been evaluated on two different databases and obtains very promising results.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1854-1866"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260741","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2260741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

Abstract

Recently, new methods for smart decomposition of time-frequency representations of audio have been proposed in order to address the problem of automatic music transcription. However those techniques are not necessarily suitable for notes having variations of both pitch and spectral envelope over time. The HALCA (Harmonic Adaptive Latent Component Analysis) model presented in this article allows considering those two kinds of variations simultaneously. Each note in a constant-Q transform is locally modeled as a weighted sum of fixed narrowband harmonic spectra, spectrally convolved with some impulse that defines the pitch. All parameters are estimated by means of the expectation-maximization (EM) algorithm, in the framework of Probabilistic Latent Component Analysis. Interesting priors over the parameters are also introduced in order to help the EM algorithm converging towards a meaningful solution. We applied this model for automatic music transcription: the onset time, duration and pitch of each note in an audio file are inferred from the estimated parameters. The system has been evaluated on two different databases and obtains very promising results.

查看原文本刊更多论文

音频谐波自适应潜分量分析及其在音乐转写中的应用

近年来，为了解决音乐自动转录的问题，提出了一种智能分解音频时频表示的新方法。然而，这些技术不一定适用于音调和频谱包络随时间变化的音符。本文提出的HALCA(谐波自适应潜成分分析)模型允许同时考虑这两种变化。常数q变换中的每个音符都局部建模为固定窄带谐波谱的加权和，与定义音高的脉冲进行频谱卷积。在概率潜在成分分析的框架下，采用期望最大化(EM)算法对所有参数进行估计。为了帮助EM算法收敛到有意义的解，还引入了参数上的有趣先验。我们将这个模型应用于自动音乐转录:音频文件中每个音符的开始时间、持续时间和音高都是从估计的参数中推断出来的。该系统在两个不同的数据库上进行了测试，取得了很好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.