基于自回归多态音符模型的钢琴复调转写

International Society for Music Information Retrieval Conference Pub Date : 2020-10-02 DOI:10.5281/ZENODO.4245466

Taegyun Kwon, Dasaem Jeong, Juhan Nam

{"title":"基于自回归多态音符模型的钢琴复调转写","authors":"Taegyun Kwon, Dasaem Jeong, Juhan Nam","doi":"10.5281/ZENODO.4245466","DOIUrl":null,"url":null,"abstract":"Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model\",\"authors\":\"Taegyun Kwon, Dasaem Jeong, Juhan Nam\",\"doi\":\"10.5281/ZENODO.4245466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.\",\"PeriodicalId\":309903,\"journal\":{\"name\":\"International Society for Music Information Retrieval Conference\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Society for Music Information Retrieval Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5281/ZENODO.4245466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Society for Music Information Retrieval Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.4245466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

最近在复调钢琴转录方面取得的进展主要是通过故意设计神经网络架构来检测不同的音符状态，如开始或持续，并对状态的时间演变进行建模。然而，它们中的大多数对每个音符状态使用单独的神经网络，从而优化多个损失函数，并且它们还通过状态神经网络之间的抽象连接或使用后处理模块来处理音符状态的时间演变。在本文中，我们提出了一种统一的神经网络架构，其中多个音符状态被预测为具有单个损失函数的softmax输出，并且通过单个神经网络内的自回归连接学习时间顺序。这种紧凑的模型允许在不增加架构复杂性的情况下增加注释状态。使用MAESTRO数据集，我们检查多个音符状态的各种组合，包括开、开始、维持、重新开始、偏移和关闭。我们还证明了自回归模块有效地学习了音符的状态间依赖。最后，我们证明了我们提出的模型在参数较少的情况下实现了与最先进的性能相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量