SinTra:从单个多轨音乐片段中学习灵感模型

International Society for Music Information Retrieval Conference Pub Date : 2022-04-21 DOI:10.48550/arXiv.2204.09917

Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng

{"title":"SinTra:从单个多轨音乐片段中学习灵感模型","authors":"Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng","doi":"10.48550/arXiv.2204.09917","DOIUrl":null,"url":null,"abstract":"In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results show that our framework can learn information from a single music segment more sufﬁciently than Music Transformer. Also the comparison between SinTra and its variant, i.e., the single-stage SinTra with the ﬁrst stage only, shows that the pyramid structure can effectively suppress overly-fragmented notes.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SinTra: Learning an inspiration model from a single multi-track music segment\",\"authors\":\"Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng\",\"doi\":\"10.48550/arXiv.2204.09917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results show that our framework can learn information from a single music segment more sufﬁciently than Music Transformer. Also the comparison between SinTra and its variant, i.e., the single-stage SinTra with the ﬁrst stage only, shows that the pyramid structure can effectively suppress overly-fragmented notes.\",\"PeriodicalId\":309903,\"journal\":{\"name\":\"International Society for Music Information Retrieval Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Society for Music Information Retrieval Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2204.09917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Society for Music Information Retrieval Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.09917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们提出了一个自回归序列生成模型SinTra，它可以从单个多轨音乐片段中学习，以生成具有任意长度的多乐器的连贯，美学和可变的复调音乐。对于这个任务，为了确保生成的样本和训练音乐的相关性，我们提出了一种新的音高组表示。SinTra由Transformer-XL的一个金字塔组成，采用多尺度训练策略，既可以学习音乐结构，也可以学习单个训练音乐片段中音符之间的相对位置关系。此外，为了保持音轨间的相关性，我们使用卷积运算来处理多音轨音乐，并且在解码时，音轨相互独立以防止干扰。我们用主观研究和客观指标来评估辛特拉。对比结果表明，我们的框架可以比music Transformer更充分地从单个音乐片段中学习信息。通过对比SinTra及其变体，即只有第一级的单级SinTra，可以发现金字塔结构可以有效地抑制过度碎片化的音符。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SinTra: Learning an inspiration model from a single multi-track music segment

In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results show that our framework can learn information from a single music segment more sufﬁciently than Music Transformer. Also the comparison between SinTra and its variant, i.e., the single-stage SinTra with the ﬁrst stage only, shows that the pyramid structure can effectively suppress overly-fragmented notes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量