通过生成式预训练进行旋律转录

International Society for Music Information Retrieval Conference Pub Date : 2022-12-04 DOI:10.48550/arXiv.2212.01884

Chris Donahue, John Thickstun, Percy Liang

{"title":"通过生成式预训练进行旋律转录","authors":"Chris Donahue, John Thickstun, Percy Liang","doi":"10.48550/arXiv.2212.01884","DOIUrl":null,"url":null,"abstract":"Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Melody transcription via generative pre-training\",\"authors\":\"Chris Donahue, John Thickstun, Percy Liang\",\"doi\":\"10.48550/arXiv.2212.01884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .\",\"PeriodicalId\":309903,\"journal\":{\"name\":\"International Society for Music Information Retrieval Conference\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Society for Music Information Retrieval Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.01884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Society for Music Information Retrieval Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.01884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

尽管旋律在音乐感知中起着核心作用，但在音乐信息检索中，可靠地检测任意音乐录音中出现的旋律音符仍然是一个开放的挑战。旋律转录的一个关键挑战是建立能够处理包含任何数量乐器合奏和音乐风格的广泛音频的方法-现有的策略对某些旋律乐器或风格有效，但不是全部。为了应对这一挑战，我们利用了Jukebox (Dhariwal等人，2020)的表示，这是一种广泛的音乐音频生成模型，从而相对于传统的谱图特征，将旋律转录的性能提高了20%。旋律转录的另一个障碍是缺乏训练数据——我们从众包的广泛音乐注释中获得了一个包含50美元小时旋律转录的新数据集。生成预训练和新数据集相结合，在旋律转录方面的表现比最强基线提高了77 %。通过将我们的新旋律转录方法与节拍检测、键估计和和弦识别的解决方案相结合，我们构建了Sheet Sage，这是一个能够直接从音乐音频中转录人类可读的导音表的系统。音频示例可以在https://chrisdonahue.com/sheetsage找到，代码可以在https://github.com/chrisdonahue/sheetsage找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Melody transcription via generative pre-training

Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量