野外比较评估:音乐表现力渲染系统

Kyle Worrall;Zongyu Yin;Tom Collins
{"title":"野外比较评估:音乐表现力渲染系统","authors":"Kyle Worrall;Zongyu Yin;Tom Collins","doi":"10.1109/TAI.2024.3408717","DOIUrl":null,"url":null,"abstract":"There have been many attempts to model the ability of human musicians to take a score and perform or render it expressively, by adding tempo, timing, loudness, and articulation changes to nonexpressive music data. While expressive rendering models exist in academic research, most of these are not open source or accessible, meaning they are difficult to evaluate empirically and have not been widely adopted in professional music software. Systematic comparative evaluation of such algorithms stopped after the last performance rendering contest (RENCON) in 2013, making it difficult to compare newer models to existing work in a fair and valid way. In this article, we introduce the first transformer-based model for expressive rendering, cue-free express + pedal (CFE + P), which predicts expressive attributes such as notewise dynamics and micro-timing adjustments, and beatwise tempo and sustain pedal use based only on the start and end times and pitches of notes (e.g., inexpressive musical instrument digital interface (MIDI) input). We perform two comparative evaluations on our model against a nonmachine learning baseline taken from professional music software and two open-source algorithms—a feedforward neural network (FFNN) and hierarchical recurrent neural network (HRNN). The results of two listening studies indicate that our model renders passages that outperform what can be done in professional music software such as Logic Pro and Ableton Live.\n<xref><sup>1</sup></xref>\n<fn><label><sup>1</sup></label><p>All data and preexisting hypotheses can be accessed via the Open Science Foundation: <uri>https://osf.io/6uwjk/</uri>.</p></fn>","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5290-5303"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Evaluation in the Wild: Systems for the Expressive Rendering of Music\",\"authors\":\"Kyle Worrall;Zongyu Yin;Tom Collins\",\"doi\":\"10.1109/TAI.2024.3408717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There have been many attempts to model the ability of human musicians to take a score and perform or render it expressively, by adding tempo, timing, loudness, and articulation changes to nonexpressive music data. While expressive rendering models exist in academic research, most of these are not open source or accessible, meaning they are difficult to evaluate empirically and have not been widely adopted in professional music software. Systematic comparative evaluation of such algorithms stopped after the last performance rendering contest (RENCON) in 2013, making it difficult to compare newer models to existing work in a fair and valid way. In this article, we introduce the first transformer-based model for expressive rendering, cue-free express + pedal (CFE + P), which predicts expressive attributes such as notewise dynamics and micro-timing adjustments, and beatwise tempo and sustain pedal use based only on the start and end times and pitches of notes (e.g., inexpressive musical instrument digital interface (MIDI) input). We perform two comparative evaluations on our model against a nonmachine learning baseline taken from professional music software and two open-source algorithms—a feedforward neural network (FFNN) and hierarchical recurrent neural network (HRNN). The results of two listening studies indicate that our model renders passages that outperform what can be done in professional music software such as Logic Pro and Ableton Live.\\n<xref><sup>1</sup></xref>\\n<fn><label><sup>1</sup></label><p>All data and preexisting hypotheses can be accessed via the Open Science Foundation: <uri>https://osf.io/6uwjk/</uri>.</p></fn>\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"5 10\",\"pages\":\"5290-5303\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10547570/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10547570/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人们曾多次尝试模拟人类音乐家的能力,通过在无表现力的音乐数据中添加节奏、时序、响度和衔接变化,将乐谱进行表现性演奏或渲染。虽然表现力渲染模型存在于学术研究中,但其中大部分都不是开源或可访问的,这意味着它们很难进行实证评估,也没有被专业音乐软件广泛采用。对此类算法的系统性比较评估在 2013 年上一届表演渲染竞赛 (RENCON) 之后就停止了,因此很难以公平有效的方式将新模型与现有模型进行比较。在本文中,我们介绍了首个基于变换器的表现力渲染模型--无提示表现+踏板(CFE + P),该模型仅根据音符的开始和结束时间及音高(如无表现力的乐器数字接口(MIDI)输入)预测表现力属性,如音符的动态和微调,以及节拍的节奏和延音踏板的使用。我们将我们的模型与来自专业音乐软件的非机器学习基线和两种开源算法--前馈神经网络(FFNN)和分层递归神经网络(HRNN)--进行了两次比较评估。两项听力研究的结果表明,我们的模型所渲染的段落优于专业音乐软件(如 Logic Pro 和 Ableton Live)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative Evaluation in the Wild: Systems for the Expressive Rendering of Music
There have been many attempts to model the ability of human musicians to take a score and perform or render it expressively, by adding tempo, timing, loudness, and articulation changes to nonexpressive music data. While expressive rendering models exist in academic research, most of these are not open source or accessible, meaning they are difficult to evaluate empirically and have not been widely adopted in professional music software. Systematic comparative evaluation of such algorithms stopped after the last performance rendering contest (RENCON) in 2013, making it difficult to compare newer models to existing work in a fair and valid way. In this article, we introduce the first transformer-based model for expressive rendering, cue-free express + pedal (CFE + P), which predicts expressive attributes such as notewise dynamics and micro-timing adjustments, and beatwise tempo and sustain pedal use based only on the start and end times and pitches of notes (e.g., inexpressive musical instrument digital interface (MIDI) input). We perform two comparative evaluations on our model against a nonmachine learning baseline taken from professional music software and two open-source algorithms—a feedforward neural network (FFNN) and hierarchical recurrent neural network (HRNN). The results of two listening studies indicate that our model renders passages that outperform what can be done in professional music software such as Logic Pro and Ableton Live. 1

All data and preexisting hypotheses can be accessed via the Open Science Foundation: https://osf.io/6uwjk/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信