生成式lstm系统中显式音乐特征的交互控制

Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion Pub Date : 2018-09-12 DOI:10.1145/3243274.3243296

Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros

{"title":"生成式lstm系统中显式音乐特征的交互控制","authors":"Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros","doi":"10.1145/3243274.3243296","DOIUrl":null,"url":null,"abstract":"Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.","PeriodicalId":129628,"journal":{"name":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Interactive Control of Explicit Musical Features in Generative LSTM-based Systems\",\"authors\":\"Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros\",\"doi\":\"10.1145/3243274.3243296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.\",\"PeriodicalId\":129628,\"journal\":{\"name\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"volume\":\"108 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3243274.3243296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3243274.3243296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

长短期记忆(LSTM)神经网络已经有效地应用于学习和生成音乐序列，由复杂的音乐表示和集成到其他深度学习模型中。深度神经网络与基于lstm的系统一起进行隐式学习:给定足够大的数据量，它们将信息转换为高级特征，然而，与人类感知的高级特征无关。例如，这样的模型能够创作巴赫合唱风格的音乐，但它们不能创作节奏不那么密集的版本，或者巴赫合唱以低音开始，以高音结束——在实时互动的方式下更是如此。本文提出了一种创建这种系统的方法。开发了一个非常基本的基于lstm的体系结构，可以根据用户提供的节奏密度和音高/音域值作曲。一个小的初始数据集被扩展到包含这两个特征的更强烈的变化，系统学习并生成的音乐不仅反映了风格，而且(最重要的是)反映了在每个特定时间作为输入明确给出的特征。这个系统——以及未来的版本将包含更先进的架构和表现形式——适合生成实时和/或交互式定义的音乐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interactive Control of Explicit Musical Features in Generative LSTM-based Systems

Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion

自引率

0.00%

发文量