Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros
{"title":"生成式lstm系统中显式音乐特征的交互控制","authors":"Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros","doi":"10.1145/3243274.3243296","DOIUrl":null,"url":null,"abstract":"Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.","PeriodicalId":129628,"journal":{"name":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Interactive Control of Explicit Musical Features in Generative LSTM-based Systems\",\"authors\":\"Maximos A. Kaliakatsos-Papakostas, Aggelos Gkiokas, V. Katsouros\",\"doi\":\"10.1145/3243274.3243296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.\",\"PeriodicalId\":129628,\"journal\":{\"name\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"volume\":\"108 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3243274.3243296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3243274.3243296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Interactive Control of Explicit Musical Features in Generative LSTM-based Systems
Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform information into high-level features that, however, do not relate with the high-level features perceived by humans. For instance, such models are able to compose music in the style of the Bach chorales, but they are not able to compose a less rhythmically dense version of them, or a Bach choral that begins with low and ends with high pitches -- even more so in an interactive way in real-time. This paper presents an approach to creating such systems. A very basic LSTM-based architecture is developed that can compose music that corresponds to user-provided values of rhythm density and pitch height/register. A small initial dataset is augmented to incorporate more intense variations of these two features and the system learns and generates music that not only reflects the style, but also (and most importantly) reflects the features that are explicitly given as input at each specific time. This system -- and future versions that will incorporate more advanced architectures and representation -- is suitable for generating music the features of which are defined in real-time and/or interactively.