Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki
{"title":"再放一遍!适合你心情的音乐作品","authors":"Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki","doi":"10.1109/TransAI49837.2020.00008","DOIUrl":null,"url":null,"abstract":"Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.","PeriodicalId":151527,"journal":{"name":"2020 Second International Conference on Transdisciplinary AI (TransAI)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Play it again IMuCo! Music Composition to Match your Mood\",\"authors\":\"Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki\",\"doi\":\"10.1109/TransAI49837.2020.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.\",\"PeriodicalId\":151527,\"journal\":{\"name\":\"2020 Second International Conference on Transdisciplinary AI (TransAI)\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Second International Conference on Transdisciplinary AI (TransAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TransAI49837.2020.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Second International Conference on Transdisciplinary AI (TransAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TransAI49837.2020.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Play it again IMuCo! Music Composition to Match your Mood
Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.