再放一遍!适合你心情的音乐作品

2020 Second International Conference on Transdisciplinary AI (TransAI) Pub Date : 2020-09-01 DOI:10.1109/TransAI49837.2020.00008

Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki

{"title":"再放一遍!适合你心情的音乐作品","authors":"Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki","doi":"10.1109/TransAI49837.2020.00008","DOIUrl":null,"url":null,"abstract":"Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.","PeriodicalId":151527,"journal":{"name":"2020 Second International Conference on Transdisciplinary AI (TransAI)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Play it again IMuCo! Music Composition to Match your Mood\",\"authors\":\"Tsung-Min Huang, Hunter Hsieh, Jiaqi Qin, Hsien-Fung Liu, M. Eirinaki\",\"doi\":\"10.1109/TransAI49837.2020.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.\",\"PeriodicalId\":151527,\"journal\":{\"name\":\"2020 Second International Conference on Transdisciplinary AI (TransAI)\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Second International Conference on Transdisciplinary AI (TransAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TransAI49837.2020.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Second International Conference on Transdisciplinary AI (TransAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TransAI49837.2020.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

将声音与图像(如照片)联系起来，是人类每天下意识地做的事情。深度学习允许一些与图像相关的应用，其中一些专注于为图像生成标签，或者从文本描述合成图像。同样，它也被用于根据现有乐谱创作新的乐谱，或者为歌曲添加歌词。在这个作品中，我们将视觉和声音结合在一起，呈现IMuCo，一个智能的音乐作曲家，可以为任何给定的图像创作原创音乐，并考虑其隐含的情绪。我们的音乐增强和作曲方法试图将图像“语言学”转化为音乐“语言学”，而不需要任何中间的自然语言翻译步骤。我们提出了一种编码器-解码器架构来将图像转换为音乐，首先将其分类为预定义的情绪之一，然后生成与之匹配的音乐。我们详细讨论了如何创建训练数据集，包括在表示音乐方面的几个特征工程决策。我们还介绍了一个评估分类器框架，用于系统的验证和评估，并给出了IMuCo原型对快乐和悲伤两种情绪的实验结果。IMuCo可以成为一个框架的核心组成部分，为广告、艺术和娱乐行业使用的较长的视频剪辑制作配乐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Play it again IMuCo! Music Composition to Match your Mood

Relating sounds to visuals, like photographs, is something humans do subconsciously every day. Deep learning has allowed for several image-related applications, with some focusing on generating labels for images, or synthesize images from a text description. Similarly, it has been employed to create new music scores from existing ones, or add lyrics to a song. In this work, we bring sight and sound together and present IMuCo, an intelligent music composer that creates original music for any given image, taking into consideration what its implied mood is. Our music augmentation and composing methodology attempts to translate image “linguistics” into music “linguistics” without any intermediate natural language translation steps. We propose an encoder-decoder architecture to translate an image into music, first classifying it into one of predefined moods, then generating music to match it. We discuss in detail how we created the training dataset, including several feature engineering decisions in terms of representing music. We also introduce an evaluation classifier framework used for validation and evaluation of the system, and present experimental results of IMuCo’s prototype for two moods: happy and sad. IMuCo can be the core component of a framework that composes the soundtrack for longer video clips, used in advertising, art, and entertainment industries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Second International Conference on Transdisciplinary AI (TransAI)

自引率

0.00%

发文量