Remixing Music with Visual Conditioning

2020 IEEE International Symposium on Multimedia (ISM) Pub Date : 2020-10-27 DOI:10.1109/ISM.2020.00039

Li-Chia Yang, Alexander Lerch

引用次数: 3

Abstract

We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only content. Furthermore, we propose a remixing engine that generalizes the task of source separation into music remixing. The proposed method is able to achieve improved audio quality compared to remixing performed by the separate-and-add method with a state-of-the-art audiovisual source separation model.

查看原文本刊更多论文

混合音乐与视觉条件

我们提出了一种结合深度视觉和音频模型的视觉条件音乐混音系统。该方法基于最先进的视听源分离模型，该模型使用视频信息进行乐器源分离。我们修改了模型，在推理过程中使用用户选择的图像而不是视频作为视觉输入，以实现音频内容的分离。此外，我们提出了一个重混引擎，将源分离的任务推广到音乐重混中。与使用最先进的视听源分离模型的分离和添加方法进行的重混音相比，所提出的方法能够实现改进的音频质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Symposium on Multimedia (ISM)

自引率

0.00%

发文量