Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang
{"title":"Learning Music Emotion Primitives via Supervised Dynamic Clustering","authors":"Yang Liu, Yan Liu, Xiang Zhang, Gong Chen, Ke-jun Zhang","doi":"10.1145/2964284.2967215","DOIUrl":null,"url":null,"abstract":"This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixed-length music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2967215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixed-length music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.