{"title":"一种融合多种音乐表示的语义音乐标注层次关注深度神经网络模型","authors":"Qianqian Wang, Feng Su, Yuyang Wang","doi":"10.1145/3323873.3325031","DOIUrl":null,"url":null,"abstract":"Automatically assigning a group of appropriate semantic tags to one music piece provides an effective way for people to efficiently utilize the massive and ever increasing on-line and off-line music data. In this paper, we propose a novel content-based automatic music annotation model that hierarchically combines attentive convolutional networks and recurrent networks for music representation learning, structure modelling and tag prediction. The model first exploits two separate attentive convolutional networks composed of multiple gated linear units (GLUs) to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music, which better captures informative features of the music for the annotation task than exploiting any single representation channel. The model then exploits bidirectional Long Short-Term Memory (LSTM) networks to depict the time-varying structures embedded in the description sequences of the music, and further introduces a dual-state LSTM network to encode temporal correlations between two representation channels, which effectively enriches the descriptions of the music. Finally, the model adaptively aggregates music descriptions generated at every time step with a self-attentive multi-weighting mechanism for music tag prediction. The proposed model achieves state-of-the-art results on the public MagnaTagATune music dataset, demonstrating its effectiveness on music annotation.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Hierarchical Attentive Deep Neural Network Model for Semantic Music Annotation Integrating Multiple Music Representations\",\"authors\":\"Qianqian Wang, Feng Su, Yuyang Wang\",\"doi\":\"10.1145/3323873.3325031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically assigning a group of appropriate semantic tags to one music piece provides an effective way for people to efficiently utilize the massive and ever increasing on-line and off-line music data. In this paper, we propose a novel content-based automatic music annotation model that hierarchically combines attentive convolutional networks and recurrent networks for music representation learning, structure modelling and tag prediction. The model first exploits two separate attentive convolutional networks composed of multiple gated linear units (GLUs) to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music, which better captures informative features of the music for the annotation task than exploiting any single representation channel. The model then exploits bidirectional Long Short-Term Memory (LSTM) networks to depict the time-varying structures embedded in the description sequences of the music, and further introduces a dual-state LSTM network to encode temporal correlations between two representation channels, which effectively enriches the descriptions of the music. Finally, the model adaptively aggregates music descriptions generated at every time step with a self-attentive multi-weighting mechanism for music tag prediction. The proposed model achieves state-of-the-art results on the public MagnaTagATune music dataset, demonstrating its effectiveness on music annotation.\",\"PeriodicalId\":149041,\"journal\":{\"name\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3323873.3325031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3323873.3325031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hierarchical Attentive Deep Neural Network Model for Semantic Music Annotation Integrating Multiple Music Representations
Automatically assigning a group of appropriate semantic tags to one music piece provides an effective way for people to efficiently utilize the massive and ever increasing on-line and off-line music data. In this paper, we propose a novel content-based automatic music annotation model that hierarchically combines attentive convolutional networks and recurrent networks for music representation learning, structure modelling and tag prediction. The model first exploits two separate attentive convolutional networks composed of multiple gated linear units (GLUs) to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music, which better captures informative features of the music for the annotation task than exploiting any single representation channel. The model then exploits bidirectional Long Short-Term Memory (LSTM) networks to depict the time-varying structures embedded in the description sequences of the music, and further introduces a dual-state LSTM network to encode temporal correlations between two representation channels, which effectively enriches the descriptions of the music. Finally, the model adaptively aggregates music descriptions generated at every time step with a self-attentive multi-weighting mechanism for music tag prediction. The proposed model achieves state-of-the-art results on the public MagnaTagATune music dataset, demonstrating its effectiveness on music annotation.