International Society for Music Information Retrieval Conference最新文献

A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability 用于测量和预测音乐作品可记性的数据集和基线

International Society for Music Information Retrieval Conference Pub Date : 2024-05-21 DOI: 10.5281/zenodo.10265251

Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

{"title":"A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability","authors":"Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang","doi":"10.5281/zenodo.10265251","DOIUrl":"https://doi.org/10.5281/zenodo.10265251","url":null,"abstract":"Nowadays, humans are constantly exposed to music, whether through voluntary streaming services or incidental encounters during commercial breaks. Despite the abundance of music, certain pieces remain more memorable and often gain greater popularity. Inspired by this phenomenon, we focus on measuring and predicting music memorability. To achieve this, we collect a new music piece dataset with reliable memorability labels using a novel interactive experimental procedure. We then train baselines to predict and analyze music memorability, leveraging both interpretable features and audio mel-spectrograms as inputs. To the best of our knowledge, we are the first to explore music memorability using data-driven deep learning-based methods. Through a series of experiments and ablation studies, we demonstrate that while there is room for improvement, predicting music memorability with limited data is possible. Certain intrinsic elements, such as higher valence, arousal, and faster tempo, contribute to memorable music. As prediction techniques continue to evolve, real-life applications like music recommendation systems and music style transfer will undoubtedly benefit from this new area of research.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"120 24","pages":"174-181"},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141115248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Singer Identity Representation Learning Using Self-Supervised Techniques 利用自我监督技术学习歌手身份表征

International Society for Music Information Retrieval Conference Pub Date : 2024-01-10 DOI: 10.5281/zenodo.10265323

Bernardo Torres, S. Lattner, Gaël Richard

引用次数: 0

Online Symbolic Music Alignment With Offline Reinforcement Learning 利用离线强化学习进行在线符号音乐配准

International Society for Music Information Retrieval Conference Pub Date : 2023-12-31 DOI: 10.5281/zenodo.10265367

Silvan David Peter

{"title":"Online Symbolic Music Alignment With Offline Reinforcement Learning","authors":"Silvan David Peter","doi":"10.5281/zenodo.10265367","DOIUrl":"https://doi.org/10.5281/zenodo.10265367","url":null,"abstract":"Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment states can be sampled exhaustively and the reward is dense, rendering a formulation as a simplified offline RL problem straightforward. We evaluate the trained agent in three ways. First, in its capacity to identify correct score positions for sampled test contexts; second, as the core technique of a complete algorithm for symbolic online note-wise alignment; and finally, as a real-time symbolic score follower. We further investigate the pitch-based score and performance representations used as the agent's inputs. To this end, we develop a second model, a two-step Dynamic Time Warping (DTW)-based offline alignment algorithm leveraging the same input representation. The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"113 28","pages":"634-641"},"PeriodicalIF":0.0,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139133506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Timbre Transfer Using Image-to-Image Denoising Diffusion Implicit Models 利用图像间去噪扩散隐含模型进行音调传递

International Society for Music Information Retrieval Conference Pub Date : 2023-07-10 DOI: 10.5281/zenodo.10265271

Luca Comanducci, F. Antonacci, A. Sarti

{"title":"Timbre Transfer Using Image-to-Image Denoising Diffusion Implicit Models","authors":"Luca Comanducci, F. Antonacci, A. Sarti","doi":"10.5281/zenodo.10265271","DOIUrl":"https://doi.org/10.5281/zenodo.10265271","url":null,"abstract":"Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many timbre transfer, by converting audio waveforms containing only single instruments and multiple instruments, respectively. We compare the proposed technique with existing state-of-the-art methods both through listening tests and objective measures in order to demonstrate the effectiveness of the proposed model.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"29 1","pages":"257-263"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139361161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adapting Meter Tracking Models to Latin American Music 适应拉丁美洲音乐的节拍跟踪模型

International Society for Music Information Retrieval Conference Pub Date : 2023-04-14 DOI: 10.48550/arXiv.2304.07186

Lucas Maia, Martín Rocamora, L. Biscainho, Magdalena Fuentes

{"title":"Adapting Meter Tracking Models to Latin American Music","authors":"Lucas Maia, Martín Rocamora, L. Biscainho, Magdalena Fuentes","doi":"10.48550/arXiv.2304.07186","DOIUrl":"https://doi.org/10.48550/arXiv.2304.07186","url":null,"abstract":"Beat and downbeat tracking models have improved significantly in recent years with the introduction of deep learning methods. However, despite these improvements, several challenges remain. Particularly, the adaptation of available models to underrepresented music traditions in MIR is usually synonymous with collecting and annotating large amounts of data, which is impractical and time-consuming. Transfer learning, data augmentation, and fine-tuning techniques have been used quite successfully in related tasks and are known to alleviate this bottleneck. Furthermore, when studying these music traditions, models are not required to generalize to multiple mainstream music genres but to perform well in more constrained, homogeneous conditions. In this work, we investigate simple yet effective strategies to adapt beat and downbeat tracking models to two different Latin American music traditions and analyze the feasibility of these adaptations in real-world applications concerning the data and computational requirements. Contrary to common belief, our findings show it is possible to achieve good performance by spending just a few minutes annotating a portion of the data and training a model in a standard CPU machine, with the precise amount of resources needed depending on the task and the complexity of the dataset.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122322289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song 从歌词中塑造流行歌曲旋律生成的节奏

International Society for Music Information Retrieval Conference Pub Date : 2023-01-03 DOI: 10.48550/arXiv.2301.01361

Daiyu Zhang, Ju-Chiang Wang, K. Kosta, Jordan B. L. Smith, Shicen Zhou

{"title":"Modeling the Rhythm from Lyrics for Melody Generation of Pop Song","authors":"Daiyu Zhang, Ju-Chiang Wang, K. Kosta, Jordan B. L. Smith, Shicen Zhou","doi":"10.48550/arXiv.2301.01361","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01361","url":null,"abstract":"Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody generation show that the proposed framework is able to model key characteristics of rhythm and pitch distributions in the dataset, and in a subjective evaluation, the melodies generated by our system were rated as similar to or better than those of a state-of-the-art alternative.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129233970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Generating music with sentiment using Transformer-GANs 使用变形金刚生成带有情感的音乐

International Society for Music Information Retrieval Conference Pub Date : 2022-12-21 DOI: 10.48550/arXiv.2212.11134

Pedro Neves, José Fornari, J. Florindo

{"title":"Generating music with sentiment using Transformer-GANs","authors":"Pedro Neves, José Fornari, J. Florindo","doi":"10.48550/arXiv.2212.11134","DOIUrl":"https://doi.org/10.48550/arXiv.2212.11134","url":null,"abstract":"The field of Automatic Music Generation has seen significant progress thanks to the advent of Deep Learning. However, most of these results have been produced by unconditional models, which lack the ability to interact with their users, not allowing them to guide the generative process in meaningful and practical ways. Moreover, synthesizing music that remains coherent across longer timescales while still capturing the local aspects that make it sound ``realistic'' or ``human-like'' is still challenging. This is due to the large computational requirements needed to work with long sequences of data, and also to limitations imposed by the training schemes that are often employed. In this paper, we propose a generative model of symbolic music conditioned by data retrieved from human sentiment. The model is a Transformer-GAN trained with labels that correspond to different configurations of the valence and arousal dimensions that quantitatively represent human affective states. We try to tackle both of the problems above by employing an efficient linear version of Attention and using a Discriminator both as a tool to improve the overall quality of the generated music and its ability to follow the conditioning signals.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132264760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Melody transcription via generative pre-training 通过生成式预训练进行旋律转录

International Society for Music Information Retrieval Conference Pub Date : 2022-12-04 DOI: 10.48550/arXiv.2212.01884

Chris Donahue, John Thickstun, Percy Liang

{"title":"Melody transcription via generative pre-training","authors":"Chris Donahue, John Thickstun, Percy Liang","doi":"10.48550/arXiv.2212.01884","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01884","url":null,"abstract":"Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114248668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations Texere直到找到Sonus !使用电影改编的书籍的自动密集配乐构建

International Society for Music Information Retrieval Conference Pub Date : 2022-12-02 DOI: 10.48550/arXiv.2212.01033

Jaidev Shriram, Makarand Tapaswi, Vinoo Alluri

{"title":"Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations","authors":"Jaidev Shriram, Makarand Tapaswi, Vinoo Alluri","doi":"10.48550/arXiv.2212.01033","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01033","url":null,"abstract":"Reading, much like music listening, is an immersive experience that transports readers while taking them on an emotional journey. Listening to complementary music has the potential to amplify the reading experience, especially when the music is stylistically cohesive and emotionally relevant. In this paper, we propose the first fully automatic method to build a dense soundtrack for books, which can play high-quality instrumental music for the entirety of the reading duration. Our work employs a unique text processing and music weaving pipeline that determines the context and emotional composition of scenes in a chapter. This allows our method to identify and play relevant excerpts from the soundtrack of the book's movie adaptation. By relying on the movie composer's craftsmanship, our book soundtracks include expert-made motifs and other scene-specific musical characteristics. We validate the design decisions of our approach through a perceptual study. Our readers note that the book soundtrack greatly enhanced their reading experience, due to high immersiveness granted via uninterrupted and style-consistent music, and a heightened emotional state attained via high precision emotion and scene context recognition.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132950848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dataset for Greek Traditional and Folk Music: Lyra 希腊传统和民间音乐的数据集:Lyra

International Society for Music Information Retrieval Conference Pub Date : 2022-11-21 DOI: 10.48550/arXiv.2211.11479

Charilaos Papaioannou, Ioannis Valiantzas, Theodoros Giannakopoulos, Maximos A. Kaliakatsos-Papakostas, A. Potamianos

{"title":"A Dataset for Greek Traditional and Folk Music: Lyra","authors":"Charilaos Papaioannou, Ioannis Valiantzas, Theodoros Giannakopoulos, Maximos A. Kaliakatsos-Papakostas, A. Potamianos","doi":"10.48550/arXiv.2211.11479","DOIUrl":"https://doi.org/10.48550/arXiv.2211.11479","url":null,"abstract":"Studying under-represented music traditions under the MIR scope is crucial, not only for developing novel analysis tools, but also for unveiling musical functions that might prove useful in studying world musics. This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre, among others. The content has been collected from a Greek documentary series that is available online, where academics present music traditions of Greece with live music and dance performance during the show, along with discussions about social, cultural and musicological aspects of the presented music. Therefore, this procedure has resulted in a significant wealth of descriptions regarding a variety of aspects, such as musical genre, places of origin and musical instruments. In addition, the audio recordings were performed under strict production-level specifications, in terms of recording equipment, leading to very clean and homogeneous audio content. In this work, apart from presenting the dataset in detail, we propose a baseline deep-learning classification approach to recognize the involved musicological attributes. The dataset, the baseline classification methods and the models are provided in public repositories. Future directions for further refining the dataset are also discussed.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129472415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2