{"title":"Kiite Cafe: A Web Service for Getting Together Virtually to Listen to Music","authors":"Kosetsu Tsukuda, Keisuke Ishida, Masahiro Hamasaki, Masataka Goto","doi":"10.5281/ZENODO.5624491","DOIUrl":"https://doi.org/10.5281/ZENODO.5624491","url":null,"abstract":"In light of the COVID-19 pandemic making it difficult for people to get together in person, this paper describes a public web service called Kiite Cafe that lets users get together virtually to listen to music. When users listen to music on Kiite Cafe, their experiences are characterized by two architectures: (i) visualization of each user’s reactions, and (ii) selection of songs from users’ favorite songs. These architectures enable users to feel social connection with others and the joy of introducing others to their favorite songs as if they were together in person to listen to music. In addition, the architectures provide three user experiences: (1) motivation to react to played songs, (2) the opportunity to listen to a diverse range of songs, and (3) the opportunity to contribute as curators. By analyzing the behavior logs of 1,760 Kiite Cafe users over about five months, we quantitatively show that these user experiences can generate various effects (e.g., users react to a more diverse range of songs on Kiite Cafe than when listening alone). We also discuss how our proposed architectures can continue to enrich music listening experiences with others even after the pandemic’s resolution.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121824062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Music Tagging Transformer","authors":"Minz Won, Keunwoo Choi, Xavier Serra","doi":"10.5281/ZENODO.5624405","DOIUrl":"https://doi.org/10.5281/ZENODO.5624405","url":null,"abstract":"We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme. \u0000The Music Tagging Transformer is further improved by noisy student training, a semi-supervised approach that leverages both labeled and unlabeled data combined with data augmentation. To our best knowledge, this is the first attempt to utilize the entire audio of the million song dataset.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113942839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang
{"title":"Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation","authors":"Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang","doi":"10.5281/ZENODO.5624475","DOIUrl":"https://doi.org/10.5281/ZENODO.5624475","url":null,"abstract":"Deep neural network based methods have been successfully applied to music source separation. They typically learn a mapping from a mixture spectrogram to a set of source spectrograms, all with magnitudes only. This approach has several limitations: 1) its incorrect phase reconstruction degrades the performance, 2) it limits the magnitude of masks between 0 and 1 while we observe that 22% of time-frequency bins have ideal ratio mask values of over~1 in a popular dataset, MUSDB18, 3) its potential on very deep architectures is under-explored. Our proposed system is designed to overcome these. First, we propose to estimate phases by estimating complex ideal ratio masks (cIRMs) where we decouple the estimation of cIRMs into magnitude and phase estimations. Second, we extend the separation method to effectively allow the magnitude of the mask to be larger than 1. Finally, we propose a residual UNet architecture with up to 143 layers. Our proposed system achieves a state-of-the-art MSS result on the MUSDB18 dataset, especially, a SDR of 8.98~dB on vocals, outperforming the previous best performance of 7.24~dB. The source code is available at: this https URL","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130986258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsiao-Tzu Hung, Joann Ching, Seungheon Doh, Nabin Kim, Juhan Nam, Yi-Hsuan Yang
{"title":"EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation","authors":"Hsiao-Tzu Hung, Joann Ching, Seungheon Doh, Nabin Kim, Juhan Nam, Yi-Hsuan Yang","doi":"10.5281/ZENODO.5090631","DOIUrl":"https://doi.org/10.5281/ZENODO.5090631","url":null,"abstract":"While there are many music datasets with emotion labels in the literature, they cannot be used for research on symbolic-domain music analysis or generation, as there are usually audio files only. In this paper, we present the EMOPIA (pronounced `yee-mo-pi-uh') dataset, a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. Since the clips are not restricted to one clip per song, they can also be used for song-level analysis. We present the methodology for building the dataset, covering the song list curation, clip selection, and emotion annotation processes. Moreover, we prototype use cases on clip-level music emotion classification and emotion-based symbolic music generation by training and evaluating corresponding models using the dataset. The result demonstrates the potential of EMOPIA for being used in future exploration on piano emotion-related MIR tasks.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125713276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent User Interfaces for Music Discovery: The Past 20 Years and What's to Come","authors":"Peter Knees, M. Schedl, Masataka Goto","doi":"10.5334/TISMIR.60","DOIUrl":"https://doi.org/10.5334/TISMIR.60","url":null,"abstract":"Assisting the user in finding music is one of the original motivations that led to the establishment of Music Information Retrieval (MIR) as a research field. This encompasses classic Information Retrieval inspired access to music repositories that aims at meeting an information need of an expert user. Beyond this, however, music as a cultural art form is also connected to an entertainment need of potential listeners, requiring more intuitive and engaging means for music discovery. A central aspect in this process is the user interface. In this article, we reflect on the evolution of MIR-driven intelligent user interfaces for music browsing and discovery over the past two decades. We argue that three major developments have transformed and shaped user interfaces during this period, each connected to a phase of new listening practices. Phase 1 has seen the development of content-based music retrieval interfaces built upon audio processing and content description algorithms facilitating the automatic organization of repositories and finding music according to sound qualities. These interfaces are primarily connected to personal music collections or (still) small commercial catalogs. Phase 2 comprises interfaces incorporating collaborative and automatic semantic description of music, exploiting knowledge captured in user-generated metadata. These interfaces are connected to collective web platforms. Phase 3 is dominated by recommender systems built upon the collection of online music interaction traces on a large scale. These interfaces are connected to streaming services. We review and contextualize work from all three phases and extrapolate current developments to outline possible scenarios of music recommendation and listening interfaces of the future.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114822792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard
{"title":"Should we consider the users in contextual music auto-tagging models?","authors":"Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard","doi":"10.5281/ZENODO.3961560","DOIUrl":"https://doi.org/10.5281/ZENODO.3961560","url":null,"abstract":"Music tags are commonly used to describe and categorize music. Various auto-tagging models and datasets have been proposed for the automatic music annotation with tags. However, the past approaches often neglect the fact that many of these tags largely depend on the user, especially the tags related to the context of music listening. In this paper, we address this problem by proposing a user-aware music auto-tagging system and evaluation protocol. Specifically, we use both the audio content and user information extracted from the user listening history to predict contextual tags for a given user/track pair. We propose a new dataset of music tracks annotated with contextual tags per user. We compare our model to the traditional audio-based model and study the influence of user embeddings on the classification quality. Our work shows that explicitly modeling the user listening history into the automatic tagging process could lead to more accurate estimation of contextual tags.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User Perceptions Underlying Social Music Behavior","authors":"Louis Spinelli, Josephine Lau, Jin Ha Lee","doi":"10.5281/ZENODO.4245474","DOIUrl":"https://doi.org/10.5281/ZENODO.4245474","url":null,"abstract":"","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116630601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meijun Liu, Eva Zangerle, Xiao Hu, Alessandro B. Melchiorre, M. Schedl
{"title":"Pandemics, music, and collective sentiment: evidence from the outbreak of COVID-19","authors":"Meijun Liu, Eva Zangerle, Xiao Hu, Alessandro B. Melchiorre, M. Schedl","doi":"10.5281/ZENODO.4245394","DOIUrl":"https://doi.org/10.5281/ZENODO.4245394","url":null,"abstract":"The COVID-19 pandemic causes a massive global health crisis and produces substantial economic and social distress, which in turn may cause stress and anxiety among people. Real-world events play a key role in shaping collective sentiment in a society. As people listen to music daily everywhere in the world, the sentiment of music being listened to can reflect the mood of the listeners and serve as a measure of collective sentiment. However, the exact relationship between real-world events and the sentiment of music being listened to is not clear. Driven by this research gap, we use the unexpected outbreak of COVID-19 as a natural experiment to explore how users' sentiment of music being listened to evolves before and during the outbreak of the pandemic. We employ causal inference approaches on an extended version of the LFM-1b dataset of listening events shared on Last.fm, to examine the impact of the pandemic on the sentiment of music listened to by users in different countries. We find that, after the first COVID-19 case in a country was confirmed, the sentiment of artists users listened to becomes more negative. This negative effect is pronounced for males while females' music emotion is less influenced by the outbreak of the COVID-19 pandemic. We further find a negative association between the number of new weekly COVID-19 cases and users' music sentiment. Our results provide empirical evidence that public sentiment can be monitored based on collective music listening behaviors, which can contribute to research in related disciplines.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129554504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keitaro Tanaka, Takayuki Nakatsuka, Ryo Nishikimi, Kazuyoshi Yoshii, S. Morishima
{"title":"Multi-Instrument Music Transcription Based on Deep Spherical Clustering of Spectrograms and Pitchgrams","authors":"Keitaro Tanaka, Takayuki Nakatsuka, Ryo Nishikimi, Kazuyoshi Yoshii, S. Morishima","doi":"10.5281/ZENODO.4245436","DOIUrl":"https://doi.org/10.5281/ZENODO.4245436","url":null,"abstract":"This paper describes a clustering-based music transcription method that estimates the piano rolls of arbitrary musical instrument parts from multi-instrument polyphonic music signals. If target musical pieces are always played by particular kinds of musical instruments, a way to obtain piano rolls is to compute the pitchgram (pitch saliency spectrogram) of each musical instrument by using a deep neural network (DNN). However, this approach has a critical limitation that it has no way to deal with musical pieces including undefined musical instruments. To overcome this limitation, we estimate a condensed pitchgram with an existing instrument-independent neural multi-pitch estimator and then separate the pitchgram into a specified number of musical instrument parts with a deep spherical clustering technique. To improve the performance of transcription, we propose a joint spectrogram and pitchgram clustering method based on the timbral and pitch characteristics of musical instruments. The experimental results show that the proposed method can transcribe musical pieces including unknown musical instruments as well as those containing only predefined instruments, at the state-of-the-art transcription accuracy.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132081400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Finkensiep, Ken Déguernel, M. Neuwirth, M. Rohrmeier
{"title":"Voice-Leading Schema Recognition Using Rhythm and Pitch Features","authors":"Christoph Finkensiep, Ken Déguernel, M. Neuwirth, M. Rohrmeier","doi":"10.5281/ZENODO.4245482","DOIUrl":"https://doi.org/10.5281/ZENODO.4245482","url":null,"abstract":"Musical schemata constitute important structural building blocks used across historical styles and periods. They consist of two or more melodic lines that are combined to form specific successions of intervals. This paper tackles the problem of recognizing voice-leading schemata in polyphonic music. Since schema types and subtypes can be realized in a wide variety of ways on the musical surface, finding schemata in an automated fashion is a challenging task. To perform schema inference we employ a skipgram model that computes schema candidates, which are then classified using a binary classifier on musical features related to pitch and rhythm. This model is evaluated on a novel dataset of schema annotations in Mozart’s pi-ano sonatas produced by expert annotators, which is published alongside this paper. The features are chosen to encode music-theoretically predicted properties of schema instances. We assess the relevance of each feature for the classification task, thus contributing to the theoretical understanding of complex musical objects.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125034643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}