International Society for Music Information Retrieval Conference最新文献_第2页

Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts 利用设备和音频数据在用户感知的听环境中标记音乐

International Society for Music Information Retrieval Conference Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07250

Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard

{"title":"Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts","authors":"Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard","doi":"10.48550/arXiv.2211.07250","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07250","url":null,"abstract":"As music has become more available especially on music streaming platforms, people have started to have distinct preferences to fit to their varying listening situations, also known as context. Hence, there has been a growing interest in considering the user's situation when recommending music to users. Previous works have proposed user-aware autotaggers to infer situation-related tags from music content and user's global listening preferences. However, in a practical music retrieval system, the autotagger could be only used by assuming that the context class is explicitly provided by the user. In this work, for designing a fully automatised music retrieval system, we propose to disambiguate the user's listening information from their stream data. Namely, we propose a system which can generate a situational playlist for a user at a certain time 1) by leveraging user-aware music autotaggers, and 2) by automatically inferring the user's situation from stream data (e.g. device, network) and user's general profile information (e.g. age). Experiments show that such a context-aware personalized music retrieval system is feasible, but the performance decreases in the case of new users, new tracks or when the number of context classes increases.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125129131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations YM2413-MDB:一个带有情感注释的多乐器FM视频游戏音乐数据集

International Society for Music Information Retrieval Conference Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07131

Eunjin Choi, Y. Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam

引用次数: 1

Analysis and Detection of Singing Techniques in Repertoires of J-POP Solo Singers J-POP独唱歌手演唱技巧的分析与检测

International Society for Music Information Retrieval Conference Pub Date : 2022-10-31 DOI: 10.48550/arXiv.2210.17367

Yuya Yamamoto, Juhan Nam, Hiroko Terasawa

引用次数: 2

Attention-Based Audio Embeddings for Query-by-Example 基于注意的音频嵌入，用于按例查询

International Society for Music Information Retrieval Conference Pub Date : 2022-10-16 DOI: 10.48550/arXiv.2210.08624

Anup Singh, Kris Demuynck, Vipul Arora

引用次数: 1

JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE JukeDrummer:通过Transformer VQ-VAE生成条件节拍感知音频域鼓伴奏

International Society for Music Information Retrieval Conference Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06007

Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang

{"title":"JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE","authors":"Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang","doi":"10.48550/arXiv.2210.06007","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06007","url":null,"abstract":"This paper proposes a model that generates a drum track in the audio domain to play along to a user-provided drum-free recording. Specifically, using paired data of drumless tracks and the corresponding human-made drum tracks, we train a Transformer model to improvise the drum part of an unseen drumless recording. We combine two approaches to encode the input audio. First, we train a vector-quantized variational autoencoder (VQ-VAE) to represent the input audio with discrete codes, which can then be readily used in a Transformer. Second, using an audio-domain beat tracking model, we compute beat-related features of the input audio and use them as embeddings in the Transformer. Instead of generating the drum track directly as waveforms, we use a separate VQ-VAE to encode the mel-spectrogram of a drum track into another set of discrete codes, and train the Transformer to predict the sequence of drum-related discrete codes. The output codes are then converted to a mel-spectrogram with a decoder, and then to the waveform with a vocoder. We report both objective and subjective evaluations of variants of the proposed model, demonstrating that the model with beat information generates drum accompaniment that is rhythmically and stylistically consistent with the input audio.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127281703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Supervised and Unsupervised Learning of Audio Representations for Music Understanding 用于音乐理解的音频表示的监督和非监督学习

International Society for Music Information Retrieval Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03799

Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, F. Gouyon, Andreas F. Ehmann

{"title":"Supervised and Unsupervised Learning of Audio Representations for Music Understanding","authors":"Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, F. Gouyon, Andreas F. Ehmann","doi":"10.48550/arXiv.2210.03799","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03799","url":null,"abstract":"In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127135746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Melody Infilling with User-Provided Structural Context 旋律填充与用户提供的结构背景

International Society for Music Information Retrieval Conference Pub Date : 2022-10-06 DOI: 10.48550/arXiv.2210.02829

Chih-Pin Tan, A. Su, Yi-Hsuan Yang

引用次数: 2

And what if two musical versions don't share melody, harmony, rhythm, or lyrics ? 如果两个音乐版本不共享旋律、和声、节奏或歌词怎么办?

International Society for Music Information Retrieval Conference Pub Date : 2022-10-03 DOI: 10.48550/arXiv.2210.01256

M. Abrassart, Guillaume Doras

{"title":"And what if two musical versions don't share melody, harmony, rhythm, or lyrics ?","authors":"M. Abrassart, Guillaume Doras","doi":"10.48550/arXiv.2210.01256","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01256","url":null,"abstract":"Version identification (VI) has seen substantial progress over the past few years. On the one hand, the introduction of the metric learning paradigm has favored the emergence of scalable yet accurate VI systems. On the other hand, using features focusing on specific aspects of musical pieces, such as melody, harmony, or lyrics, yielded interpretable and promising performances. In this work, we build upon these recent advances and propose a metric learning-based system systematically leveraging four dimensions commonly admitted to convey musical similarity between versions: melodic line, harmonic structure, rhythmic patterns, and lyrics. We describe our deliberately simple model architecture, and we show in particular that an approximated representation of the lyrics is an efficient proxy to discriminate between versions and non-versions. We then describe how these features complement each other and yield new state-of-the-art performances on two publicly available datasets. We finally suggest that a VI system using a combination of melodic, harmonic, rhythmic and lyrics features could theoretically reach the optimal performances obtainable on these datasets.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123069048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Hierarchical Metrical Structure Beyond Measures 学习超越尺度的等级格律结构

International Society for Music Information Retrieval Conference Pub Date : 2022-09-21 DOI: 10.48550/arXiv.2209.10259

Junyan Jiang, Daniel Chin, Yixiao Zhang, Gus G. Xia

引用次数: 3

Modeling Perceptual Loudness of Piano Tone: Theory and Applications 钢琴音色感知响度建模:理论与应用

International Society for Music Information Retrieval Conference Pub Date : 2022-09-21 DOI: 10.48550/arXiv.2209.10674

Yang Qu, Yutian Qin, Lecheng Chao, Hangkai Qian, Ziyu Wang, Gus G. Xia

{"title":"Modeling Perceptual Loudness of Piano Tone: Theory and Applications","authors":"Yang Qu, Yutian Qin, Lecheng Chao, Hangkai Qian, Ziyu Wang, Gus G. Xia","doi":"10.48550/arXiv.2209.10674","DOIUrl":"https://doi.org/10.48550/arXiv.2209.10674","url":null,"abstract":"The relationship between perceptual loudness and physical attributes of sound is an important subject in both computer music and psychoacoustics. Early studies of\"equal-loudness contour\"can trace back to the 1920s and the measured loudness with respect to intensity and frequency has been revised many times since then. However, most studies merely focus on synthesized sound, and the induced theories on natural tones with complex timbre have rarely been justified. To this end, we investigate both theory and applications of natural-tone loudness perception in this paper via modeling piano tone. The theory part contains: 1) an accurate measurement of piano-tone equal-loudness contour of pitches, and 2) a machine-learning model capable of inferring loudness purely based on spectral features trained on human subject measurements. As for the application, we apply our theory to piano control transfer, in which we adjust the MIDI velocities on two different player pianos (in different acoustic environments) to achieve the same perceptual effect. Experiments show that both our theoretical loudness modeling and the corresponding performance control transfer algorithm significantly outperform their baselines.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132099082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0