International Society for Music Information Retrieval Conference最新文献

筛选
英文 中文
Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts 利用设备和音频数据在用户感知的听环境中标记音乐
International Society for Music Information Retrieval Conference Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07250
Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard
{"title":"Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts","authors":"Karim M. Ibrahim, Elena V. Epure, G. Peeters, G. Richard","doi":"10.48550/arXiv.2211.07250","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07250","url":null,"abstract":"As music has become more available especially on music streaming platforms, people have started to have distinct preferences to fit to their varying listening situations, also known as context. Hence, there has been a growing interest in considering the user's situation when recommending music to users. Previous works have proposed user-aware autotaggers to infer situation-related tags from music content and user's global listening preferences. However, in a practical music retrieval system, the autotagger could be only used by assuming that the context class is explicitly provided by the user. In this work, for designing a fully automatised music retrieval system, we propose to disambiguate the user's listening information from their stream data. Namely, we propose a system which can generate a situational playlist for a user at a certain time 1) by leveraging user-aware music autotaggers, and 2) by automatically inferring the user's situation from stream data (e.g. device, network) and user's general profile information (e.g. age). Experiments show that such a context-aware personalized music retrieval system is feasible, but the performance decreases in the case of new users, new tracks or when the number of context classes increases.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125129131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations YM2413-MDB:一个带有情感注释的多乐器FM视频游戏音乐数据集
International Society for Music Information Retrieval Conference Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07131
Eunjin Choi, Y. Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam
{"title":"YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations","authors":"Eunjin Choi, Y. Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam","doi":"10.48550/arXiv.2211.07131","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07131","url":null,"abstract":"Existing multi-instrumental datasets tend to be biased toward pop and classical music. In addition, they generally lack high-level annotations such as emotion tags. In this paper, we propose YM2413-MDB, an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument. They were converted from binary commands of the YM2413 sound chip. Each song was labeled with 19 emotion tags by two annotators and validated by three verifiers to obtain refined tags. We provide the baseline models and results for emotion recognition and emotion-conditioned symbolic music generation using YM2413-MDB.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131598224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis and Detection of Singing Techniques in Repertoires of J-POP Solo Singers J-POP独唱歌手演唱技巧的分析与检测
International Society for Music Information Retrieval Conference Pub Date : 2022-10-31 DOI: 10.48550/arXiv.2210.17367
Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
{"title":"Analysis and Detection of Singing Techniques in Repertoires of J-POP Solo Singers","authors":"Yuya Yamamoto, Juhan Nam, Hiroko Terasawa","doi":"10.48550/arXiv.2210.17367","DOIUrl":"https://doi.org/10.48550/arXiv.2210.17367","url":null,"abstract":"In this paper, we focus on singing techniques within the scope of music information retrieval research. We investigate how singers use singing techniques using real-world recordings of famous solo singers in Japanese popular music songs (J-POP). First, we built a new dataset of singing techniques. The dataset consists of 168 commercial J-POP songs, and each song is annotated using various singing techniques with timestamps and vocal pitch contours. We also present descriptive statistics of singing techniques on the dataset to clarify what and how often singing techniques appear. We further explored the difficulty of the automatic detection of singing techniques using previously proposed machine learning techniques. In the detection, we also investigate the effectiveness of auxiliary information (i.e., pitch and distribution of label duration), not only providing the baseline. The best result achieves 40.4% at macro-average F-measure on nine-way multi-class detection. We provide the annotation of the dataset and its detail on the appendix website 0 .","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117275118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Attention-Based Audio Embeddings for Query-by-Example 基于注意的音频嵌入,用于按例查询
International Society for Music Information Retrieval Conference Pub Date : 2022-10-16 DOI: 10.48550/arXiv.2210.08624
Anup Singh, Kris Demuynck, Vipul Arora
{"title":"Attention-Based Audio Embeddings for Query-by-Example","authors":"Anup Singh, Kris Demuynck, Vipul Arora","doi":"10.48550/arXiv.2210.08624","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08624","url":null,"abstract":"An ideal audio retrieval system efficiently and robustly recognizes a short query snippet from an extensive database. However, the performance of well-known audio fingerprinting systems falls short at high signal distortion levels. This paper presents an audio retrieval system that generates noise and reverberation robust audio fingerprints using the contrastive learning framework. Using these fingerprints, the method performs a comprehensive search to identify the query audio and precisely estimate its timestamp in the reference audio. Our framework involves training a CNN to maximize the similarity between pairs of embeddings extracted from clean audio and its corresponding distorted and time-shifted version. We employ a channel-wise spectral-temporal attention mechanism to better discriminate the audio by giving more weight to the salient spectral-temporal patches in the signal. Experimental results indicate that our system is efficient in computation and memory usage while being more accurate, particularly at higher distortion levels, than competing state-of-the-art systems and scalable to a larger database.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"558 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123390238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE JukeDrummer:通过Transformer VQ-VAE生成条件节拍感知音频域鼓伴奏
International Society for Music Information Retrieval Conference Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06007
Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang
{"title":"JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE","authors":"Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang","doi":"10.48550/arXiv.2210.06007","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06007","url":null,"abstract":"This paper proposes a model that generates a drum track in the audio domain to play along to a user-provided drum-free recording. Specifically, using paired data of drumless tracks and the corresponding human-made drum tracks, we train a Transformer model to improvise the drum part of an unseen drumless recording. We combine two approaches to encode the input audio. First, we train a vector-quantized variational autoencoder (VQ-VAE) to represent the input audio with discrete codes, which can then be readily used in a Transformer. Second, using an audio-domain beat tracking model, we compute beat-related features of the input audio and use them as embeddings in the Transformer. Instead of generating the drum track directly as waveforms, we use a separate VQ-VAE to encode the mel-spectrogram of a drum track into another set of discrete codes, and train the Transformer to predict the sequence of drum-related discrete codes. The output codes are then converted to a mel-spectrogram with a decoder, and then to the waveform with a vocoder. We report both objective and subjective evaluations of variants of the proposed model, demonstrating that the model with beat information generates drum accompaniment that is rhythmically and stylistically consistent with the input audio.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127281703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Supervised and Unsupervised Learning of Audio Representations for Music Understanding 用于音乐理解的音频表示的监督和非监督学习
International Society for Music Information Retrieval Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03799
Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, F. Gouyon, Andreas F. Ehmann
{"title":"Supervised and Unsupervised Learning of Audio Representations for Music Understanding","authors":"Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, F. Gouyon, Andreas F. Ehmann","doi":"10.48550/arXiv.2210.03799","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03799","url":null,"abstract":"In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127135746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Melody Infilling with User-Provided Structural Context 旋律填充与用户提供的结构背景
International Society for Music Information Retrieval Conference Pub Date : 2022-10-06 DOI: 10.48550/arXiv.2210.02829
Chih-Pin Tan, A. Su, Yi-Hsuan Yang
{"title":"Melody Infilling with User-Provided Structural Context","authors":"Chih-Pin Tan, A. Su, Yi-Hsuan Yang","doi":"10.48550/arXiv.2210.02829","DOIUrl":"https://doi.org/10.48550/arXiv.2210.02829","url":null,"abstract":"This paper proposes a novel Transformer-based model for music score infilling, to generate a music passage that fills in the gap between given past and future contexts. While existing infilling approaches can generate a passage that connects smoothly locally with the given contexts, they do not take into account the musical form or structure of the music and may therefore generate overly smooth results. To address this issue, we propose a structure-aware conditioning approach that employs a novel attention-selecting module to supply user-provided structure-related information to the Transformer for infilling. With both objective and subjective evaluations, we show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality than the two existing structure-agnostic infilling models.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127999335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
And what if two musical versions don't share melody, harmony, rhythm, or lyrics ? 如果两个音乐版本不共享旋律、和声、节奏或歌词怎么办?
International Society for Music Information Retrieval Conference Pub Date : 2022-10-03 DOI: 10.48550/arXiv.2210.01256
M. Abrassart, Guillaume Doras
{"title":"And what if two musical versions don't share melody, harmony, rhythm, or lyrics ?","authors":"M. Abrassart, Guillaume Doras","doi":"10.48550/arXiv.2210.01256","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01256","url":null,"abstract":"Version identification (VI) has seen substantial progress over the past few years. On the one hand, the introduction of the metric learning paradigm has favored the emergence of scalable yet accurate VI systems. On the other hand, using features focusing on specific aspects of musical pieces, such as melody, harmony, or lyrics, yielded interpretable and promising performances. In this work, we build upon these recent advances and propose a metric learning-based system systematically leveraging four dimensions commonly admitted to convey musical similarity between versions: melodic line, harmonic structure, rhythmic patterns, and lyrics. We describe our deliberately simple model architecture, and we show in particular that an approximated representation of the lyrics is an efficient proxy to discriminate between versions and non-versions. We then describe how these features complement each other and yield new state-of-the-art performances on two publicly available datasets. We finally suggest that a VI system using a combination of melodic, harmonic, rhythmic and lyrics features could theoretically reach the optimal performances obtainable on these datasets.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123069048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Hierarchical Metrical Structure Beyond Measures 学习超越尺度的等级格律结构
International Society for Music Information Retrieval Conference Pub Date : 2022-09-21 DOI: 10.48550/arXiv.2209.10259
Junyan Jiang, Daniel Chin, Yixiao Zhang, Gus G. Xia
{"title":"Learning Hierarchical Metrical Structure Beyond Measures","authors":"Junyan Jiang, Daniel Chin, Yixiao Zhang, Gus G. Xia","doi":"10.48550/arXiv.2209.10259","DOIUrl":"https://doi.org/10.48550/arXiv.2209.10259","url":null,"abstract":"Music contains hierarchical structures beyond beats and measures. While hierarchical structure annotations are helpful for music information retrieval and computer musicology, such annotations are scarce in current digital music databases. In this paper, we explore a data-driven approach to automatically extract hierarchical metrical structures from scores. We propose a new model with a Temporal Convolutional Network-Conditional Random Field (TCN-CRF) architecture. Given a symbolic music score, our model takes in an arbitrary number of voices in a beat-quantized form, and predicts a 4-level hierarchical metrical structure from downbeat-level to section-level. We also annotate a dataset using RWC-POP MIDI files to facilitate training and evaluation. We show by experiments that the proposed method performs better than the rule-based approach under different orchestration settings. We also perform some simple musicological analysis on the model predictions. All demos, datasets and pre-trained models are publicly available on Github.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123481823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modeling Perceptual Loudness of Piano Tone: Theory and Applications 钢琴音色感知响度建模:理论与应用
International Society for Music Information Retrieval Conference Pub Date : 2022-09-21 DOI: 10.48550/arXiv.2209.10674
Yang Qu, Yutian Qin, Lecheng Chao, Hangkai Qian, Ziyu Wang, Gus G. Xia
{"title":"Modeling Perceptual Loudness of Piano Tone: Theory and Applications","authors":"Yang Qu, Yutian Qin, Lecheng Chao, Hangkai Qian, Ziyu Wang, Gus G. Xia","doi":"10.48550/arXiv.2209.10674","DOIUrl":"https://doi.org/10.48550/arXiv.2209.10674","url":null,"abstract":"The relationship between perceptual loudness and physical attributes of sound is an important subject in both computer music and psychoacoustics. Early studies of\"equal-loudness contour\"can trace back to the 1920s and the measured loudness with respect to intensity and frequency has been revised many times since then. However, most studies merely focus on synthesized sound, and the induced theories on natural tones with complex timbre have rarely been justified. To this end, we investigate both theory and applications of natural-tone loudness perception in this paper via modeling piano tone. The theory part contains: 1) an accurate measurement of piano-tone equal-loudness contour of pitches, and 2) a machine-learning model capable of inferring loudness purely based on spectral features trained on human subject measurements. As for the application, we apply our theory to piano control transfer, in which we adjust the MIDI velocities on two different player pianos (in different acoustic environments) to achieve the same perceptual effect. Experiments show that both our theoretical loudness modeling and the corresponding performance control transfer algorithm significantly outperform their baselines.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132099082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信