International Society for Music Information Retrieval Conference最新文献

筛选
英文 中文
Generating Coherent Drum Accompaniment With Fills And Improvisations 产生连贯的鼓伴奏与填充和即兴
International Society for Music Information Retrieval Conference Pub Date : 2022-09-01 DOI: 10.48550/arXiv.2209.00291
Rishabh A. Dahale, Vaibhav Talwadker, P. Rao, Prateek Verma
{"title":"Generating Coherent Drum Accompaniment With Fills And Improvisations","authors":"Rishabh A. Dahale, Vaibhav Talwadker, P. Rao, Prateek Verma","doi":"10.48550/arXiv.2209.00291","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00291","url":null,"abstract":"Creating a complex work of art like music necessitates profound creativity. With recent advancements in deep learning and powerful models such as transformers, there has been huge progress in automatic music generation. In an accompaniment generation context, creating a coherent drum pattern with apposite fills and improvisations at proper locations in a song is a challenging task even for an experienced drummer. Drum beats tend to follow a repetitive pattern through stanzas with fills or improvisation at section boundaries. In this work, we tackle the task of drum pattern generation conditioned on the accompanying music played by four melodic instruments: Piano, Guitar, Bass, and Strings. We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data. We propose a novelty function to capture the extent of improvisation in a bar relative to its neighbors. We train a model to predict improvisation locations from the melodic accompaniment tracks. Finally, we use a novel BERT-inspired in-filling architecture, to learn the structure of both the drums and melody to in-fill elements of improvised music.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127929361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cadence Detection in Symbolic Classical Music using Graph Neural Networks 基于图神经网络的符号古典音乐节奏检测
International Society for Music Information Retrieval Conference Pub Date : 2022-08-31 DOI: 10.48550/arXiv.2208.14819
E. Karystinaios, G. Widmer
{"title":"Cadence Detection in Symbolic Classical Music using Graph Neural Networks","authors":"E. Karystinaios, G. Widmer","doi":"10.48550/arXiv.2208.14819","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14819","url":null,"abstract":"Cadences are complex structures that have been driving music from the beginning of contrapuntal polyphony until today. Detecting such structures is vital for numerous MIR tasks such as musicological analysis, key detection, or music segmentation. However, automatic cadence detection remains challenging mainly because it involves a combination of high-level musical elements like harmony, voice leading, and rhythm. In this work, we present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task. We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network. We obtain results that are roughly on par with the state of the art, and we present a model capable of making predictions at multiple levels of granularity, from individual notes to beats, thanks to the fine-grained, note-by-note representation. Moreover, our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context. We argue that this general approach to modeling musical scores and classification tasks has a number of potential advantages, beyond the specific recognition task presented here.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"59 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116522414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning 写生表达:自我监督学习下钢琴表现力演奏的灵活呈现
International Society for Music Information Retrieval Conference Pub Date : 2022-08-31 DOI: 10.48550/arXiv.2208.14867
Seungyeon Rhyu, Sarah Kim, Kyogu Lee
{"title":"Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning","authors":"Seungyeon Rhyu, Sarah Kim, Kyogu Lee","doi":"10.48550/arXiv.2208.14867","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14867","url":null,"abstract":"We propose a system for rendering a symbolic piano performance with flexible musical expression. It is necessary to actively control musical expression for creating a new music performance that conveys various emotions or nuances. However, previous approaches were limited to following the composer's guidelines of musical expression or dealing with only a part of the musical attributes. We aim to disentangle the entire musical expression and structural attribute of piano performance using a conditional VAE framework. It stochastically generates expressive parameters from latent representations and given note structures. In addition, we employ self-supervised approaches that force the latent variables to represent target attributes. Finally, we leverage a two-step encoder and decoder that learn hierarchical dependency to enhance the naturalness of the output. Experimental results show that our system can stably generate performance parameters relevant to the given musical scores, learn disentangled representations, and control musical attributes independently of each other.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123475932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating generative audio systems and their metrics 评估生成音频系统及其参数
International Society for Music Information Retrieval Conference Pub Date : 2022-08-31 DOI: 10.48550/arXiv.2209.00130
Ashvala Vinay, Alexander Lerch
{"title":"Evaluating generative audio systems and their metrics","authors":"Ashvala Vinay, Alexander Lerch","doi":"10.48550/arXiv.2209.00130","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00130","url":null,"abstract":"Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reported metrics in most cases unknown, prohibiting any conclusive insights with respect to practical usability and audio quality. This paper presents a study that investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and with (ii) a listening study. The results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133084584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks MeloForm:基于专家系统和神经网络的曲式旋律生成
International Society for Music Information Retrieval Conference Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14345
Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
{"title":"MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks","authors":"Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu","doi":"10.48550/arXiv.2208.14345","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14345","url":null,"abstract":"Human usually composes music by organizing elements according to the musical form to express music ideas. However, for neural network-based music generation, it is difficult to do so due to the lack of labelled data on musical form. In this paper, we develop MeloForm, a system that generates melody with musical form using expert systems and neural networks. Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models. Both subjective and objective experimental evaluations demonstrate that MeloForm generates melodies with precise musical form control with 97.79% accuracy, and outperforms baseline systems in terms of subjective evaluation score by 0.75, 0.50, 0.86 and 0.89 in structure, thematic, richness and overall quality, without any labelled musical form data. Besides, MeloForm can support various kinds of forms, such as verse and chorus form, rondo form, variational form, sonata form, etc.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131240265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription 钢琴转录中的和声结构和音高不变性建模
International Society for Music Information Retrieval Conference Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14339
Weixing Wei, P. Li, Yi Yu, Wei Li
{"title":"HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription","authors":"Weixing Wei, P. Li, Yi Yu, Wei Li","doi":"10.48550/arXiv.2208.14339","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14339","url":null,"abstract":"While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115659530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards robust music source separation on loud commercial music 在嘈杂的商业音乐上实现健壮的音乐源分离
International Society for Music Information Retrieval Conference Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14355
Chang-Bin Jeon, Kyogu Lee
{"title":"Towards robust music source separation on loud commercial music","authors":"Chang-Bin Jeon, Kyogu Lee","doi":"10.48550/arXiv.2208.14355","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14355","url":null,"abstract":"Nowadays, commercial music has extreme loudness and heavily compressed dynamic range compared to the past. Yet, in music source separation, these characteristics have not been thoroughly considered, resulting in the domain mismatch between the laboratory and the real world. In this paper, we confirmed that this domain mismatch negatively affect the performance of the music source separation networks. To this end, we first created the out-of-domain evaluation datasets, musdb-L and XL, by mimicking the music mastering process. Then, we quantitatively verify that the performance of the state-of-the-art algorithms significantly deteriorated in our datasets. Lastly, we proposed LimitAug data augmentation method to reduce the domain mismatch, which utilizes an online limiter during the training data sampling process. We confirmed that it not only alleviates the performance degradation on our out-of-domain datasets, but also results in higher performance on in-domain data.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Music Separation Enhancement with Generative Modeling 音乐分离增强与生成建模
International Society for Music Information Retrieval Conference Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12387
N. Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
{"title":"Music Separation Enhancement with Generative Modeling","authors":"N. Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo","doi":"10.48550/arXiv.2208.12387","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12387","url":null,"abstract":"Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115360898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MuLan: A Joint Embedding of Music Audio and Natural Language 木兰:音乐音频与自然语言的联合嵌入
International Society for Music Information Retrieval Conference Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12415
Qingqing Huang, A. Jansen, Joonseok Lee, R. Ganti, Judith Yue Li, D. Ellis
{"title":"MuLan: A Joint Embedding of Music Audio and Natural Language","authors":"Qingqing Huang, A. Jansen, Joonseok Lee, R. Ganti, Judith Yue Li, D. Ellis","doi":"10.48550/arXiv.2208.12415","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12415","url":null,"abstract":"Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122912791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings 行为歌曲嵌入的多目标超参数优化
International Society for Music Information Retrieval Conference Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12724
Massimo Quadrana, Antoine Larreche-Mouly, Matthias Mauch
{"title":"Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings","authors":"Massimo Quadrana, Antoine Larreche-Mouly, Matthias Mauch","doi":"10.48550/arXiv.2208.12724","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12724","url":null,"abstract":"Song embeddings are a key component of most music recommendation engines. In this work, we study the hyper-parameter optimization of behavioral song embeddings based on Word2Vec on a selection of downstream tasks, namely next-song recommendation, false neighbor rejection, and artist and genre clustering. We present new optimization objectives and metrics to monitor the effects of hyper-parameter optimization. We show that single-objective optimization can cause side effects on the non optimized metrics and propose a simple multi-objective optimization to mitigate these effects. We find that next-song recommendation quality of Word2Vec is anti-correlated with song popularity, and we show how song embedding optimization can balance performance across different popularity levels. We then show potential positive downstream effects on the task of play prediction. Finally, we provide useful insights on the effects of training dataset scale by testing hyper-parameter optimization on an industry-scale dataset.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"384 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128242177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信