International Society for Music Information Retrieval Conference最新文献

筛选
英文 中文
A Model You Can Hear: Audio Identification with Playable Prototypes 你能听到的模型:基于可玩原型的音频识别
International Society for Music Information Retrieval Conference Pub Date : 2022-08-05 DOI: 10.48550/arXiv.2208.03311
Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loïc Landrieu
{"title":"A Model You Can Hear: Audio Identification with Playable Prototypes","authors":"Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loïc Landrieu","doi":"10.48550/arXiv.2208.03311","DOIUrl":"https://doi.org/10.48550/arXiv.2208.03311","url":null,"abstract":"Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124077463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SampleMatch: Drum Sample Retrieval by Musical Context SampleMatch:鼓样本检索的音乐背景
International Society for Music Information Retrieval Conference Pub Date : 2022-08-01 DOI: 10.48550/arXiv.2208.01141
S. Lattner
{"title":"SampleMatch: Drum Sample Retrieval by Musical Context","authors":"S. Lattner","doi":"10.48550/arXiv.2208.01141","DOIUrl":"https://doi.org/10.48550/arXiv.2208.01141","url":null,"abstract":"Modern digital music production typically involves combining numerous acoustic elements to compile a piece of music. Important types of such elements are drum samples, which determine the characteristics of the percussive components of the piece. Artists must use their aesthetic judgement to assess whether a given drum sample fits the current musical context. However, selecting drum samples from a potentially large library is tedious and may interrupt the creative flow. In this work, we explore the automatic drum sample retrieval based on aesthetic principles learned from data. As a result, artists can rank the samples in their library by fit to some musical context at different stages of the production process (i.e., by fit to incomplete song mixtures). To this end, we use contrastive learning to maximize the score of drum samples originating from the same song as the mixture. We conduct a listening test to determine whether the human ratings match the automatic scoring function. We also perform objective quantitative analyses to evaluate the efficacy of our approach.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114518067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Unsupervised Hierarchies of Audio Concepts 学习音频概念的无监督层次
International Society for Music Information Retrieval Conference Pub Date : 2022-07-21 DOI: 10.48550/arXiv.2207.11231
Darius Afchar, Romain Hennequin, Vincent Guigue
{"title":"Learning Unsupervised Hierarchies of Audio Concepts","authors":"Darius Afchar, Romain Hennequin, Vincent Guigue","doi":"10.48550/arXiv.2207.11231","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11231","url":null,"abstract":"Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121455636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-instrument Music Synthesis with Spectrogram Diffusion 多乐器音乐合成与谱图扩散
International Society for Music Information Retrieval Conference Pub Date : 2022-06-11 DOI: 10.48550/arXiv.2206.05408
Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel
{"title":"Multi-instrument Music Synthesis with Spectrogram Diffusion","authors":"Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel","doi":"10.48550/arXiv.2206.05408","DOIUrl":"https://doi.org/10.48550/arXiv.2206.05408","url":null,"abstract":"An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fr'echet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130646273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
SinTra: Learning an inspiration model from a single multi-track music segment SinTra:从单个多轨音乐片段中学习灵感模型
International Society for Music Information Retrieval Conference Pub Date : 2022-04-21 DOI: 10.48550/arXiv.2204.09917
Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng
{"title":"SinTra: Learning an inspiration model from a single multi-track music segment","authors":"Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng","doi":"10.48550/arXiv.2204.09917","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09917","url":null,"abstract":"In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results show that our framework can learn information from a single music segment more sufficiently than Music Transformer. Also the comparison between SinTra and its variant, i.e., the single-stage SinTra with the first stage only, shows that the pyramid structure can effectively suppress overly-fragmented notes.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130991350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Does Track Sequence in User-generated Playlists Matter? 轨道序列在用户生成的播放列表重要吗?
International Society for Music Information Retrieval Conference Pub Date : 2021-11-08 DOI: 10.5072/ZENODO.940616
Harald Schweiger, Emilia Parada-Cabaleiro, M. Schedl
{"title":"Does Track Sequence in User-generated Playlists Matter?","authors":"Harald Schweiger, Emilia Parada-Cabaleiro, M. Schedl","doi":"10.5072/ZENODO.940616","DOIUrl":"https://doi.org/10.5072/ZENODO.940616","url":null,"abstract":"The extent to which the sequence of tracks in music playlists matters to listeners is a disputed question, nevertheless a very important one for tasks such as music recommendation (e. g., automatic playlist generation or continuation). While several user studies already approached this question, results are largely inconsistent. In contrast, in this paper we take a data-driven approach and investigate 704,166 user-generated playlists of a major music streaming provider. In particular, we study the consistency (in terms of variance) of a variety of audio features and metadata between subsequent tracks in playlists, and we relate this variance to the corresponding variance computed on a position-independent set of tracks. Our results show that some features vary on average up to 16% less among subsequent tracks in comparison to position-independent pairs of tracks. Furthermore, we show that even pairs of tracks that lie up to 11 positions apart in the playlist are significantly more consistent in several audio features and genres. Our findings yield a better understanding of how users create playlists and will stimulate further progress in sequential music recommenders.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"55 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130839340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition 让我们各持己见:个性化音乐情感识别的共识熵主动学习
International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624399
Juan Sebastián Gómez Cañón, Estefanía Cano, Yi-Hsuan Yang, P. Herrera, E. Gómez
{"title":"Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition","authors":"Juan Sebastián Gómez Cañón, Estefanía Cano, Yi-Hsuan Yang, P. Herrera, E. Gómez","doi":"10.5281/ZENODO.5624399","DOIUrl":"https://doi.org/10.5281/ZENODO.5624399","url":null,"abstract":"Previous research in music emotion recognition (MER) has tackled the inherent problem of subjectivity through the use of personalized models – models which predict the emotions that a particular user would perceive from music. Personalized models are trained in a supervised manner, and are tested exclusively with the annotations provided by a specific user. While past research has focused on model adaptation or reducing the amount of annotations required from a given user, we propose a methodology based on uncertainty sampling and query-by-committee, adopting prior knowledge from the agreement of human annotations as an oracle for active learning (AL). We assume that our disagreements define our personal opinions and should be considered for personalization. We use the DEAM dataset, the current benchmark dataset for MER, to pre-train our models. We then use the AMG1608 dataset, the largest MER dataset containing multiple annotations per musical excerpt, to re-train diverse machine learning models using AL and evaluate personalization. Our results suggest that our methodology can be beneficial to produce personalized classification models that exhibit different results depending on the algorithms’ complexity.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130693171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Piano Sheet Music Identification Using Marketplace Fingerprinting 利用市场指纹识别钢琴乐谱
International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624375
Kevin Ji, Daniel Yang, T. Tsai
{"title":"Piano Sheet Music Identification Using Marketplace Fingerprinting","authors":"Kevin Ji, Daniel Yang, T. Tsai","doi":"10.5281/ZENODO.5624375","DOIUrl":"https://doi.org/10.5281/ZENODO.5624375","url":null,"abstract":"This paper studies the problem of identifying piano sheet music based on a cell phone image of all or part of a physical page. We re-examine current best practices for large-scale sheet music retrieval through an economics perspective. In our analogy, the runtime search is like a consumer shopping in a store. The items on the shelves correspond to fingerprints, and purchasing an item corresponds to doing a fingerprint lookup in the database. From this perspective, we show that previous approaches are extremely inefficient marketplaces in which the consumer has very few choices and adopts an irrational buying strategy. The main contribution of this work is to propose a novel fingerprinting scheme called marketplace fingerprinting. This approach redesigns the system to be an efficient marketplace in which the consumer has many options and adopts a rational buying strategy that explicitly considers the cost and expected utility of each item. We also show that de-ciding which fingerprints to include in the database poses a type of minimax problem in which the store and the consumer have competing interests. On experiments using all solo piano sheet music images in IMSLP as a searchable database, we show that marketplace fingerprinting substantially outperforms previous approaches and achieves a mean reciprocal rank of 0 . 905 with sub-second average runtime.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133598657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A case study of deep enculturation and sensorimotor synchronization to real music 深层文化与感觉运动同步对真实音乐的个案研究
International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624537
Olof Misgeld, Torbjörn Gulz, Jura Miniotaite, A. Holzapfel
{"title":"A case study of deep enculturation and sensorimotor synchronization to real music","authors":"Olof Misgeld, Torbjörn Gulz, Jura Miniotaite, A. Holzapfel","doi":"10.5281/ZENODO.5624537","DOIUrl":"https://doi.org/10.5281/ZENODO.5624537","url":null,"abstract":"Synchronization of movement to music is a behavioural capacity that separates humans from most other species. Whereas such movements have been studied using a wide range of methods, only few studies have investigated synchronisation to real music stimuli in a cross-culturally comparative setting. The present study employs beat tracking evaluation metrics and accent histograms to analyze the differences in the ways participants from two cultural groups synchronize their tapping with either familiar or unfamiliar music stimuli. Instead of choosing two apparently remote cultural groups, we selected two groups of musicians that share cultural backgrounds, but that differ regarding the music style they specialize in. The employed method to record tapping responses in audio format facilitates a fine-grained analysis of metrical accents that emerge from the responses. The identified differences between groups are related to the metrical structures inherent to the two musical styles, such as non-isochronicity of the beat, and differences between the groups document the influence of the deep enculturation of participants to their style of expertise. Besides these findings, our study sheds light on a conceptual weakness of a common beat tracking evaluation metric, when applied to human tapping instead of machine generated beat estimations.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123922555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Music Performance Markup Format and Ecosystem 音乐表演标记格式与生态系统
International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624429
Axel Berndt
{"title":"The Music Performance Markup Format and Ecosystem","authors":"Axel Berndt","doi":"10.5281/ZENODO.5624429","DOIUrl":"https://doi.org/10.5281/ZENODO.5624429","url":null,"abstract":"Music Performance Markup (MPM) is a new XML format that offers a model-based, systematic approach to describing and analysing musical performances. Its foundation is a set of mathematical models that capture the characteristics of performance features such as tempo, rubato, dynamics, articulations, and metrical accentuations. After a brief introduction to MPM, this paper will put the focus on the infrastructure of documentations, software tools and ongoing development activities around the format.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130941350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信