International Society for Music Information Retrieval Conference最新文献_第6页

A Model You Can Hear: Audio Identification with Playable Prototypes 你能听到的模型:基于可玩原型的音频识别

International Society for Music Information Retrieval Conference Pub Date : 2022-08-05 DOI: 10.48550/arXiv.2208.03311

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loïc Landrieu

引用次数: 1

SampleMatch: Drum Sample Retrieval by Musical Context SampleMatch:鼓样本检索的音乐背景

International Society for Music Information Retrieval Conference Pub Date : 2022-08-01 DOI: 10.48550/arXiv.2208.01141

S. Lattner

引用次数: 0

Learning Unsupervised Hierarchies of Audio Concepts 学习音频概念的无监督层次

International Society for Music Information Retrieval Conference Pub Date : 2022-07-21 DOI: 10.48550/arXiv.2207.11231

Darius Afchar, Romain Hennequin, Vincent Guigue

{"title":"Learning Unsupervised Hierarchies of Audio Concepts","authors":"Darius Afchar, Romain Hennequin, Vincent Guigue","doi":"10.48550/arXiv.2207.11231","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11231","url":null,"abstract":"Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121455636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-instrument Music Synthesis with Spectrogram Diffusion 多乐器音乐合成与谱图扩散

International Society for Music Information Retrieval Conference Pub Date : 2022-06-11 DOI: 10.48550/arXiv.2206.05408

Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

{"title":"Multi-instrument Music Synthesis with Spectrogram Diffusion","authors":"Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel","doi":"10.48550/arXiv.2206.05408","DOIUrl":"https://doi.org/10.48550/arXiv.2206.05408","url":null,"abstract":"An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fr'echet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130646273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

SinTra: Learning an inspiration model from a single multi-track music segment SinTra:从单个多轨音乐片段中学习灵感模型

International Society for Music Information Retrieval Conference Pub Date : 2022-04-21 DOI: 10.48550/arXiv.2204.09917

Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng

{"title":"SinTra: Learning an inspiration model from a single multi-track music segment","authors":"Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng","doi":"10.48550/arXiv.2204.09917","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09917","url":null,"abstract":"In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results show that our framework can learn information from a single music segment more sufﬁciently than Music Transformer. Also the comparison between SinTra and its variant, i.e., the single-stage SinTra with the ﬁrst stage only, shows that the pyramid structure can effectively suppress overly-fragmented notes.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130991350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Does Track Sequence in User-generated Playlists Matter? 轨道序列在用户生成的播放列表重要吗?

International Society for Music Information Retrieval Conference Pub Date : 2021-11-08 DOI: 10.5072/ZENODO.940616

Harald Schweiger, Emilia Parada-Cabaleiro, M. Schedl

{"title":"Does Track Sequence in User-generated Playlists Matter?","authors":"Harald Schweiger, Emilia Parada-Cabaleiro, M. Schedl","doi":"10.5072/ZENODO.940616","DOIUrl":"https://doi.org/10.5072/ZENODO.940616","url":null,"abstract":"The extent to which the sequence of tracks in music playlists matters to listeners is a disputed question, nevertheless a very important one for tasks such as music recommendation (e. g., automatic playlist generation or continuation). While several user studies already approached this question, results are largely inconsistent. In contrast, in this paper we take a data-driven approach and investigate 704,166 user-generated playlists of a major music streaming provider. In particular, we study the consistency (in terms of variance) of a variety of audio features and metadata between subsequent tracks in playlists, and we relate this variance to the corresponding variance computed on a position-independent set of tracks. Our results show that some features vary on average up to 16% less among subsequent tracks in comparison to position-independent pairs of tracks. Furthermore, we show that even pairs of tracks that lie up to 11 positions apart in the playlist are significantly more consistent in several audio features and genres. Our findings yield a better understanding of how users create playlists and will stimulate further progress in sequential music recommenders.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"55 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130839340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition 让我们各持己见:个性化音乐情感识别的共识熵主动学习

International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624399

Juan Sebastián Gómez Cañón, Estefanía Cano, Yi-Hsuan Yang, P. Herrera, E. Gómez

{"title":"Let's agree to disagree: Consensus Entropy Active Learning for Personalized Music Emotion Recognition","authors":"Juan Sebastián Gómez Cañón, Estefanía Cano, Yi-Hsuan Yang, P. Herrera, E. Gómez","doi":"10.5281/ZENODO.5624399","DOIUrl":"https://doi.org/10.5281/ZENODO.5624399","url":null,"abstract":"Previous research in music emotion recognition (MER) has tackled the inherent problem of subjectivity through the use of personalized models – models which predict the emotions that a particular user would perceive from music. Personalized models are trained in a supervised manner, and are tested exclusively with the annotations provided by a specific user. While past research has focused on model adaptation or reducing the amount of annotations required from a given user, we propose a methodology based on uncertainty sampling and query-by-committee, adopting prior knowledge from the agreement of human annotations as an oracle for active learning (AL). We assume that our disagreements define our personal opinions and should be considered for personalization. We use the DEAM dataset, the current benchmark dataset for MER, to pre-train our models. We then use the AMG1608 dataset, the largest MER dataset containing multiple annotations per musical excerpt, to re-train diverse machine learning models using AL and evaluate personalization. Our results suggest that our methodology can be beneficial to produce personalized classification models that exhibit different results depending on the algorithms’ complexity.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130693171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Piano Sheet Music Identification Using Marketplace Fingerprinting 利用市场指纹识别钢琴乐谱

International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624375

Kevin Ji, Daniel Yang, T. Tsai

{"title":"Piano Sheet Music Identification Using Marketplace Fingerprinting","authors":"Kevin Ji, Daniel Yang, T. Tsai","doi":"10.5281/ZENODO.5624375","DOIUrl":"https://doi.org/10.5281/ZENODO.5624375","url":null,"abstract":"This paper studies the problem of identifying piano sheet music based on a cell phone image of all or part of a physical page. We re-examine current best practices for large-scale sheet music retrieval through an economics perspective. In our analogy, the runtime search is like a consumer shopping in a store. The items on the shelves correspond to ﬁngerprints, and purchasing an item corresponds to doing a ﬁngerprint lookup in the database. From this perspective, we show that previous approaches are extremely inefﬁcient marketplaces in which the consumer has very few choices and adopts an irrational buying strategy. The main contribution of this work is to propose a novel ﬁngerprinting scheme called marketplace ﬁngerprinting. This approach redesigns the system to be an efﬁcient marketplace in which the consumer has many options and adopts a rational buying strategy that explicitly considers the cost and expected utility of each item. We also show that de-ciding which ﬁngerprints to include in the database poses a type of minimax problem in which the store and the consumer have competing interests. On experiments using all solo piano sheet music images in IMSLP as a searchable database, we show that marketplace ﬁngerprinting substantially outperforms previous approaches and achieves a mean reciprocal rank of 0 . 905 with sub-second average runtime.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133598657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A case study of deep enculturation and sensorimotor synchronization to real music 深层文化与感觉运动同步对真实音乐的个案研究

International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624537

Olof Misgeld, Torbjörn Gulz, Jura Miniotaite, A. Holzapfel

{"title":"A case study of deep enculturation and sensorimotor synchronization to real music","authors":"Olof Misgeld, Torbjörn Gulz, Jura Miniotaite, A. Holzapfel","doi":"10.5281/ZENODO.5624537","DOIUrl":"https://doi.org/10.5281/ZENODO.5624537","url":null,"abstract":"Synchronization of movement to music is a behavioural capacity that separates humans from most other species. Whereas such movements have been studied using a wide range of methods, only few studies have investigated synchronisation to real music stimuli in a cross-culturally comparative setting. The present study employs beat tracking evaluation metrics and accent histograms to analyze the differences in the ways participants from two cultural groups synchronize their tapping with either familiar or unfamiliar music stimuli. Instead of choosing two apparently remote cultural groups, we selected two groups of musicians that share cultural backgrounds, but that differ regarding the music style they specialize in. The employed method to record tapping responses in audio format facilitates a fine-grained analysis of metrical accents that emerge from the responses. The identified differences between groups are related to the metrical structures inherent to the two musical styles, such as non-isochronicity of the beat, and differences between the groups document the influence of the deep enculturation of participants to their style of expertise. Besides these findings, our study sheds light on a conceptual weakness of a common beat tracking evaluation metric, when applied to human tapping instead of machine generated beat estimations.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123922555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Music Performance Markup Format and Ecosystem 音乐表演标记格式与生态系统

International Society for Music Information Retrieval Conference Pub Date : 2021-11-07 DOI: 10.5281/ZENODO.5624429

Axel Berndt

引用次数: 0