{"title":"Deconstruct, Analyse, Reconstruct: How to improve Tempo, Beat, and Downbeat Estimation","authors":"Sebastian Böck, M. Davies","doi":"10.5281/ZENODO.4245498","DOIUrl":"https://doi.org/10.5281/ZENODO.4245498","url":null,"abstract":"In this paper, we undertake a critical assessment of a state-of-the-art deep neural network approach for computational rhythm analysis. Our methodology is to deconstruct this approach, analyse its constituent parts, and then reconstruct it. To this end, we devise a novel multi-task approach for the simultaneous estimation of tempo, beat, and downbeat. In particular, we seek to embed more explicit musical knowledge into the design decisions in building the network. We additionally reflect this outlook when training the network, and include a simple data augmentation strategy to increase the network's exposure to a wider range of tempi, and hence beat and downbeat information. Via an in-depth comparative evaluation, we present state-of-the-art results over all three tasks, with performance increases of up to 6% points over existing systems.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127674713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guillaume Doras, Furkan Yesiler, Joan Serrà, E. Gómez, G. Peeters
{"title":"Combining musical features for cover detection","authors":"Guillaume Doras, Furkan Yesiler, Joan Serrà, E. Gómez, G. Peeters","doi":"10.5281/ZENODO.4245424","DOIUrl":"https://doi.org/10.5281/ZENODO.4245424","url":null,"abstract":"Recent works have addressed the automatic cover detection problem from a metric learning perspective. They employ different input representations, aiming to exploit melodic or harmonic characteristics of songs and yield promising performances. In this work, we propose a comparative study of these different representations and show that systems combining melodic and harmonic features drastically outperform those relying on a single input representation. We illustrate how these features complement each other with both quantitative and qualitative analyses. We finally investigate various fusion schemes and propose methods yielding state-of-the-art performances on two publicly-available large datasets.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131248254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Savard, Erin H. Bugbee, Melissa R. McGuirl, Katherine M. Kinnaird
{"title":"SuPP & MaPP: Adaptable Structure-Based Representations for MIR Tasks","authors":"C. Savard, Erin H. Bugbee, Melissa R. McGuirl, Katherine M. Kinnaird","doi":"10.5281/ZENODO.4245438","DOIUrl":"https://doi.org/10.5281/ZENODO.4245438","url":null,"abstract":"Accurate and flexible representations of music data are paramount to addressing MIR tasks, yet many of the existing approaches are difficult to interpret or rigid in nature. This work introduces two new song representations for structure-based retrieval methods: Surface Pattern Preservation (SuPP), a continuous song representation, and Matrix Pattern Preservation (MaPP), SuPP’s discrete counterpart. These representations come equipped with several user-defined parameters so that they are adaptable for a range of MIR tasks. Experimental results show MaPP as successful in addressing the cover song task on a set of Mazurka scores, with a mean precision of 0.965 and recall of 0.776. SuPP and MaPP also show promise in other MIR applications, such as novel-segment detection and genre classification, the latter of which demonstrates their suitability as inputs for machine learning problems.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115844261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bistate reduction and comparison of drum patterns","authors":"O. Lartillot, Fred Bruford","doi":"10.5281/ZENODO.4245432","DOIUrl":"https://doi.org/10.5281/ZENODO.4245432","url":null,"abstract":"This paper develops the hypothesis that symbolic drum patterns can be represented in a reduced form as a simple oscillation between two states, a Low state (commonly associated with kick drum events) and a High state (often associated with either snare drum or high hat). Both an onset time and an accent time is associated to each state. The systematic inference of the reduced form is formal-ized. This enables the specification of a rhythmic struc-tural similarity measure on drum patterns, where reduced patterns are compared through alignment. The two-state representation allows a low computational cost alignment, once the complex topological formalization is fully taken into account. A comparison with the Hamming distance, as well as similarity ratings collected from listeners on a drum loop dataset, indicates that the bistate reduction enables to convey subtle aspects that goes beyond surface-level comparison of rhythmic textures.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131555318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Thalmann, Kazuyoshi Yoshii, Thomas Wilmering, Geraint A. Wiggins, M. Sandler
{"title":"A Method for Analysis of Shared Structure in Large Music Collections using Techniques from Genetic Sequencing and Graph Theory","authors":"F. Thalmann, Kazuyoshi Yoshii, Thomas Wilmering, Geraint A. Wiggins, M. Sandler","doi":"10.5281/ZENODO.4245440","DOIUrl":"https://doi.org/10.5281/ZENODO.4245440","url":null,"abstract":"While common approaches to automatic structural analysis of music typically focus on individual audio files, our approach collates audio features of large sets of related files in order to find a shared musical temporal structure. The content of each individual file and the differences between them can then be described in relation to this shared structure. We first construct a large similarity graph of temporal segments, such as beats or bars, based on self-alignments and selected pair-wise alignments between the given input files. Part of this graph is then partitioned into groups of corresponding segments using multiple sequence alignment. This partitioned graph is searched for recurring sections which can be organized hierarchically based on their co-occurrence. We apply our approach to discover shared harmonic structure in a dataset containing a large number of different live performances of a number of songs. Our evaluation shows that using the joint information from a number of files has the advantage of evening out the noisiness or inaccuracy of the underlying feature data and leads to a robust estimate of shared musical material.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127663662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Sebastián Gómez Cañón, Estefanía Cano, P. Herrera, E. Gómez
{"title":"Joyful for you and tender for us: the influence of individual characteristics and language on emotion labeling and classification","authors":"Juan Sebastián Gómez Cañón, Estefanía Cano, P. Herrera, E. Gómez","doi":"10.5281/ZENODO.4245568","DOIUrl":"https://doi.org/10.5281/ZENODO.4245568","url":null,"abstract":"Tagging a musical excerpt with an emotion label may result in a vague and ambivalent exercise. This subjectivity entangles several high-level music description tasks when the computational models built to address them produce predictions on the basis of a \"ground truth\". In this study, we investigate the relationship between emotions perceived in pop and rock music (mainly in Euro-American styles) and personal characteristics from the listener, using language as a key feature. Our goal is to understand the influence of lyrics comprehension on music emotion perception and use this knowledge to improve Music Emotion Recognition (MER) models. We systematically analyze over 30K annotations of 22 musical fragments to assess the impact of individual differences on agreement, as defined by Krippendorff's coefficient. We employ personal characteristics to form group-based annotations by assembling ratings with respect to listeners' familiarity, preference, lyrics comprehension, and music sophistication. Finally, we study our group-based annotations in a two-fold approach: (1) assessing the similarity within annotations using manifold learning algorithms and unsupervised clustering, and (2) analyzing their performance by training classification models with diverse \"ground truths\". Our results suggest that a) applying a broader categorization of taxonomies and b) using multi-label, group-based annotations based on language, can be beneficial for MER models.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126113213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Smith, E. Truesdell, Jason Freeman, Brian Magerko, K. Boyer, Tom McKlin
{"title":"Modeling Music and Code Knowledge to Support a Co-creative AI Agent for Education","authors":"J. Smith, E. Truesdell, Jason Freeman, Brian Magerko, K. Boyer, Tom McKlin","doi":"10.5281/ZENODO.4245386","DOIUrl":"https://doi.org/10.5281/ZENODO.4245386","url":null,"abstract":"EarSketch is an online environment for learning intro-ductory computing concepts through code-driven, sample-based music production. This paper details the design and implementation of a module to perform code and music analyses on projects on the EarSketch platform. This analysis module combines inputs in the form of symbolic metadata, audio feature analysis, and user code to produce com-prehensive models of user projects. The module performs a detailed analysis of the abstract syntax tree of a user’s code to model use of computational concepts. It uses music information retrieval (MIR) and symbolic metadata to analyze users’ musical design choices. These analyses produce a model containing users’ coding and musical deci-sions, as well as qualities of the algorithmic music created by those decisions. The models produced by this module will support future development of CAI, a Co-creative Artificial Intelligence. CAI is designed to collaborate with learners and promote increased competency and engagement with topics in the EarSketch curriculum. Our module combines code analysis and MIR to further the educational goals of CAI and EarSketch and to explore the application of multimodal analysis tools to education.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129921562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Less is more: Faster and better music version identification with embedding distillation","authors":"Furkan Yesiler, J. Serrà, E. Gómez","doi":"10.5281/ZENODO.4245570","DOIUrl":"https://doi.org/10.5281/ZENODO.4245570","url":null,"abstract":"Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further narrow this gap by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model. We compare a wide range of techniques and propose new ones, from classical dimensionality reduction to more sophisticated distillation schemes. With those, we obtain 99% smaller embeddings that, moreover, yield up to a 3% accuracy increase. Such small embeddings can have an important impact in retrieval time, up to the point of making a real-world system practical on a standalone laptop.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121757269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model","authors":"Taegyun Kwon, Dasaem Jeong, Juhan Nam","doi":"10.5281/ZENODO.4245466","DOIUrl":"https://doi.org/10.5281/ZENODO.4245466","url":null,"abstract":"Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116289001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction","authors":"Andrew Mcleod, James Owers, Kazuyoshi Yoshii","doi":"10.5281/ZENODO.4245566","DOIUrl":"https://doi.org/10.5281/ZENODO.4245566","url":null,"abstract":"In this paper, we introduce the MIDI Degradation Toolkit (MDTK), containing functions which take as input a musical excerpt (a set of notes with pitch, onset time, and duration), and return a \"degraded\" version of that excerpt with some error (or errors) introduced. Using the toolkit, we create the Altered and Corrupted MIDI Excerpts dataset version 1.0 (ACME v1.0), and propose four tasks of increasing difficulty to detect, classify, locate, and correct the degradations. We hypothesize that models trained for these tasks can be useful in (for example) improving automatic music transcription performance if applied as a post-processing step. To that end, MDTK includes a script that measures the distribution of different types of errors in a transcription, and creates a degraded dataset with similar properties. MDTK's degradations can also be applied dynamically to a dataset during training (with or without the above script), generating novel degraded excerpts each epoch. MDTK could also be used to test the robustness of any system designed to take MIDI (or similar) data as input (e.g. systems designed for voice separation, metrical alignment, or chord detection) to such transcription errors or otherwise noisy data. The toolkit and dataset are both publicly available online, and we encourage contribution and feedback from the community.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}