{"title":"Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction","authors":"Matt McVicar, Raúl Santos-Rodríguez, T. D. Bie","doi":"10.1109/ICASSP.2016.7471715","DOIUrl":null,"url":null,"abstract":"Separating the singing from a polyphonic mixed audio signal is a challenging but important task, with a wide range of applications across the music industry and music informatics research. Various methods have been devised over the years, ranging from Deep Learning approaches to dedicated ad hoc solutions. In this paper, we present a novel machine learning method for the task, using a Conditional Random Field (CRF) approach for structured output prediction. We exploit the diversity of previously proposed approaches by using their predictions as input features to our method - thus effectively developing an ensemble method. Our empirical results demonstrate the potential of integrating predictions from different previously-proposed methods into one ensemble method, and additionally show that CRF models with larger complexities generally lead to superior performance.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7471715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Separating the singing from a polyphonic mixed audio signal is a challenging but important task, with a wide range of applications across the music industry and music informatics research. Various methods have been devised over the years, ranging from Deep Learning approaches to dedicated ad hoc solutions. In this paper, we present a novel machine learning method for the task, using a Conditional Random Field (CRF) approach for structured output prediction. We exploit the diversity of previously proposed approaches by using their predictions as input features to our method - thus effectively developing an ensemble method. Our empirical results demonstrate the potential of integrating predictions from different previously-proposed methods into one ensemble method, and additionally show that CRF models with larger complexities generally lead to superior performance.