Jaime Garcia-Martinez, David Diaz-Guerra, Archontis Politis, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas
{"title":"SynthSOD:为管弦乐队音乐源分离开发异构数据集","authors":"Jaime Garcia-Martinez, David Diaz-Guerra, Archontis Politis, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas","doi":"arxiv-2409.10995","DOIUrl":null,"url":null,"abstract":"Recent advancements in music source separation have significantly progressed,\nparticularly in isolating vocals, drums, and bass elements from mixed tracks.\nThese developments owe much to the creation and use of large-scale, multitrack\ndatasets dedicated to these specific components. However, the challenge of\nextracting similarly sounding sources from orchestra recordings has not been\nextensively explored, largely due to a scarcity of comprehensive and clean (i.e\nbleed-free) multitrack datasets. In this paper, we introduce a novel multitrack\ndataset called SynthSOD, developed using a set of simulation techniques to\ncreate a realistic (i.e. using high-quality soundfonts), musically motivated,\nand heterogeneous training set comprising different dynamics, natural tempo\nchanges, styles, and conditions. Moreover, we demonstrate the application of a\nwidely used baseline music separation model trained on our synthesized dataset\nw.r.t to the well-known EnsembleSet, and evaluate its performance under both\nsynthetic and real-world conditions.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation\",\"authors\":\"Jaime Garcia-Martinez, David Diaz-Guerra, Archontis Politis, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas\",\"doi\":\"arxiv-2409.10995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in music source separation have significantly progressed,\\nparticularly in isolating vocals, drums, and bass elements from mixed tracks.\\nThese developments owe much to the creation and use of large-scale, multitrack\\ndatasets dedicated to these specific components. However, the challenge of\\nextracting similarly sounding sources from orchestra recordings has not been\\nextensively explored, largely due to a scarcity of comprehensive and clean (i.e\\nbleed-free) multitrack datasets. In this paper, we introduce a novel multitrack\\ndataset called SynthSOD, developed using a set of simulation techniques to\\ncreate a realistic (i.e. using high-quality soundfonts), musically motivated,\\nand heterogeneous training set comprising different dynamics, natural tempo\\nchanges, styles, and conditions. Moreover, we demonstrate the application of a\\nwidely used baseline music separation model trained on our synthesized dataset\\nw.r.t to the well-known EnsembleSet, and evaluate its performance under both\\nsynthetic and real-world conditions.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10995\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation
Recent advancements in music source separation have significantly progressed,
particularly in isolating vocals, drums, and bass elements from mixed tracks.
These developments owe much to the creation and use of large-scale, multitrack
datasets dedicated to these specific components. However, the challenge of
extracting similarly sounding sources from orchestra recordings has not been
extensively explored, largely due to a scarcity of comprehensive and clean (i.e
bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack
dataset called SynthSOD, developed using a set of simulation techniques to
create a realistic (i.e. using high-quality soundfonts), musically motivated,
and heterogeneous training set comprising different dynamics, natural tempo
changes, styles, and conditions. Moreover, we demonstrate the application of a
widely used baseline music separation model trained on our synthesized dataset
w.r.t to the well-known EnsembleSet, and evaluate its performance under both
synthetic and real-world conditions.