{"title":"音乐拼接:使用NMF2D源分离音频拼接","authors":"H. F. Aarabi, G. Peeters","doi":"10.1145/3243274.3243299","DOIUrl":null,"url":null,"abstract":"Musaicing (music mosaicing) aims at reconstructing a target music track by superimposing audio samples selected from a collection. This selection is based on their acoustic similarity to the target. The baseline technique to perform this is concatenative synthesis in which the superposition only occurs in time. Non-Negative Matrix Factorization has also been proposed for this task. In this, a target spectrogram is factorized into an activation matrix and a predefined basis matrix which represents the sample collection. The superposition therefore occurs in time and frequency. However, in both methods the samples used for the reconstruction represent isolated sources (such as bees) and remain unchanged during the musaicing (samples need to be pre-pitch-shifted). This reduces the applicability of these methods. We propose here a variation of the musaicing in which the samples used for the reconstruction are obtained by applying a NMF2D separation algorithm to a music collection (such as a collection of Reggae tracks). Using these separated samples, a second NMF2D algorithm is then used to automatically find the best transposition factors to represent the target. We performed an online perceptual experiment of our method which shows that it outperforms the NMF algorithm when the sources are polyphonic and multi-source.","PeriodicalId":129628,"journal":{"name":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Music retiler: Using NMF2D source separation for audio mosaicing\",\"authors\":\"H. F. Aarabi, G. Peeters\",\"doi\":\"10.1145/3243274.3243299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Musaicing (music mosaicing) aims at reconstructing a target music track by superimposing audio samples selected from a collection. This selection is based on their acoustic similarity to the target. The baseline technique to perform this is concatenative synthesis in which the superposition only occurs in time. Non-Negative Matrix Factorization has also been proposed for this task. In this, a target spectrogram is factorized into an activation matrix and a predefined basis matrix which represents the sample collection. The superposition therefore occurs in time and frequency. However, in both methods the samples used for the reconstruction represent isolated sources (such as bees) and remain unchanged during the musaicing (samples need to be pre-pitch-shifted). This reduces the applicability of these methods. We propose here a variation of the musaicing in which the samples used for the reconstruction are obtained by applying a NMF2D separation algorithm to a music collection (such as a collection of Reggae tracks). Using these separated samples, a second NMF2D algorithm is then used to automatically find the best transposition factors to represent the target. We performed an online perceptual experiment of our method which shows that it outperforms the NMF algorithm when the sources are polyphonic and multi-source.\",\"PeriodicalId\":129628,\"journal\":{\"name\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3243274.3243299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3243274.3243299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Music retiler: Using NMF2D source separation for audio mosaicing
Musaicing (music mosaicing) aims at reconstructing a target music track by superimposing audio samples selected from a collection. This selection is based on their acoustic similarity to the target. The baseline technique to perform this is concatenative synthesis in which the superposition only occurs in time. Non-Negative Matrix Factorization has also been proposed for this task. In this, a target spectrogram is factorized into an activation matrix and a predefined basis matrix which represents the sample collection. The superposition therefore occurs in time and frequency. However, in both methods the samples used for the reconstruction represent isolated sources (such as bees) and remain unchanged during the musaicing (samples need to be pre-pitch-shifted). This reduces the applicability of these methods. We propose here a variation of the musaicing in which the samples used for the reconstruction are obtained by applying a NMF2D separation algorithm to a music collection (such as a collection of Reggae tracks). Using these separated samples, a second NMF2D algorithm is then used to automatically find the best transposition factors to represent the target. We performed an online perceptual experiment of our method which shows that it outperforms the NMF algorithm when the sources are polyphonic and multi-source.