Audio Source Separation using Wave-U-Net with Spectral Loss

Varun Patkar, Tanish Parmar, Parth Narvekar, Vedant Pawar, Joanne Gomes
{"title":"Audio Source Separation using Wave-U-Net with Spectral Loss","authors":"Varun Patkar, Tanish Parmar, Parth Narvekar, Vedant Pawar, Joanne Gomes","doi":"10.1109/CSCITA55725.2023.10104853","DOIUrl":null,"url":null,"abstract":"Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.","PeriodicalId":224479,"journal":{"name":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCITA55725.2023.10104853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.
带频谱损失的Wave-U-Net音频源分离
现有的音频源分离模型通常使用幅度谱,而忽略了相位信息,由于采样率高,导致了长期的时间相关性。音频源分离一直是一个问题,只有少数解决方案被提出。本研究提出了一种具有频谱损失函数的Wave-U-Net架构,该架构将输入音频与人声一起分离成多个不同乐器声音的音频文件。由于缺乏对特定仪器的训练以及使用均方误差(MSE)作为评估参数,现有的具有均方误差(MSE)损失函数的Wave-U-Net架构提供的结果质量很差。在讨论损失函数时,移位不变性是应该考虑的一个重要方面。本研究工作利用频谱损失函数与Wave-U-Net架构相协调,即使两个音频源是异步的,也能自动同步相位。谱损失函数解决了平移不变性问题。使用MUSDB18数据集对所提出的模型进行训练,并使用信号失真比(SDR)等评价指标对结果进行比较。在成功实现了带谱损失函数的Wave-U-Net体系结构后,系统的精度得到了显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信