{"title":"Detection of Speech Overlapped with Low-Energy Music using Pyknograms","authors":"Mrinmoy Bhattacharjee, S. Prasanna, P. Guha","doi":"10.1109/NCC52529.2021.9530150","DOIUrl":null,"url":null,"abstract":"Detection of speech overlapped with music is a challenging task. This work deals with discriminating clean speech from speech overlapped with low-energy music. The overlapped signals are generated synthetically. An enhanced spectrogram representation called Pyknogram has been explored for the current task. Pyknograms have been previously used in overlapped speech detection. The classification is performed using a neural network that is designed with only convolutional layers. The performance of Pyknograms at various high SNR levels is compared with that of discrete fourier transform based spectrograms. The classification system is benchmarked on three publicly available datasets, viz., GTZAN, Scheirer-slaney and MUSAN. The Pyknogram representation with the fully convolutional classifier performs well, both individually and in combination with spectrograms.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC52529.2021.9530150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Detection of speech overlapped with music is a challenging task. This work deals with discriminating clean speech from speech overlapped with low-energy music. The overlapped signals are generated synthetically. An enhanced spectrogram representation called Pyknogram has been explored for the current task. Pyknograms have been previously used in overlapped speech detection. The classification is performed using a neural network that is designed with only convolutional layers. The performance of Pyknograms at various high SNR levels is compared with that of discrete fourier transform based spectrograms. The classification system is benchmarked on three publicly available datasets, viz., GTZAN, Scheirer-slaney and MUSAN. The Pyknogram representation with the fully convolutional classifier performs well, both individually and in combination with spectrograms.