{"title":"Detecting the Number of Speakers in Speech Mixtures by Human and Machine","authors":"T. Maka, Miroslaw Lazoryszczak","doi":"10.23919/SPA.2018.8563405","DOIUrl":null,"url":null,"abstract":"The problem of sound sources estimation and its properties in acoustic scene plays important role in many voice-based interaction systems. The interference between sources can deteriorate system performance meaningfully. The paper presents a comparison results of objective and subjective methods applied to the process of identification the number of speakers in speech mixtures. The audio data set used for computational and subjective tests consists of a number of utterances spoken by from two up to seven simultaneous speakers. In order to determine the number of speakers, two approaches are applied to speech mixtures: first uses spectrogram factorization with NMF (non-negative matrix factorization) algorithm, the other is based on the perceptual evaluation by the group of listeners. Both techniques are compared in terms of classification accuracy.","PeriodicalId":265587,"journal":{"name":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SPA.2018.8563405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The problem of sound sources estimation and its properties in acoustic scene plays important role in many voice-based interaction systems. The interference between sources can deteriorate system performance meaningfully. The paper presents a comparison results of objective and subjective methods applied to the process of identification the number of speakers in speech mixtures. The audio data set used for computational and subjective tests consists of a number of utterances spoken by from two up to seven simultaneous speakers. In order to determine the number of speakers, two approaches are applied to speech mixtures: first uses spectrogram factorization with NMF (non-negative matrix factorization) algorithm, the other is based on the perceptual evaluation by the group of listeners. Both techniques are compared in terms of classification accuracy.