{"title":"人机混合语音中说话人数量的检测","authors":"T. Maka, Miroslaw Lazoryszczak","doi":"10.23919/SPA.2018.8563405","DOIUrl":null,"url":null,"abstract":"The problem of sound sources estimation and its properties in acoustic scene plays important role in many voice-based interaction systems. The interference between sources can deteriorate system performance meaningfully. The paper presents a comparison results of objective and subjective methods applied to the process of identification the number of speakers in speech mixtures. The audio data set used for computational and subjective tests consists of a number of utterances spoken by from two up to seven simultaneous speakers. In order to determine the number of speakers, two approaches are applied to speech mixtures: first uses spectrogram factorization with NMF (non-negative matrix factorization) algorithm, the other is based on the perceptual evaluation by the group of listeners. Both techniques are compared in terms of classification accuracy.","PeriodicalId":265587,"journal":{"name":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Detecting the Number of Speakers in Speech Mixtures by Human and Machine\",\"authors\":\"T. Maka, Miroslaw Lazoryszczak\",\"doi\":\"10.23919/SPA.2018.8563405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of sound sources estimation and its properties in acoustic scene plays important role in many voice-based interaction systems. The interference between sources can deteriorate system performance meaningfully. The paper presents a comparison results of objective and subjective methods applied to the process of identification the number of speakers in speech mixtures. The audio data set used for computational and subjective tests consists of a number of utterances spoken by from two up to seven simultaneous speakers. In order to determine the number of speakers, two approaches are applied to speech mixtures: first uses spectrogram factorization with NMF (non-negative matrix factorization) algorithm, the other is based on the perceptual evaluation by the group of listeners. Both techniques are compared in terms of classification accuracy.\",\"PeriodicalId\":265587,\"journal\":{\"name\":\"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/SPA.2018.8563405\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SPA.2018.8563405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting the Number of Speakers in Speech Mixtures by Human and Machine
The problem of sound sources estimation and its properties in acoustic scene plays important role in many voice-based interaction systems. The interference between sources can deteriorate system performance meaningfully. The paper presents a comparison results of objective and subjective methods applied to the process of identification the number of speakers in speech mixtures. The audio data set used for computational and subjective tests consists of a number of utterances spoken by from two up to seven simultaneous speakers. In order to determine the number of speakers, two approaches are applied to speech mixtures: first uses spectrogram factorization with NMF (non-negative matrix factorization) algorithm, the other is based on the perceptual evaluation by the group of listeners. Both techniques are compared in terms of classification accuracy.