{"title":"An extension to Fisher Linear Semi-Discriminant analysis for Speaker Diarization","authors":"S. Montazzolli, Andre Gustavo Adami, D. Barone","doi":"10.1109/ITS.2014.6947969","DOIUrl":null,"url":null,"abstract":"The Fisher Linear Semi-Discriminant Analysis is used in Speaker Diarization to project acoustic features into a discriminant and lower dimensional space. Given that such analysis uses short segments to estimate the scatter matrices, the projection could be improved by using longer segments (i.e., more information). Since a change of speaker is more likely to occur during periods of non-speech, we propose to use segments of speech produced by the boundaries estimated from a voice activity detection method based on Hidden Markov Models. Using datasets from the NIST Speaker Recognition Evaluations, we show that the estimated segments provide a better scatter matrices for the analysis. The results show a relative improvement of 21% in the Speaker Error Time on the Switchboard corpus used in the evaluations.","PeriodicalId":359348,"journal":{"name":"2014 International Telecommunications Symposium (ITS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Telecommunications Symposium (ITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITS.2014.6947969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Fisher Linear Semi-Discriminant Analysis is used in Speaker Diarization to project acoustic features into a discriminant and lower dimensional space. Given that such analysis uses short segments to estimate the scatter matrices, the projection could be improved by using longer segments (i.e., more information). Since a change of speaker is more likely to occur during periods of non-speech, we propose to use segments of speech produced by the boundaries estimated from a voice activity detection method based on Hidden Markov Models. Using datasets from the NIST Speaker Recognition Evaluations, we show that the estimated segments provide a better scatter matrices for the analysis. The results show a relative improvement of 21% in the Speaker Error Time on the Switchboard corpus used in the evaluations.