Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan
{"title":"低维语音片段嵌入的比较:在说话人拨号化中的应用","authors":"Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan","doi":"10.1109/NCC.2019.8732210","DOIUrl":null,"url":null,"abstract":"Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"103 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of low-dimension speech segment embeddings: Application to speaker diarization\",\"authors\":\"Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan\",\"doi\":\"10.1109/NCC.2019.8732210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.\",\"PeriodicalId\":6870,\"journal\":{\"name\":\"2019 National Conference on Communications (NCC)\",\"volume\":\"103 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2019.8732210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of low-dimension speech segment embeddings: Application to speaker diarization
Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.