低维语音片段嵌入的比较:在说话人拨号化中的应用

2019 National Conference on Communications (NCC) Pub Date : 2019-02-01 DOI:10.1109/NCC.2019.8732210

Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan

{"title":"低维语音片段嵌入的比较:在说话人拨号化中的应用","authors":"Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan","doi":"10.1109/NCC.2019.8732210","DOIUrl":null,"url":null,"abstract":"Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"103 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of low-dimension speech segment embeddings: Application to speaker diarization\",\"authors\":\"Srikanth Raj Chetupalli, T. Sreenivas, Anand Gopalakrishnan\",\"doi\":\"10.1109/NCC.2019.8732210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.\",\"PeriodicalId\":6870,\"journal\":{\"name\":\"2019 National Conference on Communications (NCC)\",\"volume\":\"103 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2019.8732210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分词聚类是无监督说话人划分的关键步骤。传统的分段聚类方法采用自底向上的方法，如分层聚类技术。本文采用自顶向下的聚类方法，首先获得说话人敏感的低维片段(说话人空间)表示，然后基于高斯混合模型(GMM)进行聚类。我们探索了三种获得低维段表示的方法:(i)基于段到段随机距离的多维缩放(MDS);(ii) GMM均值超向量的传统主成分分析(PCA)和(iii)因子分析(i-vectors)。我们发现，与PCA甚至i向量嵌入相比，基于MDS的嵌入具有更好的表示效果，因此具有更好的diarization性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of low-dimension speech segment embeddings: Application to speaker diarization

Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 National Conference on Communications (NCC)

自引率

0.00%

发文量