{"title":"Spoken language clustering in the i-vectors space","authors":"Stanisław Kacprzak","doi":"10.1109/IWSSIP.2017.7965607","DOIUrl":null,"url":null,"abstract":"This paper presents the results of language clustering in the i-vectors space, a method to determine in an unsupervised manner how many languages are in a data set and which recordings contain the same language. The most dense i-vectors clusters are found using the DBSCAN algorithm in a low dimensional space obtained by the t-SNE method. Quality of clustering for spherical k-means and the proposed method are tested with the data from NIST 2015 i-Vector Challenge. Usefulness of obtained clustering is tested in the challenge evaluation system. The results demonstrate that the proposed method allows to find 109 dense clusters with low impurity for 50 target languages.","PeriodicalId":302860,"journal":{"name":"2017 International Conference on Systems, Signals and Image Processing (IWSSIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Systems, Signals and Image Processing (IWSSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSSIP.2017.7965607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents the results of language clustering in the i-vectors space, a method to determine in an unsupervised manner how many languages are in a data set and which recordings contain the same language. The most dense i-vectors clusters are found using the DBSCAN algorithm in a low dimensional space obtained by the t-SNE method. Quality of clustering for spherical k-means and the proposed method are tested with the data from NIST 2015 i-Vector Challenge. Usefulness of obtained clustering is tested in the challenge evaluation system. The results demonstrate that the proposed method allows to find 109 dense clusters with low impurity for 50 target languages.