C. Wutiwiwatchai, V. Achariyakulporn, C. Tanprasert
{"title":"Text-dependent speaker identification using LPC and DTW for Thai language","authors":"C. Wutiwiwatchai, V. Achariyakulporn, C. Tanprasert","doi":"10.1109/TENCON.1999.818504","DOIUrl":null,"url":null,"abstract":"This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted and formed as feature vectors represented each speech signal. Dynamic time warping (DTW) is used to measure distances between referenced and evaluated vectors. These distances, indicating nearness of unknown vectors to references, incorporated with the K-nearest neighbor (KNN) decision technique are used to decide who possesses those unknown vectors. The experimental results have shown that the best identification rate for a single digit is 95.83% and the highest rate for concatenated digits of top-3, top-5, and top-7 are 98.75%, 100%, and 99.20%, respectively.","PeriodicalId":121142,"journal":{"name":"Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.1999.818504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted and formed as feature vectors represented each speech signal. Dynamic time warping (DTW) is used to measure distances between referenced and evaluated vectors. These distances, indicating nearness of unknown vectors to references, incorporated with the K-nearest neighbor (KNN) decision technique are used to decide who possesses those unknown vectors. The experimental results have shown that the best identification rate for a single digit is 95.83% and the highest rate for concatenated digits of top-3, top-5, and top-7 are 98.75%, 100%, and 99.20%, respectively.