Piotr Żelasko, B. Ziółko, T. Jadczyk, Tomasz Pedzimaz
{"title":"用于波兰语语音识别的语言驱动的固定状态三合一耳机","authors":"Piotr Żelasko, B. Ziółko, T. Jadczyk, Tomasz Pedzimaz","doi":"10.1109/CYBConf.2015.7175941","DOIUrl":null,"url":null,"abstract":"The paper presents one of the possible approaches to build a triphone model for automatic speech recognition of Polish. Even though classifiers are well developed and described, such task is not a trivial one because of lack of enough training data and importance of calculation time spent for the training of the model. To overcome this problem, some states are typically tied using data-driven criteria. We investigate a linguistically motivated approach, where phonetically related contexts are tied. We compared recognition results of a system using this approach and of a system with no context tying on around 15 000 utterances. The results indicate a small improvement in the performance of the system.","PeriodicalId":177233,"journal":{"name":"2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linguistically motivated tied-state triphones for polish speech recognition\",\"authors\":\"Piotr Żelasko, B. Ziółko, T. Jadczyk, Tomasz Pedzimaz\",\"doi\":\"10.1109/CYBConf.2015.7175941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents one of the possible approaches to build a triphone model for automatic speech recognition of Polish. Even though classifiers are well developed and described, such task is not a trivial one because of lack of enough training data and importance of calculation time spent for the training of the model. To overcome this problem, some states are typically tied using data-driven criteria. We investigate a linguistically motivated approach, where phonetically related contexts are tied. We compared recognition results of a system using this approach and of a system with no context tying on around 15 000 utterances. The results indicate a small improvement in the performance of the system.\",\"PeriodicalId\":177233,\"journal\":{\"name\":\"2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBConf.2015.7175941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBConf.2015.7175941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Linguistically motivated tied-state triphones for polish speech recognition
The paper presents one of the possible approaches to build a triphone model for automatic speech recognition of Polish. Even though classifiers are well developed and described, such task is not a trivial one because of lack of enough training data and importance of calculation time spent for the training of the model. To overcome this problem, some states are typically tied using data-driven criteria. We investigate a linguistically motivated approach, where phonetically related contexts are tied. We compared recognition results of a system using this approach and of a system with no context tying on around 15 000 utterances. The results indicate a small improvement in the performance of the system.