Ali Orkan Bayer, Tolga Q7iloglut, Meltem Turhan, Yondem Bilgisayar, Miihendisligi B6liimii Telektrik Ve Elektronik, Miihendisligi B6liimii
{"title":"土耳其语语音识别的不同语言模型研究","authors":"Ali Orkan Bayer, Tolga Q7iloglut, Meltem Turhan, Yondem Bilgisayar, Miihendisligi B6liimii Telektrik Ve Elektronik, Miihendisligi B6liimii","doi":"10.1109/SIU.2006.1659779","DOIUrl":null,"url":null,"abstract":"Large vocabulary continuous speech recognition can be performed with high accuracy for languages like English that do not have a rich morphological structure. However, the performance of these systems for agglutinative languages is very low. The major reason for that is, the language models that are built on the words do not perform well for agglutinative languages. In this study, three different language models that consider the structure of the agglutinative languages are investigated. Two of the models consider the subword units as the units of language modeling. The first one uses only the stem of the words as units, and the other one uses stems and endings of the words separately as the units. The third model, firstly, places the words into certain classes by using the co-occurrences of the words, and then uses these classes as the units of the language model. The performance of the models are tested by using two stage decoding; in the first stage, lattices are formed by using bi-gram models and then tri-gram models are used for recognition over these lattices. In this study, it is shown that the vocabulary coverage of the system seriously affects the recognition performance. For this reason, models that use stems and endings as the modeling unit perform better since their coverage of the vocabulary is higher. In addition to that, a single-pass decoder that can perform single pass decoding over these models is believed to increase the recognition performance","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Investigation of Different Language Models for Turkish Speech Recognition\",\"authors\":\"Ali Orkan Bayer, Tolga Q7iloglut, Meltem Turhan, Yondem Bilgisayar, Miihendisligi B6liimii Telektrik Ve Elektronik, Miihendisligi B6liimii\",\"doi\":\"10.1109/SIU.2006.1659779\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large vocabulary continuous speech recognition can be performed with high accuracy for languages like English that do not have a rich morphological structure. However, the performance of these systems for agglutinative languages is very low. The major reason for that is, the language models that are built on the words do not perform well for agglutinative languages. In this study, three different language models that consider the structure of the agglutinative languages are investigated. Two of the models consider the subword units as the units of language modeling. The first one uses only the stem of the words as units, and the other one uses stems and endings of the words separately as the units. The third model, firstly, places the words into certain classes by using the co-occurrences of the words, and then uses these classes as the units of the language model. The performance of the models are tested by using two stage decoding; in the first stage, lattices are formed by using bi-gram models and then tri-gram models are used for recognition over these lattices. In this study, it is shown that the vocabulary coverage of the system seriously affects the recognition performance. For this reason, models that use stems and endings as the modeling unit perform better since their coverage of the vocabulary is higher. In addition to that, a single-pass decoder that can perform single pass decoding over these models is believed to increase the recognition performance\",\"PeriodicalId\":415037,\"journal\":{\"name\":\"2006 IEEE 14th Signal Processing and Communications Applications\",\"volume\":\"157 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE 14th Signal Processing and Communications Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU.2006.1659779\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE 14th Signal Processing and Communications Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2006.1659779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigation of Different Language Models for Turkish Speech Recognition
Large vocabulary continuous speech recognition can be performed with high accuracy for languages like English that do not have a rich morphological structure. However, the performance of these systems for agglutinative languages is very low. The major reason for that is, the language models that are built on the words do not perform well for agglutinative languages. In this study, three different language models that consider the structure of the agglutinative languages are investigated. Two of the models consider the subword units as the units of language modeling. The first one uses only the stem of the words as units, and the other one uses stems and endings of the words separately as the units. The third model, firstly, places the words into certain classes by using the co-occurrences of the words, and then uses these classes as the units of the language model. The performance of the models are tested by using two stage decoding; in the first stage, lattices are formed by using bi-gram models and then tri-gram models are used for recognition over these lattices. In this study, it is shown that the vocabulary coverage of the system seriously affects the recognition performance. For this reason, models that use stems and endings as the modeling unit perform better since their coverage of the vocabulary is higher. In addition to that, a single-pass decoder that can perform single pass decoding over these models is believed to increase the recognition performance