{"title":"建立阿拉伯语统计模型的一种新方法","authors":"Ali Sadiqui, Ahmed Zinedine","doi":"10.1109/CIST.2014.7016635","DOIUrl":null,"url":null,"abstract":"Language models are one of the key components in modern systems of automatic language processing. In this study we present a new approach for the realization of a statistical model of Arabic language for non-vocalized texts. This approach allows to overcome the morphological complexity of the Arabic language and to address the limitations of existing morphological analyzers. Indeed the classic approach adopted by most of the morphological analyzers, bring the word out of its context and therefore generate several options for segmentation. Our solution proposes using trellises at a time to keep the possibilities of segmentation generated by the morphological analyzer and then create the model language. In order to realize this solution, we have used these tools: AraMorph and Lattice-Tool from the box SRILM and AT & WSF. The language was estimated from a corpus composed of 100 K words and has been tested on a corpus of 7 K words. The results and analysis are presented in this document.","PeriodicalId":106483,"journal":{"name":"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new method to construct a statistical model for Arabic language\",\"authors\":\"Ali Sadiqui, Ahmed Zinedine\",\"doi\":\"10.1109/CIST.2014.7016635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language models are one of the key components in modern systems of automatic language processing. In this study we present a new approach for the realization of a statistical model of Arabic language for non-vocalized texts. This approach allows to overcome the morphological complexity of the Arabic language and to address the limitations of existing morphological analyzers. Indeed the classic approach adopted by most of the morphological analyzers, bring the word out of its context and therefore generate several options for segmentation. Our solution proposes using trellises at a time to keep the possibilities of segmentation generated by the morphological analyzer and then create the model language. In order to realize this solution, we have used these tools: AraMorph and Lattice-Tool from the box SRILM and AT & WSF. The language was estimated from a corpus composed of 100 K words and has been tested on a corpus of 7 K words. The results and analysis are presented in this document.\",\"PeriodicalId\":106483,\"journal\":{\"name\":\"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIST.2014.7016635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIST.2014.7016635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new method to construct a statistical model for Arabic language
Language models are one of the key components in modern systems of automatic language processing. In this study we present a new approach for the realization of a statistical model of Arabic language for non-vocalized texts. This approach allows to overcome the morphological complexity of the Arabic language and to address the limitations of existing morphological analyzers. Indeed the classic approach adopted by most of the morphological analyzers, bring the word out of its context and therefore generate several options for segmentation. Our solution proposes using trellises at a time to keep the possibilities of segmentation generated by the morphological analyzer and then create the model language. In order to realize this solution, we have used these tools: AraMorph and Lattice-Tool from the box SRILM and AT & WSF. The language was estimated from a corpus composed of 100 K words and has been tested on a corpus of 7 K words. The results and analysis are presented in this document.