{"title":"统计工具的阿拉伯语文本","authors":"Fayçal Imedjdouben","doi":"10.1109/NTIC55069.2022.10100607","DOIUrl":null,"url":null,"abstract":"We present here a statistical tool dedicated to the Arabic language. This statistical tool uses encoding from the Unicode standard; the tool was programmed in the MATLAB environment. The statistical processing of the Arabic language constitutes a fundamental step for the realization and analysis of Arabic language corpora dedicated to various fields of application such as: the field of speech synthesis, speech recognition field, and the field of natural language processing. Our system which generates the statistical results related to the Arabic text is essentially based as input on a sequence of the diacritized Arabic text. The latter is transformed into data coded according to the Unicode standard so that the statistical rules base that we have developed can process it. The statistical tool developed provides useful information related to the treated Arabic text such as: number of words, occurrence frequency of each grapheme, and occurrence frequency of syllables \"CV/CVV/CVC\".","PeriodicalId":403927,"journal":{"name":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical Tool for Arabic Text\",\"authors\":\"Fayçal Imedjdouben\",\"doi\":\"10.1109/NTIC55069.2022.10100607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present here a statistical tool dedicated to the Arabic language. This statistical tool uses encoding from the Unicode standard; the tool was programmed in the MATLAB environment. The statistical processing of the Arabic language constitutes a fundamental step for the realization and analysis of Arabic language corpora dedicated to various fields of application such as: the field of speech synthesis, speech recognition field, and the field of natural language processing. Our system which generates the statistical results related to the Arabic text is essentially based as input on a sequence of the diacritized Arabic text. The latter is transformed into data coded according to the Unicode standard so that the statistical rules base that we have developed can process it. The statistical tool developed provides useful information related to the treated Arabic text such as: number of words, occurrence frequency of each grapheme, and occurrence frequency of syllables \\\"CV/CVV/CVC\\\".\",\"PeriodicalId\":403927,\"journal\":{\"name\":\"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NTIC55069.2022.10100607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NTIC55069.2022.10100607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We present here a statistical tool dedicated to the Arabic language. This statistical tool uses encoding from the Unicode standard; the tool was programmed in the MATLAB environment. The statistical processing of the Arabic language constitutes a fundamental step for the realization and analysis of Arabic language corpora dedicated to various fields of application such as: the field of speech synthesis, speech recognition field, and the field of natural language processing. Our system which generates the statistical results related to the Arabic text is essentially based as input on a sequence of the diacritized Arabic text. The latter is transformed into data coded according to the Unicode standard so that the statistical rules base that we have developed can process it. The statistical tool developed provides useful information related to the treated Arabic text such as: number of words, occurrence frequency of each grapheme, and occurrence frequency of syllables "CV/CVV/CVC".