M. B. Mohamed, Sarra Zrigui, A. Zouaghi, M. Zrigui
{"title":"N-scheme model: An approach towards reducing Arabic language sparseness","authors":"M. B. Mohamed, Sarra Zrigui, A. Zouaghi, M. Zrigui","doi":"10.1109/ICTA.2015.7426895","DOIUrl":null,"url":null,"abstract":"In addition to traditional characteristics of natural languages like implicitly or ambiguity or imprecision, Arabic is known by its sparseness which explains the difficulty of its automatic processing. But on the other hand, Arabic language is characterized by an interesting property; lemmas are generated by derivation based on roots and schemes. Schemes are kinds of molds allowing changing the form of root by actions involving elongation, or repetition, or even adding characters. Schemes can also give meaning to generated word. In this work we have studied the statistical characteristics of the Arabic language at the level of schemes; we have emphasized the attenuation of the sparseness at this level. Then we explored the possibility of building natural language processing tools for Arabic by relying on schemes. We discovered that schemes have great potential in building accurate natural language processing tools for Arabic. Based entirely or partially on schemes we built an n-scheme statistical model and a text classification system.","PeriodicalId":375443,"journal":{"name":"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA.2015.7426895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In addition to traditional characteristics of natural languages like implicitly or ambiguity or imprecision, Arabic is known by its sparseness which explains the difficulty of its automatic processing. But on the other hand, Arabic language is characterized by an interesting property; lemmas are generated by derivation based on roots and schemes. Schemes are kinds of molds allowing changing the form of root by actions involving elongation, or repetition, or even adding characters. Schemes can also give meaning to generated word. In this work we have studied the statistical characteristics of the Arabic language at the level of schemes; we have emphasized the attenuation of the sparseness at this level. Then we explored the possibility of building natural language processing tools for Arabic by relying on schemes. We discovered that schemes have great potential in building accurate natural language processing tools for Arabic. Based entirely or partially on schemes we built an n-scheme statistical model and a text classification system.