{"title":"Creating Arabic Lexical Resources in TEI: A Schema for Discontinuous Morphology Encoding","authors":"Ouafae Nahli, A. D. Grosso","doi":"10.1109/CiSt49399.2021.9357273","DOIUrl":null,"url":null,"abstract":"An Arabic word can be described according to its lexical and morphological information. Lexical analysis consists in gathering both semantic information (meaning and translation) and syntactic properties (parts of speech). Morphological analysis, instead, identifies word patterns that group the words having the same syntactic, inflectional and semantic behaviour. Such descriptions constitute two different but complementary levels of study. This paper illustrates our work, aimed at creating an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary $al=q\\bar{a}m\\bar{u}s\\ al=m\\underset{.}{h}\\bar{\\imath}\\underset{.}{t}$. The morphological level describes the word patterns. The two levels are autonomous but complementary. Each word described at the lexical level is linked to its corresponding pattern. The formalization of the word pattern makes it possible to enrich word descriptions with additional morphosyntactic and inflectional information. To obtain a digital systematic resource, we followed the guidelines provided by the Text Encoding Initiative (TEI). We adopted the TEI module devoted to encoding digital dictionaries and lexicons in order to formally represent the medieval primary source $al=q\\bar{a}m\\bar{u}s\\ \\ al=mu\\underset{.}{h}\\bar{\\imath}\\underset{.}{t}$. We also used the TEI interpretation approach to encode the morphological word patterns keeping the two levels separate but at the same time allowing them to be linked.","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An Arabic word can be described according to its lexical and morphological information. Lexical analysis consists in gathering both semantic information (meaning and translation) and syntactic properties (parts of speech). Morphological analysis, instead, identifies word patterns that group the words having the same syntactic, inflectional and semantic behaviour. Such descriptions constitute two different but complementary levels of study. This paper illustrates our work, aimed at creating an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary $al=q\bar{a}m\bar{u}s\ al=m\underset{.}{h}\bar{\imath}\underset{.}{t}$. The morphological level describes the word patterns. The two levels are autonomous but complementary. Each word described at the lexical level is linked to its corresponding pattern. The formalization of the word pattern makes it possible to enrich word descriptions with additional morphosyntactic and inflectional information. To obtain a digital systematic resource, we followed the guidelines provided by the Text Encoding Initiative (TEI). We adopted the TEI module devoted to encoding digital dictionaries and lexicons in order to formally represent the medieval primary source $al=q\bar{a}m\bar{u}s\ \ al=mu\underset{.}{h}\bar{\imath}\underset{.}{t}$. We also used the TEI interpretation approach to encode the morphological word patterns keeping the two levels separate but at the same time allowing them to be linked.