{"title":"SANTI-morf dictionaries","authors":"Prihantoro","doi":"10.1558/lexi.23569","DOIUrl":null,"url":null,"abstract":"This article highlights the structure of dictionaries used in SANTI-morf (Sistem Analisis Teks Indonesia – morfologi), a multi-module pipeline system that performs annotations for an Indonesian corpus at the morpheme level and built using NooJ (Silberztein, 2003, 2016). SANTI-morf dictionaries, together with other SANTI-morf components, enable the system to tokenize each word in an Indonesian corpus into morphemes (e.g., cliticized and non-cliticized roots, affixes, reduplications) and associate these morphemes with their corresponding tags. Each entry in the SANTI-morf dictionary is encoded with a tag composed of morphological analysis (MA) labels. In most cases, these labels are combined with system implementation (SI) labels. Morphological analysis labels consist of formal and functional morphological criteria labels and are typically used for searching the annotated corpus (e.g., root part of speech (POS) labels). System implementation labels are used for system implementation and are mostly of interest to developers rather than end users. They include morphotactic and morphophonemic constraint labels, which are processed when the monomorphemic entries in dictionaries work together with SANTI-morf grammars (rules).","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":"18 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Lexicography","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1558/lexi.23569","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
This article highlights the structure of dictionaries used in SANTI-morf (Sistem Analisis Teks Indonesia – morfologi), a multi-module pipeline system that performs annotations for an Indonesian corpus at the morpheme level and built using NooJ (Silberztein, 2003, 2016). SANTI-morf dictionaries, together with other SANTI-morf components, enable the system to tokenize each word in an Indonesian corpus into morphemes (e.g., cliticized and non-cliticized roots, affixes, reduplications) and associate these morphemes with their corresponding tags. Each entry in the SANTI-morf dictionary is encoded with a tag composed of morphological analysis (MA) labels. In most cases, these labels are combined with system implementation (SI) labels. Morphological analysis labels consist of formal and functional morphological criteria labels and are typically used for searching the annotated corpus (e.g., root part of speech (POS) labels). System implementation labels are used for system implementation and are mostly of interest to developers rather than end users. They include morphotactic and morphophonemic constraint labels, which are processed when the monomorphemic entries in dictionaries work together with SANTI-morf grammars (rules).
期刊介绍:
The International Journal of Lexicography was launched in 1988. Interdisciplinary as well as international, it is concerned with all aspects of lexicography, including issues of design, compilation and use, and with dictionaries of all languages, though the chief focus is on dictionaries of the major European languages - monolingual and bilingual, synchronic and diachronic, pedagogical and encyclopedic. The Journal recognizes the vital role of lexicographical theory and research, and of developments in related fields such as computational linguistics, and welcomes contributions in these areas.