{"title":"Naïve Terminological Annotation of Legal Texts in Slovak","authors":"R. Garabík, J. Levická","doi":"10.31724/rihjj.48.1.2","DOIUrl":null,"url":null,"abstract":"Correct automatic terminological annotation of texts in a corpus can be sometimes a challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility of simple annotation based on sequence matching of lemmatized texts to annotate Slovak language corpus with IATE terminological entries. The accuracy of annotating legal language is very good when annotating multiword terms, while accuracy of single-word terms can be increased by applying simple filters based on word lengths and blacklisting most frequent false positives.","PeriodicalId":51986,"journal":{"name":"Rasprave","volume":"44 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rasprave","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31724/rihjj.48.1.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
Correct automatic terminological annotation of texts in a corpus can be sometimes a challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility of simple annotation based on sequence matching of lemmatized texts to annotate Slovak language corpus with IATE terminological entries. The accuracy of annotating legal language is very good when annotating multiword terms, while accuracy of single-word terms can be increased by applying simple filters based on word lengths and blacklisting most frequent false positives.