Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein
{"title":"New Estonian Words and Senses: Detection and Description","authors":"Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein","doi":"10.1353/dic.2020.0005","DOIUrl":null,"url":null,"abstract":"ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.","PeriodicalId":35106,"journal":{"name":"Dictionaries","volume":"41 1","pages":"69 - 82"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/dic.2020.0005","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dictionaries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/dic.2020.0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 2
Abstract
ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.