New Estonian Words and Senses: Detection and Description

Q2 Arts and Humanities

Dictionaries Pub Date : 2020-05-15 DOI:10.1353/dic.2020.0005

Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein

{"title":"New Estonian Words and Senses: Detection and Description","authors":"Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein","doi":"10.1353/dic.2020.0005","DOIUrl":null,"url":null,"abstract":"ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.","PeriodicalId":35106,"journal":{"name":"Dictionaries","volume":"41 1","pages":"69 - 82"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/dic.2020.0005","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dictionaries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/dic.2020.0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 2

Abstract

ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.

查看原文本刊更多论文

新爱沙尼亚语字义:侦测与描述

摘要:网络时代加剧了对语言自动监测的需求，包括对新词和词义的提取。在本文中，我们首先简要概述了统一词典系统Ekilex，这是爱沙尼亚语言研究所自2019年以来所有新词典编纂任务的起点。我们描述了用于手动收集和注册新词和词义的现有数据库。接下来，我们描述了一项基于小型媒体语料库和现有词典的半自动新词检测实验研究，该研究于2018年在爱沙尼亚语言研究所进行。实验的目的是建立一个新词检测的工作流程，测试爱沙尼亚语处理工具的可靠性，并编制新词候选列表。实验的重点是单字检测。结果表明，为了更有效地发现新词，我们需要更先进的自动语言处理工具，我们认为迫切需要建立一个(半)自动新词检测的基础设施。这是爱沙尼亚语的第一项研究，旨在开发一种工具，为词典编纂者提供新词候选词，以便纳入词典。最后，我们讨论了在不久的将来，新词的词典编纂处理的一些方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dictionaries Arts and Humanities-Language and Linguistics

CiteScore

0.80

自引率

0.00%

发文量