New Estonian Words and Senses: Detection and Description

Q2 Arts and Humanities
Dictionaries Pub Date : 2020-05-15 DOI:10.1353/dic.2020.0005
Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein
{"title":"New Estonian Words and Senses: Detection and Description","authors":"Margit Langemets, Jelena Kallas, Kaisa Norak, Indrek Hein","doi":"10.1353/dic.2020.0005","DOIUrl":null,"url":null,"abstract":"ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.","PeriodicalId":35106,"journal":{"name":"Dictionaries","volume":"41 1","pages":"69 - 82"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/dic.2020.0005","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dictionaries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/dic.2020.0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT:The Web era has intensified the need for the automatic monitoring of language, including the extraction of new words and senses. In this paper, we first give a brief overview of the unified dictionary system Ekilex, the starting point for all new lexicographic tasks at the Institute of the Estonian Language since 2019. We describe the existing databases meant for manual collecting and registering new words and meanings. Next we describe an experimental study on semi-automatic new word detection on the basis of the small media corpus and existing dictionaries carried out in 2018 at the Institute of the Estonian Language. The goal of the experiment was to develop a workflow for new word detection, to test the reliability of the tools for Estonian language processing, and to compile the new word candidate list. The experiment was focused on single word detection. The results revealed that in order to make new word discovery more effective we need more advanced tools for automatic language processing, and we perceive an urgent need to set up an infrastructure for (semi-) automatic new word detection.This is the first study for Estonian aimed at the development of a tool to supply lexicographers with new word candidates for inclusion in a dictionary. We end the paper by discussing some aspects of the lexicographic treatment of new words and meanings in the near future.
新爱沙尼亚语字义:侦测与描述
摘要:网络时代加剧了对语言自动监测的需求,包括对新词和词义的提取。在本文中,我们首先简要概述了统一词典系统Ekilex,这是爱沙尼亚语言研究所自2019年以来所有新词典编纂任务的起点。我们描述了用于手动收集和注册新词和词义的现有数据库。接下来,我们描述了一项基于小型媒体语料库和现有词典的半自动新词检测实验研究,该研究于2018年在爱沙尼亚语言研究所进行。实验的目的是建立一个新词检测的工作流程,测试爱沙尼亚语处理工具的可靠性,并编制新词候选列表。实验的重点是单字检测。结果表明,为了更有效地发现新词,我们需要更先进的自动语言处理工具,我们认为迫切需要建立一个(半)自动新词检测的基础设施。这是爱沙尼亚语的第一项研究,旨在开发一种工具,为词典编纂者提供新词候选词,以便纳入词典。最后,我们讨论了在不久的将来,新词的词典编纂处理的一些方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Dictionaries
Dictionaries Arts and Humanities-Language and Linguistics
CiteScore
0.80
自引率
0.00%
发文量
12
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信