创建综合词典时非标准词汇单元的自动搜索算法

E. A. Gorobets, A. V. Mamontova
{"title":"创建综合词典时非标准词汇单元的自动搜索算法","authors":"E. A. Gorobets, A. V. Mamontova","doi":"10.29025/2079-6021-2022-2-131-142","DOIUrl":null,"url":null,"abstract":"The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.","PeriodicalId":34231,"journal":{"name":"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Algorithm of Automatic Search for Non-Standard Vocabulary Units when Creating a Comprehnsive Dictionary\",\"authors\":\"E. A. Gorobets, A. V. Mamontova\",\"doi\":\"10.29025/2079-6021-2022-2-131-142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.\",\"PeriodicalId\":34231,\"journal\":{\"name\":\"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29025/2079-6021-2022-2-131-142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29025/2079-6021-2022-2-131-142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文讨论了在综合词典的创建中开发和使用一个优化语言工作的自动工具的经验。尽管现代词典编纂中语言信息的自动处理水平很高,但仍有许多问题尚未解决。创建综合词典源的主要问题是不同词典的组合,因为它们中的标题单位可以以不同的形式出现,但同时指一个词位;词典编纂者在匹配过程上花费了大量时间,而这些材料必须手动处理。本研究的目的是解决使用形态分析仪识别非标准单词的问题。作者开发的程序旨在从标题单元列表中自动选择非标准单词,这可以显著减少出错的机会,减少创建摘要词典所花费的时间,还可以最大限度地减少手动处理和解释单元的必要性。该开发是在Python 3.8.2中使用pymorph2形态分析器库0.9.1版本进行的。作者开发的算法和程序可以用于任何需要自动选择非初始单词形式的单词列表。创建的程序在综合词源词典中的22738个单词列表中进行了测试,其中确定了979个非标准单元。对指定数量的单词的平均处理时间为1.5秒,这证明了所创建的算法的有效性及其在词典编纂实践中进一步使用的方便性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Algorithm of Automatic Search for Non-Standard Vocabulary Units when Creating a Comprehnsive Dictionary
The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
54
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信