创建综合词典时非标准词汇单元的自动搜索算法

Aktual''nye problemy filologii i pedagogicheskoi lingvistiki Pub Date : 2022-06-25 DOI:10.29025/2079-6021-2022-2-131-142

E. A. Gorobets, A. V. Mamontova

{"title":"创建综合词典时非标准词汇单元的自动搜索算法","authors":"E. A. Gorobets, A. V. Mamontova","doi":"10.29025/2079-6021-2022-2-131-142","DOIUrl":null,"url":null,"abstract":"The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.","PeriodicalId":34231,"journal":{"name":"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Algorithm of Automatic Search for Non-Standard Vocabulary Units when Creating a Comprehnsive Dictionary\",\"authors\":\"E. A. Gorobets, A. V. Mamontova\",\"doi\":\"10.29025/2079-6021-2022-2-131-142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.\",\"PeriodicalId\":34231,\"journal\":{\"name\":\"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29025/2079-6021-2022-2-131-142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aktual''nye problemy filologii i pedagogicheskoi lingvistiki","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29025/2079-6021-2022-2-131-142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论了在综合词典的创建中开发和使用一个优化语言工作的自动工具的经验。尽管现代词典编纂中语言信息的自动处理水平很高，但仍有许多问题尚未解决。创建综合词典源的主要问题是不同词典的组合，因为它们中的标题单位可以以不同的形式出现，但同时指一个词位；词典编纂者在匹配过程上花费了大量时间，而这些材料必须手动处理。本研究的目的是解决使用形态分析仪识别非标准单词的问题。作者开发的程序旨在从标题单元列表中自动选择非标准单词，这可以显著减少出错的机会，减少创建摘要词典所花费的时间，还可以最大限度地减少手动处理和解释单元的必要性。该开发是在Python 3.8.2中使用pymorph2形态分析器库0.9.1版本进行的。作者开发的算法和程序可以用于任何需要自动选择非初始单词形式的单词列表。创建的程序在综合词源词典中的22738个单词列表中进行了测试，其中确定了979个非标准单元。对指定数量的单词的平均处理时间为1.5秒，这证明了所创建的算法的有效性及其在词典编纂实践中进一步使用的方便性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Algorithm of Automatic Search for Non-Standard Vocabulary Units when Creating a Comprehnsive Dictionary

The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Aktual''nye problemy filologii i pedagogicheskoi lingvistiki

自引率

0.00%

发文量

审稿时长

4 weeks