英汉-格鲁吉亚平行语料库及其在格鲁吉亚词典编纂中的应用

IF 0.9 2区 文学 0 LANGUAGE & LINGUISTICS
Lexikos Pub Date : 2022-01-01 DOI:10.5788/32-2-1701
T. Margalitadze, G. Meladze, Z. Pourtskhvanidze
{"title":"英汉-格鲁吉亚平行语料库及其在格鲁吉亚词典编纂中的应用","authors":"T. Margalitadze, G. Meladze, Z. Pourtskhvanidze","doi":"10.5788/32-2-1701","DOIUrl":null,"url":null,"abstract":"The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexi­cog­raphy. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-auto­matic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexi­cographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary","PeriodicalId":43907,"journal":{"name":"Lexikos","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"English–Georgian Parallel Corpus and Its Application in Georgian Lexicography\",\"authors\":\"T. Margalitadze, G. Meladze, Z. Pourtskhvanidze\",\"doi\":\"10.5788/32-2-1701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexi­cog­raphy. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-auto­matic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexi­cographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary\",\"PeriodicalId\":43907,\"journal\":{\"name\":\"Lexikos\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lexikos\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.5788/32-2-1701\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lexikos","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.5788/32-2-1701","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

摘要

格鲁吉亚语是格鲁吉亚的官方语言,是高加索地区土著语系卡特维利亚语族中唯一的书面语言。格鲁吉亚语文学和词典编纂有着悠久的传统,英语-格鲁吉亚词典编纂也不例外。鉴于越来越多的电子文本语料库用于词典编纂目的,格鲁吉亚词典编纂者团队正在编写《综合英语-格鲁吉亚语词典》(CEGD),随后是《综合英语-格鲁吉亚语在线词典》(CEGOD),决定编写《英语-格鲁吉亚语平行语料库》(EGPC)。该项目的目的是发展建立格鲁吉亚语平行语料库的方法,并评估其对格鲁吉亚语双语词汇编纂的效率。语料库的工作已经进行了十多年。最终目标是为将来编写的格鲁吉亚双语语料库建立一个标准。本文介绍了EGPC的内容、组成、结构、功能、搜索引擎等。文章还讨论了多年来为评估和提高《英语-格鲁吉亚语词典》在自动或半自动识别、术语标注和提取、术语条目的编纂以及《英汉-格鲁吉亚语学习词典》词条等方面的价值、适用性和效率而进行的各项研究。特别强调的是语料库在词典编纂活动和机器翻译项目中的实际或潜在的适用性。这项研究的结果对其他资源不足的语言,如格鲁吉亚语,可能会很有趣。关键词:平行语料库,词条,英格词典,格英词典
本文章由计算机程序翻译,如有差异,请以英文原文为准。
English–Georgian Parallel Corpus and Its Application in Georgian Lexicography
The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexi­cog­raphy. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-auto­matic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexi­cographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Lexikos
Lexikos Multiple-
CiteScore
1.00
自引率
25.00%
发文量
15
审稿时长
7 weeks
期刊介绍: Lexikos (Greek for "of or for words") is a journal for the lexicographical specialist. It is the only journal in Africa which is exclusively devoted to lexicography. Articles dealing with all aspects of lexicography and terminology or the implications that research in related disciplines such as linguistics, computer and information science, etc. has for lexicography will be considered for publication. Articles may be written in Afrikaans, English, Dutch, German and French.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信