开发英语立陶宛网络安全术语库的方法框架

Q2 Arts and Humanities
Sigita Rackevičienė, Liudmila Mockienė, A. Utka, A. Rokas
{"title":"开发英语立陶宛网络安全术语库的方法框架","authors":"Sigita Rackevičienė, Liudmila Mockienė, A. Utka, A. Rokas","doi":"10.5755/j01.sal.1.39.29156","DOIUrl":null,"url":null,"abstract":"The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.","PeriodicalId":37822,"journal":{"name":"Studies About Languages","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase\",\"authors\":\"Sigita Rackevičienė, Liudmila Mockienė, A. Utka, A. Rokas\",\"doi\":\"10.5755/j01.sal.1.39.29156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.\",\"PeriodicalId\":37822,\"journal\":{\"name\":\"Studies About Languages\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies About Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5755/j01.sal.1.39.29156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies About Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5755/j01.sal.1.39.29156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 2

摘要

本文的目的是提出一个在网络安全领域开发英语-立陶宛双语术语库的方法框架,该框架可以作为其他语言对和其他专业领域的模型。本文认为,即使现有资源有限,所提出的方法方法也能确保创建高质量的双语术语库。本文讨论了数据集(语料库)编译、术语标注、自动双语术语提取(BiTE)和对齐、知识丰富的上下文提取和语言链接开放数据(LLOD)技术的方法和问题。本文提出了理论思考,并对所描述的方法的有效性进行了论证。理论分析和试点研究表明:1)平行语料库和可比语料库的结合可以大大扩展可用于术语提取的数据源的数量和种类;这种方法对于资源较少的语言尤其重要,因为它们通常缺乏并行数据;2)使用人工标注数据(金标准语料库)训练的深度学习系统允许有效地自动化提取术语数据和元数据,从而能够以最少的人工输入定期更新术语库;LLOD技术能够将术语数据集成到全球语言数据生态系统中,并使其在Web上可重用、可搜索和可发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase
The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Studies About Languages
Studies About Languages Social Sciences-Linguistics and Language
CiteScore
0.60
自引率
0.00%
发文量
8
审稿时长
32 weeks
期刊介绍: The journal aims at bringing together the scholars interested in languages and technology, linguistic theory development, empirical research of different aspects of languages functioning within a society. The articles published in the journal focus on theoretical and empirical research, including General Linguistics, Applied Linguistics (Translation studies, Computational Linguistics, Sociolinguistics, Media Linguistics, etc.), Comparative and Contrastive Linguistics. The journal aims at becoming a multidisciplinary venue of sharing ideas and experience among the scholars working in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信