Lancsbox software options for the prospective investigation of the multilingual corpus for European studies

О. I. Andrushenko
{"title":"Lancsbox software options for the prospective investigation of the multilingual corpus for European studies","authors":"О. I. Andrushenko","doi":"10.32589/2311-0821.1.2023.286180","DOIUrl":null,"url":null,"abstract":"The paper presents a comparative analysis of the lexeme European in two language variations (British and American English) based on the built-in corpora represented by newspapers, fiction, etc. that are licensed by LancsBox software (AmE06 and BE06 respectively). The investigation describes the algorithms of implementing linguistic research as part of the project taught during the course “Multilingual Corpus and its Resources for European Studies (KNLU)” (Erasmus+ Program). The LancsBox user-friendly software, that works with major operating systems, has proved to be a powerful manager for compiling and using the existing corpora. It enables to visualize the textual data based on the following software package tools: KWIC, GraphColl, Words, Ngrams, Wizard, etc. essential for the study of a specific linguistic unit. The statistical analysis of both corpora under analysis has revealed that the word European belongs to the lexemes that are seldom employed in the language. The comparison of the two variations has shown that the word occurs in similar top-ten frequent collocates, however, the GraphColl tool visualization has indicated the major differences between two сorpora. Thus, in British English Corpus N+N structures are more commonly employed and are more vibrant than in American English Corpus. The t-test has proved a statistically significant difference between the corpora with regard to the linguistic variable European. These data may testify to cultural differences between the users of two language variations taking into account that both сorpora represent the same time frame.","PeriodicalId":217176,"journal":{"name":"MESSENGER of Kyiv National Linguistic University. Series Philology","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MESSENGER of Kyiv National Linguistic University. Series Philology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32589/2311-0821.1.2023.286180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The paper presents a comparative analysis of the lexeme European in two language variations (British and American English) based on the built-in corpora represented by newspapers, fiction, etc. that are licensed by LancsBox software (AmE06 and BE06 respectively). The investigation describes the algorithms of implementing linguistic research as part of the project taught during the course “Multilingual Corpus and its Resources for European Studies (KNLU)” (Erasmus+ Program). The LancsBox user-friendly software, that works with major operating systems, has proved to be a powerful manager for compiling and using the existing corpora. It enables to visualize the textual data based on the following software package tools: KWIC, GraphColl, Words, Ngrams, Wizard, etc. essential for the study of a specific linguistic unit. The statistical analysis of both corpora under analysis has revealed that the word European belongs to the lexemes that are seldom employed in the language. The comparison of the two variations has shown that the word occurs in similar top-ten frequent collocates, however, the GraphColl tool visualization has indicated the major differences between two сorpora. Thus, in British English Corpus N+N structures are more commonly employed and are more vibrant than in American English Corpus. The t-test has proved a statistically significant difference between the corpora with regard to the linguistic variable European. These data may testify to cultural differences between the users of two language variations taking into account that both сorpora represent the same time frame.
Lancsbox软件选项为欧洲研究的多语言语料库的前瞻性调查
本文以LancsBox软件(AmE06和BE06)授权的报纸、小说等为代表的内置语料库为基础,对两种语言变体(英式英语和美式英语)中的词素European进行了对比分析。该调查描述了实施语言学研究的算法,作为课程“多语言语料库及其欧洲研究资源”(Erasmus+计划)中教授的项目的一部分。LancsBox用户友好的软件,可与主要操作系统兼容,已被证明是编译和使用现有语料库的强大管理器。基于KWIC、GraphColl、Words、Ngrams、Wizard等研究特定语言单位所必需的软件包工具,实现文本数据的可视化。通过对所分析的两个语料库的统计分析发现,“European”一词属于语言中很少使用的词汇。两种变体的比较表明,该词出现在相似的十大频繁搭配中,然而,GraphColl工具可视化显示了两种词汇之间的主要差异。因此,在英国英语语料库中,N+N结构比美国英语语料库更常用,更有活力。t检验证明了语料库之间在语言变量方面存在统计学上的显著差异。这些数据可能证明两种语言变体使用者之间的文化差异,因为这两种语言都代表同一时间框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信