Investigating the linguistic representativeness of Early Modern Greek Corpora

E. Karantzola, Yannis Kostopoulos, K. Sampanis
{"title":"Investigating the linguistic representativeness of Early Modern Greek Corpora","authors":"E. Karantzola, Yannis Kostopoulos, K. Sampanis","doi":"10.1553/EMODERN_GREEKS1","DOIUrl":null,"url":null,"abstract":"Following a poorly documented period in the history of vernacular Greek (6th-12th c.), the late 15th century sets the beginning of a linguistic era characterized by a quantitatively and qualitatively incomparable production of prose texts written in “common” language. It is at this point that classicizing Greek stops dominating in writing, and a new linguistic variety – albeit a very diverse and fluid one – Early Modern Greek (EMG) starts growing rapidly as a literacy language. The development of this new variety is manifested in its widespread use as literary language (in texts with aesthetic function), as well as in its use as a simple scripta, namely a written vernacular for legal, administrative, commercial, and other functions. Despite its significance in the history of Greek, this period remains to a large extent unexplored and underrepresented in Greek language corpora. On this view, our understanding of EMG depends crucially on the representativeness of the few available corpora. The aim of this paper is to investigate the linguistic representativeness of EMG corpora, and to explore possible associations between observed linguistic patterns and corpora design. Focusing on the distribution of contrastive and reformulation markers, our study reveals that the linguistic data illustrated in the available EMG corpora are divergent and largely dependent on the representation of variables, such as text form (poetry/prose), period, geographical region, and genre","PeriodicalId":210552,"journal":{"name":"Digital Lexis and Beyond","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Lexis and Beyond","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1553/EMODERN_GREEKS1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Following a poorly documented period in the history of vernacular Greek (6th-12th c.), the late 15th century sets the beginning of a linguistic era characterized by a quantitatively and qualitatively incomparable production of prose texts written in “common” language. It is at this point that classicizing Greek stops dominating in writing, and a new linguistic variety – albeit a very diverse and fluid one – Early Modern Greek (EMG) starts growing rapidly as a literacy language. The development of this new variety is manifested in its widespread use as literary language (in texts with aesthetic function), as well as in its use as a simple scripta, namely a written vernacular for legal, administrative, commercial, and other functions. Despite its significance in the history of Greek, this period remains to a large extent unexplored and underrepresented in Greek language corpora. On this view, our understanding of EMG depends crucially on the representativeness of the few available corpora. The aim of this paper is to investigate the linguistic representativeness of EMG corpora, and to explore possible associations between observed linguistic patterns and corpora design. Focusing on the distribution of contrastive and reformulation markers, our study reveals that the linguistic data illustrated in the available EMG corpora are divergent and largely dependent on the representation of variables, such as text form (poetry/prose), period, geographical region, and genre
考察早期现代希腊语料库的语言代表性
在经历了一段文献贫乏的希腊方言历史时期(公元前6 -12年)之后,15世纪末开始了一个语言时代,其特点是用“普通”语言写的散文文本在数量和质量上都无可比拟。正是在这一点上,古典希腊语在写作中不再占主导地位,一种新的语言种类——尽管是非常多样化和流动的——早期现代希腊语(EMG)开始迅速发展成为一种读写语言。这种新变体的发展表现在它作为文学语言(在具有审美功能的文本中)的广泛使用,以及作为一种简单的脚本,即用于法律、行政、商业和其他功能的书面白话。尽管它在希腊历史上具有重要意义,但这一时期在很大程度上仍未被探索,在希腊语语料库中也没有得到充分的体现。根据这种观点,我们对肌电图的理解主要取决于少数可用语料库的代表性。本文的目的是研究肌电图语料库的语言代表性,并探讨观察到的语言模式与语料库设计之间的可能联系。通过对比标记和重新表述标记的分布,我们的研究表明,可用的肌电图语料库中显示的语言数据是不同的,并且在很大程度上取决于文本形式(诗歌/散文)、时期、地理区域和体裁等变量的表示
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信