Strategies in tracing linguistic variation in a corpus of Old Irish texts (CorPH)

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics Pub Date : 2022-09-20 DOI:10.1075/ijcl.22018.sti

D. Stifter, Fangzhe Qiu, M. Aquino-López, Bernhard Bauer, E. Lash, Nora White

引用次数: 0

Abstract

This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish (c. 7th–10th cent.) created by the ERC-funded Chronologicon Hibernicum (ChronHib) project by bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH – ‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of variation.

查看原文本刊更多论文

古爱尔兰语语料库中语言变异的追踪策略

本文介绍了古爱尔兰语语料库（CorPH），这是一个由78篇早期爱尔兰语文本（约7-10美分）组成的语料库，由ERC资助的Chronologicon Hibernicum（ChroonHib）项目创建，该项目将预先存在的词汇和句法数据库结合在一起，并添加了该时期的更多关键文本。除了对词性、形态和句法信息进行注释外，还为CorPH开发了另一层注释——“变体标记”，即对爱尔兰早期共时语言变体进行数字编码的标记集，从而大大改进了对材料之间时间变化的研究。研究语言变异的另一个新支柱是贝叶斯语言变异分析（BLaVA），以应对“不那么大的数据”对统计语料库方法提出的挑战。BLaVA不是反映特征频率，而是将语言变化建模为变化的概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Corpus Linguistics Multiple-

CiteScore

3.30

自引率

0.00%

发文量

期刊介绍： The International Journal of Corpus Linguistics (IJCL) publishes original research covering methodological, applied and theoretical work in any area of corpus linguistics. Through its focus on empirical language research, IJCL provides a forum for the presentation of new findings and innovative approaches in any area of linguistics (e.g. lexicology, grammar, discourse analysis, stylistics, sociolinguistics, morphology, contrastive linguistics), applied linguistics (e.g. language teaching, forensic linguistics), and translation studies. Based on its interest in corpus methodology, IJCL also invites contributions on the interface between corpus and computational linguistics.