Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch

Iris Van de Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche
{"title":"Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch","authors":"Iris Van de Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche","doi":"10.5117/tet2023.1.006.vand","DOIUrl":null,"url":null,"abstract":"In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca. 1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality.","PeriodicalId":30675,"journal":{"name":"Taal en Tongval Language Variation in the Low Countries","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taal en Tongval Language Variation in the Low Countries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5117/tet2023.1.006.vand","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca. 1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality.
荷兰语历史语料库:一个新的多体裁的早期和晚期现代荷兰语语料库
在这一贡献,我们提出荷兰语的历史语料库(HCD),一个新的多体裁,历时语料库早期和晚期现代荷兰语(约1550-1850)。它包括数字化的手写行政文件(如镇议会会议报告)、手写的个人文件(如日记和游记)和印刷的小册子(如政治或宗教性质的小册子)。语料库在北部和南部材料之间也保持平衡,北部的数据来自荷兰和泽兰省,南部的数据来自佛兰德斯和布拉班特省。在讨论了它的结构和组成之后,我们将用一些较小的案例研究来说明新语料库的价值。根据我们在语料库方面的经验,我们将在最后提出一个请求,即历史语料库建设不要过于关注数据的数量(“大数据”),而是将注意力转移到数据质量上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
5
审稿时长
53 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信