Florian Petran, Marcel Bollmann, Stefanie Dipper, Thomas Klein
{"title":"ReM: A reference corpus of Middle High German - corpus compilation, annotation, and access","authors":"Florian Petran, Marcel Bollmann, Stefanie Dipper, Thomas Klein","doi":"10.21248/jlcl.31.2016.208","DOIUrl":null,"url":null,"abstract":"This paper describes ReM and the results of the ReM project and its predecessors. All projects closely collaborate in developing common annotation standards to allow for diachronic investigations. ReA has already been published and made available via the corpus search tool ANNIS2 (Krause and Zeldes, 2016), while ReF and ReN are still in the annotation process. The ReM project builds on several earlier annotation efforts, such as the corpus of the new Middle High German Grammar (MiGraKo, Klein et al. (2009)), expanding them and adding further texts, to produce a reference corpus for Middle High German, which we will also call “ReM” for short. The combined corpus, which consists of around two million tokens, provides a mostly complete collection of written records from Early Middle High German (1050–1200) as well as a selection of Middle High German texts from 1200 to 1350. Texts have been digitized and annotated with parts of speech and morphology (using the HiTS tagset, cf. Dipper et al. (2013)) as well as lemma information. Release 1.0 of ReM has been published in December 2016 and is also accessible via the ANNIS tool. The project website at https://www.linguistics.ruhr-uni-bochum. de/rem/ offers extensive documentation of the project and the corpus. The corpus","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.31.2016.208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper describes ReM and the results of the ReM project and its predecessors. All projects closely collaborate in developing common annotation standards to allow for diachronic investigations. ReA has already been published and made available via the corpus search tool ANNIS2 (Krause and Zeldes, 2016), while ReF and ReN are still in the annotation process. The ReM project builds on several earlier annotation efforts, such as the corpus of the new Middle High German Grammar (MiGraKo, Klein et al. (2009)), expanding them and adding further texts, to produce a reference corpus for Middle High German, which we will also call “ReM” for short. The combined corpus, which consists of around two million tokens, provides a mostly complete collection of written records from Early Middle High German (1050–1200) as well as a selection of Middle High German texts from 1200 to 1350. Texts have been digitized and annotated with parts of speech and morphology (using the HiTS tagset, cf. Dipper et al. (2013)) as well as lemma information. Release 1.0 of ReM has been published in December 2016 and is also accessible via the ANNIS tool. The project website at https://www.linguistics.ruhr-uni-bochum. de/rem/ offers extensive documentation of the project and the corpus. The corpus