{"title":"Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment","authors":"E. M. Arévalo, P. Petré","doi":"10.1145/3078081.3078089","DOIUrl":null,"url":null,"abstract":"Current research in Corpus Linguistics and related disciplines within the multi-disciplinary field of Digital Humanities, involves computer-aided manual processing of large text corpora. Typically, corpus instances are retrieved with the help of concordancers and textual search engines and subsequently labeled by hand before being submitted to quantitative analysis. While well-established software solutions already exist for corpus data retrieval, less attention has been paid to the annotation process in terms of both software facilities and best practices, especially in the context of collaborative research. However, with the increase in size and scope of research projects we envisage new needs for synchronizing interdependent annotations by different researchers. Current ad-hoc solutions to collaborative corpus analysis and annotation typically involve general-purpose Real-Time Editing (RTE) and cloud storage software, whose functionality is arguably sub-optimal for research purposes. In the present paper we discuss potential problems related to synchronizing annotations in large-scale projects, as well as the potential benefits that can be derived from a dedicated approach to annotation data management. As a proof of concept, we showcase our current solution in the form of Cosycat (Collaborative Synchronized Corpus Annotation Tool), a collaborative asynchronous application that has grown out of a Historical Linguistics research project involving several parallel studies and multiple researchers.","PeriodicalId":293200,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078081.3078089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Current research in Corpus Linguistics and related disciplines within the multi-disciplinary field of Digital Humanities, involves computer-aided manual processing of large text corpora. Typically, corpus instances are retrieved with the help of concordancers and textual search engines and subsequently labeled by hand before being submitted to quantitative analysis. While well-established software solutions already exist for corpus data retrieval, less attention has been paid to the annotation process in terms of both software facilities and best practices, especially in the context of collaborative research. However, with the increase in size and scope of research projects we envisage new needs for synchronizing interdependent annotations by different researchers. Current ad-hoc solutions to collaborative corpus analysis and annotation typically involve general-purpose Real-Time Editing (RTE) and cloud storage software, whose functionality is arguably sub-optimal for research purposes. In the present paper we discuss potential problems related to synchronizing annotations in large-scale projects, as well as the potential benefits that can be derived from a dedicated approach to annotation data management. As a proof of concept, we showcase our current solution in the form of Cosycat (Collaborative Synchronized Corpus Annotation Tool), a collaborative asynchronous application that has grown out of a Historical Linguistics research project involving several parallel studies and multiple researchers.