{"title":"Modeling and annotating complex data structures","authors":"Piotr Banski, A. Witt","doi":"10.4324/9781315552941-11","DOIUrl":null,"url":null,"abstract":"Although it is possible to associate an unlimited number of arbitrary, complex layers of annotations with a text, an image, or an audio/video file, the most common applications almost always follow the classical approach: additional information associated with primary data is expressed in an ordered hierarchy, using a tree structure as its underlying data model. The present contribution offers a brief review of the more popular ways of data structuring and highlights some of the problems that each of them is meant to handle. The first part of the present chapter focuses on the most relevant issues of data modeling for researchers in the humanities and reviews the basic kinds of the relevant data models. The second part addresses ways to capture these abstract models in concrete encoding formats available to digital humanists. We focus here on approaches that use XML, but the models can also be applied more generally. Information and communication are tightly related: communication relies on the exchange of information, but just as the individual information containers are determined by many kinds of variables, organizing these containers into higher level structures is vital for ensuring success in transmitting complete and compact messages. Finding the appropriate level of complexity for the structuring of information is one of the key problems in the field of digital humanities. Simple information packages are quick to set up, process and visualize, but as the individual fields of study develop, more and more information needs to be accommodated within a vertically tight space of electronic documents.1 Packaging of complex information raises new theoretical questions and demands new, more efficient, technological solutions. For the purpose of an introductory example, let us assume that the “information containers” are words, subject to the choice of the natural language but also, on the technological plane, to, for example, the selection of the character encoding, such as ISO 8859-1 (known as “Latin-1”) or Unicode. These words are grouped into larger units: phrases, sentences, or utterances. The structure of these larger units, on the one hand, is dictated by the internal syntactic rules of the given language but, on the other, it is also modeled technologically by the selection of Originally published in: Flanders, Julia/Jannidis, Fotis (Eds.): The shape of data in digital humanities. Modeling texts and text-based resources. London [et al.]: Routledge, 2019. Pp. 217-235. (Digital Research in the Arts and Humanities)","PeriodicalId":200326,"journal":{"name":"The Shape of Data in the Digital Humanities","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Shape of Data in the Digital Humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4324/9781315552941-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Although it is possible to associate an unlimited number of arbitrary, complex layers of annotations with a text, an image, or an audio/video file, the most common applications almost always follow the classical approach: additional information associated with primary data is expressed in an ordered hierarchy, using a tree structure as its underlying data model. The present contribution offers a brief review of the more popular ways of data structuring and highlights some of the problems that each of them is meant to handle. The first part of the present chapter focuses on the most relevant issues of data modeling for researchers in the humanities and reviews the basic kinds of the relevant data models. The second part addresses ways to capture these abstract models in concrete encoding formats available to digital humanists. We focus here on approaches that use XML, but the models can also be applied more generally. Information and communication are tightly related: communication relies on the exchange of information, but just as the individual information containers are determined by many kinds of variables, organizing these containers into higher level structures is vital for ensuring success in transmitting complete and compact messages. Finding the appropriate level of complexity for the structuring of information is one of the key problems in the field of digital humanities. Simple information packages are quick to set up, process and visualize, but as the individual fields of study develop, more and more information needs to be accommodated within a vertically tight space of electronic documents.1 Packaging of complex information raises new theoretical questions and demands new, more efficient, technological solutions. For the purpose of an introductory example, let us assume that the “information containers” are words, subject to the choice of the natural language but also, on the technological plane, to, for example, the selection of the character encoding, such as ISO 8859-1 (known as “Latin-1”) or Unicode. These words are grouped into larger units: phrases, sentences, or utterances. The structure of these larger units, on the one hand, is dictated by the internal syntactic rules of the given language but, on the other, it is also modeled technologically by the selection of Originally published in: Flanders, Julia/Jannidis, Fotis (Eds.): The shape of data in digital humanities. Modeling texts and text-based resources. London [et al.]: Routledge, 2019. Pp. 217-235. (Digital Research in the Arts and Humanities)