Modeling and annotating complex data structures

The Shape of Data in the Digital Humanities Pub Date : 2018-11-02 DOI:10.4324/9781315552941-11

Piotr Banski, A. Witt

{"title":"Modeling and annotating complex data structures","authors":"Piotr Banski, A. Witt","doi":"10.4324/9781315552941-11","DOIUrl":null,"url":null,"abstract":"Although it is possible to associate an unlimited number of arbitrary, complex layers of annotations with a text, an image, or an audio/video file, the most common applications almost always follow the classical approach: additional information associated with primary data is expressed in an ordered hierarchy, using a tree structure as its underlying data model. The present contribution offers a brief review of the more popular ways of data structuring and highlights some of the problems that each of them is meant to handle. The first part of the present chapter focuses on the most relevant issues of data modeling for researchers in the humanities and reviews the basic kinds of the relevant data models. The second part addresses ways to capture these abstract models in concrete encoding formats available to digital humanists. We focus here on approaches that use XML, but the models can also be applied more generally. Information and communication are tightly related: communication relies on the exchange of information, but just as the individual information containers are determined by many kinds of variables, organizing these containers into higher level structures is vital for ensuring success in transmitting complete and compact messages. Finding the appropriate level of complexity for the structuring of information is one of the key problems in the field of digital humanities. Simple information packages are quick to set up, process and visualize, but as the individual fields of study develop, more and more information needs to be accommodated within a vertically tight space of electronic documents.1 Packaging of complex information raises new theoretical questions and demands new, more efficient, technological solutions. For the purpose of an introductory example, let us assume that the “information containers” are words, subject to the choice of the natural language but also, on the technological plane, to, for example, the selection of the character encoding, such as ISO 8859-1 (known as “Latin-1”) or Unicode. These words are grouped into larger units: phrases, sentences, or utterances. The structure of these larger units, on the one hand, is dictated by the internal syntactic rules of the given language but, on the other, it is also modeled technologically by the selection of Originally published in: Flanders, Julia/Jannidis, Fotis (Eds.): The shape of data in digital humanities. Modeling texts and text-based resources. London [et al.]: Routledge, 2019. Pp. 217-235. (Digital Research in the Arts and Humanities)","PeriodicalId":200326,"journal":{"name":"The Shape of Data in the Digital Humanities","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Shape of Data in the Digital Humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4324/9781315552941-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Although it is possible to associate an unlimited number of arbitrary, complex layers of annotations with a text, an image, or an audio/video file, the most common applications almost always follow the classical approach: additional information associated with primary data is expressed in an ordered hierarchy, using a tree structure as its underlying data model. The present contribution offers a brief review of the more popular ways of data structuring and highlights some of the problems that each of them is meant to handle. The first part of the present chapter focuses on the most relevant issues of data modeling for researchers in the humanities and reviews the basic kinds of the relevant data models. The second part addresses ways to capture these abstract models in concrete encoding formats available to digital humanists. We focus here on approaches that use XML, but the models can also be applied more generally. Information and communication are tightly related: communication relies on the exchange of information, but just as the individual information containers are determined by many kinds of variables, organizing these containers into higher level structures is vital for ensuring success in transmitting complete and compact messages. Finding the appropriate level of complexity for the structuring of information is one of the key problems in the field of digital humanities. Simple information packages are quick to set up, process and visualize, but as the individual fields of study develop, more and more information needs to be accommodated within a vertically tight space of electronic documents.1 Packaging of complex information raises new theoretical questions and demands new, more efficient, technological solutions. For the purpose of an introductory example, let us assume that the “information containers” are words, subject to the choice of the natural language but also, on the technological plane, to, for example, the selection of the character encoding, such as ISO 8859-1 (known as “Latin-1”) or Unicode. These words are grouped into larger units: phrases, sentences, or utterances. The structure of these larger units, on the one hand, is dictated by the internal syntactic rules of the given language but, on the other, it is also modeled technologically by the selection of Originally published in: Flanders, Julia/Jannidis, Fotis (Eds.): The shape of data in digital humanities. Modeling texts and text-based resources. London [et al.]: Routledge, 2019. Pp. 217-235. (Digital Research in the Arts and Humanities)

查看原文本刊更多论文

对复杂的数据结构进行建模和注释

尽管可以将无限数量的任意复杂注释层与文本、图像或音频/视频文件关联，但最常见的应用程序几乎总是遵循经典方法:使用树结构作为其底层数据模型，以有序的层次结构表示与主要数据关联的附加信息。本文简要回顾了比较流行的数据结构化方法，并重点介绍了每种方法要处理的一些问题。本章的第一部分重点讨论了人文学科研究人员最相关的数据建模问题，并回顾了相关数据模型的基本类型。第二部分介绍了用数字人文主义者可用的具体编码格式捕获这些抽象模型的方法。我们在这里主要讨论使用XML的方法，但是这些模型也可以更普遍地应用。信息和通信紧密相关:通信依赖于信息交换，但正如单个信息容器由多种变量决定一样，将这些容器组织成更高级别的结构对于确保成功传输完整而紧凑的消息至关重要。为信息结构寻找合适的复杂程度是数字人文领域的关键问题之一。简单的信息包可以快速地建立、处理和可视化，但随着各个研究领域的发展，越来越多的信息需要容纳在垂直紧凑的电子文档空间中复杂信息的包装提出了新的理论问题，需要新的、更有效的技术解决方案。出于介绍性示例的目的，让我们假设“信息容器”是单词，受制于自然语言的选择，但在技术层面上，也受制于字符编码的选择，例如ISO 8859-1(称为“Latin-1”)或Unicode。这些词被分成更大的单位:短语、句子或话语。这些较大单位的结构，一方面是由给定语言的内部语法规则决定的，但另一方面，它也通过选择最初出版于:Flanders, Julia/Jannidis, Fotis(主编):数字人文学科中的数据形状来技术上建模。建模文本和基于文本的资源。伦敦[et al.]: Routledge, 2019。217 - 235页。(艺术与人文学科的数字研究)

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Shape of Data in the Digital Humanities

自引率

0.00%

发文量