传输问题吗?一种在文本匹配过程中统一拉丁前缀和文本变体的嵌入式方法

IF 0.7 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities Pub Date : 2023-10-11 DOI:10.1093/llc/fqad069

Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger

{"title":"传输问题吗?一种在文本匹配过程中统一拉丁前缀和文本变体的嵌入式方法","authors":"Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger","doi":"10.1093/llc/fqad069","DOIUrl":null,"url":null,"abstract":"Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"49 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transmission problems? An embedded approach for unification of Latin prefixes and text variants during text matching\",\"authors\":\"Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger\",\"doi\":\"10.1093/llc/fqad069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqad069\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/llc/fqad069","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

前现代文本的手稿传统给数字人文领域的学者们提出了一个具体的问题:在印刷术使文本的标准化版本的生产成为可能之前，手工复制文本(通常是由不同的人)本质上是一个容易出错的过程，这不仅导致了措辞的差异，而且导致了多种传播变体的拼写差异。这尤其适用于古代文本，其中与原型的时间距离往往相当大。在计算机化的研究中，特别是在引文研究和文本挖掘领域的文本匹配中，这些措辞和拼写上的差异——无论多么小——可能会阻碍文本的成功匹配。本案例研究提出了一种解决方案，用于解决由拉丁语(非)同化前缀引起的文本差异问题，这是一个现代版本主要因作者而异的特点，但有时甚至在同一文本的两个版本之间也是如此。对于教父杰罗姆的书信，以及维吉尔的《牧歌》、《圣歌》和《埃涅阿斯纪》，从错误率和效率的角度比较了两种方法:(1)执行和(2)逆转语料库范围内的同化。此外，本文还讨论了数字版本中文本关键数据(非)可访问性的更广泛含义。最后，对经典文本计算机化研究中文本关键数据的一般需求进行了阐述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Transmission problems? An embedded approach for unification of Latin prefixes and text variants during text matching

Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Scholarship in the Humanities Multiple-

CiteScore

1.80

自引率

25.00%

发文量

期刊介绍： DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.