Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger
{"title":"传输问题吗?一种在文本匹配过程中统一拉丁前缀和文本变体的嵌入式方法","authors":"Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger","doi":"10.1093/llc/fqad069","DOIUrl":null,"url":null,"abstract":"Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"49 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transmission problems? An embedded approach for unification of Latin prefixes and text variants during text matching\",\"authors\":\"Franziska Schropp, Thomas E Konrad, Marie Revellio, Barbara Feichtinger\",\"doi\":\"10.1093/llc/fqad069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqad069\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/llc/fqad069","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
Transmission problems? An embedded approach for unification of Latin prefixes and text variants during text matching
Abstract The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.
期刊介绍:
DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.