{"title":"Lexical diversity as a lens into the classification of Slavic languages: A quantitative typology perspective","authors":"Chenliang Zhou, Haitao Liu","doi":"10.1093/llc/fqad042","DOIUrl":"https://doi.org/10.1093/llc/fqad042","url":null,"abstract":"\u0000 This study proposes a linguistic classification method based on quantitative typology, which leverages a large-scale multilingual parallel corpus to obtain valid language classification result by excluding the influence of covariates such as text genre and semantic content in cross-language comparison. To achieve this, we model the type–token relationships of each Slavic parallel text and calculate the lexical diversity to approximate the morphological complexity of the language. We perform automatic clustering of languages based on these lexical diversity metrics. Our findings show that (1) the lexical diversity metrics can well reflect that the language is located somewhere on the continuum of ‘analytism-synthetism’; (2) the automatic clustering based on these metrics effectively reflects the genealogical classification of Slavic languages; and (3) the geographical distribution of lexical diversity in the region where Slavic languages are spoken shows a monotonic increasing trend from southwest to northeast, which is consistent with the pattern found by previous authors on a global scale. The methodological approach taken in this study is data-driven, with the benefit of being independent of theoretical assumptions and easy for computer processing. This approach can offer a better insight into corpus-based typology and may shed light on the understanding of language as a human-driven complex adaptive system.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49490507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to Digital Humanities: Enhancing Scholarship with the Use of Technology. Kathryn C. Wymer","authors":"Yali Shi","doi":"10.1093/llc/fqad043","DOIUrl":"https://doi.org/10.1093/llc/fqad043","url":null,"abstract":"","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"61620118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Unravelling interlanguage facts via explainable machine learning","authors":"","doi":"10.1093/llc/fqad035","DOIUrl":"https://doi.org/10.1093/llc/fqad035","url":null,"abstract":"","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44502058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Egan, Mark Eisen, Alejandro Ribeiro, Santiago Segarra
{"title":"“I would I had that corporal soundness”: Pervez Rizvi's Analysis of the Word Adjacency Network Method of Authorship Attribution","authors":"G. Egan, Mark Eisen, Alejandro Ribeiro, Santiago Segarra","doi":"10.1093/llc/fqad032","DOIUrl":"https://doi.org/10.1093/llc/fqad032","url":null,"abstract":"\u0000 In his two-part article ‘An Analysis of the Word Adjacency Network Method—Part 1—The evidence of its unsoundness’ and ‘Part 2—A true understanding of the method’ Digital Scholarship in the Humanities, 38: 347-78 (2022), Pervez Rizvi attempts to replicate the Word Adjacency Network (WAN) method for authorship attribution and show that it does not produce the new knowledge that we, its inventors, claim for it. In the present essay, we will show that Rizvi misrepresents fundamental aspects of the WAN method, that his attempted replication fails not because the method is flawed but because he erred in replicating it, and that Rizvi misunderstands key aspects of the mathematics of Information Theory that the method uses.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45784964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Provenance visualization: Tracing people, processes, and practices through a data-driven approach to provenance","authors":"T. Vancisin, Loraine Clarke, M. Orr, Uta Hinrichs","doi":"10.1093/llc/fqad020","DOIUrl":"https://doi.org/10.1093/llc/fqad020","url":null,"abstract":"\u0000 Provenance disclosure—the documentation of an artifact’s origin and how it was produced—is an important aspect to consider when working with historical records which undergo multiple transformations in preparation for and during digitization. Provenance in this context is commonly communicated through explanatory text or static diagrams. However, the methodological and curatorial decisions that have influenced the records’ data are easily overlooked, in particular when exploring the records through visualization as a result of digitization processes. We propose a data-driven approach to provenance disclosure which (1) traces provenance back to when the records were created, (2) documents and categorizes the records’ transformations (transcriptions, content modifications, changes in organization, and representational form), and (3) uses data visualization to disclose provenance in interactive ways. We reflect on how this approach can be practically applied in the context of historical record collections, and we present findings from a qualitative study we conducted to investigate the merits and limitations of provenance-driven visualization. Our findings suggest that data-driven provenance disclosure has the potential to (1) promote transparency and deeper interpretations of historical records, (2) provide rigor in researching historical document collections and underlying production processes, and (3) encourage ethical considerations by making visible labor and implicit bias that influence the production and curation of historical records.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45272916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proverbs as indicators of proficiency for art-generating AI","authors":"Luis J. Tosina Fernández","doi":"10.1093/llc/fqad034","DOIUrl":"https://doi.org/10.1093/llc/fqad034","url":null,"abstract":"\u0000 Art generated by Artificial Intelligence (AI) is currently having great repercussion online. The reason for this is the fact that it allows people without creative talent to produce outstanding works by just typing in the description of what they want to illustrate. However, the appearance of this technology has also caused some discomfort among artists and graphic designers, who see their craft threatened by a service that is available to anyone free of charge. In this article, the capability of some of these platforms to process figurative language will be assessed with the help of five well-known proverbs found in almost identical terms across a number of Western languages. These proverbs were used as the prompts on five of the most popular AI art generators accessible at present. After analyzing the results, our experiment concludes that AI evidences significant deficiencies in the processing of proverbs and, therefore, of figurative language. Consequently, AI does not seem able to substitute human agency completely in artistic creation yet. This exposes an aspect that needs improvement not just for the creative applications of AI but for other applications that it may have in the future. To achieve this, disciplines such as psycholinguistics should be integrated into the teams that develop AI.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47391299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Díaz-Ordóñez, Domingo Savio Rodríguez Baena, Bartolomé Yun-Casalilla
{"title":"A new approach for the construction of historical databases—NoSQL Document-oriented databases: the example of AtlantoCracies","authors":"Manuel Díaz-Ordóñez, Domingo Savio Rodríguez Baena, Bartolomé Yun-Casalilla","doi":"10.1093/llc/fqad033","DOIUrl":"https://doi.org/10.1093/llc/fqad033","url":null,"abstract":"This article proposes, and justifies, the use of the Document-oriented databases as a flexible, easy to use, and powerful digital tool in the field of historical research. First, the reasons that have made relational databases the predominant instrument among historians are studied, while detailing the problems involved in their use. Next, the way in which historians have tried to face these problems by using other digital tools is explained, as well as the limitations that such use entails. Through a case study—that of European aristocratic networks in early modern times—it is shown, however, that Document-oriented databases, present notable advantages and have greater explanatory power for the historian’s work. Thanks to their flexibility, they are better adapted to the often-unpredictable nature of historical sources without diminishing their ease of use or their analytical potential.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43264481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Donig, Markus Eckl, S. Gassner, Malte Rehbein
{"title":"Web archive analytics: Blind spots and silences in distant readings of the archived web","authors":"Simon Donig, Markus Eckl, S. Gassner, Malte Rehbein","doi":"10.1093/llc/fqad014","DOIUrl":"https://doi.org/10.1093/llc/fqad014","url":null,"abstract":"\u0000 In this article, we discuss epistemological and methodological aspects of web archive analytics, a recent development towards more data-centred access to web archives. More specifically, we suggest understanding both the process of archiving and subsequent steps of analysis at scale as acts of observation that can be questioned for their epistemological priori. Therefore, we propose the concepts of ‘blind spots’ (features of the live web not included upon creation in the archive) and ‘silences’ (latent features present in the archive but requiring a particular method to be made articulate). In particular, we address two forms of silences playing a structural role in web archive analytics, crucial to both historians and social scientists alike: abundance (or scale) and time. We trace epistemological implications of web archive analytics across an exemplary case study workflow and suggest methodological answers to the issues raised in this process. On the data extraction side, we introduce warc2corpus (w2c), a new tool for extracting granular, structured data, especially temporal information related to the creation, modification, and publication specifically of webpages. For data analysis, we demonstrate how distant reading techniques—more specifically structural topic modelling (STM)—can contribute to providing a rich, temporally structured representation of textual web archive content that in turn can be subjected to scholarly inquiry, interpretation, and re-contextualization.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46386901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NEAT—Named Entities in Archaeological Texts: A semantic approach to term extraction and classification","authors":"Maria Pia di Buono, Gennaro Nolano, J. Monti","doi":"10.1093/llc/fqad017","DOIUrl":"https://doi.org/10.1093/llc/fqad017","url":null,"abstract":"\u0000 The lack of annotated datasets affects the development of Natural Language Processing applications and heavily impacts the access to textual data, in particular for specific domains and specific languages. In this paper, we propose a methodology to annotate texts concerning domain-specific knowledge, to provide a reliable source of data for the task of Named Entity Recognition (NER) in the domain of archaeology for the Italian laguage. This method integrates syntactic and semantic information from several structured sources to annotate entities’ mentions in unstructured texts. Furthermore, we make use of an ontology to label entities with the specific type they refer to. By using a corpus made up of item descriptions from Europeana’s Archaeology Collection, we first test our proposed methodology on a mock dataset composed of 1,000 texts. After several steps of improvements, we use the final process to create a complete dataset composed of 5,000 descriptions. The resulting dataset, Named Entities in Archaeological Texts has a total of 41,002 spans of texts annotated with their domain-specific entity classification according to the CIDOC Conceptual Reference Model.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44252712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example","authors":"Wenjie Fan, Dongbo Wang, Shuiqing Huang","doi":"10.1093/llc/fqad016","DOIUrl":"https://doi.org/10.1093/llc/fqad016","url":null,"abstract":"\u0000 There exists no sentence boundary in most classical Chinese literature texts. Since it is difficult to read literature of this kind, experts in literature or linguistics would segment the sentence manually. This article explores the effectiveness of classical Chinese sentence segmentation method so as to provide a reference for classical Chinese punctuation. On the basis of the machine learning methods, we chose three components of machine learning, namely models, tagging schemes, and features, to compare the learning results. The models include conditional random field (CRF) models, long short term memory (LSTM) models, BiLSTM–CRF models, and three Bidirectional Encoder Representation from Transformers (BERT) models. There are five tagging schemes in this article and three features including the statistical feature, Guangyun, and Fanqie. Finally, the performance of the combined feature template is evaluated by ten-fold cross-validation on four classical Chinese texts in different genres. The SikuBERT model is proved to be the most effective model for sentence segmentation at present. Different tagging schemes and various features are introduced. The results show that 5-tag-J tagging schemes can improve performance. Statistical feature, as an important clue for classical Chinese sentence segmentation, is useful in related tasks, but Guangyun and Fanqie have little impact. Other important factors of sentence segmentation are genres and writing styles.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43547289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}