Digital Scholarship in the Humanities最新文献_第9页

“I would I had that corporal soundness”: Pervez Rizvi's Analysis of the Word Adjacency Network Method of Authorship Attribution “I would I had that body sound”:Pervez Rizvi对作者归属词邻接网络方法的分析

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-28 DOI: 10.1093/llc/fqad032

G. Egan, Mark Eisen, Alejandro Ribeiro, Santiago Segarra

引用次数: 0

Provenance visualization: Tracing people, processes, and practices through a data-driven approach to provenance 来源可视化:通过数据驱动的来源方法跟踪人员、过程和实践

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-24 DOI: 10.1093/llc/fqad020

T. Vancisin, Loraine Clarke, M. Orr, Uta Hinrichs

{"title":"Provenance visualization: Tracing people, processes, and practices through a data-driven approach to provenance","authors":"T. Vancisin, Loraine Clarke, M. Orr, Uta Hinrichs","doi":"10.1093/llc/fqad020","DOIUrl":"https://doi.org/10.1093/llc/fqad020","url":null,"abstract":"\u0000 Provenance disclosure—the documentation of an artifact’s origin and how it was produced—is an important aspect to consider when working with historical records which undergo multiple transformations in preparation for and during digitization. Provenance in this context is commonly communicated through explanatory text or static diagrams. However, the methodological and curatorial decisions that have influenced the records’ data are easily overlooked, in particular when exploring the records through visualization as a result of digitization processes. We propose a data-driven approach to provenance disclosure which (1) traces provenance back to when the records were created, (2) documents and categorizes the records’ transformations (transcriptions, content modifications, changes in organization, and representational form), and (3) uses data visualization to disclose provenance in interactive ways. We reflect on how this approach can be practically applied in the context of historical record collections, and we present findings from a qualitative study we conducted to investigate the merits and limitations of provenance-driven visualization. Our findings suggest that data-driven provenance disclosure has the potential to (1) promote transparency and deeper interpretations of historical records, (2) provide rigor in researching historical document collections and underlying production processes, and (3) encourage ethical considerations by making visible labor and implicit bias that influence the production and curation of historical records.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45272916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proverbs as indicators of proficiency for art-generating AI 谚语作为人工智能艺术生成能力的指标

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-22 DOI: 10.1093/llc/fqad034

Luis J. Tosina Fernández

{"title":"Proverbs as indicators of proficiency for art-generating AI","authors":"Luis J. Tosina Fernández","doi":"10.1093/llc/fqad034","DOIUrl":"https://doi.org/10.1093/llc/fqad034","url":null,"abstract":"\u0000 Art generated by Artificial Intelligence (AI) is currently having great repercussion online. The reason for this is the fact that it allows people without creative talent to produce outstanding works by just typing in the description of what they want to illustrate. However, the appearance of this technology has also caused some discomfort among artists and graphic designers, who see their craft threatened by a service that is available to anyone free of charge. In this article, the capability of some of these platforms to process figurative language will be assessed with the help of five well-known proverbs found in almost identical terms across a number of Western languages. These proverbs were used as the prompts on five of the most popular AI art generators accessible at present. After analyzing the results, our experiment concludes that AI evidences significant deficiencies in the processing of proverbs and, therefore, of figurative language. Consequently, AI does not seem able to substitute human agency completely in artistic creation yet. This exposes an aspect that needs improvement not just for the creative applications of AI but for other applications that it may have in the future. To achieve this, disciplines such as psycholinguistics should be integrated into the teams that develop AI.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47391299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new approach for the construction of historical databases—NoSQL Document-oriented databases: the example of AtlantoCracies 构建历史数据库的一种新方法——nosql面向文档的数据库:以atlantocracy为例

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-22 DOI: 10.1093/llc/fqad033

Manuel Díaz-Ordóñez, Domingo Savio Rodríguez Baena, Bartolomé Yun-Casalilla

引用次数: 0

Web archive analytics: Blind spots and silences in distant readings of the archived web 网络档案分析:对存档网络的远距离阅读中的盲点和沉默

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-19 DOI: 10.1093/llc/fqad014

Simon Donig, Markus Eckl, S. Gassner, Malte Rehbein

{"title":"Web archive analytics: Blind spots and silences in distant readings of the archived web","authors":"Simon Donig, Markus Eckl, S. Gassner, Malte Rehbein","doi":"10.1093/llc/fqad014","DOIUrl":"https://doi.org/10.1093/llc/fqad014","url":null,"abstract":"\u0000 In this article, we discuss epistemological and methodological aspects of web archive analytics, a recent development towards more data-centred access to web archives. More specifically, we suggest understanding both the process of archiving and subsequent steps of analysis at scale as acts of observation that can be questioned for their epistemological priori. Therefore, we propose the concepts of ‘blind spots’ (features of the live web not included upon creation in the archive) and ‘silences’ (latent features present in the archive but requiring a particular method to be made articulate). In particular, we address two forms of silences playing a structural role in web archive analytics, crucial to both historians and social scientists alike: abundance (or scale) and time. We trace epistemological implications of web archive analytics across an exemplary case study workflow and suggest methodological answers to the issues raised in this process. On the data extraction side, we introduce warc2corpus (w2c), a new tool for extracting granular, structured data, especially temporal information related to the creation, modification, and publication specifically of webpages. For data analysis, we demonstrate how distant reading techniques—more specifically structural topic modelling (STM)—can contribute to providing a rich, temporally structured representation of textual web archive content that in turn can be subjected to scholarly inquiry, interpretation, and re-contextualization.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46386901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NEAT—Named Entities in Archaeological Texts: A semantic approach to term extraction and classification NEAT——考古文本中的命名实体：术语提取和分类的语义方法

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-13 DOI: 10.1093/llc/fqad017

Maria Pia di Buono, Gennaro Nolano, J. Monti

{"title":"NEAT—Named Entities in Archaeological Texts: A semantic approach to term extraction and classification","authors":"Maria Pia di Buono, Gennaro Nolano, J. Monti","doi":"10.1093/llc/fqad017","DOIUrl":"https://doi.org/10.1093/llc/fqad017","url":null,"abstract":"\u0000 The lack of annotated datasets affects the development of Natural Language Processing applications and heavily impacts the access to textual data, in particular for specific domains and specific languages. In this paper, we propose a methodology to annotate texts concerning domain-specific knowledge, to provide a reliable source of data for the task of Named Entity Recognition (NER) in the domain of archaeology for the Italian laguage. This method integrates syntactic and semantic information from several structured sources to annotate entities’ mentions in unstructured texts. Furthermore, we make use of an ontology to label entities with the specific type they refer to. By using a corpus made up of item descriptions from Europeana’s Archaeology Collection, we first test our proposed methodology on a mock dataset composed of 1,000 texts. After several steps of improvements, we use the final process to create a complete dataset composed of 5,000 descriptions. The resulting dataset, Named Entities in Archaeological Texts has a total of 41,002 spans of texts annotated with their domain-specific entity classification according to the CIDOC Conceptual Reference Model.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44252712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example 文言文的自动分词——以《春秋》为例

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-12 DOI: 10.1093/llc/fqad016

Wenjie Fan, Dongbo Wang, Shuiqing Huang

{"title":"Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example","authors":"Wenjie Fan, Dongbo Wang, Shuiqing Huang","doi":"10.1093/llc/fqad016","DOIUrl":"https://doi.org/10.1093/llc/fqad016","url":null,"abstract":"\u0000 There exists no sentence boundary in most classical Chinese literature texts. Since it is difficult to read literature of this kind, experts in literature or linguistics would segment the sentence manually. This article explores the effectiveness of classical Chinese sentence segmentation method so as to provide a reference for classical Chinese punctuation. On the basis of the machine learning methods, we chose three components of machine learning, namely models, tagging schemes, and features, to compare the learning results. The models include conditional random field (CRF) models, long short term memory (LSTM) models, BiLSTM–CRF models, and three Bidirectional Encoder Representation from Transformers (BERT) models. There are five tagging schemes in this article and three features including the statistical feature, Guangyun, and Fanqie. Finally, the performance of the combined feature template is evaluated by ten-fold cross-validation on four classical Chinese texts in different genres. The SikuBERT model is proved to be the most effective model for sentence segmentation at present. Different tagging schemes and various features are introduced. The results show that 5-tag-J tagging schemes can improve performance. Statistical feature, as an important clue for classical Chinese sentence segmentation, is useful in related tasks, but Guangyun and Fanqie have little impact. Other important factors of sentence segmentation are genres and writing styles.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43547289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unravelling interlanguage facts via explainable machine learning 通过可解释的机器学习揭示中介语言事实

3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-10 DOI: 10.1093/llc/fqad019

Barbara Berti, Andrea Esuli, Fabrizio Sebastiani

{"title":"Unravelling interlanguage facts via explainable machine learning","authors":"Barbara Berti, Andrea Esuli, Fabrizio Sebastiani","doi":"10.1093/llc/fqad019","DOIUrl":"https://doi.org/10.1093/llc/fqad019","url":null,"abstract":"Abstract Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the performance of NLI systems has steadily improved over the years. We focus on a different facet of the NLI task, i.e. that of analysing the internals of an NLI classifier trained by an explainable machine learning (EML) algorithm, in order to obtain explanations of its classification decisions, with the ultimate goal of gaining insight into which linguistic phenomena ‘give a speaker’s native language away’. We use this perspective in order to tackle both NLI and a (much less researched) companion task, i.e. guessing whether a text has been written by a native or a non-native speaker. Using three datasets of different provenance (two datasets of English learners’ essays and a dataset of social media posts), we investigate which kind of linguistic traits (lexical, morphological, syntactic, and statistical) are most effective for solving our two tasks, namely, are most indicative of a speaker’s L1; our experiments indicate that the most discriminative features are the lexical ones, followed by the morphological, syntactic, and statistical features, in this order. We also present two case studies, one on Italian and one on Spanish learners of English, in which we analyse individual linguistic traits that the classifiers have singled out as most important for spotting these L1s; we show that the traits identified as most discriminative well align with our intuition, i.e. represent typical patterns of language misuse, underuse, or overuse, by speakers of the given L1. Overall, our study shows that the use of EML can be a valuable tool for the scholar who investigates interlanguage facts and language transfer.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"829 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135593412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hacking stylometry with multiple voices: Imaginary writers can override authorial signal in Delta 用多种声音破解文体学:想象中的作家可以在Delta中覆盖作者信号

3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-08 DOI: 10.1093/llc/fqad012

Daniil Skorinkin, Boris Orekhov

{"title":"Hacking stylometry with multiple voices: Imaginary writers can override authorial signal in Delta","authors":"Daniil Skorinkin, Boris Orekhov","doi":"10.1093/llc/fqad012","DOIUrl":"https://doi.org/10.1093/llc/fqad012","url":null,"abstract":"Abstract It is a basic assumption of stylometry that texts written by the same person show greater stylometric similarity even if published under multiple pennames. Statistical authorship attribution strongly relies on the ability of Burrows’s Delta and its variants to cluster one author together regardless of pseudonyms. At the same time, the very first computational discoveries by the founder of modern stylometry showed that a single author is capable of producing multiple voices (Burrows, 1987, Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press). We investigate two authors whose stylistically autonomous pennames seem to deceive Delta and override authorial signals: a Portuguese poet Fernando Pessoa and a French novelist Romain Gary. Pessoa managed to create at least three pennames (the author himself used the term ‘heteronym’) who exhibit all traits of individual human beings from the stylometric point of view. Gary’s alter ego Emile Ajar, who was an intentional literary mystification, also demonstrates traits of stylometric autonomy. At the same time, other pseudonyms used by Gary lack that autonomy completely. Our investigation shows that there appears to be a continuum between a purely formal use of a penname, which brings almost no distinction from the real name of an author, and a strong literary sub-personality such as those created by Pessoa.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135648140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sagas and genre: A case for application of network analysis to manuscripts preserving Old Norse-Icelandic saga literature 传奇与类型：网络分析在保存古挪威冰岛传奇文学手稿中的应用

IF 0.8 3区文学

Digital Scholarship in the Humanities Pub Date : 2023-04-07 DOI: 10.1093/llc/fqad013

K. Kapitan, Tarrin Wills

{"title":"Sagas and genre: A case for application of network analysis to manuscripts preserving Old Norse-Icelandic saga literature","authors":"K. Kapitan, Tarrin Wills","doi":"10.1093/llc/fqad013","DOIUrl":"https://doi.org/10.1093/llc/fqad013","url":null,"abstract":"\u0000 This study applies statistical approaches to the analysis of the genre relationships of Old Norse-Icelandic literature in order to expand our understanding of the relationships between works, their transmission, and their possible modes of reception, as manifested in the extant manuscripts. This article contributes to the ongoing discussion of the genre boundaries of Old Norse-Icelandic literature and presents an alternative method of engaging with this material in the form of computer-assisted analysis, i.e. data visualization and network analysis. Using data collected from major online databases of Old Norse-Icelandic manuscripts, we present the most complete to date network of co-occurrences in manuscripts of works belonging to a number of literary genres. The present study empirically demonstrates the manifoldness of the connections between the Old Norse-Icelandic works which transcend traditional scholarly genre boundaries. The study identifies two main communities within the network: a community of romances, or works of narrative fiction, which includes mainly legendary sagas (fornaldarsögur) and chivalric sagas (riddarasögur), and a community of historicizing narratives, or pseudo-history, which includes mainly sagas of Icelanders (Íslendingasögur) and kings’ sagas (konungasögur).","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44337502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0