Shakespeare Machine: New AI-Based Technologies for Textual Analysis

IF 0.7 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities Pub Date : 2024-06-04 DOI:10.1093/llc/fqae021

Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes

{"title":"Shakespeare Machine: New AI-Based Technologies for Textual Analysis","authors":"Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes","doi":"10.1093/llc/fqae021","DOIUrl":null,"url":null,"abstract":"\n This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqae021","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.

查看原文本刊更多论文

莎士比亚机器基于人工智能的文本分析新技术

本文展示了一种利用自然语言处理（NLP）工具来帮助分析戏剧文本和类似作品的方法。该方法利用预先训练好的大型语言模型神经网络来收集文本的元数据，以便进行下游统计分析，找出人物对话中值得关注的模式。我们特别关注莎士比亚的作品，以情感和情绪分数的形式收集其戏剧中每一行的元数据。除了由 NLP 模型产生的情感和情绪分数外，我们还直接收集了流派、台词长度和人物性别等元数据。我们展示了如何利用这些元数据来揭示莎士比亚剧中人物的一些有趣模式，而这些模式可能很难通过直接阅读文本来发现。我们利用这些元数据揭示了莎士比亚戏剧中人物性别与该人物对话情感内容之间在统计学上的重要关系，并对体裁进行了控制。我们还在此介绍我们为进行这些分析而编制的公开数据集。该数据集收集了莎士比亚戏剧中的文本以及各种元数据，这些元数据对于分析莎士比亚作品以及其他形式的分析非常有用。这里展示的方法可以扩展到大型 NLP 模型提供的其他各种元数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Scholarship in the Humanities Multiple-

CiteScore

1.80

自引率

25.00%

发文量

期刊介绍： DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.