Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes
{"title":"莎士比亚机器基于人工智能的文本分析新技术","authors":"Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes","doi":"10.1093/llc/fqae021","DOIUrl":null,"url":null,"abstract":"\n This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Shakespeare Machine: New AI-Based Technologies for Textual Analysis\",\"authors\":\"Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes\",\"doi\":\"10.1093/llc/fqae021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqae021\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqae021","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
Shakespeare Machine: New AI-Based Technologies for Textual Analysis
This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.
期刊介绍:
DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.