莎士比亚机器基于人工智能的文本分析新技术

IF 0.7 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY
Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes
{"title":"莎士比亚机器基于人工智能的文本分析新技术","authors":"Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes","doi":"10.1093/llc/fqae021","DOIUrl":null,"url":null,"abstract":"\n This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Shakespeare Machine: New AI-Based Technologies for Textual Analysis\",\"authors\":\"Carl Ehrett, Lucian Ghita, Dillon Ranwala, Alison Menezes\",\"doi\":\"10.1093/llc/fqae021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqae021\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqae021","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

本文展示了一种利用自然语言处理(NLP)工具来帮助分析戏剧文本和类似作品的方法。该方法利用预先训练好的大型语言模型神经网络来收集文本的元数据,以便进行下游统计分析,找出人物对话中值得关注的模式。我们特别关注莎士比亚的作品,以情感和情绪分数的形式收集其戏剧中每一行的元数据。除了由 NLP 模型产生的情感和情绪分数外,我们还直接收集了流派、台词长度和人物性别等元数据。我们展示了如何利用这些元数据来揭示莎士比亚剧中人物的一些有趣模式,而这些模式可能很难通过直接阅读文本来发现。我们利用这些元数据揭示了莎士比亚戏剧中人物性别与该人物对话情感内容之间在统计学上的重要关系,并对体裁进行了控制。我们还在此介绍我们为进行这些分析而编制的公开数据集。该数据集收集了莎士比亚戏剧中的文本以及各种元数据,这些元数据对于分析莎士比亚作品以及其他形式的分析非常有用。这里展示的方法可以扩展到大型 NLP 模型提供的其他各种元数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Shakespeare Machine: New AI-Based Technologies for Textual Analysis
This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.80
自引率
25.00%
发文量
78
期刊介绍: DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信