CapekDraCor：对欧洲可编程戏剧语料库的新贡献

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI:10.2478/jazcas-2023-0042

Petr Porízka

{"title":"CapekDraCor：对欧洲可编程戏剧语料库的新贡献","authors":"Petr Porízka","doi":"10.2478/jazcas-2023-0042","DOIUrl":null,"url":null,"abstract":"Abstract The aim of this paper is to present the new CapekDraCor corpus and the DraCor project with its research-oriented concept of a programmable corpora focused on quantitative analyses within the framework of computational literary studies. This digital platform extends the possibilities of large-scale drama analysis with a focus on the dramatic character(s). The basic operationalisation is the interaction within a dramatic configuration, i.e., the scenic co-presence of two speakers, from which network data are automatically extracted, both global networks of interactions of dramas and data characterising individual actors, i.e., literary characters. The paper demonstrates the CapekDraCor corpus, a new contribution to the extensive DraCor database, and presents the way the data are processed with respect to their specific multi-layered structure. The corpus contains all the plays written by Karel and Josef Čapek and the data are processed in a standardized format based on XML and general TEI guidelines for processing drama with a defined basic drama tagset. CapekDraCor also uses the newly created EZdrama format for data processing, which works as an intermediate step from .txt to .xml file as a lightweight YAML-like markup language. A file in this format can be automatically converted into a DraCor-ready XML file with a TEI header. The advantage of the programmable corpora concept is the possibility to use suitably structured data for drama research outside the DraCor platform and with other methods or tools for textual analysis. Simultaneously, this approach moves the researcher from the technical requirements of the analysis to operationalised computational analysis based on research questions and pre-prepared and flexible tools. DraCor is a unique open infrastructure (both in terms of data and tools) for the analysis of European drama, currently comprising 15 corpora in 10 different languages with a total of about 3,000 plays from a wide range of periods.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"7 1","pages":"244 - 253"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CapekDraCor: A New Contribution to the European Programable Drama Corpora\",\"authors\":\"Petr Porízka\",\"doi\":\"10.2478/jazcas-2023-0042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The aim of this paper is to present the new CapekDraCor corpus and the DraCor project with its research-oriented concept of a programmable corpora focused on quantitative analyses within the framework of computational literary studies. This digital platform extends the possibilities of large-scale drama analysis with a focus on the dramatic character(s). The basic operationalisation is the interaction within a dramatic configuration, i.e., the scenic co-presence of two speakers, from which network data are automatically extracted, both global networks of interactions of dramas and data characterising individual actors, i.e., literary characters. The paper demonstrates the CapekDraCor corpus, a new contribution to the extensive DraCor database, and presents the way the data are processed with respect to their specific multi-layered structure. The corpus contains all the plays written by Karel and Josef Čapek and the data are processed in a standardized format based on XML and general TEI guidelines for processing drama with a defined basic drama tagset. CapekDraCor also uses the newly created EZdrama format for data processing, which works as an intermediate step from .txt to .xml file as a lightweight YAML-like markup language. A file in this format can be automatically converted into a DraCor-ready XML file with a TEI header. The advantage of the programmable corpora concept is the possibility to use suitably structured data for drama research outside the DraCor platform and with other methods or tools for textual analysis. Simultaneously, this approach moves the researcher from the technical requirements of the analysis to operationalised computational analysis based on research questions and pre-prepared and flexible tools. DraCor is a unique open infrastructure (both in terms of data and tools) for the analysis of European drama, currently comprising 15 corpora in 10 different languages with a total of about 3,000 plays from a wide range of periods.\",\"PeriodicalId\":262732,\"journal\":{\"name\":\"Journal of Linguistics/Jazykovedný casopis\",\"volume\":\"7 1\",\"pages\":\"244 - 253\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Linguistics/Jazykovedný casopis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jazcas-2023-0042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Linguistics/Jazykovedný casopis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jazcas-2023-0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要本文旨在介绍新的 CapekDraCor 语料库和 DraCor 项目及其以研究为导向的概念，即在计算文学研究框架内以定量分析为重点的可编程语料库。这一数字平台扩展了以戏剧人物为重点的大规模戏剧分析的可能性。其基本操作方式是戏剧配置中的互动，即两个发言人的场景共存，从中自动提取网络数据，既包括戏剧互动的全球网络，也包括描述单个演员（即文学人物）特征的数据。本文展示了卡佩克-德拉科语料库（CapekDraCor），这是对庞大的德拉科数据库的新贡献，并介绍了根据其特定的多层结构处理数据的方法。该语料库包含卡雷尔-恰佩克（Karel Čapek）和约瑟夫-恰佩克（Josef Čapek）创作的所有剧本，数据处理采用基于 XML 和一般 TEI 准则的标准化格式，用于处理带有定义的基本戏剧标签集的戏剧。CapekDraCor 还使用新创建的 EZdrama 格式进行数据处理，该格式作为从 .txt 到 .xml 文件的中间步骤，是一种类似 YAML 的轻量级标记语言。这种格式的文件可以自动转换为带有 TEI 标头的 DraCor 可用 XML 文件。可编程语料库概念的优势在于，可以在 DraCor 平台之外使用结构适当的数据进行戏剧研究，也可以使用其他方法或工具进行文本分析。同时，这种方法使研究人员从分析的技术要求转向基于研究问题和预先准备的灵活工具的可操作计算分析。DraCor 是一个用于分析欧洲戏剧的独特的开放式基础设施（包括数据和工具），目前包括 10 种不同语言的 15 个语料库，共计约 3,000 部不同时期的戏剧。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CapekDraCor: A New Contribution to the European Programable Drama Corpora

Abstract The aim of this paper is to present the new CapekDraCor corpus and the DraCor project with its research-oriented concept of a programmable corpora focused on quantitative analyses within the framework of computational literary studies. This digital platform extends the possibilities of large-scale drama analysis with a focus on the dramatic character(s). The basic operationalisation is the interaction within a dramatic configuration, i.e., the scenic co-presence of two speakers, from which network data are automatically extracted, both global networks of interactions of dramas and data characterising individual actors, i.e., literary characters. The paper demonstrates the CapekDraCor corpus, a new contribution to the extensive DraCor database, and presents the way the data are processed with respect to their specific multi-layered structure. The corpus contains all the plays written by Karel and Josef Čapek and the data are processed in a standardized format based on XML and general TEI guidelines for processing drama with a defined basic drama tagset. CapekDraCor also uses the newly created EZdrama format for data processing, which works as an intermediate step from .txt to .xml file as a lightweight YAML-like markup language. A file in this format can be automatically converted into a DraCor-ready XML file with a TEI header. The advantage of the programmable corpora concept is the possibility to use suitably structured data for drama research outside the DraCor platform and with other methods or tools for textual analysis. Simultaneously, this approach moves the researcher from the technical requirements of the analysis to operationalised computational analysis based on research questions and pre-prepared and flexible tools. DraCor is a unique open infrastructure (both in terms of data and tools) for the analysis of European drama, currently comprising 15 corpora in 10 different languages with a total of about 3,000 plays from a wide range of periods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Linguistics/Jazykovedný casopis

自引率

0.00%

发文量