俄语现实主义和浪漫主义文学语料库中形容词的数量分析

INFuture2019: Knowledge in the Digital Age Pub Date : 1900-01-01 DOI:10.17234/infuture.2019.3

Lorena Kasunić, Petra Bago

{"title":"俄语现实主义和浪漫主义文学语料库中形容词的数量分析","authors":"Lorena Kasunić, Petra Bago","doi":"10.17234/infuture.2019.3","DOIUrl":null,"url":null,"abstract":"Computational analysis of text is an increasingly important approach used by researchers in the field of digital humanities. A much-debated question is whether computational techniques such as text analysis, which is in fact a quantitative approach, is adequate for analysing literary texts, since literature is considered as a type of artistic expression. In the paper we highlight the importance of the application of computational analysis with a study conducted on a corpus of selected Russian literary texts from the periods of Realism and Romanticism. Texts included in the romantic subcorpus are “Eugene Onegin” by Alexander Pushkin and “A Hero of Our Time” by Mikhail Lermontov. Texts that constitute the realist subcorpus are “Anna Karenina” by Leo Tolstoy and “Crime and Punishment” by Fyodor Dostoevsky. The analyzed texts are translations into the Croatian language. The paper presents current methods and approaches used in computational literature analysis. The focus of this research is the analysis of adjective usage in romantic and realist texts, due to the fact that these two literary periods are based on distinctive poetic principles. The texts were analyzed using the programming language “Python”. Part-of-speech tagging was accomplished with an online tagger for Croatian language. Considering that all texts are historical (because they originate in the 19 or early 20 century) difficulties with POS tagging are expected. Results of the research show more similarites in the usage of adjectives between the subcorpora then expected. The paper points out how quantitative methods “borrowed” from the field of natural language processing and statistics can be significant in drawing conclusions about literature and that numbers can be meaningful if interpreted competently.","PeriodicalId":286092,"journal":{"name":"INFuture2019: Knowledge in the Digital Age","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantitative analysis of adjectives in the Russian literary corpus of realism and romanticism\",\"authors\":\"Lorena Kasunić, Petra Bago\",\"doi\":\"10.17234/infuture.2019.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational analysis of text is an increasingly important approach used by researchers in the field of digital humanities. A much-debated question is whether computational techniques such as text analysis, which is in fact a quantitative approach, is adequate for analysing literary texts, since literature is considered as a type of artistic expression. In the paper we highlight the importance of the application of computational analysis with a study conducted on a corpus of selected Russian literary texts from the periods of Realism and Romanticism. Texts included in the romantic subcorpus are “Eugene Onegin” by Alexander Pushkin and “A Hero of Our Time” by Mikhail Lermontov. Texts that constitute the realist subcorpus are “Anna Karenina” by Leo Tolstoy and “Crime and Punishment” by Fyodor Dostoevsky. The analyzed texts are translations into the Croatian language. The paper presents current methods and approaches used in computational literature analysis. The focus of this research is the analysis of adjective usage in romantic and realist texts, due to the fact that these two literary periods are based on distinctive poetic principles. The texts were analyzed using the programming language “Python”. Part-of-speech tagging was accomplished with an online tagger for Croatian language. Considering that all texts are historical (because they originate in the 19 or early 20 century) difficulties with POS tagging are expected. Results of the research show more similarites in the usage of adjectives between the subcorpora then expected. The paper points out how quantitative methods “borrowed” from the field of natural language processing and statistics can be significant in drawing conclusions about literature and that numbers can be meaningful if interpreted competently.\",\"PeriodicalId\":286092,\"journal\":{\"name\":\"INFuture2019: Knowledge in the Digital Age\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INFuture2019: Knowledge in the Digital Age\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17234/infuture.2019.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INFuture2019: Knowledge in the Digital Age","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17234/infuture.2019.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本的计算分析是数字人文领域研究人员日益使用的一种重要方法。一个备受争议的问题是，像文本分析这样的计算技术，实际上是一种定量方法，是否足以分析文学文本，因为文学被认为是一种艺术表达。在本文中，我们强调了应用计算分析的重要性，并对从现实主义和浪漫主义时期选定的俄罗斯文学文本进行了研究。浪漫主义文本包括亚历山大·普希金的《尤金·奥涅金》和米哈伊尔·莱蒙托夫的《我们这个时代的英雄》。构成现实主义子语料库的文本有列夫·托尔斯泰的《安娜·卡列尼娜》和陀思妥耶夫斯基的《罪与罚》。分析后的文本被翻译成克罗地亚语。本文介绍了目前计算文献分析中使用的方法和途径。由于浪漫主义和现实主义两个文学时期的诗学原则不同，本研究的重点是分析浪漫主义和现实主义文本中的形容词用法。这些文本使用编程语言“Python”进行分析。词性标注是用克罗地亚语的在线标注器完成的。考虑到所有文本都是历史的(因为它们起源于19世纪或20世纪初)，预计在POS标记方面会遇到困难。研究结果表明，两种语料库之间形容词用法的相似性高于预期。这篇论文指出，从自然语言处理和统计学领域“借用”的定量方法在得出关于文学的结论时是多么重要，如果解释得当，数字也可以是有意义的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantitative analysis of adjectives in the Russian literary corpus of realism and romanticism

Computational analysis of text is an increasingly important approach used by researchers in the field of digital humanities. A much-debated question is whether computational techniques such as text analysis, which is in fact a quantitative approach, is adequate for analysing literary texts, since literature is considered as a type of artistic expression. In the paper we highlight the importance of the application of computational analysis with a study conducted on a corpus of selected Russian literary texts from the periods of Realism and Romanticism. Texts included in the romantic subcorpus are “Eugene Onegin” by Alexander Pushkin and “A Hero of Our Time” by Mikhail Lermontov. Texts that constitute the realist subcorpus are “Anna Karenina” by Leo Tolstoy and “Crime and Punishment” by Fyodor Dostoevsky. The analyzed texts are translations into the Croatian language. The paper presents current methods and approaches used in computational literature analysis. The focus of this research is the analysis of adjective usage in romantic and realist texts, due to the fact that these two literary periods are based on distinctive poetic principles. The texts were analyzed using the programming language “Python”. Part-of-speech tagging was accomplished with an online tagger for Croatian language. Considering that all texts are historical (because they originate in the 19 or early 20 century) difficulties with POS tagging are expected. Results of the research show more similarites in the usage of adjectives between the subcorpora then expected. The paper points out how quantitative methods “borrowed” from the field of natural language processing and statistics can be significant in drawing conclusions about literature and that numbers can be meaningful if interpreted competently.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

INFuture2019: Knowledge in the Digital Age

自引率

0.00%

发文量