Generalized context-dependent graph-theoretic model of folklore and literary texts

Trudy Instituta sistemnogo programmirovaniia RAN Pub Date : 2022-01-01 DOI:10.15514/ispras-2022-34(1)-6

N. Moskin, A. Rogov, R. V. Voronov

{"title":"Generalized context-dependent graph-theoretic model of folklore and literary texts","authors":"N. Moskin, A. Rogov, R. V. Voronov","doi":"10.15514/ispras-2022-34(1)-6","DOIUrl":null,"url":null,"abstract":"One of the problems of automatic text processing is their attribution. This term is understood as the establishment of the attributes of a text work (determination of authorship, time of creation, place of recording, etc.). The article presents a generalized context-dependent graph-theoretic model designed for the analysis of folklore and literary texts. The minimal structural unit of the model (primitive) is a word. Sets of words are combined into vertices, and the same word can be related to different vertices. Edges and graph substructures reflect the lexical, syntactic and semantic links of the text. The characteristics of the model are its fuzziness, hierarchy and temporality. As examples, a hierarchical graph-theoretical model of components (on the example of literary works by A. S. Pushkin), a temporal graph-theoretic model of a fairy tale plot (on the example of Russian fairy tales by A. M. Afanasyev) and a fuzzy graph-theoretic model of «strong» connections of grammatical classes (on the example of anonymous articles from the pre-revolutionary magazines «Time», «Epoch» and the weekly «Citizen», edited by F. M. Dostoevsky). The model is built in such a way that it can be further explored using artificial intelligence methods (for example, decision trees or neural networks). For this purpose, a format for storing such data was implemented in the information system «Folklore», as well as procedures for entering, editing and analyzing texts and their graph-theoretic models.","PeriodicalId":33459,"journal":{"name":"Trudy Instituta sistemnogo programmirovaniia RAN","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trudy Instituta sistemnogo programmirovaniia RAN","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15514/ispras-2022-34(1)-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

One of the problems of automatic text processing is their attribution. This term is understood as the establishment of the attributes of a text work (determination of authorship, time of creation, place of recording, etc.). The article presents a generalized context-dependent graph-theoretic model designed for the analysis of folklore and literary texts. The minimal structural unit of the model (primitive) is a word. Sets of words are combined into vertices, and the same word can be related to different vertices. Edges and graph substructures reflect the lexical, syntactic and semantic links of the text. The characteristics of the model are its fuzziness, hierarchy and temporality. As examples, a hierarchical graph-theoretical model of components (on the example of literary works by A. S. Pushkin), a temporal graph-theoretic model of a fairy tale plot (on the example of Russian fairy tales by A. M. Afanasyev) and a fuzzy graph-theoretic model of «strong» connections of grammatical classes (on the example of anonymous articles from the pre-revolutionary magazines «Time», «Epoch» and the weekly «Citizen», edited by F. M. Dostoevsky). The model is built in such a way that it can be further explored using artificial intelligence methods (for example, decision trees or neural networks). For this purpose, a format for storing such data was implemented in the information system «Folklore», as well as procedures for entering, editing and analyzing texts and their graph-theoretic models.

查看原文本刊更多论文

民俗学与文学文本的广义语境依赖图论模型

自动文本处理的一个问题是它们的归属。这个术语被理解为文本作品属性的确立(作者身份、创作时间、记录地点等的确定)。本文提出了一个基于语境的广义图论模型，用于民俗文学文本的分析。模型的最小结构单元(原语)是一个词。单词集被组合成顶点，同一个单词可以与不同的顶点相关联。边和图的子结构反映了文本的词汇、句法和语义联系。该模型的特点是模糊性、层次性和时效性。例如，成分的分层图论模型(以普希金的文学作品为例)，童话情节的时间图论模型(以阿法纳西耶夫的俄罗斯童话为例)和语法类的“强”联系的模糊图论模型(以陀思妥耶夫斯基编辑的革命前杂志《时代》、《时代》和周刊《公民》的匿名文章为例)。该模型的构建方式可以使用人工智能方法(例如，决策树或神经网络)对其进行进一步探索。为此目的，在«民俗学»信息系统中实现了存储这些数据的格式，以及输入、编辑和分析文本及其图论模型的程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Trudy Instituta sistemnogo programmirovaniia RAN

自引率

0.00%

发文量

审稿时长

4 weeks