19-20世纪的文学遗产:用于智力分析的光栅图像分类和手写文本语料库的主题建模

IF 0.1 0 LANGUAGE & LINGUISTICS

Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education Pub Date : 2023-09-01 DOI:10.20339/phs.5-23.160

Elena N. Penskaya, Lyubov V. Khachaturian

{"title":"19-20世纪的文学遗产:用于智力分析的光栅图像分类和手写文本语料库的主题建模","authors":"Elena N. Penskaya, Lyubov V. Khachaturian","doi":"10.20339/phs.5-23.160","DOIUrl":null,"url":null,"abstract":"The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.","PeriodicalId":40803,"journal":{"name":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","volume":"39 1","pages":"0"},"PeriodicalIF":0.1000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts\",\"authors\":\"Elena N. Penskaya, Lyubov V. Khachaturian\",\"doi\":\"10.20339/phs.5-23.160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.\",\"PeriodicalId\":40803,\"journal\":{\"name\":\"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20339/phs.5-23.160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20339/phs.5-23.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

摘要

文章探讨了目前的趋势与数字形式的手写遗产的历史上的俄罗斯文学史下半叶-20世纪中期的工作。将虚拟档案的形成过程分析为科研“大数据”的逐渐积累——一个包含数以万计数字形式档案文件的栅格文件信息阵列。提出了用于智能分析系统的手写文档光栅图像分类的新方法、档案文档可视化的实验方法以及以前未使用的搜索引擎功能。我们非常关注手稿的架构:从栅格图像的图形元素到语义元素的转换，这允许使用数据挖掘元素来处理未识别的数据数组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts

The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education LANGUAGE & LINGUISTICS-

自引率

50.00%

发文量

100