Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts
{"title":"Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts","authors":"Elena N. Penskaya, Lyubov V. Khachaturian","doi":"10.20339/phs.5-23.160","DOIUrl":null,"url":null,"abstract":"The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.","PeriodicalId":40803,"journal":{"name":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","volume":"39 1","pages":"0"},"PeriodicalIF":0.1000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20339/phs.5-23.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.