19-20世纪的文学遗产:用于智力分析的光栅图像分类和手写文本语料库的主题建模

IF 0.1 0 LANGUAGE & LINGUISTICS
Elena N. Penskaya, Lyubov V. Khachaturian
{"title":"19-20世纪的文学遗产:用于智力分析的光栅图像分类和手写文本语料库的主题建模","authors":"Elena N. Penskaya, Lyubov V. Khachaturian","doi":"10.20339/phs.5-23.160","DOIUrl":null,"url":null,"abstract":"The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.","PeriodicalId":40803,"journal":{"name":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","volume":"39 1","pages":"0"},"PeriodicalIF":0.1000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts\",\"authors\":\"Elena N. Penskaya, Lyubov V. Khachaturian\",\"doi\":\"10.20339/phs.5-23.160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.\",\"PeriodicalId\":40803,\"journal\":{\"name\":\"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20339/phs.5-23.160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Filologicheskie Nauki-Nauchnye Doklady Vysshei Shkoly-Philological Sciences-Scientific Essays of Higher Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20339/phs.5-23.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

摘要

文章探讨了目前的趋势与数字形式的手写遗产的历史上的俄罗斯文学史下半叶-20世纪中期的工作。将虚拟档案的形成过程分析为科研“大数据”的逐渐积累——一个包含数以万计数字形式档案文件的栅格文件信息阵列。提出了用于智能分析系统的手写文档光栅图像分类的新方法、档案文档可视化的实验方法以及以前未使用的搜索引擎功能。我们非常关注手稿的架构:从栅格图像的图形元素到语义元素的转换,这允许使用数据挖掘元素来处理未识别的数据数组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Literary heritage of the 19–20 centuries: classification of raster images for intellectual analysis and thematic modeling of the corpus of handwritten texts
The article examines the current trends in working with digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster documents containing tens of thousands of digital forms of archival documents. New approaches to classifying raster images of handwritten documents for use in intelligent analysis systems, experimental methods of visualization of archival documents, as well as previously unused capabilities of the search engine are proposed. Much attention is paid to the architectonics of the manuscript: the transition from graphic elements of a raster image to semantic ones, which allows the use of data mining elements for an unrecognized data array.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
50.00%
发文量
100
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信