{"title":"从中世纪手稿的电子出版到大数据,或者人工智能对斯拉夫书籍起源的了解","authors":"Victor Baranov","doi":"10.59076/2603-2899.2023.3.10","DOIUrl":null,"url":null,"abstract":"The article describes the preparation of machine-readable linguistic resources based on medieval Slavic written monuments, as well as their use in systems for automated and automatic processing of large text data. The history of this area of applied Paleoslavistics is briefly shown on the example of several projects for the creation of electronic publications, collections and corpora of Slavic manuscripts. Particular attention is paid to the stages of development and the material of the Manuscript historical corpus (mansucripts.ru), which contains marked-up transliterations of Glagolitic and transcriptions of Cyrillic manuscripts of the 10th–15th centuries, as well as specialized tools for processing, demonstrating and analyzing non-standard graphic and spelling features and structure of texts. The labor-intensive and complex process of preparing copies of manuscripts and marking them up, unfortunately, is still the only way to convert a graphic image into a machine-readable form. It is noted that the tagged collections created on the basis of Slavic manuscripts make it possible to use the latter both for creating models for recognizing manuscripts in existing HTR systems and for developing new specialized tools for recognizing and analyzing Slavic manuscript heritage.","PeriodicalId":52013,"journal":{"name":"Palaeobulgarica-Starobalgaristika","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Electronic Publication of a Medieval Manuscript to Big Data, or What Artificial Intelligence Knows about the Beginning of Slavic Books\",\"authors\":\"Victor Baranov\",\"doi\":\"10.59076/2603-2899.2023.3.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article describes the preparation of machine-readable linguistic resources based on medieval Slavic written monuments, as well as their use in systems for automated and automatic processing of large text data. The history of this area of applied Paleoslavistics is briefly shown on the example of several projects for the creation of electronic publications, collections and corpora of Slavic manuscripts. Particular attention is paid to the stages of development and the material of the Manuscript historical corpus (mansucripts.ru), which contains marked-up transliterations of Glagolitic and transcriptions of Cyrillic manuscripts of the 10th–15th centuries, as well as specialized tools for processing, demonstrating and analyzing non-standard graphic and spelling features and structure of texts. The labor-intensive and complex process of preparing copies of manuscripts and marking them up, unfortunately, is still the only way to convert a graphic image into a machine-readable form. It is noted that the tagged collections created on the basis of Slavic manuscripts make it possible to use the latter both for creating models for recognizing manuscripts in existing HTR systems and for developing new specialized tools for recognizing and analyzing Slavic manuscript heritage.\",\"PeriodicalId\":52013,\"journal\":{\"name\":\"Palaeobulgarica-Starobalgaristika\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.2000,\"publicationDate\":\"2023-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Palaeobulgarica-Starobalgaristika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59076/2603-2899.2023.3.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Palaeobulgarica-Starobalgaristika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59076/2603-2899.2023.3.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
From Electronic Publication of a Medieval Manuscript to Big Data, or What Artificial Intelligence Knows about the Beginning of Slavic Books
The article describes the preparation of machine-readable linguistic resources based on medieval Slavic written monuments, as well as their use in systems for automated and automatic processing of large text data. The history of this area of applied Paleoslavistics is briefly shown on the example of several projects for the creation of electronic publications, collections and corpora of Slavic manuscripts. Particular attention is paid to the stages of development and the material of the Manuscript historical corpus (mansucripts.ru), which contains marked-up transliterations of Glagolitic and transcriptions of Cyrillic manuscripts of the 10th–15th centuries, as well as specialized tools for processing, demonstrating and analyzing non-standard graphic and spelling features and structure of texts. The labor-intensive and complex process of preparing copies of manuscripts and marking them up, unfortunately, is still the only way to convert a graphic image into a machine-readable form. It is noted that the tagged collections created on the basis of Slavic manuscripts make it possible to use the latter both for creating models for recognizing manuscripts in existing HTR systems and for developing new specialized tools for recognizing and analyzing Slavic manuscript heritage.