A. V. Kan, A. A. Khoroshilov, Y. V. Nikitin, S. A. Stupnikov, I. A. Chechulin, A. A. Khoroshilov
{"title":"Methods of Semantic Analysis of Texts","authors":"A. V. Kan, A. A. Khoroshilov, Y. V. Nikitin, S. A. Stupnikov, I. A. Chechulin, A. A. Khoroshilov","doi":"10.3103/S0005105525701560","DOIUrl":"10.3103/S0005105525701560","url":null,"abstract":"<p>This article examines methods and models of semantic analysis that have been developed based on the concept of phraseological conceptual analysis of texts (PCAT) for solving analytical and synthetic information processing problems using the system of inflectional classes of the Russian language, the principle of linguistic analogy, the centroid-context model (CCM), and a wide range of submodels developed based on the CCM. The proposed methods make it possible to create a suite of software and declarative tools to implement diverse technologies for the processing and analysis of scientific and technical documents at a large multisector information center. This new scientific approach to multilevel semantic analysis expands the range of domestic technologies for processing Russian-language scientific and technical texts.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"37 - 57"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. V. Pitelinsky, A. A. Ryzhov, E. V. Shchepetkova, A. G. Gorshkov
{"title":"Digital Campus Twins: Concepts, Technologies, and Development Prospects","authors":"K. V. Pitelinsky, A. A. Ryzhov, E. V. Shchepetkova, A. G. Gorshkov","doi":"10.3103/S0005105526700032","DOIUrl":"10.3103/S0005105526700032","url":null,"abstract":"<p>This article analyzes and compares types of digital twins (DTs) types, highlighting their advantages and disadvantages. Examples of DT application in construction and urban development that are used to create DT architectures are presented and analyzed. The use of DTs as part of a smart city is examined in detail, using Singapore as an example. Information on the use of DTs on campuses and universities in the Russian Federation is reviewed. The feasibility of using DT technology in combination with mobile robotic systems to improve campus reliability and service quality is substantiated. An approach to implementing a typical DT for a smart campus is described.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"24 - 36"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Socio-Cyber-Physical Systems Models Based on Digital Twins","authors":"A. I. Vodyaho, N. A. Zhukova, V. Ya. Ananeva","doi":"10.3103/S0005105525701559","DOIUrl":"10.3103/S0005105525701559","url":null,"abstract":"<div><p>An approach is considered to the use of digital twins, including human digital twins, as part of socio-cyber-physical systems. The analysis of existing approaches to the construction of digital twins and digital twin systems is carried out. Possible approaches to building digital twin of a human, as well as a collective or group of people, who are presented as elements of socio-cyber-physical system, are analyzed. It is proposed to use a multilevel polymodel as a model. A generalized model of socio-cyber-physical system based on digital twins, an ontological model and a generalized structure of human digital twin are presented. An example of using the developed approach in the university’s educational process using competence models is given.</p></div>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"1 - 10"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Decision Support Algorithm for the Commercialization of Intellectual Property","authors":"S. I. Prudnikov, E. Yu. Dorozhkin","doi":"10.3103/S0005105526700044","DOIUrl":"10.3103/S0005105526700044","url":null,"abstract":"<p>This article presents a decision support algorithm for the commercialization of intellectual property that integrates regression analysis and machine learning methods. The algorithm takes a range of factors into account, including patent features, market indicators, technological trends, and economic conditions. A formalized problem statement with an objective function for minimizing the mean square error is proposed, and the algorithm implementation stages are detailed: from data collection and preprocessing to the construction, validation, and dynamic updating of the predictive model. Particular attention is paid to the implementation of a dynamic assessment mechanism for technological trends to improve the model’s adaptability.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"15 - 23"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing the Quality of Large Language Models in Machine Translation Tasks","authors":"A. V. Mylnikova, L. A. Mylnikov","doi":"10.3103/S0005105526700020","DOIUrl":"10.3103/S0005105526700020","url":null,"abstract":"<p>This article presents a comparative evaluation of machine translation quality across several large language models (LLMs), i.e., DeepSeek, Grok, Mistral, Qwen, GigaChat, and Yandex, based on translations of expressive linguistic means (phraseologisms, homonyms, puns, etc.) and texts of various functional styles. Translation quality is assessed quantitatively using coherence metrics (BLEU, METEOR, and chrF) and qualitatively through expert analysis based on adequacy, equivalence, and harmony criteria against reference translations, with additional comparison to Google Translate. The findings demonstrate that modern LLMs can overcome classical machine translation challenges and represent a new paradigm for developing human–AI hybrid systems.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"58 - 67"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Synthesis Software Architecture for the Chechen Language","authors":"E. S. Izrailova","doi":"10.3103/S0005105526700019","DOIUrl":"10.3103/S0005105526700019","url":null,"abstract":"<p>The paper describes a process of modeling a speech synthesis system based on available acoustic data used for machine learning to obtain a model corresponding to the natural characteristics of speech. Described are the stages in creating a textual and phonetic–acoustic database adapted for training an automatic speech synthesis system. We present the architecture of the resulting speech synthesis software system, consisting of several functional modules. Information is given on the preparation of an experimental training database, the process of the system’s machine learning, setting the parameters of the neural network, and the results of the experiment on training the speech synthesis system. The problem of eliminating graphical homonymy in transcribing Chechen texts and ways to address it are considered.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"68 - 74"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Method for Testing the Hypothesis of the Homogeneity of Distribution Laws of Multidimensional Random Variables Based on Their Nonparametric Estimates and a Classification Algorithm","authors":"A. V. Lapko, V. A. Lapko","doi":"10.3103/S0005105525701572","DOIUrl":"10.3103/S0005105525701572","url":null,"abstract":"<div><p>This article describes a methodology of and algorithmic means of testing the hypothesis of homogeneity of two distribution laws of multidimensional random variables using a pattern recognition algorithm that is based on kernel estimates of probability densities. The introduced classes are characterized according to the domain of the definition of the probability densities under consideration. On this basis, a training sample is formed from the initial statistical data and a pattern recognition algorithm is synthesized that corresponds to the maximum likelihood criterion using kernel estimates of probability densities. The original task of testing the hypothesis is replaced by testing the hypothesis that the pattern recognition error is equal to one half, which is possible when the analyzed distribution laws are close.</p></div>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"60 1","pages":"11 - 14"},"PeriodicalIF":0.5,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. A. Gavrikov, A. K. Usmanov, D. Revaev, S. N. Buzykanov
{"title":"A Tool for the Rapid Diagnostics of Memory in Neural Network Architectures of Language Models","authors":"P. A. Gavrikov, A. K. Usmanov, D. Revaev, S. N. Buzykanov","doi":"10.3103/S0005105525701493","DOIUrl":"10.3103/S0005105525701493","url":null,"abstract":"<p>Large language models (LLMs) have evolved from simple <i>n</i>-gram systems to modern universal architectures; however, a key limitation remains the quadratic complexity of the self-attention mechanism with respect to input sequence length. This significantly increases memory consumption and computational costs, and with the emergence of tasks requiring extremely long contexts, creates the need for new architectural solutions. Since evaluating a proposed architecture typically requires long and expensive full-scale training, it is necessary to develop a tool that allows for a rapid preliminary assessment of a model’s internal memory capacity. This paper presents a method for quantitative evaluation of the internal memory of neural network architectures based on synthetic tests that do not require large data corpora. Internal memory is defined as the amount of information a model can reproduce without direct access to its original inputs. To validate the approach, a software framework was developed and tested on the GPT-2 and Mamba architectures. The experiments employed copy, inversion, and associative retrieval tasks. Comparison of prediction accuracy, error distribution, and computational cost enables a fast assessment of the efficiency and potential of various LLM architectures.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 6","pages":"S513 - S520"},"PeriodicalIF":0.5,"publicationDate":"2026-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. M. Ataeva, N. P. Tuchkova, K. B. Teymurazov, A. Abdyshov, M. G. Kobuk
{"title":"SciLibRu, the Library of Scientific Subject Domains","authors":"O. M. Ataeva, N. P. Tuchkova, K. B. Teymurazov, A. Abdyshov, M. G. Kobuk","doi":"10.3103/S000510552570147X","DOIUrl":"10.3103/S000510552570147X","url":null,"abstract":"<p>The work is devoted to the problem of data integration for representing scientific subject areas based on their semantic description in the SciLibRu digital library. The LibMeta library’s ontology and knowledge graph are used as the data model. SciLibRu is populated by adding data from scientific journals. This paper demonstrates how the stages of processing semistructured scientific publications for their integration into the library’s ontology are implemented. Completion of all stages of the data preprocessing yields a dataset that can be used to train language models for queries in Russian-language scientific subject areas.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 6","pages":"S505 - S512"},"PeriodicalIF":0.5,"publicationDate":"2026-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Archival Handwritten Letter Attribution Using Siamese Neural Networks","authors":"N. M. Pronina","doi":"10.3103/S0005105525701481","DOIUrl":"10.3103/S0005105525701481","url":null,"abstract":"<p>This paper presents a method for the automated attribution of archival handwritten letters based on a Siamese neural network, addressing a key challenge in the digital humanities—the authentication of historical documents. The research is motivated by the mass digitization of 17th- to 19th-century archives, where attribution is often hindered by incomplete or inaccurate metadata about the authors. The method is designed for real-world document collections and accounts for challenges typical of archival materials: poor-quality scans, significant handwriting variation, and substantial class imbalance (from 1 to over 50 samples per author). The use of a Siamese network architecture enables the extraction of discriminative vector representations (embeddings). Based on these embeddings, the method not only classifies documents by known authors but also effectively identifies manuscripts that do not match any known author in the archive. This significantly narrows down the pool of candidates for subsequent expert verification. The study introduces a data preprocessing algorithm and provides a comparative analysis of two approaches to text analysis: at the image fragment level (300×300 px) and at the individual text line level. The developed tool offers archivists and philologists an effective solution for the preliminary sorting and attribution of handwritten documents in large collections.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 6","pages":"S557 - S570"},"PeriodicalIF":0.5,"publicationDate":"2026-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147612880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}