{"title":"Abstractive Summarization for Trade News Analysis Based on a New Domain-Specific Dataset","authors":"D. A. Liutova, V. A. Malykh","doi":"10.3103/S0005105525701377","DOIUrl":"10.3103/S0005105525701377","url":null,"abstract":"<p>We present TradeNewsSum—a corpus for abstractive summarization of international trade news—covering Russian- and English-language publications from domain-specific sources. All summaries are manually prepared following unified guidelines. We conducted experiments with fine-tuning transformer and seq2seq models and performed automatic evaluation using the LLM-as-a-judge scheme. LLaMA 3.1 in instruction-prompting mode achieved the best results, showing high scores across metrics, including factual completeness.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S430 - S436"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Y. Rogov, D. E. Indenbom, D. S. Korzh, D. V. Pugacheva, V. A. Voronov, E. V. Tutubalina
{"title":"Hiding in Meaning: Semantic Encoding for Generative Text Steganography","authors":"O. Y. Rogov, D. E. Indenbom, D. S. Korzh, D. V. Pugacheva, V. A. Voronov, E. V. Tutubalina","doi":"10.3103/S0005105525701390","DOIUrl":"10.3103/S0005105525701390","url":null,"abstract":"<p>We propose a novel framework for steganographic text generation that hides binary messages within semantically coherent natural language using latent-space conditioning of large language models (LLMs). Secret messages are first encoded into continuous vectors via a learned binary-to-latent mapping, which is used to guide text generation through prefix tuning. Unlike prior token-level or syntactic steganography, our method avoids explicit word manipulation and instead operates entirely within the latent semantic space, enabling more fluent and less detectable outputs. On the receiver side, the latent representation is recovered from the generated text and decoded back into the original message. As a key theoretical contribution, we provide a robustness guarantee: if the recovered latent vector lies within a bounded distance of the original, exact message recovery is ensured, with the bound determined by the decoder’s Lipschitz continuity and the minimum logit margin. This formal result offers a principled view of the reliability–capacity trade-off in latent steganographic systems. Empirical evaluation on both synthetic data and real-world domains such as Amazon reviews shows that our method achieves high message recovery accuracy (above 91%), strong text fluency and competitive capacity up to 6 bits per sentence element while maintaining resilience against neural steganalysis. These findings demonstrate that latent conditioned generation offers a secure and practical pathway for embedding information in modern LLMs.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S447 - S452"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Modeling of the Thematic Field of Studying Cultural Congruence in a Psychological Context","authors":"A. M. Ganieva","doi":"10.3103/S000510552570133X","DOIUrl":"10.3103/S000510552570133X","url":null,"abstract":"<p>The aim of this work is to identify key topics in modern psychological research of cultural congruence using the method of thematic digital modeling of an array of scientific publications. The modernity and significance of the conducted research lie in the growing importance of cultural congruence in the context of the digital transformation of society, which is changing the ways of socialization and interaction. Modern technologies require rethinking the psychological mechanisms of individual adaptation to the cultural environment, especially in childhood and adolescence. Despite the active study of this phenomenon, there is a noticeable shortage of research on the cultural congruence of adults. The use of digital modeling and artificial intelligence allows us to systematize knowledge and identify the structure of the thematic field with high accuracy. The obtained data opens up the prospect for further study of cultural congruence throughout the entire life cycle. The thematic field review of cultural congruence research was conducted based on an analysis of digital archives comprising a curated collection of 112 scholarly publications on the topic. This review employed a topic modeling algorithm implemented in the Python programming language and leveraged digital platforms incorporating multimodal neural network–based tools (GigaChat, Qwen, and DeepSeek). The data analysis yielded four distinct age groups that reflect the developmental specificity of cultural congruence manifestations: preschoolers, primary school–age children, adolescents, and adults.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S405 - S409"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Normalization of Text Recognized by Optical Character Recognition Using Lightweight LLMs","authors":"V. K. Vershinin, I. V. Khodnenko, S. V. Ivanov","doi":"10.3103/S0005105525701328","DOIUrl":"10.3103/S0005105525701328","url":null,"abstract":"<p>Despite recent progress, Optical Character Recognition (OCR) on historical newspapers still leaves 5–10% character errors. We present a fully automated post-OCR normalization pipeline that combines lightweight 7–8B instruction-tuned LLMs quantized to 4-bit (INT4) with a small set of regex rules. On the BLN600 benchmark (600 pages of 19th-century British newspapers), our best model YandexGPT-5-Instruct Q4 reduces the Character Error Rate (CER) from 8.4 to 4.0% (–52.5%) and the Word Error Rate (WER) from 20.2 to 6.5% (–67.8%), while raising semantic similarity to 0.962. The system runs on consumer hardware (RTX-4060 Ti and 8 GB VRAM) at about 35 s per page and requires no fine-tuning or parallel training data. These results indicate that compact INT4 LLMs are a practical alternative to large checkpoints for post-OCR cleanup of historical documents.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S397 - S404"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring Uncertainty in Transformer Circuits with Effective Information Consistency","authors":"A. A. Krasnovsky","doi":"10.3103/S0005105525701365","DOIUrl":"10.3103/S0005105525701365","url":null,"abstract":"<p>Mechanistic interpretability has identified functional subgraphs within large language models (LLMs), known as transformer circuits (TCs), that appear to implement specific algorithms. Yet we lack a formal, single-pass way to quantify when an active circuit is behaving coherently and thus likely trustworthy. Building on the author’s prior sheaf-theoretic formulation of causal emergence (Krasnovsky, 2025) we specialize it to transformer circuits and introduce the single-pass, dimensionless effective-information consistency score (EICS). EICS combines a normalized sheaf inconsistency computed from local Jacobians and activations with a Gaussian EI proxy for circuit-level causal emergence derived from the same forward state. The construction is white-box, single-pass, and makes units explicit so that the score is dimensionless. We further provide practical guidance on score interpretation, computational overhead (with fast and exact modes), and a toy sanity-check analysis.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S423 - S429"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a Dynamic Expert System for Analyzing the Impact of Climate Effects on Small and Medium-Sized Enterprises","authors":"R. A. Burnashev, Y. V. Sergeev","doi":"10.3103/S0005105525701316","DOIUrl":"10.3103/S0005105525701316","url":null,"abstract":"<p>Growing climate instability is creating new challenges and risks for the resilience of small and medium-sized enterprises (SMEs). This article proposes a prototype architecture for a dynamic expert system comprising several key modules: a user interface, a knowledge base, a server application, and a dynamic data update module with real-time APIs. A distinctive feature of the system is the application of <i>Z</i><sup>+</sup>-number calculus, implemented using the scikit-fuzzy library, which allows for the accounting of graded confidence in evaluations. This approach provides more robust and adaptive risk assessments that are sensitive to changes in the quality of input data. Interactive visualization of the results is built upon OpenStreetMap. The system’s methodology for self-adaptation of confidence measures based on historical data is described.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S389 - S396"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. S. Adygamov, A. O. Golub, E. R. Saifullin, T. R. Gimadiev, N. Yu. Serov
{"title":"Intelligent Chemist Robot: Towards an Autonomous Laboratory","authors":"M. S. Adygamov, A. O. Golub, E. R. Saifullin, T. R. Gimadiev, N. Yu. Serov","doi":"10.3103/S0005105525701304","DOIUrl":"10.3103/S0005105525701304","url":null,"abstract":"<p>This paper describes a hardware and software platform that enables automated chemical syntheses, including the preparation, heating, and mixing of reaction mixtures, as well as postsynthesis dilution sampling and sending for high-performance liquid chromatography (HPLC) analysis, followed by automated processing of the results. A custom Python library, ChemBot, was developed to control individual robotic devices, and a client web server was created to manage the entire system. A web interface was created to view the system status and the progress of syntheses. The performance of the entire platform for performing experiments was tested by performing aldol condensation syntheses, where the ratio of reagents, the catalyst and its amount, the temperature and time of synthesis were varied. Writing custom code to monitor and control the entire system is an important step toward integrating the robotic system with artificial intelligence (AI), which will ultimately enable the transition to an autonomous laboratory, where target molecule prediction and synthesis, experimental execution and analysis, and, if necessary, refinement or modification of the model will be performed automatically, without the need for human intervention.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S383 - S388"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI in Cancer Prevention: A Retrospective Study","authors":"Petr Philonenko, Vladimir Kokh, Pavel Blinov","doi":"10.3103/S0005105525701432","DOIUrl":"10.3103/S0005105525701432","url":null,"abstract":"<p>This study investigates the feasibility of effectively solving population-scale cancer-screening problems using artificial intelligence (AI) methods that predict malignant neoplasm risk based on minimal electronic health record (EHR) data, namely, medical diagnosis and service codes. To address the formulated problem, we considered a broad spectrum of modern approaches, including classical machine learning methods, survival analysis, deep learning, and large language models (LLMs). Numerical experiments demonstrated that gradient boosting using survival analysis models as additional predictors possesses the best ability to rank patients by cancer risk level, enabling consideration of both population-level and individual risk factors for cancers. Predictors constructed from EHR data include demographic characteristics, healthcare utilization patterns, and clinical markers. This solution was tested in retrospective experiments under the supervision of specialized oncologists. In the retrospective experiment involving more than 1.9 million patients, we established that the risk group captures up to 5.4 times as many patients with cancer at the same level of medical examination. The investigated method represents a scalable solution using exclusively diagnosis and service codes, requiring no specialized infrastructure and integrable into oncological vigilance processes, making it applicable to population-scale cancer screening.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S479 - S483"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147612879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional Electrocardiogram Generation Using Hierarchical Variational Autoencoders","authors":"I. A. Sviridov, K. S. Egorov","doi":"10.3103/S0005105525701407","DOIUrl":"10.3103/S0005105525701407","url":null,"abstract":"<p>Cardiovascular diseases remain the leading cause of mortality. Automated electrocardiogram (ECG) analysis can ease clinical workloads but is limited by scarce and imbalanced data. Synthetic ECGs can mitigate these issues, and while most methods use generative adversarial networks (GANs), recent work has shown that variational autoencoders (VAEs) perform comparably. This paper presents a cNVAE-ECG model, a conditional Nouveau VAE (NVAE) that generates high-resolution 10-s 12-lead ECGs with multiple pathologies. Leveraging a compact channel-generation scheme and class embeddings for multilabel conditioning, cNVAE-ECG improves downstream binary and multilabel classification, achieving an AUROC gain of up to 2% in transfer learning over GAN-based models. The model is publicly available at https://github.com/univanxx/cNVAE_ECG.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S453 - S460"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. V. Isangulov, A. M. Elizarov, A. R. Kunafin, A. R. Gatiatullin, N. A. Prokopyev
{"title":"Neuro-Symbolic Approach to Augmented Text Generation via Automated Induction of Morphotactic Rules","authors":"M. V. Isangulov, A. M. Elizarov, A. R. Kunafin, A. R. Gatiatullin, N. A. Prokopyev","doi":"10.3103/S0005105525701353","DOIUrl":"10.3103/S0005105525701353","url":null,"abstract":"<p>The work presents a hybrid neuro-symbolic method that combines a large language model (LLM) and a finite-state transducer (FST) to ensure morphological correctness in text generation for agglutinative languages. The system automatically extracts rules from corpus data: for local examples of word forms, the LLM produces sequences of morphological analyses, which are then aggregated and organized into compact descriptions of morphotactic rules (LEXC) and allomorph selection (regex). During generation, the LLM and FST operate jointly: if a token is not recognized by the automaton, the LLM derives a “lemma + tags” pair from the context, and the FST produces the correct surface form. A literary corpus (~1600 sentences) was used as the dataset. For a list of 50 nouns, 250 word forms were extracted. Using the proposed algorithm, the LLM generated 110 context-sensitive regex rules along with LEXC morphotactics, from which an FST was compiled that recognized 170/250 forms (~70%). In an applied machine translation test on a subcorpus of 300 sentences, integrating this FST into the LLM cycle improved quality from BLEU 16.14/ChrF 45.13 to BLEU 25.71/ChrF 50.87 without retraining the translator. The approach scales to other parts of speech (verbs, adjectives, etc.) as well as to other agglutinative and low-resource languages, where it can accelerate the development of lexical and grammatical resources.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 5","pages":"S415 - S422"},"PeriodicalIF":0.5,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147614773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}