M. I. Rudakov, A. N. Beznosikov, Ya. A. Kholodov, A. V. Gasnikov
{"title":"Activations and Gradients Compression for Model-Parallel Training","authors":"M. I. Rudakov, A. N. Beznosikov, Ya. A. Kholodov, A. V. Gasnikov","doi":"10.1134/S1064562423701314","DOIUrl":"10.1134/S1064562423701314","url":null,"abstract":"<p>Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information compression can be applied to decrease workers’ communication time, as it is often a bottleneck in such systems. This work explores how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We analyze compression methods such as quantization and TopK compression, and also experiment with error compensation techniques. Moreover, we employ TopK with AQ-SGD per-batch error feedback approach. We conduct experiments on image classification and language model fine-tuning tasks. Our findings demonstrate that gradients require milder compression rates than activations. We observe that <span>(K = 10% )</span> is the lowest TopK compression level, which does not harm model convergence severely. Experiments also show that models trained with TopK perform well only when compression is also applied during inference. We find that error feedback techniques do not improve model-parallel training compared to plain compression, but allow model inference without compression with almost no quality drop. Finally, when applied with the AQ-SGD approach, TopK stronger than with <span>(K = 30% )</span> worsens model performance significantly.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S272 - S281"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142413765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Network Approach to the Problem of Predicting Interest Rate Anomalies under the Influence of Correlated Noise","authors":"G. A. Zotov, P. P. Lukianchenko","doi":"10.1134/S1064562423701521","DOIUrl":"10.1134/S1064562423701521","url":null,"abstract":"<p>The aim of this study is to analyze bifurcation points in financial models using colored noise as a stochastic component. The research investigates the impact of colored noise on change-points and approach to their detection via neural networks. The paper presents a literature review on the use of colored noise in complex systems. The Vasicek stochastic model of interest rates is the object of the research. The research methodology involves approximating numerical solutions of the model using the Euler–Maruyama method, calibrating model parameters, and adjusting the integration step. Methods for detecting bifurcation points and their application to the data are discussed. The study results include the outcomes of an LSTM model trained to detect change-points for models with different types of noise. Results are provided for comparison with various change-point windows and forecast step sizes.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S293 - S299"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142413766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Do we Benefit from the Categorization of the News Flow in the Stock Price Prediction Problem?","authors":"T. D. Kulikova, E. Yu. Kovtun, S. A. Budennyy","doi":"10.1134/S1064562423701648","DOIUrl":"10.1134/S1064562423701648","url":null,"abstract":"<p>The power of machine learning is widely leveraged in the task of company stock price prediction. It is essential to incorporate historical stock prices and relevant external world information for constructing a more accurate predictive model. The sentiments of the financial news connected with the company can become such valuable knowledge. However, financial news has different topics, such as <i>Macro</i>, <i>Markets</i>, or <i>Product news</i>. The adoption of such categorization is usually out of scope in a market research. In this work, we aim to close this gap and explore the effect of capturing the news topic differentiation in the stock price prediction problem. Initially, we classify the financial news stream into 20 pre-defined topics with the pre-trained model. Then, we get sentiments and explore the topic of news group sentiment labeling. Moreover, we conduct the experiments with the several well-proved models for time series forecasting, including the Temporal Convolutional Network (TCN), the D-Linear, the Transformer, and the Temporal Fusion Transformer (TFT). In the results of our research, utilizing the information from separate topic groups contributes to a better performance of deep learning models compared to the approach when we consider all news sentiments without any division.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S503 - S510"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. T. Osipov, M. I. Gongola, Ye. A. Morkhova, A. P. Nemudryi, A. A. Kabanov
{"title":"Machine Learning As a Tool to Accelerate the Search for New Materials for Metal-Ion Batteries","authors":"V. T. Osipov, M. I. Gongola, Ye. A. Morkhova, A. P. Nemudryi, A. A. Kabanov","doi":"10.1134/S1064562423701612","DOIUrl":"10.1134/S1064562423701612","url":null,"abstract":"<p>The search for new solid ionic conductors is an important topic of material science that requires significant resources, but can be accelerated using machine learning (ML) techniques. In this work, ML methods were applied to predict the migration energy of working ions. The training set is based on data on 225 lithium ion migration channels in 23 ion conductors. The descriptors were the parameters of free space in the crystal obtained by the Voronoi partitioning method. The accuracy of migration energy prediction was evaluated by comparison with the data obtained by the density functional theory method. Two methods of ML were applied in the work: support vector regression and ordinal regression. It is shown that the parameters of free space in a crystal correlate with the migration energy, while the best results are obtained by ordinal regression. The developed ML models can be used as an additional filter in the analysis of ionic conductivity in solids.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S476 - S483"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Online Learning in Recurrent and Feedforward Quantum Neural Networks","authors":"S. V. Zuev","doi":"10.1134/S1064562423701557","DOIUrl":"10.1134/S1064562423701557","url":null,"abstract":"<p>For adaptive artificial intelligence systems, the question of the possibility of online learning is especially important, since such training provides adaptation. The purpose of the work is to consider methods of quantum machine online learning for the two most common architectures of quantum neural networks: feedforward and recurrent. The work uses the quantumz module available on PyPI to emulate quantum computing and create artificial quantum neural networks. In addition, the genser module is used to transform data dimensions, which provides reversible transformation of dimensions without loss of information. The data for the experiments are taken from open sources. The paper implements the machine learning method without optimization, proposed by the author earlier. Online learning algorithms for recurrent and feedforward quantum neural network are presented and experimentally confirmed. The proposed learning algorithms can be used as data science tools, as well as a part of adaptive intelligent control systems. The developed software can fully unleash its potential only on quantum computers, but, in the case of a small number of quantum registers, it can also be used in systems that emulate quantum computing, or in photonic computers.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S317 - S324"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142413768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Safilo, D. Tikhonovich, A. V. Petrov, D. I. Ignatov
{"title":"MTS Kion Implicit Contextualised Sequential Dataset for Movie Recommendation","authors":"I. Safilo, D. Tikhonovich, A. V. Petrov, D. I. Ignatov","doi":"10.1134/S1064562423701594","DOIUrl":"10.1134/S1064562423701594","url":null,"abstract":"<p>We present a new movie and TV show recommendation dataset collected from the real users of MTS Kion video-on-demand platform. In contrast to other popular movie recommendation datasets, such as MovieLens or Netflix, our dataset is based on the implicit interactions registered at the watching time, rather than on explicit ratings. We also provide rich contextual and side information including interactions characteristics (such as temporal information, watch duration and watch percentage), user demographics and rich movies meta-information. In addition, we describe the MTS Kion Challenge—an online recommender systems challenge that was based on this dataset—and provide an overview of the best performing solutions of the winners. We keep the competition sandbox open, so the researchers are welcome to try their own recommendation algorithms and measure the quality on the private part of the dataset.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S456 - S464"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Medyakov, G. Molodtsov, A. Beznosikov, A. Gasnikov
{"title":"Optimal Data Splitting in Distributed Optimization for Machine Learning","authors":"D. Medyakov, G. Molodtsov, A. Beznosikov, A. Gasnikov","doi":"10.1134/S1064562423701600","DOIUrl":"10.1134/S1064562423701600","url":null,"abstract":"<p>The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches suffer from a significant bottleneck—the cost of communications. Therefore, a large amount of research has recently been directed at solving this problem. One such approach uses local data similarity. In particular, there exists an algorithm provably optimally exploiting the similarity property. But this result, as well as results from other works solve the communication bottleneck by focusing only on the fact that communication is significantly more expensive than local computing and does not take into account the various capacities of network devices and the different relationship between communication time and local computing expenses. We consider this setup and the objective of this study is to achieve an optimal ratio of distributed data between the server and local machines for any costs of communications and local computations. The running times of the network are compared between uniform and optimal distributions. The superior theoretical performance of our solutions is experimentally validated.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S465 - S475"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140299620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1-Dimensional Topological Invariants to Estimate Loss Surface Non-Convexity","authors":"D. S. Voronkova, S. A. Barannikov, E. V. Burnaev","doi":"10.1134/S1064562423701569","DOIUrl":"10.1134/S1064562423701569","url":null,"abstract":"<p>We utilize the framework of topological data analysis to examine the geometry of loss landscape. With the use of topology and Morse theory, we propose to analyse 1-dimensional topological invariants as a measure of loss function non-convexity up to arbitrary re-parametrization. The proposed approach uses optimization of 2-dimensional simplices in network weights space and allows to conduct both qualitative and quantitative evaluation of loss landscape to gain insights into behavior and optimization of neural networks. We provide geometrical interpretation of the topological invariants and describe the algorithm for their computation. We expect that the proposed approach can complement the existing tools for analysis of loss landscape and shed light on unresolved issues in the field of deep learning.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S325 - S332"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Safe Pretraining of Deep Language Models in a Synthetic Pseudo-Language","authors":"T. E. Gorbacheva, I. Y. Bondarenko","doi":"10.1134/S1064562423701636","DOIUrl":"10.1134/S1064562423701636","url":null,"abstract":"<p>This paper compares the pretraining of a transformer on natural language texts and on sentences of a synthetic pseudo-language. The artificial texts are automatically generated according to the rules written in a context-free grammar. The results of fine-tuning to complete tasks of the RussianSuperGLUE project statistically reliably showed that the models had the same scores. That is, the use of artificial texts facilitates the AI safety, because it can completely control the composition of the dataset. In addition, at the pretraining stage of a RoBERTa-like model, it is enough to learn recognizing only the syntactic and morphological patterns of the language, which can be successfully created in a fairly simple way, such as a context-free grammar.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S494 - S502"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pichugin, M. Pechin, A. Beznosikov, A. Savchenko, A. Gasnikov
{"title":"Optimal Analysis of Method with Batching for Monotone Stochastic Finite-Sum Variational Inequalities","authors":"A. Pichugin, M. Pechin, A. Beznosikov, A. Savchenko, A. Gasnikov","doi":"10.1134/S1064562423701582","DOIUrl":"10.1134/S1064562423701582","url":null,"abstract":"<p>Variational inequalities are a universal optimization paradigm that is interesting in itself, but also incorporates classical minimization and saddle point problems. Modern realities encourage to consider stochastic formulations of optimization problems. In this paper, we present an analysis of a method that gives optimal convergence estimates for monotone stochastic finite-sum variational inequalities. In contrast to the previous works, our method supports batching and does not lose the oracle complexity optimality. The effectiveness of the algorithm, especially in the case of small but not single batches is confirmed experimentally.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"108 2 supplement","pages":"S348 - S359"},"PeriodicalIF":0.5,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}