{"title":"Towards understanding the optimization mechanisms in deep learning","authors":"Binchuan Qi, Wei Gong, Li Li","doi":"10.1007/s10489-025-06875-7","DOIUrl":"10.1007/s10489-025-06875-7","url":null,"abstract":"<div><p>In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model’s parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as overparameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced inverted transformer: advancing variate token encoding and blending for time series forecasting","authors":"Xin-Yi Li, Yu-Bin Yang","doi":"10.1007/s10489-025-06886-4","DOIUrl":"10.1007/s10489-025-06886-4","url":null,"abstract":"<div><p>Recent advancements in channel-dependent Transformer-based forecasters highlight the efficacy of variate tokenization for time series forecasting. Despite this progress, challenges remain in handling complex time series. The vanilla Transformer, while effective in certain scenarios, faces limitations in addressing intricate cross-variate interactions and diverse temporal patterns. This paper presents the Enhanced Inverted Transformer (EiT for short), enhancing standard Transformer blocks for advanced modeling and blending of variate tokens. EiT incorporates three key innovations: First, a hybrid multi-patch attention mechanism that adaptively fuses global and local attention maps, capturing both stable and volatile correlations to mitigate overfitting and enrich inter-channel communication. Second, a multi-head feed-forward network with specialized heads for various temporal patterns, enhancing parameter efficiency and contributing to robust multivariate predictions. Third, paired channel normalization applied to each layer, preserving crucial channel-specific statistics and boosting forecasting performance. By integrating these innovations, EiT effectively overcomes limitations and unlocks the potential of variate tokens for accurate and robust multivariate time series forecasting. Extensive evaluations demonstrate that EiT achieves state-of-the-art (SOTA) performance, surpassing the previous method, the inverted Transformer, by an average of 4.4% in Mean Squared Error (MSE) and 3.4% in Mean Absolute Error (MAE) across five challenging long-term forecasting datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A graph convolutional network for time series classification using recurrence plots","authors":"Hyewon Kang, Taek-Ho Lee, Junghye Lee","doi":"10.1007/s10489-025-06841-3","DOIUrl":"10.1007/s10489-025-06841-3","url":null,"abstract":"<div><p>Time series classification (TSC) is a crucial task across various domains, and its performance heavily depends on the quality of input representations. Among various representations, the recurrence plot (RP) effectively captures topological recurrence, the unique property of time series data. However, conventional convolutional neural networks (CNNs) cannot fully exploit this property since they treat the RP as grid-like data. In this study, we propose RP-GCN, a novel approach that uses a graph convolutional network (GCN) to exploit topological recurrence inherent in the RP, thereby improving TSC performance. Our method transforms a multivariate time series into graphs where state matrices act as node feature matrices and RPs serve as adjacency matrices, enabling graph convolution to utilize recurrence relationships. We evaluated RP-GCN on 35 benchmark multivariate time series classification datasets and demonstrated superior accuracy and efficient inference time compared to existing methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06841-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loup-Noé Levy, Guillaume Guerard, Sonia Djebali, Soufian Ben Amor
{"title":"PretopoMD: pretopology-based mixed data hierarchical clustering","authors":"Loup-Noé Levy, Guillaume Guerard, Sonia Djebali, Soufian Ben Amor","doi":"10.1007/s10489-025-06770-1","DOIUrl":"10.1007/s10489-025-06770-1","url":null,"abstract":"<div><p>This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm’s robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality reduction techniques and its innovative use of logical rules that enhance both cluster formation and clarity, thereby contributing a significant advancement to the discourse on clustering mixed data.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preprocessing method for shield operational parameters adaptable to geological survey data characteristic for predicting disc cutter wear","authors":"Deyun Mo, Liping Bai, Wenjiang Liao, Xinyuan Tian, Weiran Huang","doi":"10.1007/s10489-025-06846-y","DOIUrl":"10.1007/s10489-025-06846-y","url":null,"abstract":"<div><p>Shield operational parameters are inherently noisy and, relative to concurrent geological exploration data, contain considerable redundancy, they must be pre-processed before the datasets input to artificial intelligence models. This paper presents a denoising and compression method for preprocessing shield operational parameters, integrating it with the stratal slicing method for predicting disc cutter wear. The operational parameter signals affecting cutter wear are first denoised using wavelet transform, Fourier transform, rolling average, and autoencoder techniques. The proposed Ring-based Summation Averaging (RSA) and Piecewise Aggregate Averaging (PAA) methods are then used to compress the denoised signals, resulting in compressed sequences composed of key points equal to the number of tunnel rings, effectively matching the geological parameters expanded by the stratal slicing method. Furthermore, the prepared data were tested using the long short-term memory (LSTM) + attention mechanism (AM) model to evaluate its application effectiveness in the Guangzhou Metro Line 18 railway. The results show that data compressed using PAA not only better tracks signal variations but also allows for flexible control of the output length of the compressed sequence. The combination of wavelet transforms denoising (WTD) with PAA exhibited the best wear prediction results, achieving <i>R</i><sup><i>2</i></sup> / <i>MSE</i> = 0.95 / 2.21 mm. By integrating WTD, PAA, stratal slicing method, and sequence models, a comprehensive and universal methodology is established that can predict disc cutter wear based on initial geological data and shield operational parameters.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-discriminator generative adversarial networks with dynamic penalty to over-sample imbalanced credit datasets","authors":"Xiaogang Dong, Lifei Wang, Xiwen Qin, Hongyu Shi","doi":"10.1007/s10489-025-06836-0","DOIUrl":"10.1007/s10489-025-06836-0","url":null,"abstract":"<div><p>The problem of credit risk data imbalance reduces the effectiveness of assessment models. Existing oversampling methods focus only on a partial sample of a few classes, resulting in a lack of diversity in the types of data generated. This paper proposes an innovative GAN variant called Magnify-GAN. The originality of Magnify-GAN lies in the fact that it is equipped with a primary discriminator and multiple secondary discriminators, each of which employs a different loss function. This multi-discriminator approach not only improves the learning results, but also enriches the feedback received during the training process. In addition, we integrate an innovative dynamic coefficient mechanism to enable the model to dynamically adapt to changes in data distribution. To further improve stability and address the common modal collapse problem in GAN, a gradient penalty method is embedded in the training protocol. This integrated strategy ensures that Magnify-GAN can effectively generate samples representing various minority classes within the real data. Compared to ten classical imbalanced sampling methods, Magnify-GAN demonstrates superior performance in precision, F1-score, and AUC values across six synthetic and four real-world imbalanced datasets. Ablation studies, visualized through heatmaps, reveal the complementary synergy between the core modules. Furthermore, a complexity analysis shows that Magnify-GAN offers significant performance gains with moderate increases in computational cost compared to state-of-the-art methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145062214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic gradient accelerated by negative momentum for training deep neural networks","authors":"Xiaotian Li, Zhuang Yang, Yang Wang","doi":"10.1007/s10489-025-06900-9","DOIUrl":"10.1007/s10489-025-06900-9","url":null,"abstract":"<div><p>The fast and robust stochastic optimization algorithms for training deep neural networks (DNNs) are still a topic of heated discussion. As a simple but effective way, the momentum technique, which utilizes historical gradient information, shows significant promise in training DNNs both theoretically and empirically. Nonetheless, the accumulation of error gradients in stochastic settings leads to the failure of momentum techniques, e.g., Nesterov’s accelerated gradient (NAG), in accelerating stochastic optimization algorithms. To address this problem, a novel type of stochastic optimization algorithm based on negative momentum (NM) is developed and analyzed. In this work, we applied NM to vanilla stochastic gradient descent (SGD), leading to SGD-NM. Although a convex combination of previous and current historical information is adopted in SGD-NM, fewer hyperparameters are introduced than those of the existing NM techniques. Meanwhile, we establish a theoretical guarantee for the resulting SGD-NM and show that SGD-NM enjoys the same low computational cost as vanilla SGD. To further show the superiority of NM in stochastic optimization algorithms, we propose a variant of stochastically controlled stochastic gradient (SCSG) based on the negative momentum technique, termed SCSG-NM, which achieves faster convergence compared to SCSG. Finally, we conduct experiments on various DNN architectures and benchmark datasets. The comparison results with state-of-the-art stochastic optimization algorithms show the great potential of NM in accelerating stochastic optimization, including more robust to large learning rates and better generalization.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145062215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regret-theory-based three-way decision making in hesitant fuzzy environments: A multi-attribute approach and its applications","authors":"Weihua Xu, Wenxiu Luo","doi":"10.1007/s10489-025-06801-x","DOIUrl":"10.1007/s10489-025-06801-x","url":null,"abstract":"<div><p>Decision-making is intricately linked to the psychological behavior of decision-makers, particularly their susceptibility to risk uncertainty and the consequent emergence of regret psychology. The hesitant fuzzy information system is an effective mechanism for encapsulating the substantial uncertainty inherent in real-world data. While existing three-way multi-attribute decision-making (TWD-MADM) methods have made significant progress in handling uncertainty, they often overlook the psychological factors of decision-makers, such as regret aversion. This paper introduces a three-way decision-making method (TWD-MADM-RT-HFS), grounded in regret theory, for multi-attribute decision-making in a hesitant fuzzy environment. Unlike traditional TWD-MADM approaches, our method explicitly incorporates regret theory to model decision-makers’ psychological behavior, providing a more realistic framework for decision-making under uncertainty. The methodology involves computing a relative outcome matrix using the PROMETHEE-II method to assess the gains and losses of objectives. A novel regret-based perceived utility function is proposed to quantify decision-makers’ aversion to regret, followed by calculating satisfaction-based weight functions for different events across various states. The integration of these weight functions with the perceived utility function yields a new expected utility function, pivotal for ranking and classifying alternatives. To validate the effectiveness of the proposed methodology, the Algerian Forest Fires Dataset was selected for application testing and successfully classified into three categories: fire, possible fire and no fire. The results were then ranked in detail based on the probability of their occurrence. It is anticipated that this classification will help to predict fire risk more accurately in the future, so that timely measures can be taken to prevent and control fire hazards. The method’s feasibility, effectiveness, and superiority are validated through a comparative analysis with existing methods in real-case scenarios. The stability of the model is further confirmed by conducting sensitivity analyses under different parameter settings.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 14","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physics-informed epidemic prediction for irregularly sampled spatio-temporal sequence with missing values","authors":"Haodong Cheng, Yingchi Mao","doi":"10.1007/s10489-025-06802-w","DOIUrl":"10.1007/s10489-025-06802-w","url":null,"abstract":"<div><p>In the task of predicting the spatiotemporal spread of the epidemic, a deep learning framework based on the discrete physics-informed neural network has been proposed, which integrates spatio-temporal dependency relationships and physical constraint mechanisms to address the limitations of traditional physics-informed neural networks. However, these methods typically assume that the spatiotemporal sequence is normally sampled at regular intervals and there are no missing values, without modeling the asynchronous spatiotemporal correlation present in irregularly sampled multivariate spatio-temporal sequences with missing values. The presence of missing values and variable time intervals in node variables in different regions may blur or distort the actual relationships between variables, which in turn affects the quality of loss-constrained learning of unknown parameters based on physical models. Therefore, this paper proposes a novel method for physics-informed spatiotemporal sequence prediction, named PEPIST. It utilizes a designed spatio-temporal sparse graph structure to effectively represent the irregularity of sampling time intervals and spatiotemporal missing values, and combines mechanisms such as graph spatiotemporal pattern capture and attention based physical spatiotemporal parameter interpolation to generate unknown parameter variable representations required for multi-region SEIR-informed loss constraints, as well as spatiotemporal characteristics of the variables to be predicted. Experimental results have shown that the method proposed in this paper exhibits high prediction accuracy in real COVID-19 epidemic prediction cases.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 14","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyang Zhou, Zhijie Wen, Yuandi Zhao, Jun Shi, Shihui Ying
{"title":"Mitigating noisy labels in long-tailed image classification via multi-level collaborative learning","authors":"Xinyang Zhou, Zhijie Wen, Yuandi Zhao, Jun Shi, Shihui Ying","doi":"10.1007/s10489-025-06809-3","DOIUrl":"10.1007/s10489-025-06809-3","url":null,"abstract":"<div><p>Label noise and class imbalance are two types of data bias that have attracted widespread attention in the past, but few methods can address both of them simultaneously. Recently, some works have begun to explore handling the two biases concurrently. In this article, we combine feature-level sample selection with logit-level knowledge distillation and logit adjustment to form a more complete collaborative training framework using two neural networks, which is termed <b>D</b>ynamic <b>N</b>oise and <b>I</b>mbalance <b>W</b>eighted <b>D</b>istillation (DNIWD). Firstly, we construct two types of sample sets, which are dynamic high-confidence set and basic confidence set. Based on the former, we estimate the centroids for each class in the latent space and select clean and easy examples for the peer network based on the uncertainty. Secondly, based on the latter, we perform knowledge distillation between the existing two networks to facilitate the learning of all classes, letting the network adaptively adjust the weight of distillation loss based on its own outputs. Meanwhile, we add an auxiliary classifier to each network and apply an improved balanced loss to train it, in order to boost the generalization performance of tail classes in more severe cases of class imbalance and provide balanced predictions for constructing confidence sample sets. Compared to state-of-the-art methods, <b>DNIWD</b> achieves significant improvement on synthetic and real-world datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 14","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}