Qingwei Pang , Chenglizhao Chen , Wenhao Li , Shanchen Pang
{"title":"Multi-domain masked reconstruction self-supervised learning for lithology identification using well-logging data","authors":"Qingwei Pang , Chenglizhao Chen , Wenhao Li , Shanchen Pang","doi":"10.1016/j.knosys.2025.113843","DOIUrl":"10.1016/j.knosys.2025.113843","url":null,"abstract":"<div><div>Lithology identification is crucial in the fields of energy exploration and oil and gas drilling, particularly for unconventional reservoirs, wherein the complexity and high heterogeneity of rock formations pose significant challenges for prospecting and exploration. To address the dual challenges of scarcity of labeled data and low accuracy of lithology identification models, in this study, we proposed a novel dual-domain masked reconstruction self-supervised learning (MR-SSL) framework. This framework comprised two stages: self-supervised pretraining and supervised fine-tuning, to significantly improve the accuracy of lithology identification using only a small number of labeled samples. In the pretraining stage, we designed three innovative tasks: time- and frequency-domain masked reconstruction and time–frequency contrastive learning, with each supported by a specifically designed loss functions. The time- and frequency-domain masked reconstruction tasks achieved multi-dimensional feature modeling through differentiated designs: the former combined cross-depth and cross-parameter dynamic masking strategies to adaptively capture stratigraphic non-stationarity based on periodic analysis, whereas the latter synchronously learned single-parameter specificity and multi-parameter correlation through a shared-private embedding mechanism. These tasks, in conjunction with the time–frequency contrastive learning task, provided the model enhanced complementarity through cross-domain feature consistency constraints. In the supervised fine-tuning stage, the pretrained encoder was frozen, time–frequency features were integrated, and classification head was trained, further enhancing the model’s capability of lithology classification, with respect to the target geological conditions. Experimental validation demonstrated that the MR-SSL model achieved high accuracy of 98.7% and 97.07% for two different oilfield datasets, while using only 20% labeled data, surpassing the performances of conventional supervised and self-supervised methods. The proposed model presents a unique advantage: it enables the deep decoupling and complementary utilization of time–frequency features in logging data through multi-task collaboration, thereby providing an efficient low-label solution for lithology identification for unconventional reservoirs.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113843"},"PeriodicalIF":7.2,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Wang , Shuzhen Xu , Cuicui Lv , Chaoqing Ma , Fangbo Cai
{"title":"Diff-RCformer: A diffusion-augmented Recursive Context Transformer for image super-resolution","authors":"Shuo Wang , Shuzhen Xu , Cuicui Lv , Chaoqing Ma , Fangbo Cai","doi":"10.1016/j.knosys.2025.113758","DOIUrl":"10.1016/j.knosys.2025.113758","url":null,"abstract":"<div><div>Diffusion models have recently exhibited strong potential in single-image super-resolution (SISR) by effectively modeling complex data distributions and generating high-quality reconstructions. However, existing diffusion-based SISR methods often suffer from excessive iterative steps, resulting in a high computational overhead and slow convergence. In addition, traditional convolutional neural networks and Transformer-based architectures have difficulty in capturing complex global information, thereby limiting the reconstruction quality. To address these issues, we propose Diff-RCformer, which is a novel SISR framework that integrates diffusion-based prior generation with the Recursive Context Transformer (RCformer) to achieve robust and efficient super-resolution. Specifically, we use the diffusion model to generate high-quality prior features for super-resolution by iteratively refining Gaussian noise in a compressed latent space. These prior features are then injected into the RCformer, guiding it to reconstruct the high-resolution image. In the RCformer, we introduce Prior-Guided Recursive Generalization Network (PG-RGN) blocks. These blocks recursively aggregate the input features into representative feature maps, enabling them to adapt flexibly to input features of different dimensions and extract global information through cross-attention. We also combine the PG-RGN with Prior-Guided Local Self-Attention (PG-LSA) to enable the model to capture local detail features accurately and enhance the utilization of the global context. To achieve an optimal combination of local and global features, we propose Adaptive Feature Integration (AFI), which efficiently fuses local and global features across multiple attention layers. Our method also supports cascaded super-resolution, enabling flexible multi-stage refinement, which is particularly useful for complex scenarios. Comprehensive experiments on standard benchmarks indicate that Diff-RCformer surpasses recent state-of-the-art methods both quantitatively and qualitatively. <span><span>https://github.com/SureT-T/Diff-RCformer</span><svg><path></path></svg></span></div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113758"},"PeriodicalIF":7.2,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal-spectral-spatial synchronization attention-based network for EEG emotion recognition","authors":"Zhifen Guo, Jiao Wang, Hongchen Luo, Fengbin Ma, Yiying Zhang","doi":"10.1016/j.knosys.2025.113762","DOIUrl":"10.1016/j.knosys.2025.113762","url":null,"abstract":"<div><div>Electroencephalogram (EEG) provides an objective and precise representation of human emotional states, establishing EEG-based emotion recognition as a pivotal area in affective computing and intelligent systems. Nevertheless, EEG signals contain temporal-spectral-spatial features, exhibiting dynamic variations, frequency-band correlations, and spatial dependencies, with varying resolutions across domains. The challenge lies in adapting to resolution differences between domains, thereby improving the model’s ability to integrate complementary information across these domains. Moreover, processing multi-domain features often leads to complex model structures and excessive feature fusion, resulting in information loss. To tackle these challenges, we propose a unified framework: the Temporal-Spectral-Spatial Synchronization Attention-Based Network, which facilitates efficient modeling of multi-domain data. Specifically, the proposed network consists of a temporal-spectral-spatial attention encoder and a categorical decoder. The encoder adapts to resolution differences across temporal-spectral-spatial domains and synchronizes the fusion of spatiotemporal and spectral data, thus simplifying the model structure. Furthermore, we introduce a gating mechanism to adaptively balance the weights across domains and prevent excessive fusion that results in information loss. Finally, extensive experimental comparisons along with both subjective and objective analyses, demonstrate that our proposed network outperforms state-of-the-art models on the SEED, SEED-IV and DEAP.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113762"},"PeriodicalIF":7.2,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated and Lightweight Recursive Kernel Optimized Network (LReKON) learning model for cervical cancer diagnosis","authors":"G. Saranya , C. Sujatha","doi":"10.1016/j.knosys.2025.113742","DOIUrl":"10.1016/j.knosys.2025.113742","url":null,"abstract":"<div><div>Cervical cancer is the most prevalent long-term illnesses that can be highly affected women around the world. Images from Pap smears are a widely used technology for cervical cancer screening and diagnosis. Even when the contaminated sample is present, human error can lead to false-negative results when examining pap smears. This challenge has been revamped by automated image processing diagnostics, which is crucial in identifying abnormal tissues impacted by cervical cancer. Therefore, the proposed study aims to develop an automated and lightweight cervical cancer diagnosis system, known as, Lightweight Recursive Kernel Optimized Network (LReKON) for fast and accurate cervical cancer diagnosis. The aberrant region has been more accurately segmented from the raw cervical pictures using the Squeeze and Excitation Instance Segmentation Network (SEI-SN) technology. A novel algorithm termed Optimal Hyperplane based Kernel Neural Network (OHyKN) is used to determine the segmented region as either healthy or cancer-affected, depending on the relevant class. This work also employs a novel Hybrid Heap based Diffusion Vector Optimizer (H<sup>2</sup>DVO) technique to enhance the training and testing performance of the classifier and expedite the prediction process. Additionally, the proposed LReKON model's segmentation and classification performance is tested and verified using publicly available benchmarking datasets, including the Mendeley LBC and the SIPaKMed Pap smear image dataset, taking into account a number of factors. The trained LReKON model is best in cervical cancer diagnosis with 99.10 % accuracy, 99 % precision, 98.9 % recall, and 98.9 % F1-score and with an extremely fast inference time of 0.28 s. The SEI-SN segmentation module plays a crucial role in performance enhancement as its removal reduces accuracy to 95.15 %, and the removal of the OHyKN classification module reduces accuracy to 96.25 %. The H2DVO optimization step improves efficiency since its elimination results in an additional inference time of 0.32 s and accuracy of 97.35 %. Moreover, when the three components SEI-SN, OHyKN, and H2DVO are eliminated, accuracy reduces to 93.55 %, defining their combined contribution to segmentation, classification, and optimization.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113742"},"PeriodicalIF":7.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangtao Nie , Lei Zhang , Chongxing Song , Zhiqiang Lang , Weixin Ren , Wei Wei , Chen Ding , Yanning Zhang
{"title":"Generalized pixel-aware deep function-mixture network for effective spectral super-resolution","authors":"Jiangtao Nie , Lei Zhang , Chongxing Song , Zhiqiang Lang , Weixin Ren , Wei Wei , Chen Ding , Yanning Zhang","doi":"10.1016/j.knosys.2025.113743","DOIUrl":"10.1016/j.knosys.2025.113743","url":null,"abstract":"<div><div>Recent progress on spectral super-resolution (SR) mainly focuses on directly mapping an RGB image to its HSI counterpart using deep convolutional neural networks, <em>i.e.,</em> non-linearly transform the RGB context within a size-fixed receptive field centered at each pixel to its spectrum using a universal deep mapping function. However, in real scenarios, pixels in HSIs inevitably require size-different receptive fields and distinct mapping functions due to their differences in object category or spatial position, and consequently, these existing methods show limited generalization capacity, especially when the imaging scene is complicated. To tackle this issue, we introduce a pixel-aware deep function-mixture network (PADFMN) for SSR, which consists of a novel class of modules called function-mixture (FM) blocks. Each FM block contains several basis functions, represented by parallel subnets with varying receptive field sizes. Additionally, a separate subnet functions as a mixing function, generating pixel-level weights that linearly combine the outputs of the basis functions. This approach allows the network to dynamically adjust the receptive field size and mapping function for each pixel based on its specific characteristics. Through stacking several such FM blocks together and fusing their intermediate feature representations, we can obtain an effective SSR network with flexibility in learning pixel-wise deep mapping functions as well as better generalization capacity. Moreover, with the aim of employing the proposed PADFMN to cope with two more challenging SSR tasks, including cross-sensor SSR (<em>i.e.,</em> test on RGB image shot by a new sensor with unseen spectral response function) and scale-arbitrary SSR (<em>i.e.,</em> the spectral resolution of HSI to reconstruct can be arbitrarily determined), we extend the core FM blocks to two more generalized versions, namely sensor-guided FM block and scale-guided FM block. The former is able to cast the sensor-related information (<em>e.g.,</em> spectral response function) into guidance via dynamic filters to assist the spectral reconstruction using the basic FM block. This is beneficial for reducing the distribution shift between the training and test images incurred by unseen RGB sensors in terms of establishing the deep mapping function, thus leading to pleasing performance in cross-sensor SSR tasks. On the other hand, the latter encodes the user-determined spectral resolution to control the channel dimension of the feature output by the last basic FM block precisely via dynamically generating corresponding convolution filters, so that the network can reconstruct HSI with an arbitrarily determined scale while keeping the spectrum accuracy. We test the proposed method on three benchmark datasets, and it achieves state-of-the-art performance in SSR, cross-sensor SSR, and scale-arbitrary SSR tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113743"},"PeriodicalIF":7.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yijun Sheng , Puiieng Lei , Yanyan Liu , Ximing Chen , Qiwen Xu , Zhiguo Gong
{"title":"Dual space multi-granular model for multi-interest sequential recommendation","authors":"Yijun Sheng , Puiieng Lei , Yanyan Liu , Ximing Chen , Qiwen Xu , Zhiguo Gong","doi":"10.1016/j.knosys.2025.113764","DOIUrl":"10.1016/j.knosys.2025.113764","url":null,"abstract":"<div><div>Sequential recommendation aims to predict the next item for a user given his historical interaction sequence. Recently, multi-interest and Graph Neural Network (GNN) based paradigms are two new directions in such task. Multi-interest learning encompasses extracting diverse user interests through historical item clustering, while GNN refines user preferences through correlations among historical items. Recent research suggests the synergistic potential of combining these methods to aggregate user preferences at multiple levels, enhancing the accuracy of multi-interest extraction for improved recommendations. However, existing GNN-based multi-interest models can only achieve local smoothing for node embeddings through neighbor information aggregating, thus, they could not match remote items (far-apart items on the item–item graph) even though those remote items show similar local patterns and the items may reflect some niche preferences. It is unreasonable in the context of multi-interest recommendation as the objective is to capture the user’s interests in a more comprehensive manner. To tackle this issue, we propose a <strong>D</strong>ual <strong>S</strong>pace <strong>M</strong>ulti-<strong>G</strong>ranular <strong>Rec</strong>ommendation model (<strong>DSMGRec</strong>), where a Graph Deconvolutional Network (GDcN) is designed to disentangle local structure-based patterns of items as their additional embeddings. Then, we adopt a dual framework that combines the traditional GNN with our novel GDcN to encode multi-granular representations for items in the dual space. Such dual item representations can match items by not only their primary patterns but also their secondary patterns. Experiments on four real-world datasets with different densities show that our model outperforms state-of-the-art baselines.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113764"},"PeriodicalIF":7.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu
{"title":"DPNet: A dual prototype few-shot semantic segmentation network for crack detection","authors":"Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu","doi":"10.1016/j.knosys.2025.113733","DOIUrl":"10.1016/j.knosys.2025.113733","url":null,"abstract":"<div><div>Road crack detection is crucial for maintaining the aesthetics and safety of roads. The varying morphology of cracks often results in insufficient road crack samples, limiting the effectiveness of existing detection methods in few-sample scenarios. Further, when visual samples are insufficient, employing textual information to extract visual information from images is a cutting-edge technology. In this paper, we propose a Dual Prototype Network (DPNet) for few-shot crack detection. Firstly, we introduce an Improved Pixel Weight (IPW) data enhancement to strengthen the foreground and edges of cropped samples, improving learning efficiency in the case of insufficient samples. Next, we design a dual prototype prediction method. Specifically, we employ domain related text input to generate a Language-Image Prototype (LIP) with general domain knowledge through Contrastive Language-Image Pre-training (CLIP). Then, we generate a Support Prototype (SuP) with specialized domain knowledge from crack dataset images. The final prediction is obtained by linearly combining the predictions of the two prototypes. Additionally, we design an Embedding Attention Module (EAM), which leverages the characteristics of the embedding dimension to simultaneously satisfy both spatial and channel attention mechanisms in the transformer structure. Finally, our DPNet achieves superior performance on the FCrack-i and MixCrack few sample datasets, with an average mIoU improvement of 8.52% and 1.44% compared to the baseline. Moreover, we demonstrate the zero-shot capability of DPNet on CFD crack dataset.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113733"},"PeriodicalIF":7.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwen Wang , Jiaxin Li , Heye Zhang , Jingfeng Zhang , Feng Wan , Anqi Qiu , Zhifan Gao
{"title":"FedMDD: Multi-deliberation based calibration for federated long-tailed learning","authors":"Yiwen Wang , Jiaxin Li , Heye Zhang , Jingfeng Zhang , Feng Wan , Anqi Qiu , Zhifan Gao","doi":"10.1016/j.knosys.2025.113741","DOIUrl":"10.1016/j.knosys.2025.113741","url":null,"abstract":"<div><div>Federated learning is a decentralized framework enabling collaborative training of machine learning models across distributed data clients while ensuring privacy protection. Despite its advantages, traditional federated learning faces the global long-tailed imbalance, leading to poor performance by overemphasizing head classes and under-representing tail classes. While planned (pre-hoc) and post-hoc imbalance adjustments have been explored, post-hoc methods often require auxiliary data or suffer from overconfident decision boundaries, which limits their effectiveness. To address the overconfidence and out-of-distribution in existing solutions, we propose a multi-deliberation based post-hoc calibration method (FedMDD) tailored for the federated long-tailed problem. FedMDD calibrates the global decision boundary for balance. It incorporates a local–global feature contrast constraint to generate effective features and uses consistency across client models to deliberate a model-aware margin. This margin promotes a large relative distance between tail classes and the decision boundary, preserving privacy by leveraging model performance without requiring access to local class distributions. Extensive experiments demonstrate that FedMDD outperforms existing methods in balancing decision boundaries and enhancing privacy protection, achieving superior performance on long-tailed data distributions.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113741"},"PeriodicalIF":7.2,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144155069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deepfake audio detection with spectral features and ResNeXt-based architecture","authors":"Gul Tahaoglu , Daniele Baracchi , Dasara Shullani , Massimo Iuliani , Alessandro Piva","doi":"10.1016/j.knosys.2025.113726","DOIUrl":"10.1016/j.knosys.2025.113726","url":null,"abstract":"<div><div>The increasing prevalence of deepfake audio technologies and their potential for malicious use in fields such as politics and media has raised significant concerns regarding the ability to distinguish fake from authentic audio recordings. This study proposes a robust technique for detecting synthetic audio by leveraging three spectral features: Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC), and Constant Q Cepstral Coefficients (CQCC). These features are processed using an enhanced ResNeXt architecture to improve classification accuracy between genuine and spoofed audio. Additionally, a Multi-Layer Perceptron (MLP)-based fusion technique is employed to further boost the model’s performance. Extensive experiments were conducted using three datasets: the ASVspoof 2019 Logical Access (LA) dataset—featuring text-to-speech (TTS) and voice conversion attacks—the ASVspoof 2019 Physical Access (PA) dataset—including replay attacks—and the ASVspoof 2021 LA, PA and DF datasets. The proposed approach has demonstrated superior performance compared to state-of-the-art methods across all three datasets, particularly in detecting fake audio generated by text-to-speech (TTS) attacks. Its overall performance is summarized as follows: the system achieved an Equal Error Rate (EER) of 1.05% and a minimum tandem Detection Cost Function (min-tDCF) of 0.028 on the ASVspoof 2019 Logical Access (LA) dataset, and an EER of 1.14% and min-tDCF of 0.03 on the ASVspoof 2019 Physical Access(PA) dataset, demonstrating its robustness in detecting various types of audio spoofing attacks. Finally, on the ASVspoof 2021 LA dataset the method achieved an EER of 7.44% and min-tDCF of 0.35.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113726"},"PeriodicalIF":7.2,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auto-StyleMixer: A universal adaptive N-to-One framework for cross-domain data augmentation","authors":"Huihuang Zhang, Haigen Hu, Bin Cao, Xiaoqin Zhang","doi":"10.1016/j.knosys.2025.113616","DOIUrl":"10.1016/j.knosys.2025.113616","url":null,"abstract":"<div><div>Existing domain generalization (DG) approaches that rely on traditional techniques like the Fourier transform and normalization can extract style information for cross-domain data augmentation by confusing styles to enhance model generalization. However, these one-to-one methods face two significant challenges: (1) They cannot effectively extract pure style information in deep layers, potentially disrupting the ability to learn content information. (2) Due to the unknown purity of the extracted style information, considerable resources are required to find the optimal style-mixing configuration based on manual experience. To address these challenges, we propose a universal N-to-one cross-domain data augmentation framework, named Auto-StyleMixer, which not only extracts purer style information but also adapts to learn style-mixing configurations without any manual intervention. The proposed framework can embed any traditional style extraction techniques and can be integrated as a plug-and-play module into any architecture, whether CNNs or Transformers. Extensive experiments demonstrate the effectiveness of the proposed method, showing that it achieves state-of-the-art performance on five DG benchmarks. The source code is available at <span><span>https://github.com/Jin-huihuang/AutoStyleMixer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113616"},"PeriodicalIF":7.2,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}