Qi Guo, Qi Tan, Yue Peng, Long Xiao, Miao Liu, Benyun Shi
{"title":"Model-enhanced spatial-temporal attention networks for traffic density prediction","authors":"Qi Guo, Qi Tan, Yue Peng, Long Xiao, Miao Liu, Benyun Shi","doi":"10.1007/s40747-024-01669-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01669-9","url":null,"abstract":"<p>Traffic density is a crucial indicator for evaluating the level of service, as it directly reflects the degree of road congestion and driving comfort. However, accurately predicting real-time traffic density has been a significant challenge in Intelligent Transportation Systems (ITS) due to the nonlinear and spatial-temporal dynamic complexity of traffic density. In this paper, we propose a novel Model-enhanced Spatial-Temporal Attention Network (MSTAN), which constructs a spatial-temporal traffic kernel density model using the Kernel Density Estimation (KDE) method to process the spatiotemporal data and calculate the probabilities of various spatiotemporal events. These probabilities are input into the attention mechanism, enabling the model to recognize the inherent connection between dynamic and distant events. Through this fusion, the network can deeply learn and analyze the spatial-temporal properties of traffic features. Furthermore, this paper utilizes the attention mechanism to dynamically model spatial-temporal dependencies, capturing real-time traffic conditions and density, and constructs a spatial-temporal attention module for learning. To validate the performance of the proposed MSTAN model, experiments are conducted on two public datasets of California highways (PeMS04 and PeMS08). The experimental results demonstrate that the MSTAN model outperforms existing state-of-the-art baseline models in terms of prediction accuracy, thus proving the effectiveness of the model both theoretically and practically.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nengjun Zhu, Jieyun Huang, Jian Cao, Liang Hu, Siji Zhu
{"title":"Toward medical test recommendation from optimal attribute selection perspectives: a backward reasoning approach","authors":"Nengjun Zhu, Jieyun Huang, Jian Cao, Liang Hu, Siji Zhu","doi":"10.1007/s40747-024-01629-3","DOIUrl":"https://doi.org/10.1007/s40747-024-01629-3","url":null,"abstract":"<p>Medical tests are crucial for treatment decision making. However, over-testing can often occur in any medical speciality or level of expertise. Since over-testing usually results in a financial burden for patients and is also a waste of medical resources, this naturally leads to the question: which medical test items (MTIs) are necessary and should be prioritized for the target patients? It is a nontrivial task to identify the right MTIs due to the diversified health status of patients and the complicated prerequisites of therapies. To this end, in this paper, we propose a data-driven approach to evaluate the priority which should be given to MTIs by modeling the relationships between MTIs and therapies. Specifically, we first develop a dual hierarchical topic model (DHTM), which views the adopted hierarchical therapies as labeled topics and the MTI reports, i.e., the set of hierarchical attribute-value pairs (AVPs), as documents. Then, with the therapy-AVP distribution and the partial MTI reports of the target patient, we can scope the candidate therapies, which are further utilized to evaluate the accumulated gain of MTIs to be tested. Moreover, the next MTI recommendation is conducted based on the gains. Finally, extensive experiments on real-world medical data validate the effectiveness of our approach, and some interesting observations are also provided.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"105 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyao Ding, Dongyan Ding, Gang Zhou, Jicang Lu, Taojie Zhu
{"title":"Document-level relation extraction via dual attention fusion and dynamic asymmetric loss","authors":"Xiaoyao Ding, Dongyan Ding, Gang Zhou, Jicang Lu, Taojie Zhu","doi":"10.1007/s40747-024-01632-8","DOIUrl":"https://doi.org/10.1007/s40747-024-01632-8","url":null,"abstract":"<p>Document-level relation extraction (RE), which requires integrating and reasoning information to identify multiple possible relations among entities. However, previous research typically performed reasoning on heterogeneous graphs and set a global threshold for multiple relations classification, regardless of interaction reasoning information among multiple relations and positive–negative samples imbalance on databases. This paper proposes a novel framework for Document-level RE with two techniques, dual attention fusion and dynamic asymmetric loss. Concretely, to obtain more interdependency feature learning, we construct entity pairs and contextual matrixes using multi-head axial attention and co-attention mechanism to learn the interaction among entity pairs deeply. To alleviate the hard-thresholds influence from positive–negative imbalance samples, we dynamically adjust weights to optimize the probabilities of different labels. We evaluate our model on two benchmark document-level RE datasets, DocRED and CDR. Experimental results show that our DASL (Dual Attention fusion and dynamic aSymmetric Loss) obtains superior performance on two public datasets, we further provide extensive experiments to analyze how dual attention fusion and dynamic asymmetric loss guide the model for better extracting multi-label relations among entities.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"154 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Moor: Model-based offline policy optimization with a risk dynamics model","authors":"Xiaolong Su, Peng Li, Shaofei Chen","doi":"10.1007/s40747-024-01621-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01621-x","url":null,"abstract":"<p>Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aversion. However, current model-based approaches only extract state transition information and reward information using dynamics models, which cannot capture risk information implicit in offline data and may result in the misuse of high-risk data. In this work, we propose a model-based offline policy optimization approach with a risk dynamics model (MOOR). Specifically, we construct a risk dynamics model using a quantile network that can learn the risk information of data, then we reshape model-generated data based on errors of the risk dynamics model and the risk information of data. Finally, we use a risk-averse algorithm to learn the policy on the combined dataset of offline and generated data. We theoretically prove that MOOR can identify risk information of data and avoid utilizing high-risk data, our experiments show that MOOR outperforms existing approaches and achieves state-of-the-art results in risk-sensitive D4RL and risky navigation tasks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"71 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCNet: a human pose compensation network based on incremental learning for sports actions estimation","authors":"Jia-Hong Jiang, Nan Xia","doi":"10.1007/s40747-024-01647-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01647-1","url":null,"abstract":"<p>Human pose estimation has a wide range of applications. Existing methods perform well in conventional domains, but there are certain defects when they are applied to sports activities. The first is lack of estimation of the extremity posture, making it impossible to comprehensively evaluate the movement posture; the second is insufficient occlusion handling. Therefore, we propose a human pose compensation network based on incremental learning, which obtains shared weights to extract detailed features under the premise of limited extremity training data. We propose a higher-order feature compensator (HOF-compensator) to embed the attributes of the extremity into the torso and limbs topology structure, building a complete higher-order feature. In addition, to improve the occlusion handling performance, we propose an occlusion feature enhancement attention mechanism (OFE-attention) that can identify occluded keypoints and enhance attention to occlusion areas. We design comparative experiments on three public datasets and a self-built sports dataset, achieving the highest mean accuracy among all comparative methods. In addition, we design a series of ablation analysis and visualization displays to verify that our method performs best in sports pose estimation.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"14 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanxian Duan, Qian Jiang, Xin Jin, Michal Wozniak, Yi Zhao, Liwen Wu, Shaowen Yao, Wei Zhou
{"title":"Mf-net: multi-feature fusion network based on two-stream extraction and multi-scale enhancement for face forgery detection","authors":"Hanxian Duan, Qian Jiang, Xin Jin, Michal Wozniak, Yi Zhao, Liwen Wu, Shaowen Yao, Wei Zhou","doi":"10.1007/s40747-024-01634-6","DOIUrl":"https://doi.org/10.1007/s40747-024-01634-6","url":null,"abstract":"<p>Due to the increasing sophistication of face forgery techniques, the images generated are becoming more and more realistic and difficult for human eyes to distinguish. These face forgery techniques can cause problems such as fraud and social engineering attacks in facial recognition and identity verification areas. Therefore, researchers have worked on face forgery detection studies and have made significant progress. Current face forgery detection algorithms achieve high detection accuracy within-dataset. However, it is difficult to achieve satisfactory generalization performance in cross-dataset scenarios. In order to improve the cross-dataset detection performance of the model, this paper proposes a multi-feature fusion network based on two-stream extraction and multi-scale enhancement. First, we design a two-stream feature extraction module to obtain richer feature information. Secondly, the multi-scale feature enhancement module is proposed to focus the model more on information related to the current sub-region from different scales. Finally, the forgery detection module calculates the overlap between the features of the input image and real images during the training phase to determine the forgery regions. The method encourages the model to mine forgery features and learns generic and robust features not limited to a particular feature. Thus, the model achieves high detection accuracy and performance. We achieve the AUC of 99.70% and 90.71% on FaceForensics++ and WildDeepfake datasets. The generalization experiments on Celeb-DF-v2 and WildDeepfake datasets achieve the AUC of 80.16% and 65.15%. Comparison experiments with multiple methods on other benchmark datasets confirm the superior generalization performance of our proposed method while ensuring model detection accuracy. Our code can be found at: https://github.com/1241128239/MFNet.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theoretical knowledge enhanced genetic algorithm for mine ventilation system optimization considering main fan adjustment","authors":"Wentian Shang, Jinzhang Jia","doi":"10.1007/s40747-024-01619-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01619-5","url":null,"abstract":"<p>Mining safety heavily depends on ventilation, which constitutes a significant portion of the energy costs in operations. Optimizing mine ventilation systems (MVSO) is crucial for minimizing this energy expenditure. However, current algorithms encounter challenges when applied to large-scale mines, primarily due to the complexity of variables and limited attention to optimizing main fans. This study introduces a theoretical knowledge enhanced genetic algorithm for MVSO, incorporating main fan adjustments. The algorithm models changes in the main fan’s operational status and integrates ventilation network equivalent simplification (VNES) and the minimum spanning tree (MST) to reduce the number of variables in the mine ventilation network. Additionally, leveraging mine ventilation sensitivity theory (MVST) enhances the quality of the initial algorithmic population. A simple case and two engineering cases collectively validated that the algorithm consistently provides effective and reliable optimization solutions for mine ventilation systems across varying scales. Specifically, the algorithm reduced energy consumption from 326.94 to 186.99 kW, 433.14 to 239.48 kW, and 520.53 to 324.90 kW across three different scales of mine ventilation systems. Comparative analysis with four other algorithms shows that, although this algorithm has a longer runtime due to the need to identify the minimum spanning tree during iterations, its ability to reduce problem dimensionality and improve population quality results in more stable and superior convergence performance, especially for large-scale mine ventilation systems.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"18 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pufen Zhang, Jiaxiang Wang, Meng Wan, Song Zhang, Jie Jing, Lianhong Ding, Peng Shi
{"title":"Audio-visual event localization with dual temporal-aware scene understanding and image-text knowledge bridging","authors":"Pufen Zhang, Jiaxiang Wang, Meng Wan, Song Zhang, Jie Jing, Lianhong Ding, Peng Shi","doi":"10.1007/s40747-024-01654-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01654-2","url":null,"abstract":"<p>Audio-visual event localization (AVEL) task aims to judge and classify an audible and visible event. Existing methods devote to this goal by transferring pre-trained knowledge as well as understanding temporal dependencies and cross-modal correlations of the audio-visual scene. However, most works comprehend the audio-visual scene from an entangled temporal-aware perspective, ignoring the learning of temporal dependency and cross-modal correlation in both forward and backward temporal-aware views. Recently, transferring the pre-trained knowledge from Contrastive Language-Image Pre-training model (CLIP) has shown remarkable results across various tasks. Nevertheless, since audio-visual knowledge of the AVEL task and image-text alignment knowledge of the CLIP exist heterogeneous gap, how to transfer the image-text alignment knowledge of CLIP into AVEL field has barely been investigated. To address these challenges, a novel Dual Temporal-aware scene understanding and image-text Knowledge Bridging (DTKB) model is proposed in this paper. DTKB consists of forward and backward temporal-aware scene understanding streams, in which temporal dependencies and cross-modal correlations are explicitly captured from dual temporal-aware perspectives. Consequently, DTKB can achieve fine-grained scene understanding for event localization. Additionally, a knowledge bridging (KB) module is proposed to simultaneously transfer the image-text representation and alignment knowledge of CLIP to AVEL task. This module regulates the ratio between audio-visual fusion features and CLIP’s visual features, thereby bridging the image-text alignment knowledge of CLIP and the audio-visual new knowledge for event category prediction. Besides, the KB module is compatible with previous models. Extensive experimental results demonstrate that DTKB significantly outperforms the state-of-the-arts models.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"34 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional time series classification with multiple attention mechanism","authors":"Chen Liu, Zihan Wei, Lixin Zhou, Ying Shao","doi":"10.1007/s40747-024-01630-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01630-w","url":null,"abstract":"<p>The classification of multidimensional time series holds significant importance across various domains, including action classification, medical diagnosis, and credit assessment. Within multidimensional time series data, features pertinent to classification exhibit variance in their positional distribution along the entirety of the sequence. Moreover, the relative significance of features across distinct dimensions also fluctuates, contributing to suboptimal performance in multidimensional time series classification. Consequently, the proposition of tailored deep learning models for feature extraction specific to multidimensional time series data becomes imperative. This paper introduces attention mechanisms applied to the temporal dimension, graph attention mechanisms for inter-dimensional relationships within multidimensional data, and attention mechanisms applied between channels post-convolutional calculations. These mechanisms are deployed for feature extraction across temporal, variational, and channel dimensions of multidimensional time series data, respectively. Furthermore, attention is directed towards inter-channel interactions within the squeeze-and-excitation network to enhance the model’s representational capacity. Experimental findings substantiate the viability of integrating attention mechanisms into multidimensional time series classification endeavors.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Zhong, Wang Yue, Wang Haoran, Tang Nan, Wang Shuyue
{"title":"Integrating fast iterative filtering and ensemble neural network structure with attention mechanism for carbon price forecasting","authors":"Wang Zhong, Wang Yue, Wang Haoran, Tang Nan, Wang Shuyue","doi":"10.1007/s40747-024-01609-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01609-7","url":null,"abstract":"<p>Accurate carbon price forecasts are crucial for policymakers and enterprises to understand the dynamics of carbon price fluctuations, enabling them to formulate informed policies and investment strategies. However, due to the non-linear and non-stationary nature of carbon price, traditional models often struggle to achieve high prediction accuracy. To address this challenge, this study proposes a novel integrated prediction framework designed to enhance forecast accuracy. First, the carbon price series is decomposed into a series of smoother subsequences using fast iterative filtering (FIF). Subsequently, an integrated prediction model, AM-TCN-LSTM, is constructed, incorporating the attention mechanism (AM), temporal convolutional networks (TCN), and long short-term memory (LSTM) neural networks. The attention mechanism adaptively captures complex features from multiple factors, while the TCN-LSTM efficiently extracts temporal features from the sequences. Finally, the results from each subsequence are aggregated to generate the final prediction. Five carbon markets in china: Guangdong, Hubei, Shenzhen, Beijing, and Shanghai were selected to verify the validity of the proposed model. Various comparative models and evaluation metrics were employed to assess performance. The results demonstrate that: (1) the TCN-LSTM model achieves higher prediction accuracy compared to single models. (2) FIF is a more effective decomposition method with superior performance compared to EMD-based methods. (3) The proposed model exhibits the highest predictive capability, with MAE values of 0.0964, 0.1403, 1.9476, 2.0848, and 0.5029 for the five carbon markets, significantly outperforming comparison models. (4) The attention mechanism effectively captures the influence of multiple factors on carbon price, particularly within the short-term components.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}