{"title":"Refined feature enhancement network for object detection","authors":"Zonghui Li, Yongsheng Dong","doi":"10.1007/s40747-024-01622-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01622-w","url":null,"abstract":"<p>Convolutional neural networks-based object detection techniques have achieved positive performances. However, due to the limitations of local receptive field, some existing object detection methods cannot effectively capture global information in feature extraction phases, and thus lead to unsatisfactory detection performance. Moreover, the feature information extracted by the backbone network may be redundant. To alleviate these problems, in this paper we propose a refined feature enhancement network (RFENet) for object detection. Specifically, we first propose a feature enhancement module (FEM) to capture more global and local information from feature maps with certain long-range dependencies. We further propose a multi-branch dilated attention mechanism (MDAM) to refine the extracted features in a weighted form, which can select more important spatial and channel information and broaden the receptive field of the network. Finally, we validate RFENet on MS-COCO2017, PASCAL VOC2012, and PASCAL VOC07+12 datasets, respectively. Compared to the baseline network, our RFENet improves by 2.4 AP on MS-COCO2017 dataset, 3.4 mAP on PASCAL VOC2012 dataset, and 2.7 mAP on PASCAL VOC07+12 dataset. Extensive experiments show that our RFENet can perform competitively on different datasets. The code is available at https://github.com/object9detection/RFENet.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"70 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Travel route recommendation with a trajectory learning model","authors":"Xiangping Wu, Zheng Zhang, Wangjun Wan","doi":"10.1007/s40747-024-01611-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01611-z","url":null,"abstract":"<p>This study addresses a critical issue in location-based services: travel route recommendation. It leverages historical trajectory data to predict the actual route on a road network from a starting point to a destination, given a specific departure time. However, capturing the latent patterns in complex trajectory data for accurate route planning presents a significant challenge. Existing route recommendation methods commonly face two major problems: first, inadequate integration of multi-source data, which fails to fully consider the potential factors affecting route choice; and second, limited capability to capture road network characteristics, which restricts the effective application of node features and negatively impacts recommendation accuracy. To address these issues, this research introduces a Trajectory Learning Model for Route Recommendation (TLMR) based on deep learning techniques. TLMR enhances the understanding of user route choice behavior in complex environments by integrating multi-source data. Moreover, by incorporating road network features, TLMR more effectively captures and utilizes the structural and dynamic information of the road network. Specifically, TLMR first employs a Position-aware Graph Neural Network to learn features of intersections from the road network, incorporating context features like weather and traffic conditions. Then, it integrates this information through neural networks to predict the next intersection. Finally, a beam search algorithm is applied to generate and recommend multiple candidate routes. Extensive experiments on four large real-world datasets demonstrate that TLMR outperforms existing methods in four key performance metrics. These results prove the effectiveness and superiority of TLMR in route recommendation.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junfeng Chen, Mao Mao, Azhu Guan, Altangerel Ayush
{"title":"Residual trio feature network for efficient super-resolution","authors":"Junfeng Chen, Mao Mao, Azhu Guan, Altangerel Ayush","doi":"10.1007/s40747-024-01624-8","DOIUrl":"https://doi.org/10.1007/s40747-024-01624-8","url":null,"abstract":"<p>Deep learning-based approaches have demonstrated impressive performance in single-image super-resolution (SISR). Efficient super-resolution compromises the reconstructed image’s quality to have fewer parameters and Flops. Ensured efficiency in image reconstruction and improved reconstruction quality of the model are significant challenges. This paper proposes a trio branch module (TBM) based on structural reparameterization. TBM achieves equivalence transformation through structural reparameterization operations, which use a complex network structure in the training phase and convert it to a more lightweight structure in the inference, achieving efficient inference while maintaining accuracy. Based on the TBM, we further design a lightweight version of the enhanced spatial attention mini (ESA-mini) and the residual trio feature block (RTFB). Moreover, the multiple RTFBs are combined to construct the residual trio network (RTFN). Finally, we introduce a localized contrast loss for better applicability to the super-resolution task, which enhances the reconstruction quality of the super-resolution model. Experiments show that the RTFN framework proposed in this paper outperforms other state-of-the-art efficient super-resolution methods in terms of inference speed and reconstruction quality.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"32 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhang, Jinbang Hong, Qing Bai, Haifeng Liang, Peican Zhu, Qun Song
{"title":"Enhancing adversarial transferability with local transformation","authors":"Yang Zhang, Jinbang Hong, Qing Bai, Haifeng Liang, Peican Zhu, Qun Song","doi":"10.1007/s40747-024-01628-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01628-4","url":null,"abstract":"<p>Robust deep learning models have demonstrated significant applicability in real-world scenarios. The utilization of adversarial attacks plays a crucial role in assessing the robustness of these models. Among such attacks, transfer-based attacks, which leverage white-box models to generate adversarial examples, have garnered considerable attention. These transfer-based attacks have demonstrated remarkable efficiency, particularly under the black-box setting. Notably, existing transfer attacks often exploit input transformations to amplify their effectiveness. However, prevailing input transformation-based methods typically modify input images indiscriminately, overlooking regional disparities. To bolster the transferability of adversarial examples, we propose the Local Transformation Attack (LTA) based on forward class activation maps. Specifically, we first obtain future examples through accumulated momentum and compute forward class activation maps. Subsequently, we utilize these maps to identify crucial areas and apply pixel scaling for transformation. Finally, we update the adversarial examples by using the average gradient of the transformed image. Extensive experiments convincingly demonstrate the effectiveness of our proposed LTA. Compared to the current state-of-the-art attack approaches, LTA achieves an increase of 7.9% in black-box attack performance. Particularly, in the case of ensemble attacks, our method achieved an average attack success rate of 98.3%.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenlin Liu, Zishuang Pan, Wei Han, Xichao Su, Dazhao Yu, Bing Wan
{"title":"Construction of kill webs with heterogeneous UAV swarms in dynamic contested environments","authors":"Wenlin Liu, Zishuang Pan, Wei Han, Xichao Su, Dazhao Yu, Bing Wan","doi":"10.1007/s40747-024-01644-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01644-4","url":null,"abstract":"<p>With the concept of \"mosaic warfare,\" a novel combat style that involves constructing \"kill webs\" with unmanned aerial vehicle (UAV) swarms has emerged. However, little research has focused on this specific task scenario, particularly concerning the self-organization and adaptive collaboration of heterogeneous combat units in dynamic contested environments. Considering the scales and highly dynamic natures of such swarms, an adaptive communication network mechanism is developed based on the Molloy-Reed criterion. In contrast with common offline/noncombat task scenarios, the self-organization process is refined through agent-based modeling, and a combat effectiveness evaluation is introduced to provide enhanced task execution incentives. The proposed dynamic consensus-based coalition algorithm (DCBCA) addresses UAV intelligence defects such as \"confusion,\" \"forgetfulness,\" and \"recklessness\" during the dynamic target selection process, enabling effective bottom-up kill webs construction. Extensive simulation results demonstrate that the algorithmic system outlined in this paper can support the efficient and resilient operations of large-scale heterogeneous UAV swarms. The DCBCA outperforms the dynamically improved consensus-based grouping algorithm (CBGA) and the consensus-based timetable algorithm (CBTA) in terms of performance and convergence speed.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"2 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lal Khan, Atika Qazi, Hsien-Tsung Chang, Mousa Alhajlah, Awais Mahmood
{"title":"Empowering Urdu sentiment analysis: an attention-based stacked CNN-Bi-LSTM DNN with multilingual BERT","authors":"Lal Khan, Atika Qazi, Hsien-Tsung Chang, Mousa Alhajlah, Awais Mahmood","doi":"10.1007/s40747-024-01631-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01631-9","url":null,"abstract":"<p>Sentiment analysis (SA) as a research field has gained popularity among the researcher throughout the globe over the past 10 years. Deep neural networks (DNN) and word vector models are employed nowadays and perform well in sentiment analysis. Among the different deep neural networks utilized for SA globally, Bi-directional long short-term memory (Bi-LSTM), BERT, and CNN models have received much attention. Even though these models can process a wide range of text types, Because DNNs treat different features the same, using these models in the feature learning phase of a DNN model leads to the creation of a feature space with very high dimensionality. We suggest an attention-based, stacked, two-layer CNN-Bi-LSTM DNN to overcome these glitches. After local feature extraction, by applying stacked two-layer Bi-LSTM, our proposed model extracted coming and outgoing sequences by seeing sequential data streams in backward and forward directions. The output of the stacked two-layer Bi-LSTM is supplied to the attention layer to assign various words with varying values. A second Bi-LSTM layer is constructed atop the initial layer in the suggested network to increase performance. Various experiments have been conducted to evaluate the effectiveness of our proposed model on two Urdu sentiment analysis datasets named as UCSA-21 and UCSA, and an accuracies of 83.12% and 78.91% achieved, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"245 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RMGANets: reinforcement learning-enhanced multi-relational attention graph-aware network for anti-money laundering detection","authors":"Qianyu Wang, Wei-Tek Tsai, Bowen Du","doi":"10.1007/s40747-024-01615-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01615-9","url":null,"abstract":"<p>Given the anonymity and complexity of illegal transactions, traditional deep-learning methods struggle to establish correlations between transaction addresses, cash flows, and physical users. Additionally, the limited number of labels for illegal transactions results in severe class imbalance and other challenges. To overcome these limitations, we propose a reinforcement learning-enhanced, multi-relational, attention graph-aware framework to detect anti-money laundering and illegal trading activities. On the one hand, a data-driven, graph-aware layer establishes long-term dependencies and correlations between transaction graph nodes. Similarity among graph nodes divides the topological graph into three subgraphs. Learning from these subgraphs and converging nodes enriches local, global, and contextual details. Simultaneously, using repeated nodes across the subgraphs enhances interactivity between them, reduces intra-class ambiguity, and accentuates inter-class differences. On the other hand, a reinforcement learning module embedded in the graph-aware layer compensates for the missing details in node features caused by masking operations. Furthermore, the reconstructed loss function addresses significant classification inaccuracies by reducing the weight assigned to easily classified samples. Balancing these issues and individually supervising each component enables the detection framework to achieve optimal performance. The evaluation results demonstrate that our proposed model exhibits optimal detection performance and robustness, such as F1 of 93.85% and 94.39%.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"5 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open-world disaster information identification from multimodal social media","authors":"Chen Yu, Bin Hu, Zhiguo Wang","doi":"10.1007/s40747-024-01635-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01635-5","url":null,"abstract":"<p>The application of multimodal deep learning for emergency response and recovery, specifically in disaster social media analysis, is of utmost importance. It is worth noting that in real-world scenarios, sudden disaster events may differ from the training data, which may require the multimodal network to predict them as unknown classes instead of misclassifying them to known ones. Previous studies have primarily focused on model accuracy in a closed environment and may not be able to directly detect unknown classes. Thus, we propose a novel multimodal model for categorizing social media related to disasters in an open-world environment. Our methodology entails utilizing pre-trained unimodal models as encoders for each modality and performing information fusion with a cross-attention module to obtain the joint representation. For open-world detection, we use a multitask classifier that encompasses both a closed-world and an open-world classifier. The closed-world classifier is trained on the original data to classify known classes, whereas the open-world classifier is used to determine whether the input belongs to a known class. Furthermore, we propose a sample generation strategy that models the distribution of unknown samples using known data, which allows the open-world classifier to identify unknown samples. Our experiments were conducted on two public datasets, namely CrisisMMD and MHII. According to the experimental results, the proposed method outperforms other baselines and approaches in crisis information classification.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"150 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing buffet onset prediction: a deep learning approach with enhanced interpretability for aerodynamic engineering","authors":"Jing Wang, Wei Liu, Hairun Xie, Miao Zhang","doi":"10.1007/s40747-024-01612-y","DOIUrl":"https://doi.org/10.1007/s40747-024-01612-y","url":null,"abstract":"<p>The interaction between the shock wave and boundary layer of transonic wings can trigger periodic self-excited oscillations, resulting in transonic buffet. Buffet severely restricts the flight envelope of civil aircraft and is directly related to their aerodynamic performance and safety. Developing efficient and reliable techniques for buffet onset prediction is crucial for the advancement of civil aircraft. In this study, utilizing a comprehensive database of supercritical airfoils generated through numerical simulations, a convolutional neural network (CNN) model is firstly developed to perform buffet classification based on the flow fields. After that, employing explainable machine learning techniques, including Gradient-weighted Class Activation Mapping (Grad-CAM), random forest algorithms, and statistical analysis, the research investigates the correlations between supervised CNN features and key physical characteristics related with the separation region, shock wave, leading edge suction peak, and post-shock loading. Finally, physical buffet onset metric is established with good generalization and accuracy, providing valuable guidance for engineering design in civil aircraft.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"147 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongkang Ding, Jiechen Li, Hao Wang, Ziang Liu, Anqi Wang
{"title":"Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification","authors":"Yongkang Ding, Jiechen Li, Hao Wang, Ziang Liu, Anqi Wang","doi":"10.1007/s40747-024-01646-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01646-2","url":null,"abstract":"<p>Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"18 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142597507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}