Xin Wang;Jianhui Lv;Madini O. Alassafi;Fawaz E. Alsaadi;B. D. Parameshachari;Longhao Zou;Gang Feng;Zhonghua Liu
{"title":"Deep Bi-Directional Adaptive Gating Graph Convolutional Networks for Spatio-Temporal Traffic Forecasting","authors":"Xin Wang;Jianhui Lv;Madini O. Alassafi;Fawaz E. Alsaadi;B. D. Parameshachari;Longhao Zou;Gang Feng;Zhonghua Liu","doi":"10.26599/TST2024.9010134","DOIUrl":"https://doi.org/10.26599/TST2024.9010134","url":null,"abstract":"With the advent of deep learning, various deep neural network architectures have been proposed to capture the complex spatio-temporal dependencies in traffic data. This paper introduces a novel Deep Bi-directional Adaptive Gating Graph Convolutional Network (DBAG-GCN) model for spatio-temporal traffic forecasting. The proposed model leverages the power of graph convolutional networks to capture the spatial dependencies in the road network topology and incorporates bi-directional gating mechanisms to control the information flow adaptively. Furthermore, we introduce a multi-scale temporal convolution module to capture multi-scale temporal dynamics and a contextual attention mechanism to integrate external factors such as weather conditions and event information. Extensive experiments on real-world traffic datasets demonstrate the superior performance of DBAG-GCN compared to state-of-the-art baselines, achieving significant improvements in prediction accuracy and computational efficiency. The DBAG-GCN model provides a powerful and flexible framework for spatio-temporal traffic forecasting, paving the way for intelligent transportation management and urban planning.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2060-2080"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979652","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Modality Integration Attention with Graph-Based Feature Extraction for Visual Question and Answering","authors":"Jing Lu;Chunlei Wu;Leiquan Wang;Ran Li;Xiuxuan Shen","doi":"10.26599/TST.2024.9010093","DOIUrl":"https://doi.org/10.26599/TST.2024.9010093","url":null,"abstract":"Visual Question and Answering (VQA) has garnered significant attention as a domain that requires the synthesis of visual and textual information to produce accurate responses. While existing methods often rely on Convolutional Neural Networks (CNNs) for feature extraction and attention mechanisms for embedding learning, they frequently fail to capture the nuanced interactions between entities within images, leading to potential ambiguities in answer generation. In this paper, we introduce a novel network architecture, Dual-modality Integration Attention with Graph-based Feature Extraction (DIAGFE), which addresses these limitations by incorporating two key innovations: a Graph-based Feature Extraction (GFE) module that enhances the precision of visual semantics extraction, and a Dual-modality Integration Attention (DIA) mechanism that efficiently fuses visual and question features to guide the model towards more accurate answer generation. Our model is trained with a composite loss function to refine its predictive accuracy. Rigorous experiments on the VQA2.0 dataset demonstrate that DIAGFE outperforms existing methods, underscoring the effectiveness of our approach in advancing VQA research and its potential for cross-modal understanding.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2133-2145"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979795","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation","authors":"Ling Zhou;Qirong Mao;Ming Dong","doi":"10.26599/TST.2024.9010095","DOIUrl":"https://doi.org/10.26599/TST.2024.9010095","url":null,"abstract":"Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2114-2132"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun
{"title":"A Nonconvex Activated Fuzzy RNN with Noise-Immune for Time-Varying Quadratic Programming Problems: Application to Plant Leaf Disease Identification","authors":"Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun","doi":"10.26599/TST.2024.9010127","DOIUrl":"https://doi.org/10.26599/TST.2024.9010127","url":null,"abstract":"Nonconvex Activated Fuzzy Zeroing Neural Network-based (NAFZNN) and Nonconvex Activated Fuzzy Noise-Tolerant Zeroing Neural Network-based (NAFNTZNN) models are devised and analyzed, drawing inspiration from the classical ZNN/NTZNN-based model for online addressing Time-Varying Quadratic Programming Problems (TVQPPs) with Equality and Inequality Constraints (EICs) in noisy circumstances, respectively. Furthermore, the proposed NAFZNN model and NAFNTZNN model are considered as general proportion-differentiation controller, along with general proportion-integration-differentiation controller. Besides, theoretical results demonstrate the global convergence of both the NAFZNN and NAFNTZNN models for TVQPPs with EIC under noisy conditions. Moreover, numerical results illustrate the efficiency, robustness, and ascendancy of the NAFZNN and NAFZNN models in addressing TVQPPs online, exhibiting inherent noise tolerance. Ultimately, an application example to plant leaf disease identification is conducted to support the feasibility and efficacy of the designed NAFNTZNN model, which shows its potential practical value in the field of image recognition.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"1994-2013"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979779","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Envisioning a Future Beyond Tomorrow with Script Event Stream Prediction","authors":"Zhiyi Fang;Zhuofeng Li;Qingyong Zhang;Changhua Xu;Pinzhuo Tian;Shaorong Xie","doi":"10.26599/TST.2024.9010158","DOIUrl":"https://doi.org/10.26599/TST.2024.9010158","url":null,"abstract":"Script event stream prediction is a task that predicts events based on a given context or script. Most existing methods predict one subsequent event, limiting the ability to make a longer inference about the future. Moreover, external knowledge has been proven to be beneficial for event prediction and used in many methods in the form of relations between events. However, these methods focus mainly on the continuity of actions while ignoring the other components of events. To tackle these issues, we propose a Multi-step Script Event Prediction (MuSEP) method that can make a longer inference according to the given events. We adopt reinforcement learning to implement the multi-step prediction by treating the process as a Markov chain and setting the reward considering both chain-level and event-level thus ensuring the overall quality of prediction results. Additionally, we learn the representations of events with external knowledge which could better understand events and their components. Experimental results on four datasets demonstrate that our method not only outperforms state-of-the-art methods on one-step prediction but is also capable of making multi-step prediction.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2048-2059"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grayscale-Assisted RGB Image Conversion from Near-Infrared Images","authors":"Yunyi Gao;Qiankun Liu;Lin Gu;Ying Fu","doi":"10.26599/TST.2024.9010115","DOIUrl":"https://doi.org/10.26599/TST.2024.9010115","url":null,"abstract":"Near-InfraRed (NIR) imaging technology plays a pivotal role in assisted driving and safety surveillance systems, yet its monochromatic nature and deficiency in detail limit its further application. Recent methods aim to recover the corresponding RGB image directly from the NIR image using Convolutional Neural Networks (CNN). However, these methods struggle with accurately recovering both luminance and chrominance information and the inherent deficiencies in NIR image details. In this paper, we propose grayscale-assisted RGB image restoration from NIR images to recover luminance and chrominance information in two stages. We address the complex NIR-to-RGB conversion challenge by decoupling it into two separate stages. First, it converts NIR to grayscale images, focusing on luminance learning. Then, it transforms grayscale to RGB images, concentrating on chrominance information. In addition, we incorporate frequency domain learning to shift the image processing from the spatial domain to the frequency domain, facilitating the restoration of the detailed textures often lost in NIR images. Empirical evaluations of our grayscale-assisted framework and existing state-of-the-art methods demonstrate its superior performance and yield more visually appealing results. Code is accessible at: https://github.com/Yiiclass/RING","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2215-2226"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979784","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gesture Recognition with Focuses Using Hierarchical Body Part Combination","authors":"Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie","doi":"10.26599/TST.2024.9010059","DOIUrl":"https://doi.org/10.26599/TST.2024.9010059","url":null,"abstract":"Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1583-1599"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908593","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan
{"title":"Social Media-Driven User Community Finding with Privacy Protection","authors":"Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan","doi":"10.26599/TST.2024.9010065","DOIUrl":"https://doi.org/10.26599/TST.2024.9010065","url":null,"abstract":"In the digital era, social media platforms play a crucial role in forming user communities, yet the challenge of protecting user privacy remains paramount. This paper proposes a novel framework for identifying and analyzing user communities within social media networks, emphasizing privacy protection. In detail, we implement a social media-driven user community finding approach with hashing named MCF to ensure that the extracted information cannot be traced back to specific users, thereby maintaining confidentiality. Finally, we design a set of experiments to verify the effectiveness and efficiency of our proposed MCF approach by comparing it with other existing approaches, demonstrating its effectiveness in community detection while upholding stringent privacy standards. This research contributes to the growing field of social network analysis by providing a balanced solution that respects user privacy while uncovering valuable insights into community dynamics on social media platforms.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1782-1792"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908665","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Label Prototype-Aware Structured Contrastive Distillation","authors":"Yuelong Xia;Yihang Tong;Jing Yang;Xiaodi Sun;Yungang Zhang;Huihua Wang;Lijun Yun","doi":"10.26599/TST.2024.9010182","DOIUrl":"https://doi.org/10.26599/TST.2024.9010182","url":null,"abstract":"Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1808-1830"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908678","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}