{"title":"A Survey of Human-Object Interaction Detection With Deep Learning","authors":"Geng Han;Jiachen Zhao;Lele Zhang;Fang Deng","doi":"10.1109/TETCI.2024.3518613","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3518613","url":null,"abstract":"Human-object interaction (HOI) detection has attracted significant attention due to its wide applications, including human-robot interactions, security monitoring, automatic sports commentary, etc. HOI detection aims to detect humans, objects, and their interactions in a given image or video, so it needs a higher-level semantic understanding of the image than regular object recognition or detection tasks. It is also more challenging technically because of some unique difficulties, such as multi-object interactions, long-tail distribution of interaction categories, etc. Currently, deep learning methods have achieved great performance in HOI detection, but there are few reviews describing the recent advance of deep learning-based HOI detection. Moreover, the current stage-based category of HOI detection methods is causing confusion in community discussion and beginner learning. To fill this gap, this paper summarizes, categorizes, and compares methods using deep learning for HOI detection over the last nine years. Firstly, we summarize the pipeline of HOI detection methods. Then, we divide existing methods into three categories (two-stage, one-stage, and transformer-based), distinguish them in formulas and schematics, and qualitatively compare their advantages and disadvantages. After that, we review each category of methods in detail, focusing on HOI detection methods for images. Moreover, we explore the development process of using foundation models for HOI detection. We also quantitatively compare the performance of existing methods on public HOI datasets. At last, we point out the future research direction of HOI detection.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"3-26"},"PeriodicalIF":5.3,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Chen;Peifeng Zhang;Jiahui Chen;Terry Shue Chien Lau
{"title":"Heterogeneity-Aware Clustering and Intra-Cluster Uniform Data Sampling for Federated Learning","authors":"Jian Chen;Peifeng Zhang;Jiahui Chen;Terry Shue Chien Lau","doi":"10.1109/TETCI.2024.3515007","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3515007","url":null,"abstract":"Federated learning (FL) is an innovative privacy-preserving machine learning paradigm that enables clients to train a global model without sharing their local data. However, the coexistence of category distribution heterogeneity and quantity imbalance frequently occurs in real-world FL scenarios. On the one side, due to the category distribution heterogeneity, local models are optimized based on distinct local objectives, resulting in divergent optimization directions. On the other side, quantity imbalance in widely used uniform client sampling of FL may hinder the active participation of clients with larger datasets in model training, and potentially make the model get suboptimal performance. To tackle this, we propose a framework that incorporates heterogeneity-aware clustering and intra-cluster uniform data sampling. More precisely, we firstly do heterogeneity-aware clustering that performs clustering on clients based on category distribution vectors. Then, we implement intra-cluster uniform data sampling, where local data from each client within a cluster is randomly selected based on a predetermined probability. Furthermore, to address privacy concerns, we incorporate homomorphic encryption to protect clients' category distribution vectors and sample sizes. Finally, the experimental results on multiple benchmark datasets demonstrate that the proposed framework validate the superiority of our approach.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2545-2556"},"PeriodicalIF":5.3,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Gu;Yuchen Liu;Hongyan Liu;Bo Liu;Lai-Kuan Wong;Weisi Lin;Junfei Qiao
{"title":"Model-Data Jointly Driven Method for Airborne Particulate Matter Monitoring","authors":"Ke Gu;Yuchen Liu;Hongyan Liu;Bo Liu;Lai-Kuan Wong;Weisi Lin;Junfei Qiao","doi":"10.1109/TETCI.2024.3502433","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3502433","url":null,"abstract":"In this paper we propose a novel model-data jointly driven (MDJD) method from a single picture for airborne particulate matter (APM) monitoring, towards assisting the decision-making for government and reducing the health risks for individuals. The MDJD method is mainly composed of three steps. First, we create a vector of .distance. as the model driven natural scene statistic (NSS) features through comparing the sparsity features that are extracted from one picture in five transform domains with their corresponding benchmark features that are derived by using a huge number of pictures with the extremely low APM concentrations in advance. Second, we produce a vector of .distance. as the data-driven NSS features through comparing the contrast-sensitive features that are chosen from hundreds of deep features with their associated benchmark features that are derived based on the same feature generation method as used in model-driven NSS features. Lastly, we fuse the aforesaid model- and data-driven NSS features by introducing a nonlinear regressor to estimate the APM concentration. Extensive experiments conducted on two large-size APM picture datasets validate the superiority of our proposed MDJD method over the state-of-the-art model-driven methods and data-driven methods by a sizable gain of 7.4% in terms of peak signal to noise ratio. Via a series of ablation studies, we can observe that fusing model- and data-driven NSS features is beneficial to improving the model's generalization and fitting abilities and leads to the gains of over 15.1% compared with using either type of features in isolation.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2557-2571"},"PeriodicalIF":5.3,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Exercise Group Assembly Using a Two Archive Evolutionary Algorithm","authors":"Yifei Sun;Yifei Cao;Ziang Wang;Sicheng Hou;Weifeng Gao;Zhi-Hui Zhan","doi":"10.1109/TETCI.2024.3514976","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3514976","url":null,"abstract":"Traditional exercise recommendation algorithms generate exercise groups according to the features of exercises for all students. However, as different students may have different knowledge proficiencies, this article focuses on the personalized exercises group assembly (PEGA) to select exercises for each student based on their knowledge proficiencies, which is formulated as a constrained multi-objective problem. In order to solve the constrained multi-objective PEGA problem efficiently, this paper proposes a two archives evolutionary algorithm (TAEA) with three novel designs. Firstly, as the number of exercises is very large in PEGA, the traditional binary-number encoding method will result in high consumption in both memory and computation. To this end, a new integer-number encoding (INE) method is designed for solution representation. It saves memory for exercise subset representation, speeds up evaluations, and generates solutions that satisfy some constraints. Secondly, based on the INE method, the TAEA adopts two archives name convergence-oriented archive (CA) and diversity-oriented archive (DA). The CA ensures the convergence, driving force, and feasibility of the solutions. The DA aims to provide as diverse a solution as possible, including exploration of infeasible regions. Thirdly, a classification-based offspring co-reproduction strategy is proposed to solve the issue of too much infeasible space exploration. Experimental results show that our INE method can help to reduce the running time and improve the optimization results. The effectiveness of the TAEA is demonstrated by comparing it with some recent existing algorithms.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2572-2583"},"PeriodicalIF":5.3,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PFPS: Polymerized Feature Panoptic Segmentation Based on Fully Convolutional Networks","authors":"Shucheng Ji;Xiaochen Yuan;Junqi Bao;Tong Liu;Yang Lian;Guoheng Huang;Guo Zhong","doi":"10.1109/TETCI.2024.3515004","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3515004","url":null,"abstract":"Panoptic segmentation requires the prediction of a pixel-level mask with a category label in an image. In recent years, panoptic segmentation has been gaining more attention since it can help us understand objects and the environment in many fields, such as medical images, remote sensing images, and autonomous driving. However, existing panoptic segmentation methods are usually challenging for multi-scale object segmentation and boundary localization. In this paper, we propose a Polymerized Feature Panoptic Segmentation (PFPS) to enhance the network's feature representation ability by polymerizing the extracted stage features. Specifically, we propose a Generalization-Enhanced Stage Feature Generation Module (GSFGM) to extract and enhance the stage features. In the GSFGM, a novel Sampled and Concated Feature Generation (SCFG) is designed as an individual component, which polymerizes the convoluted backbone features to enhance multi-scale feature representation. Thereafter, we propose a Stage Feature Re-weight Module (SFRM) to ensure the network can learn efficient information from the massive channels. Moreover, we further propose a Unified Encoder Module (UEM) to provide spatial information and compress the high-dimensional features by coordinating convolution operations and channel attention. To demonstrate the superiority of the proposed PFPS, we conduct experiments on the COCO-2017 and the Cityscapes validation datasets. The experimental results indicate that the PFPS achieves a better performance in PQ of 43.0%, SQ of 80.4%, RQ of 51.9%, PQ<sup>th</sup> of 48.6%, SQ<sup>th</sup> of 82.6%, RQ<sup>th</sup> of 58.1%, PQ<sup>st</sup> of 34.6% on COCO-2017 validation dataset, while PQ of 61.7%, and PQ<sup>st</sup> of 67.9% on Cityscapes validation dataset.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2584-2596"},"PeriodicalIF":5.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Bit Mixed-Precision Quantization and Acceleration of CNN for FPGA Deployment","authors":"JianRong Wang;Zhijun He;Hongbo Zhao;Rongke Liu","doi":"10.1109/TETCI.2024.3510295","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3510295","url":null,"abstract":"Nowadays, the deployment of intelligent networks on hardware devices for real-time applications is gaining popularity in both academia and industry. However, on-chip resources and power consumption are usually limited, making quantization a crucial step due to its ability to reduce the computational footprint. To this point, mixed-precision bit-width allocation for weights is an effective way to reduce the overall memory footprint while maximizing model accuracy, which can generally be divided into two schemes: per-layer quantization and per-channel quantization. However, the latter has a large searching space, making it hard to obtain optimal solutions, so currently most research focuses on the former scheme. Additionally, there is almost no research targeting the design and optimization of FPGA accelerator structures for per-channel quantization. Motivated by these considerations, this paper first proposes a mixed-precision bit allocation method, called Hierarchical Bit Programming (HBP), which reduces the magnitude of the search space by applying group optimization on channel dimension and consequently reduce the computational complexity of the solving process. Then a loop optimization strategy is presented based on quantization manner, and models are established to evaluate FPGA performance and resource requirement, enabling the evaluation and analysis of accelerator performance bottlenecks and optimization boundaries in the early phase of system design. Based on the optimization results, a hardware accelerator design structure is presented. Several mainstream CNN models are used for evaluation, and on-board tests are conducted on the Zynq MPSoC XCZU15EG FPGA platform. The experiment results show that our HBP method could achieve an improvement of more than 2% on accuracy compared with other related methods. Compared with CPU and GPU, the proposed FPGA accelerator yields speedups of 28.8%, 46.2%, 31.0%, and 35.9% in energy efficiency on VGG-16, ResNet18, ResNet34, and ResNet50, respectively, and the processing latency could be 25% lower than state-of-the-art methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2597-2617"},"PeriodicalIF":5.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Inference of Hidden Markov Models Through Probabilistic Boolean Operations in Spiking Neuronal Networks","authors":"Ayan Chakraborty;Saswat Chakrabarti","doi":"10.1109/TETCI.2024.3502472","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3502472","url":null,"abstract":"Recurrent neural networks (RNN) have been extensively used to address the problem of Bayesian inference of a hidden Markov model (HMM). However, such artificial neural architectures are prone to computationally exhaustive training procedures and high energy dissipation. Spiking neural networks (SNNs) are recently explored for performing similar tasks. An interesting problem on Bayesian inference of hidden Markov models (HMM) on SNN paradigm is addressed in this paper. A population based stochastic temporal encoding (PSTE) scheme has been introduced to establish that a spiking neuron behaves as a probabilistic Boolean operator. Using this property the posterior of a hidden state is mapped to probability of firing a logic <inline-formula><tex-math>$HIGH$</tex-math></inline-formula> by a spiking neuron. Two new algorithms are presented for fixing synaptic strengths denoted by a random variable <inline-formula><tex-math>$q$</tex-math></inline-formula>. The first algorithm uses a sigmoidal relationship from pre-statistical analysis to select the values of <inline-formula><tex-math>$q$</tex-math></inline-formula> such that the probability of a neuron producing a logic HIGH becomes equal to the posterior probability of a hidden state. The second algorithm considers data for appropriately determining <inline-formula><tex-math>$q$</tex-math></inline-formula> through in-network-training. It has been demonstrated that Bayesian inference of both two-state HMMs as well as multi-state HMMs are implementable using the concept of PSTE. Two examples are presented, one on inferring the trend of a time series and the other related to deciphering the correct digit of a seven segment LED display with noisy bits. Our framework has performed very closely with traditional Bayesian inference (difference in accuracy <inline-formula><tex-math>$< 2%$</tex-math></inline-formula>) and traditional RNNs.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2618-2632"},"PeriodicalIF":5.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSE-GCN: A Multiscale Spatiotemporal Feature Aggregation Enhanced Efficient Graph Convolutional Network for Dynamic Sign Language Recognition","authors":"Neelma Naz;Hasan Sajid;Sara Ali;Osman Hasan;Muhammad Khurram Ehsan","doi":"10.1109/TETCI.2024.3509500","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3509500","url":null,"abstract":"Graph convolution networks have emerged as an active area of research for skeleton-based sign language recognition (SLR). One essential problem in this approach is to efficiently extract the most discriminative features capable of modeling short-range and long-range spatial and temporal information over all skeleton joints while ensuring low inference costs. To address this issue, we propose a novel multi-scale efficient graph convolutional network (MSE-GCN) for skeleton-based SLR. The proposed network makes use of separable convolution layers set in a multi-scale setting and embedded in a multi branch (MB) network along with an early fusion scheme, resulting in an accurate, computationally efficient, and faster system. In addition, we have proposed a novel hybrid attention module, named Spatial Temporal Joint Part attention (ST-JPA) to distinguish the most important body parts as well as most informative joints in the specific frames from the whole sign sequence. The performance of proposed network (MSE-GCN) is evaluated on five challenging sign language datasets, WLASL-100, WLASL-300, WLASL-1000, MINDS-Libras, and LIBRAS-UFOP achieving state-of-the-art (SOTA) accuracies of 85.27%, 81.59%, 71.75%, 97.442 ± 1.01%, and 88.59±3.60%, respectively while incurring lower computational costs.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"2979-2994"},"PeriodicalIF":5.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pritam Paral;Saibal Ghosh;Sankar K. Pal;Amitava Chatterjee
{"title":"Adaptive Non-Homogeneous Granulation-Aided Density-Based Deep Feature Clustering for Far Infrared Sign Language Images","authors":"Pritam Paral;Saibal Ghosh;Sankar K. Pal;Amitava Chatterjee","doi":"10.1109/TETCI.2024.3510292","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3510292","url":null,"abstract":"In image clustering applications, deep feature clustering has recently demonstrated impressive performance, which employs deep neural networks for feature learning that favors clustering exercises. In this context, density-based methods have emerged as the preferred choice for the clustering mechanism within the framework of deep feature clustering. However, as the performance of these clustering algorithms is primarily effective on the low-dimensional feature data, deep feature learning models play a crucial role here. With far infrared (FIR) thermal imaging systems working in real-world scenarios, the images captured are largely affected by blurred edges, background noise, thermal irregularities, few details, etc. In this work, we demonstrate the effectiveness of granular computing-based techniques in such scenarios, where the input data contains indiscernible image regions and vague boundary regions. We propose a novel adaptive non-homogeneous granulation (ANHG) technique here that can adaptively select the smallest possible size of granules within a purview of unequally-sized granulation, based on a segmentation assessment index. Proposed ANHG in combination with deep feature learning helps in extracting complex, indiscernible information from the image data and capturing the local intensity variation of the data. Experimental results show significant performance improvement of the density-based deep feature clustering method after the incorporation of the proposed granulation scheme.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1269-1280"},"PeriodicalIF":5.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143716356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Light Image Enhancement Network Based on Multi-Scale Residual Feature Integration","authors":"Shuying Huang;Hebin Liu;Yong Yang;Weiguo Wan","doi":"10.1109/TETCI.2024.3508834","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3508834","url":null,"abstract":"Owing to insufficient light, images captured in low-light environment have a series of image degradation problems such as low visibility, color deviation, and noise. To address these problems, an image enhancement network based on multi-scale residual feature integration (IEN-MRFI) is proposed, which includes two modules: a shallow feature extraction module (SFEM) and a multi-scale feature integration module (MFIM). First, the SFEM is constructed for extracting multi-scale shallow features through three-scale convolutional layers and smooth convolution residual blocks (SCRBs). The constructed SCRB runs through the entire network to extract features and avoid gridding artifacts. Then, the MFIM is constructed by cascading multiple feature integration residual blocks to fuse the shallow and deep features of the same scale. Finally, the fused features are passed through a convolutional layer to obtain an enhanced result. In addition, to improve the generalization ability of our network, this study constructs an outdoor dataset for the training of the low-light image enhancement network. Experiments on indoor and outdoor images show that the enhancement results of our method can provide more accurate color saturation and richer details than those of some state-of-the-art methods. We intend to make the constructed dataset public after our paper is accepted for publication.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"2965-2978"},"PeriodicalIF":5.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}