CAAI Transactions on Intelligence Technology最新文献

筛选
英文 中文
Terahertz image denoising via multiscale hybrid-convolution residual network
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-10-02 DOI: 10.1049/cit2.12380
Heng Wu, Zijie Guo, Chunhua He, Shaojuan Luo, Bofang Song
{"title":"Terahertz image denoising via multiscale hybrid-convolution residual network","authors":"Heng Wu,&nbsp;Zijie Guo,&nbsp;Chunhua He,&nbsp;Shaojuan Luo,&nbsp;Bofang Song","doi":"10.1049/cit2.12380","DOIUrl":"https://doi.org/10.1049/cit2.12380","url":null,"abstract":"<p>Terahertz imaging technology has great potential applications in areas, such as remote sensing, navigation, security checks, and so on. However, terahertz images usually have the problems of heavy noises and low resolution. Previous terahertz image denoising methods are mainly based on traditional image processing methods, which have limited denoising effects on the terahertz noise. Existing deep learning-based image denoising methods are mostly used in natural images and easily cause a large amount of detail loss when denoising terahertz images. Here, a residual-learning-based multiscale hybrid-convolution residual network (MHRNet) is proposed for terahertz image denoising, which can remove noises while preserving detail features in terahertz images. Specifically, a multiscale hybrid-convolution residual block (MHRB) is designed to extract rich detail features and local prediction residual noise from terahertz images. Specifically, MHRB is a residual structure composed of a multiscale dilated convolution block, a bottleneck layer, and a multiscale convolution block. MHRNet uses the MHRB and global residual learning to achieve terahertz image denoising. Ablation studies are performed to validate the effectiveness of MHRB. A series of experiments are conducted on the public terahertz image datasets. The experimental results demonstrate that MHRNet has an excellent denoising effect on synthetic and real noisy terahertz images. Compared with existing methods, MHRNet achieves comprehensive competitive results.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"235-252"},"PeriodicalIF":8.4,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143533457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilingual phrase induction with local hard negative sampling
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-10-01 DOI: 10.1049/cit2.12383
Hailong Cao, Hualin Miao, Weixuan Wang, Liangyou Li, Wei Peng, Tiejun Zhao
{"title":"Bilingual phrase induction with local hard negative sampling","authors":"Hailong Cao,&nbsp;Hualin Miao,&nbsp;Weixuan Wang,&nbsp;Liangyou Li,&nbsp;Wei Peng,&nbsp;Tiejun Zhao","doi":"10.1049/cit2.12383","DOIUrl":"https://doi.org/10.1049/cit2.12383","url":null,"abstract":"<p>Bilingual lexicon induction focuses on learning word translation pairs, also known as bitexts, from monolingual corpora by establishing a mapping between the source and target embedding spaces. Despite recent advancements, bilingual lexicon induction is limited to inducing bitexts consisting of individual words, lacking the ability to handle semantics-rich phrases. To bridge this gap and support downstream cross-lingual tasks, it is practical to develop a method for bilingual phrase induction that extracts bilingual phrase pairs from monolingual corpora without relying on cross-lingual knowledge. In this paper, the authors propose a novel phrase embedding training method based on the skip-gram structure. Specifically, a local hard negative sampling strategy that utilises negative samples of central tokens in sliding windows to enhance phrase embedding learning is introduced. The proposed method achieves competitive or superior performance compared to baseline approaches, with exceptional results recorded for distant languages. Additionally, we develop a phrase representation learning method that leverages multilingual pre-trained language models. These mPLMs-based representations can be combined with the above-mentioned static phrase embeddings to further improve the accuracy of the bilingual phrase induction task. We manually construct a dataset of bilingual phrase pairs and integrate it with MUSE to facilitate the bilingual phrase induction task.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"147-159"},"PeriodicalIF":8.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiJSQ: Direct joint segmentation and quantification of left ventricle with deep multitask-derived regression network
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-09-27 DOI: 10.1049/cit2.12382
Xiuquan Du, Zheng Pei, Ying Liu, Xinzhi Cao, Lei Li, Shuo Li
{"title":"MultiJSQ: Direct joint segmentation and quantification of left ventricle with deep multitask-derived regression network","authors":"Xiuquan Du,&nbsp;Zheng Pei,&nbsp;Ying Liu,&nbsp;Xinzhi Cao,&nbsp;Lei Li,&nbsp;Shuo Li","doi":"10.1049/cit2.12382","DOIUrl":"https://doi.org/10.1049/cit2.12382","url":null,"abstract":"<p>Quantitative analysis of clinical function parameters from MRI images is crucial for diagnosing and assessing cardiovascular disease. However, the manual calculation of these parameters is challenging due to the high variability among patients and the time-consuming nature of the process. In this study, the authors introduce a framework named MultiJSQ, comprising the feature presentation network (FRN) and the indicator prediction network (IEN), which is designed for simultaneous joint segmentation and quantification. The FRN is tailored for representing global image features, facilitating the direct acquisition of left ventricle (LV) contour images through pixel classification. Additionally, the IEN incorporates specifically designed modules to extract relevant clinical indices. The authors’ method considers the interdependence of different tasks, demonstrating the validity of these relationships and yielding favourable results. Through extensive experiments on cardiac MR images from 145 patients, MultiJSQ achieves impressive outcomes, with low mean absolute errors of 124 mm<sup>2</sup>, 1.72 mm, and 1.21 mm for areas, dimensions, and regional wall thicknesses, respectively, along with a Dice metric score of 0.908. The experimental findings underscore the excellent performance of our framework in LV segmentation and quantification, highlighting its promising clinical application prospects.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"175-192"},"PeriodicalIF":8.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143536098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral imagery quality assessment and band reconstruction using the prophet model
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-09-25 DOI: 10.1049/cit2.12373
Ping Ma, Jinchang Ren, Zhi Gao, Yinhe Li, Rongjun Chen
{"title":"Hyperspectral imagery quality assessment and band reconstruction using the prophet model","authors":"Ping Ma,&nbsp;Jinchang Ren,&nbsp;Zhi Gao,&nbsp;Yinhe Li,&nbsp;Rongjun Chen","doi":"10.1049/cit2.12373","DOIUrl":"https://doi.org/10.1049/cit2.12373","url":null,"abstract":"<p>In Hyperspectral Imaging (HSI), the detrimental influence of noise and distortions on data quality is profound, which has severely affected the following-on analytics and decision-making such as land mapping. This study presents an innovative framework for assessing HSI band quality and reconstructing the low-quality bands, based on the Prophet model. By introducing a comprehensive quality metric to start, the authors approach factors in both spatial and spectral characteristics across local and global scales. This metric effectively captures the intricate noise and distortions inherent in the HSI data. Subsequently, the authors employ the Prophet model to forecast information within the low-quality bands, leveraging insights from neighbouring high-quality bands. To validate the effectiveness of the authors’ proposed model, extensive experiments on three publicly available uncorrected datasets are conducted. In a head-to-head comparison, the framework against six state-of-the-art band reconstruction algorithms including three spectral methods, two spatial-spectral methods and one deep learning method is benchmarked. The authors’ experiments also delve into strategies for band selection based on quality metrics and the quality evaluation of the reconstructed bands. In addition, the authors assess the classification accuracy utilising these reconstructed bands. In various experiments, the results consistently affirm the efficacy of the authors’ method in HSI quality assessment and band reconstruction. Notably, the authors’ approach obviates the need for manually prefiltering of noisy bands. This comprehensive framework holds promise in addressing HSI data quality concerns whilst enhancing the overall utility of HSI.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"47-61"},"PeriodicalIF":8.4,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12373","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143536048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visible and near-infrared image fusion based on information complementarity
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-09-18 DOI: 10.1049/cit2.12378
Zhuo Li, Shiliang Pu, Mengqi Ji, Feng Zeng, Bo Li
{"title":"Visible and near-infrared image fusion based on information complementarity","authors":"Zhuo Li,&nbsp;Shiliang Pu,&nbsp;Mengqi Ji,&nbsp;Feng Zeng,&nbsp;Bo Li","doi":"10.1049/cit2.12378","DOIUrl":"https://doi.org/10.1049/cit2.12378","url":null,"abstract":"<p>Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum. The fusion of visible and near-infrared (NIR) aims to enhance the quality of images acquired by video monitoring systems for the ease of user observation and data processing. Unfortunately, current fusion algorithms produce artefacts and colour distortion since they cannot make use of spectrum properties and are lacking in information complementarity. Therefore, an information complementarity fusion (ICF) model is designed based on physical signals. In order to separate high-frequency noise from important information in distinct frequency layers, the authors first extracted texture-scale and edge-scale layers using a two-scale filter. Second, the difference map between visible and near-infrared was filtered using the extended-DoG filter to produce the initial visible-NIR complementary weight map. Then, to generate a guide map, the near-infrared image with night adjustment was processed as well. The final complementarity weight map was subsequently derived via an arctanI function mapping using the guide map and the initial weight maps. Finally, fusion images were generated with the complementarity weight maps. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art in both avoiding artificial colours as well as effectively utilising information complementarity.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"193-206"},"PeriodicalIF":8.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimisation of sparse deep autoencoders for dynamic network embedding
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-08-29 DOI: 10.1049/cit2.12367
Huimei Tang, Yutao Zhang, Lijia Ma, Qiuzhen Lin, Liping Huang, Jianqiang Li, Maoguo Gong
{"title":"Optimisation of sparse deep autoencoders for dynamic network embedding","authors":"Huimei Tang,&nbsp;Yutao Zhang,&nbsp;Lijia Ma,&nbsp;Qiuzhen Lin,&nbsp;Liping Huang,&nbsp;Jianqiang Li,&nbsp;Maoguo Gong","doi":"10.1049/cit2.12367","DOIUrl":"https://doi.org/10.1049/cit2.12367","url":null,"abstract":"<p>Network embedding (NE) tries to learn the potential properties of complex networks represented in a low-dimensional feature space. However, the existing deep learning-based NE methods are time-consuming as they need to train a dense architecture for deep neural networks with extensive unknown weight parameters. A sparse deep autoencoder (called SPDNE) for dynamic NE is proposed, aiming to learn the network structures while preserving the node evolution with a low computational complexity. SPDNE tries to use an optimal sparse architecture to replace the fully connected architecture in the deep autoencoder while maintaining the performance of these models in the dynamic NE. Then, an adaptive simulated algorithm to find the optimal sparse architecture for the deep autoencoder is proposed. The performance of SPDNE over three dynamical NE models (i.e. sparse architecture-based deep autoencoder method, DynGEM, and ElvDNE) is evaluated on three well-known benchmark networks and five real-world networks. The experimental results demonstrate that SPDNE can reduce about 70% of weight parameters of the architecture for the deep autoencoder during the training process while preserving the performance of these dynamical NE models. The results also show that SPDNE achieves the highest accuracy on 72 out of 96 edge prediction and network reconstruction tasks compared with the state-of-the-art dynamical NE algorithms.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 6","pages":"1361-1376"},"PeriodicalIF":8.4,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12367","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extraction of typical operating scenarios of new power system based on deep time series aggregation
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-08-22 DOI: 10.1049/cit2.12369
Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long
{"title":"Extraction of typical operating scenarios of new power system based on deep time series aggregation","authors":"Zhaoyang Qu,&nbsp;Zhenming Zhang,&nbsp;Nan Qu,&nbsp;Yuguang Zhou,&nbsp;Yang Li,&nbsp;Tao Jiang,&nbsp;Min Li,&nbsp;Chao Long","doi":"10.1049/cit2.12369","DOIUrl":"https://doi.org/10.1049/cit2.12369","url":null,"abstract":"<p>Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. A novel deep time series aggregation scheme (DTSAs) is proposed to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyse the intrinsic mechanisms of different scheduling operational scenario switching to mathematically represent typical operational scenarios. A Gramian angular summation field-based operational scenario image encoder was designed to convert operational scenario sequences into high-dimensional spaces. This enables DTSAs to fully capture the spatiotemporal characteristics of new power systems using deep feature iterative aggregation models. The encoder also facilitates the generation of typical operational scenarios that conform to historical data distributions while ensuring the integrity of grid operational snapshots. Case studies demonstrate that the proposed method extracted new fine-grained power system dispatch schemes and outperformed the latest high-dimensional feature-screening methods. In addition, experiments with different new energy access ratios were conducted to verify the robustness of the proposed method. DTSAs enable dispatchers to master the operation experience of the power system in advance, and actively respond to the dynamic changes of the operation scenarios under the high access rate of new energy.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"283-299"},"PeriodicalIF":8.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian network structure learning by dynamic programming algorithm based on node block sequence constraints
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-08-20 DOI: 10.1049/cit2.12363
Chuchao He, Ruohai Di, Bo Li, Evgeny Neretin
{"title":"Bayesian network structure learning by dynamic programming algorithm based on node block sequence constraints","authors":"Chuchao He,&nbsp;Ruohai Di,&nbsp;Bo Li,&nbsp;Evgeny Neretin","doi":"10.1049/cit2.12363","DOIUrl":"https://doi.org/10.1049/cit2.12363","url":null,"abstract":"<p>The use of dynamic programming (DP) algorithms to learn Bayesian network structures is limited by their high space complexity and difficulty in learning the structure of large-scale networks. Therefore, this study proposes a DP algorithm based on node block sequence constraints. The proposed algorithm constrains the traversal process of the parent graph by using the M-sequence matrix to considerably reduce the time consumption and space complexity by pruning the traversal process of the order graph using the node block sequence. Experimental results show that compared with existing DP algorithms, the proposed algorithm can obtain learning results more efficiently with less than 1% loss of accuracy, and can be used for learning larger-scale networks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 6","pages":"1605-1622"},"PeriodicalIF":8.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12363","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral image restoration using noise gradient and dual priors under mixed noise conditions
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-08-19 DOI: 10.1049/cit2.12355
Hazique Aetesam, Suman Kumar Maji, V. B. Surya Prasath
{"title":"Hyperspectral image restoration using noise gradient and dual priors under mixed noise conditions","authors":"Hazique Aetesam,&nbsp;Suman Kumar Maji,&nbsp;V. B. Surya Prasath","doi":"10.1049/cit2.12355","DOIUrl":"https://doi.org/10.1049/cit2.12355","url":null,"abstract":"<p>Images obtained from hyperspectral sensors provide information about the target area that extends beyond the visible portions of the electromagnetic spectrum. However, due to sensor limitations and imperfections during the image acquisition and transmission phases, noise is introduced into the acquired image, which can have a negative impact on downstream analyses such as classification, target tracking, and spectral unmixing. Noise in hyperspectral images (HSI) is modelled as a combination from several sources, including Gaussian/impulse noise, stripes, and deadlines. An HSI restoration method for such a mixed noise model is proposed. <i>First</i>, a joint optimisation framework is proposed for recovering hyperspectral data corrupted by mixed Gaussian-impulse noise by estimating both the clean data as well as the sparse/impulse noise levels. <i>Second</i>, a hyper-Laplacian prior is used along both the spatial and spectral dimensions to express sparsity in clean image gradients. <i>Third</i>, to model the sparse nature of impulse noise, an <i>ℓ</i><sub>1</sub> − norm over the impulse noise gradient is used. Because the proposed methodology employs two distinct priors, the authors refer to it as the hyperspectral dual prior <i>(HySpDualP)</i> denoiser. To the best of authors' knowledge, this joint optimisation framework is the first attempt in this direction. To handle the non-smooth and non-convex nature of the general <i>ℓp</i> − norm-based regularisation term, a generalised shrinkage/thresholding (GST) solver is employed. <i>Finally</i>, an efficient split-Bregman approach is used to solve the resulting optimisation problem. Experimental results on synthetic data and real HSI datacube obtained from hyperspectral sensors demonstrate that the authors’ proposed model outperforms state-of-the-art methods, both visually and in terms of various image quality assessment metrics.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"72-93"},"PeriodicalIF":8.4,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explore human parsing modality for action recognition
IF 8.4 2区 计算机科学
CAAI Transactions on Intelligence Technology Pub Date : 2024-08-16 DOI: 10.1049/cit2.12366
Jinfu Liu, Runwei Ding, Yuhang Wen, Nan Dai, Fanyang Meng, Fang-Lue Zhang, Shen Zhao, Mengyuan Liu
{"title":"Explore human parsing modality for action recognition","authors":"Jinfu Liu,&nbsp;Runwei Ding,&nbsp;Yuhang Wen,&nbsp;Nan Dai,&nbsp;Fanyang Meng,&nbsp;Fang-Lue Zhang,&nbsp;Shen Zhao,&nbsp;Mengyuan Liu","doi":"10.1049/cit2.12366","DOIUrl":"https://doi.org/10.1049/cit2.12366","url":null,"abstract":"<p>Multimodal-based action recognition methods have achieved high success using pose and RGB modality. However, skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations. To address this, the authors introduce human parsing feature map as a novel modality, since it can selectively retain effective semantic features of the body parts while filtering out most irrelevant noise. The authors propose a new dual-branch framework called ensemble human parsing and pose network (EPP-Net), which is the first to leverage both skeletons and human parsing modalities for action recognition. The first human pose branch feeds robust skeletons in the graph convolutional network to model pose features, while the second human parsing branch also leverages depictive parsing feature maps to model parsing features via convolutional backbones. The two high-level features will be effectively combined through a late fusion strategy for better action recognition. Extensive experiments on NTU RGB + D and NTU RGB + D 120 benchmarks consistently verify the effectiveness of our proposed EPP-Net, which outperforms the existing action recognition methods. Our code is available at https://github.com/liujf69/EPP-Net-Action.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 6","pages":"1623-1633"},"PeriodicalIF":8.4,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12366","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信