Multimedia Tools and Applications最新文献

筛选
英文 中文
A hybrid diabetes risk prediction model XGB-ILSO-1DCNN 混合糖尿病风险预测模型 XGB-ILSO-1DCNN
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20155-5
Huifang Feng, Yanan Hui
{"title":"A hybrid diabetes risk prediction model XGB-ILSO-1DCNN","authors":"Huifang Feng, Yanan Hui","doi":"10.1007/s11042-024-20155-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20155-5","url":null,"abstract":"<p>Accurately predicting the risk of diabetes is of paramount importance for early intervention and prevention. To achieve precise diabetes risk prediction, we propose a hybrid diabetes risk prediction model, XGB-ILSO-1DCNN, which combines the Extreme Gradient Boosting (XGBoost) algorithm, the Improved Lion Swarm Optimization algorithm, and the deep learning model 1DCNN. Firstly, an XGBoost is trained based on the raw data and the prediction result based on XGBoost is regarded as a new feature, concatenating it with the original features to form a new feature set. Then, we introduce a hybrid approach called ILSO-1DCNN, which is based on improved Lion Swarm Optimization (ILSO) and one-dimensional convolutional neural network (1DCNN). This approach is proposed for diabetes risk prediction. The ILSO-1DCNN algorithm utilizes the optimization capabilities of ILSO to automatically determine the hyperparameters of the 1DCNN network. Finally, we conducted comprehensive experiments on the PIMA dataset and compared our model with baseline models. The experimental results not only demonstrate our model's exceptional predictive performance across various evaluation criteria but also highlight its efficiency and low complexity. This study introduces a novel and effective diabetes risk prediction approach, making it a valuable tool for clinical analysis in the care of diabetic patients.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization 利用对比度受限的自适应参数设置直方图均衡增强水下图像
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20210-1
Yahui Chen, Yitao Liang
{"title":"Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization","authors":"Yahui Chen, Yitao Liang","doi":"10.1007/s11042-024-20210-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20210-1","url":null,"abstract":"<p>CLAHE is widely used in underwater image processing because of its excellent performance in contrast enhancement. The selection of the clip point formula is the core problem of the CLAHE methods, and the selection of suitable clipping value has become the focus of some extended methods. In this paper, an automatic CLAHE underwater image enhancement algorithm is proposed. The method determines the clipping value according to the high-order moment dynamic features of each block of the underwater image. By quantifying the dynamic features of each block in the image more precisely, and then adding it to the clipping value formula, the contrast and details of the underwater image can be effectively enhanced. In order to effectively improve the saturation and brightness of underwater images, this paper chooses a more accurate and intuitive HSV model. Experimental results show that our methods enhance the contrast subjectively, while suppressing the amplification of noise very well, and also increase the saturation of underwater images. In objective metrics, our method obtains the best values in underwater quality assessment (UIQM), SSIM, and PSNR.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of multilabel chest X-ray classification using deep learning 利用深度学习对多标签胸部 X 光片分类进行系统回顾
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20172-4
Uswatun Hasanah, Jenq-Shiou Leu, Cries Avian, Ihsanul Azmi, Setya Widyawan Prakosa
{"title":"A systematic review of multilabel chest X-ray classification using deep learning","authors":"Uswatun Hasanah, Jenq-Shiou Leu, Cries Avian, Ihsanul Azmi, Setya Widyawan Prakosa","doi":"10.1007/s11042-024-20172-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20172-4","url":null,"abstract":"<p>Chest X-ray scans are one of the most often used diagnostic tools for identifying chest diseases. However, identifying diseases in X-ray images needs experienced technicians and is frequently noted as a time-consuming process with varying levels of interpretation. In particular circumstances, disease identification through images is a challenge for human observers. Recent advances in deep learning have opened up new possibilities for using this technique to diagnose diseases. However, further implementation requires prior knowledge of strategy and appropriate architecture design. Revealing this information, will enable faster implementation and encounter potential issues produced by specific designs, especially in multilabel classification, which is challenging compared to single-label tasks. This systematic review of all the approaches published in the literature will assist researchers in developing improved methods of whole chest disease detection. The study focuses on the deep learning methods, publically accessible datasets, hyperparameters, and performance metrics employed by various researchers in classifying multilabel chest X-ray images. The findings of this study provide a complete overview of the current state of the art, highlighting significant practical aspects of the approaches studied. Distinctive results highlighting the potential enhancements and beneficial uses of deep learning in multilabel chest disease identification are presented.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in brain tumor analysis: a comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation 脑肿瘤分析的进展:基于 MRI 分类和分割的机器学习、混合深度学习和迁移学习方法综述
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20203-0
Surajit Das, Rajat Subhra Goswami
{"title":"Advancements in brain tumor analysis: a comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation","authors":"Surajit Das, Rajat Subhra Goswami","doi":"10.1007/s11042-024-20203-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20203-0","url":null,"abstract":"<p>Brain tumors, whether cancerous or noncancerous, can be life-threatening due to abnormal cell growth, potentially causing organ dysfunction and mortality in adults. Brain tumor segmentation (BTS) and brain tumor classification (BTC) technologies are crucial in diagnosing and treating brain tumors. They assist doctors in locating and measuring tumors and developing treatment and rehabilitation strategies. Despite their importance in the medical field, BTC and BTS remain challenging. This comprehensive review specifically analyses machine and deep learning methodologies, including convolutional neural networks (CNN), transfer learning (TL), and hybrid models for BTS and BTC. We discuss CNN architectures like U-Net++, which is known for its high segmentation accuracy in 2D and 3D medical images. Additionally, transfer learning utilises pre-trained models such as ResNet, Inception, etc., from ImageNet, fine-tuned on brain tumor-specific datasets to enhance classification performance and sensitivity despite limited medical data. Hybrid models combine deep learning techniques with machine learning, using CNN for initial segmentation and traditional classification methods, improving accuracy. We discuss commonly used benchmark datasets in brain tumors research, including the BraTS dataset and the TCIA database, and evaluate performance metrics, such as the F1-score, accuracy, sensitivity, specificity, and the dice coefficient, emphasising their significance and standard thresholds in brain tumors analysis. The review addresses current machine learning (ML) and deep learning (DL) based BTS and BTC challenges and proposes solutions such as explainable deep learning models and multi-task learning frameworks. These insights aim to guide future advancements in fostering the development of accurate and efficient tools for improved patient care in brain tumors analysis.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the impact of sensor axis combinations on activity recognition and fall detection: an empirical study 调查传感器轴组合对活动识别和跌倒检测的影响:实证研究
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20136-8
Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı
{"title":"Investigating the impact of sensor axis combinations on activity recognition and fall detection: an empirical study","authors":"Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı","doi":"10.1007/s11042-024-20136-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20136-8","url":null,"abstract":"<p>Activity recognition is a fundamental concept widely embraced within the realm of healthcare. Leveraging sensor fusion techniques, particularly involving accelerometers (A), gyroscopes (G), and magnetometers (M), this technology has undergone extensive development to effectively distinguish between various activity types, improve tracking systems, and attain high classification accuracy. This research is dedicated to augmenting the effectiveness of activity recognition by investigating diverse sensor axis combinations while underscoring the advantages of this approach. In pursuit of this objective, we gathered data from two distinct sources: 20 instances of falls and 16 daily life activities, recorded through the utilization of the Motion Tracker Wireless (MTw), a commercial product. In this particular experiment, we meticulously assembled a comprehensive dataset comprising 2520 tests, leveraging the voluntary participation of 14 individuals (comprising 7 females and 7 males). Additionally, data pertaining to 7 cases of falls and 8 daily life activities were captured using a cost-effective, environment-independent Activity Tracking Device (ATD). This alternative dataset encompassed a total of 1350 tests, with the participation of 30 volunteers, equally divided between 15 females and 15 males. Within the framework of this research, we conducted meticulous comparative analyses utilizing the complete dataset, which encompassed 3870 tests in total. The findings obtained from these analyses convincingly establish the efficacy of recognizing both fall incidents and routine daily activities. This investigation underscores the potential of leveraging affordable IoT technologies to enhance the quality of everyday life and their practical utility in real-world scenarios.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning 基于深度学习的监控视频中的人群动态分析和行为识别
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20161-7
Anum Ilyas, Narmeen Bawany
{"title":"Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning","authors":"Anum Ilyas, Narmeen Bawany","doi":"10.1007/s11042-024-20161-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20161-7","url":null,"abstract":"<p>Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories; second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing eyeglasses removal in facial images: a novel approach using translation models for eyeglasses mask completion 增强面部图像中的眼镜去除效果:利用翻译模型完成眼镜遮罩的新方法
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-20101-5
Zahra Esmaily, Hossein Ebrahimpour-Komleh
{"title":"Enhancing eyeglasses removal in facial images: a novel approach using translation models for eyeglasses mask completion","authors":"Zahra Esmaily, Hossein Ebrahimpour-Komleh","doi":"10.1007/s11042-024-20101-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20101-5","url":null,"abstract":"<p>Accurately removing eyeglasses from facial images is crucial for improving the performance of various face-related tasks such as verification, identification, and reconstruction. This paper presents a novel approach to enhancing eyeglasses removal by integrating a mask completion technique into the existing framework. Our method focuses on improving the accuracy of eyeglasses masks, which is essential for subsequent eyeglasses and shadow removal steps. We introduce a unique dataset specifically designed for eyeglasses mask image completion. This dataset is generated by applying Top-Hat morphological operations to existing eyeglasses mask datasets, creating a collection of images containing eyeglasses masks in two states: damaged (incomplete) and complete (ground truth). A Pix2Pix image-to-image translation model is trained on this newly created dataset for the purpose of restoring incomplete eyeglass mask predictions. This restoration step significantly improves the accuracy of eyeglass frame extraction and leads to more realistic results in subsequent eyeglasses and shadow removal. Our method incorporates a post-processing step to refine the completed mask, preventing the formation of artifacts in the background or outside of the eyeglasses frame box, further enhancing the overall quality of the processed image. Experimental results on CelebA, FFHQ, and MeGlass datasets showcase the effectiveness of our method, outperforming state-of-the-art approaches in quantitative metrics (FID, KID, MOS) and qualitative evaluations.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cyber-XAI-Block: an end-to-end cyber threat detection & fl-based risk assessment framework for iot enabled smart organization using xai and blockchain technologies Cyber-XAI-Block:利用 xai 和区块链技术为启用了 iot 的智能组织提供端到端网络威胁检测和基于 fl 的风险评估框架
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-20059-4
Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro
{"title":"Cyber-XAI-Block: an end-to-end cyber threat detection & fl-based risk assessment framework for iot enabled smart organization using xai and blockchain technologies","authors":"Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro","doi":"10.1007/s11042-024-20059-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20059-4","url":null,"abstract":"<p>The growing integration of the Internet of Things (IoT) in smart organizations is increasing the vulnerability of cyber threats, necessitating advanced frameworks for effective threat detection and risk assessment. Existing works provide achievable results but lack effective solutions, such as detecting Social Engineering Attacks (SEA). Using Deep Learning (DL) and Machine Learning (ML) methods whereas they are limited to validating user behaviors. Like high false positive rates, attack reoccurrence, and increases in numerous attacks. To overcome this problem, we use explainable (DL) techniques to increase cyber security in an IoT-enabled smart organization environment. This paper firstly, implements Capsule Network (CapsNet) to process employee fingerprints and blink patterns. Secondly, the Quantum Key Secure Communication Protocol (QKSCP) was also used to decrease communication channel vulnerabilities like Man In The Middle (MITM) and reply attacks. After Dual Q Network-based Asynchronous Advantage Actor-Critic algorithm DQN-A3C algorithm detects and prevents attacks. Thirdly, employed the explainable DQN-A3C model and the Siamese Inter Lingual Transformer (SILT) transformer for natural language explanations to boost social engineering security by ensuring the Artificial Intelligence (AI) model and human trustworthiness. After, we built a Hopping Intrusion Detection &amp; Prevention System (IDS/IPS) using an explainable Harmonized Google Net (HGN) model with SHAP and SILT explanations to appropriately categorize dangerous external traffic flows. Finally, to improve global, cyberattack comprehension, we created a Federated Learning (FL)-based knowledge-sharing mechanism between Cyber Threat Repository (CTR) and cloud servers, known as global risk assessment. To evaluate the suggested approach, the new method is compared to the ones that already exist in terms of malicious traffic (65 bytes/sec), detection rate (97%), false positive rate (45%), prevention accuracy (98%), end-to-end response time (97 s), recall (96%), false negative rate (42%) and resource consumption (41). Our strategy's performance is examined using numerical analysis, and the results demonstrate that it outperforms other methods in all metrics.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus 基于SMOTE的深度网络与自适应提升烟尘技术用于2型糖尿病的检测和分类
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-19770-z
Phani Kumar Immadisetty, C. Rajabhushanam
{"title":"SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus","authors":"Phani Kumar Immadisetty, C. Rajabhushanam","doi":"10.1007/s11042-024-19770-z","DOIUrl":"https://doi.org/10.1007/s11042-024-19770-z","url":null,"abstract":"<p>Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation 通过扩散模型利用多变换器编码器和多假设聚合进行三维人体姿态估计
IF 3.6 4区 计算机科学
Multimedia Tools and Applications Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20179-x
Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo
{"title":"Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation","authors":"Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo","doi":"10.1007/s11042-024-20179-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20179-x","url":null,"abstract":"<p>The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction &amp; proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction &amp; proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信