Multimedia Systems最新文献_第3页

BSP-Net: automatic skin lesion segmentation improved by boundary enhancement and progressive decoding methods BSP-Net：通过边界增强和渐进解码方法改进自动皮损分割

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-29 DOI: 10.1007/s00530-024-01453-2

Chengyun Ma, Qimeng Yang, Shengwei Tian, Long Yu, Shirong Yu

{"title":"BSP-Net: automatic skin lesion segmentation improved by boundary enhancement and progressive decoding methods","authors":"Chengyun Ma, Qimeng Yang, Shengwei Tian, Long Yu, Shirong Yu","doi":"10.1007/s00530-024-01453-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01453-2","url":null,"abstract":"Automatic skin lesion segmentation from dermoscopy images is of great significance in the early treatment of skin cancers, which is yet challenging even for dermatologists due to the inherent issues, i.e., considerable size, shape and color variation, and ambiguous boundaries. In this paper, we propose a network BSP-Net that implements the combination of critical boundary information and segmentation tasks to simultaneously solve the variation and boundary problems in skin lesion segmentation. The architecture of BSP-Net primarily consists of a multi-scale boundary enhancement (MBE) module and a progressive fusion decoder (PD). The MBE module, by deeply extracting boundary information in both multi-axis frequency and multi-scale spatial domains, generates precise boundary key-point prediction maps. This process not only accurately models local boundary information but also effectively retains global contextual information. On the other hand, the PD employs an asymmetric decoding strategy, guiding the generation of refined segmentation results by combining boundary-enhanced features rich in geometric details with global features containing semantic information about lesions. This strategy progressively fuses boundary and semantic information at different levels, effectively enabling high-performance collaboration between cross-level contextual features. To assess the effectiveness of BSP-Net, we conducted extensive experiments on two public datasets (ISIC-2016 &PH2, ISIC-2018) and one private dataset (XJUSKin). BSP-Net achieved Dice coefficients of 90.81, 92.41, and 83.88%, respectively. Additionally, it demonstrated precise boundary delineation with Average Symmetric Surface Distance (ASSD) scores of 7.96, 6.88, and 10.92%, highlighting its strong performance in skin lesion segmentation.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"28 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder Gateinst：在变压器解码器中使用多尺度门控增强查询进行实例分割

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01438-1

Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu

{"title":"Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder","authors":"Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu","doi":"10.1007/s00530-024-01438-1","DOIUrl":"https://doi.org/10.1007/s00530-024-01438-1","url":null,"abstract":"Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 AP, 5.5 (AP_{50}) over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"13 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression SiamS3C：利用中心向导回归进行视觉跟踪的空间通道交叉相关技术

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01450-5

Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah

{"title":"SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression","authors":"Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah","doi":"10.1007/s00530-024-01450-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01450-5","url":null,"abstract":"Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"154 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D model watermarking using surface integrals of generated random vector fields 利用生成的随机向量场的曲面积分进行三维模型水印处理

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01455-0

Luke Vandenberghe, Chris Joslin

引用次数: 0

Anomaly detection in surveillance videos using Transformer with margin learning 利用边际学习变压器检测监控视频中的异常情况

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01443-4

Dicong Wang, Kaijun Wu

{"title":"Anomaly detection in surveillance videos using Transformer with margin learning","authors":"Dicong Wang, Kaijun Wu","doi":"10.1007/s00530-024-01443-4","DOIUrl":"https://doi.org/10.1007/s00530-024-01443-4","url":null,"abstract":"Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"49 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Remote sensing image cloud removal based on multi-scale spatial information perception 基于多尺度空间信息感知的遥感图像云去除技术

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01442-5

Aozhe Dou, Yang Hao, Weifeng Liu, Liangliang Li, Zhenzhong Wang, Baodi Liu

{"title":"Remote sensing image cloud removal based on multi-scale spatial information perception","authors":"Aozhe Dou, Yang Hao, Weifeng Liu, Liangliang Li, Zhenzhong Wang, Baodi Liu","doi":"10.1007/s00530-024-01442-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01442-5","url":null,"abstract":"Remote sensing imagery is indispensable in diverse domains, including geographic information systems, climate monitoring, agricultural planning, and disaster management. Nonetheless, cloud cover can drastically degrade the utility and quality of these images. Current deep learning-based cloud removal methods rely on convolutional neural networks to extract features at the same scale, which can overlook detailed and global information, resulting in suboptimal cloud removal performance. To overcome these challenges, we develop a method for cloud removal that leverages multi-scale spatial information perception. Our technique employs convolution kernels of various sizes, enabling the integration of both global semantic information and local detail information. An attention mechanism enhances this process by targeting key areas within the images, and dynamically adjusting channel weights to improve feature reconstruction. We compared our method with current popular cloud removal methods across three datasets, and the results show that our proposed method improves metrics such as PSNR, SSIM, and cosine similarity, verifying the effectiveness of our method in cloud removal.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting multi-level consistency learning for source-free domain adaptation 利用多层次一致性学习实现无源域适应

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01444-3

Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Ximing Li, Jinjin Chi

{"title":"Exploiting multi-level consistency learning for source-free domain adaptation","authors":"Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Ximing Li, Jinjin Chi","doi":"10.1007/s00530-024-01444-3","DOIUrl":"https://doi.org/10.1007/s00530-024-01444-3","url":null,"abstract":"Due to data privacy concerns, a more practical task known as Source-free Unsupervised Domain Adaptation (SFUDA) has gained significant attention recently. SFUDA adapts a pre-trained source model to the target domain without access to the source domain data. Existing SFUDA methods typically rely on per-class cluster structure to refine labels. However, these clusters often contain samples with different ground truth labels, leading to label noise. To address this issue, we propose a novel Multi-level Consistency Learning (MLCL) method. MLCL focuses on learning discriminative class-wise target feature representations, resulting in more accurate cluster structures. Specifically, at the inter-domain level, we construct pseudo-source domain data based on the entropy criterion. We align pseudo-labeled target domain sample with corresponding pseudo-source domain prototype by introducing a prototype contrastive loss. This loss ensures that our model can learn discriminative class-wise feature representations effectively. At the intra-domain level, we enforce consistency among different views of the same image by employing consistency-based self-training. The self-training further enhances the feature representation ability of our model. Additionally, we apply information maximization regularization to facilitate target sample clustering and promote diversity. Our extensive experiments conducted on four benchmark datasets for classification demonstrate the superior performance of the proposed MLCL method. The code is here.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"58 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrate encryption of multiple images based on a new hyperchaotic system and Baker map 基于新的超混沌系统和贝克图，对多幅图像进行整合加密

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-14 DOI: 10.1007/s00530-024-01449-y

Xingbin Liu

引用次数: 0

3D human pose estimation method based on multi-constrained dilated convolutions 基于多约束扩张卷积的 3D 人体姿态估计方法

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-14 DOI: 10.1007/s00530-024-01441-6

Huaijun Wang, Bingqian Bai, Junhuai Li, Hui Ke, Wei Xiang

{"title":"3D human pose estimation method based on multi-constrained dilated convolutions","authors":"Huaijun Wang, Bingqian Bai, Junhuai Li, Hui Ke, Wei Xiang","doi":"10.1007/s00530-024-01441-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01441-6","url":null,"abstract":"In recent years, research on 2D to 3D human pose estimation methods has gained increasing attention. However, these methods, such as depth ambiguity and self-occlusion, still need to be addressed. To address these problems, we propose a 3D human pose estimation method based on multi-constrained dilated convolutions. This approach involves using a local constraint based on graph convolution and a global constraint based on a fully connected network. It also utilizes a dilated temporal convolution network to capture long-term temporal correlations of human poses. Taking 2D joint coordinate sequences as input, the local constraint module constructs cross-joint and equipotential connections for the human skeleton. The global constraint module encodes global semantic information about posture. Finally, the constraint modules and the temporal correlation of human posture are alternately connected to achieve 3D human posture estimation. The method was validated on the public datasets Human3.6M and MPI-INF-3DHP, and the results show that the proposed method effectively reduces the error in 3D human pose estimation and demonstrates a certain degree of generalization ability.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"258 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring multi-dimensional interests for session-based recommendation 探索基于会话推荐的多维兴趣

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-08-13 DOI: 10.1007/s00530-024-01437-2

Yuhan Yang, Jing Sun, Guojia An

{"title":"Exploring multi-dimensional interests for session-based recommendation","authors":"Yuhan Yang, Jing Sun, Guojia An","doi":"10.1007/s00530-024-01437-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01437-2","url":null,"abstract":"Session-based recommendation (SBR) aims to recommend the next clicked item to users by mining the user’s interaction sequences in the current session. It has received widespread attention recently due to its excellent privacy protection capabilities. However, existing SBR methods have the following limitations: (1) there exists noisy information in session sequences; (2) it is a challenge to simultaneously model both the long-term stable and dynamic changing interests of users; (3) the internal relationships between different interest representations are often neglected. To address the above issues, we propose an Exploring Multi-Dimensional Interests for session-based recommendation model, termed EMDI, which attempts to predict more accurate and complete user intentions from multiple dimensions of user interests. Specifically, the EMDI contains the following three aspects: (1) the interest enhancement module aims to filter noise and enhance the interest expressions in the user’s behavior sequences, providing high-quality item embeddings; (2) the interest mining module separately mines users’ multi-dimensional interests, including static interests, local dynamic interests, and global dynamic interests, to capture users’ tendencies in different dimensions of interest; (3) the interest fusion module is designed to dynamically aggregate users’ interest representations from different dimensions through a novel multi-layer gated fusion network so that the implicit association between interest representations can be captured. Extensive experimental results show that the EMDI performs significantly better than other state-of-the-art methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"82 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0