Multimedia Systems最新文献_第10页

HCT: a hybrid CNN and transformer network for hyperspectral image super-resolution HCT：用于高光谱图像超分辨率的混合 CNN 和变换器网络

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-20 DOI: 10.1007/s00530-024-01387-9

Huapeng Wu, Chenyun Wang, Chenyang Lu, Tianming Zhan

引用次数: 0

Low-parameter GAN inversion framework based on hypernetwork 基于超网络的低参数 GAN 反演框架

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-17 DOI: 10.1007/s00530-024-01379-9

Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li

{"title":"Low-parameter GAN inversion framework based on hypernetwork","authors":"Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li","doi":"10.1007/s00530-024-01379-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01379-9","url":null,"abstract":"In response to the significant parameter overhead in current Generative Adversarial Networks (GAN) inversion methods when balancing high fidelity and editability, we propose a novel lightweight inversion framework based on an optimized generator. We aim to balance fidelity and editability within the StyleGAN latent space. To achieve this, the study begins by mapping raw data to the ({W}^{+}) latent space, enhancing the quality of the resulting inverted images. Following this mapping step, we introduce a carefully designed lightweight hypernetwork. This hypernetwork operates to selectively modify primary detailed features, thereby leading to a notable reduction in the parameter count essential for model training. By learning parameter variations, the precision of subsequent image editing is augmented. Lastly, our approach integrates a multi-channel parallel optimization computing module into the above structure to decrease the time needed for model image processing. Extensive experiments were conducted in facial and automotive imagery domains to validate our lightweight inversion framework. Results demonstrate that our method achieves equivalent or superior inversion and editing quality, utilizing fewer parameters.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"5 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SenseMLP: a parallel MLP architecture for sensor-based human activity recognition SenseMLP：基于传感器的人类活动识别并行 MLP 架构

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-17 DOI: 10.1007/s00530-024-01384-y

Weilin Li, Jiaming Guo, Hong Wu

{"title":"SenseMLP: a parallel MLP architecture for sensor-based human activity recognition","authors":"Weilin Li, Jiaming Guo, Hong Wu","doi":"10.1007/s00530-024-01384-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01384-y","url":null,"abstract":"Human activity recognition (HAR) with wearable inertial sensors is a burgeoning field, propelled by advances in sensor technology. Deep learning methods for HAR have notably enhanced recognition accuracy in recent years. Nonetheless, the complexity of previous models often impedes their use in real-life scenarios, particularly in online applications. Addressing this gap, we introduce SenseMLP, a novel approach employing a multi-layer perceptron (MLP) neural network architecture. SenseMLP features three parallel MLP branches that independently process and integrate features across the time, channel, and frequency dimensions. This structure not only simplifies the model but also significantly reduces the number of required parameters compared to previous deep learning HAR frameworks. We conducted comprehensive evaluations of SenseMLP against benchmark HAR datasets, including PAMAP2, OPPORTUNITY, USC-HAD, and SKODA. Our findings demonstrate that SenseMLP not only achieves state-of-the-art performance in terms of accuracy but also boasts fewer parameters and lower floating-point operations per second. For further research and application in the field, the source code of SenseMLP is available at https://github.com/forfrees/SenseMLP.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"36 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network LMFE-RDD：采用轻量级多特征提取网络的道路损坏检测器

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-14 DOI: 10.1007/s00530-024-01367-z

Qihan He, Zhongxu Li, Wenyuan Yang

{"title":"LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network","authors":"Qihan He, Zhongxu Li, Wenyuan Yang","doi":"10.1007/s00530-024-01367-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01367-z","url":null,"abstract":"Road damage detection using computer vision and deep learning to automatically identify all kinds of road damage is an efficient application in object detection, which can significantly improve the efficiency of road maintenance planning and repair work and ensure road safety. However, due to the complexity of target recognition, the existing road damage detection models usually carry a large number of parameters and a large amount of computation, resulting in a slow inference speed, which limits the actual deployment of the model on the equipment with limited computing resources to a certain extent. In this study, we propose a road damage detector named LMFE-RDD for balancing speed and accuracy, which constructs a Lightweight Multi-Feature Extraction Network (LMFE-Net) as the backbone network and an Efficient Semantic Fusion Network (ESF-Net) for multi-scale feature fusion. First, as the backbone feature extraction network, LMFE-Net inputs road damage images to obtain three different scale feature maps. Second, ESF-Net fuses these three feature graphs and outputs three fusion features. Finally, the detection head is sent for target identification and positioning, and the final result is obtained. In addition, we use WDB loss, a multi-task loss function with a non-monotonic dynamic focusing mechanism, to pay more attention to bounding box regression losses. The experimental results show that the proposed LMFE-RDD model has competitive accuracy while ensuring speed. In the Multi-Perspective Road Damage Dataset, combining the data from all perspectives, LMFE-RDD achieves the detection speed of 51.0 FPS and 64.2% mAP@0.5, but the parameters are only 13.5 M.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"36 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLTU: mixup long-tail unsupervised zero-shot image classification on vision-language models MLTU：基于视觉语言模型的混合长尾无监督零镜头图像分类

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-05 DOI: 10.1007/s00530-024-01373-1

Yunpeng Jia, Xiufen Ye, Xinkui Mei, Yusong Liu, Shuxiang Guo

{"title":"MLTU: mixup long-tail unsupervised zero-shot image classification on vision-language models","authors":"Yunpeng Jia, Xiufen Ye, Xinkui Mei, Yusong Liu, Shuxiang Guo","doi":"10.1007/s00530-024-01373-1","DOIUrl":"https://doi.org/10.1007/s00530-024-01373-1","url":null,"abstract":"Vision-language models (VLM), such as Contrastive Language-Image Pretraining (CLIP), have demonstrated powerful capabilities in image classification under zero-shot settings. However, current zero-shot learning (ZSL) relies on manually tagged samples of known classes through supervised learning, resulting in a waste of labor costs and limitations on foreseeable classes in real-world applications. To address these challenges, we propose the mixup long-tail unsupervised (MLTU) approach for open-world ZSL problems. The proposed approach employs a novel long-tail mixup loss that integrated class-based re-weighting assignments with a given mixup factor for each mixed visual embedding. To mitigate the adverse impact over time, we adopt a noisy learning strategy to filter out samples that generated incorrect labels. We reproduce the unsupervised experiments of existing state-of-the-art long-tail and noisy learning approaches. Experimental results demonstrate that MLTU achieves significant improvements in classification compared to these proven existing approaches on public datasets. Moreover, it serves as a plug-and-play solution for amending previous assignments and enhancing unsupervised performance. MLTU enables the automatic classification and correction of incorrect predictions caused by the projection bias of CLIP.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"12 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141253875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A visually meaningful secure image encryption algorithm based on conservative hyperchaotic system and optimized compressed sensing 基于保守超混沌系统和优化压缩传感的视觉意义安全图像加密算法

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-04 DOI: 10.1007/s00530-024-01370-4

Xiaojun Tong, Xilin Liu, Tao Pan, Miao Zhang, Zhu Wang

{"title":"A visually meaningful secure image encryption algorithm based on conservative hyperchaotic system and optimized compressed sensing","authors":"Xiaojun Tong, Xilin Liu, Tao Pan, Miao Zhang, Zhu Wang","doi":"10.1007/s00530-024-01370-4","DOIUrl":"https://doi.org/10.1007/s00530-024-01370-4","url":null,"abstract":"Aiming at the traditional schemes for encrypting and transmitting images can be subject to arbitrary destruction by attackers, making it difficult for algorithms with poor robustness to recover the original image, this paper proposes a new visually image encryption algorithm, which can embed the compressed and encrypted image into a carrier image to achieve visual security, thus avoiding destruction and attacks. Foremost, a new conservative hyperchaotic system without attractors was constructed that can resist reconstruction attacks. Secondly, a two-dimensional (2D) compressed sensing technique is adopted, and the pseudo random sequences of the proposed chaotic system generates a measurement matrix in compressed sensing, and optimizes this matrix to improve the visual quality of image reconstruction. Finally, by combining discrete wavelet transform (DWT) and singular value decomposition (SVD) methods, the encrypted image is embedded into the carrier image to achieve the purpose of image compression, encryption, and hiding. And experimental results and comparative analysis demonstrate that this algorithm has high security, good image reconstruction quality, and strong imperceptibility after image embedding. Under limited bandwidth conditions, the algorithm achieves excellent visual security effects.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"18 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141253772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Social bot detection on Twitter: robustness evaluation and improvement Twitter 上的社交机器人检测：鲁棒性评估与改进

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-04 DOI: 10.1007/s00530-024-01364-2

Anan Liu, Yanwei Xie, Lanjun Wang, Guoqing Jin, Junbo Guo, Jun Li

引用次数: 0

A irregular text detection via dilated recombination and efficient reorganization on natural scene 在自然场景中通过扩张重组和高效重组进行不规则文本检测

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-02 DOI: 10.1007/s00530-024-01360-6

Liwen Huang, Wenyuan Yang

{"title":"A irregular text detection via dilated recombination and efficient reorganization on natural scene","authors":"Liwen Huang, Wenyuan Yang","doi":"10.1007/s00530-024-01360-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01360-6","url":null,"abstract":"In recent years, scene text detection has brought out broader prospects via growing applied opportunities. Nevertheless, pointing out which detected capability and suitable instantaneity in equilibrium is an essential consideration of irregular text detection. Out of consideration for the trouble, we propose an efficient scene text detector that unites a Dilated Recombined Unit (DRU) and a Efficient Reorganized Unit (ERU), named DENet. In the beginning, input feature information is received into a DR-VanillaNet backbone. Dilated recombined unit is devised to insert into every block of DR-VanillaNet to heighten the connection about distant pixel points. Next, an FPN with efficient reorganized unit tends to exploit feature redundancy and permutate channels partially. Correspondingly, DRU and ERU work on constructive effect for precision with a limited descent of speed. Moreover, a progressive scale expansion is carried forward which maintains the ability to generate the adjacent instances successfully. Multiple experiments on CTW1500, Total-Text benchmark datasets prove that designed model intends to improve precision accompanied by a limited drop of speed. It is specifically indicated that the value of precision on these two datasets reaches 84.29% and 85.30%. And FPS are achieved by 8.6 and 10.9, respectively.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141194673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel spatial and spectral transformer network for hyperspectral image super-resolution 用于高光谱图像超分辨率的新型空间和光谱变换器网络

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-06-01 DOI: 10.1007/s00530-024-01363-3

Huapeng Wu, Hui Xu, Tianming Zhan

{"title":"A novel spatial and spectral transformer network for hyperspectral image super-resolution","authors":"Huapeng Wu, Hui Xu, Tianming Zhan","doi":"10.1007/s00530-024-01363-3","DOIUrl":"https://doi.org/10.1007/s00530-024-01363-3","url":null,"abstract":"Recently, transformer networks based on hyperspectral image super-resolution have achieved significant performance in comparison with most convolution neural networks. However, this is still an open problem of how to efficiently design a lightweight transformer structure to extract long-range spatial and spectral information from hyperspectral images. This paper proposes a novel spatial and spectral transformer network (SSTN) for hyperspectral image super-resolution. Specifically, the proposed transformer framework mainly consists of multiple consecutive alternating global attention layers and regional attention layers. In the global attention layer, a spatial and spectral self-attention module with less complexity is introduced to learn spatial and spectral global interaction, which can enhance the representation ability of the network. In addition, the proposed regional attention layer can extract regional feature information by using a window self-attention based on zero-padding strategy. This alternating architecture can adaptively learn regional and global feature information of hyperspectral images. Extensive experimental results demonstrate that the proposed method can achieve superior performance in comparison with the state-of-the-art hyperspectral image super-resolution methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141194707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance SD-Pose：通过自适应姿势感知引导促进空间解耦人体姿势估计

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-05-31 DOI: 10.1007/s00530-024-01368-y

Zhi Liu, Shengzhao Hao, Yunhua Lu, Lei Liu, Cong Chen, Ruohuang Wang

{"title":"SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance","authors":"Zhi Liu, Shengzhao Hao, Yunhua Lu, Lei Liu, Cong Chen, Ruohuang Wang","doi":"10.1007/s00530-024-01368-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01368-y","url":null,"abstract":"Human pose estimation is a popular and challenging task in computer vision. Currently, the mainstream methods for pose estimation are based on Gaussian heatmaps and coordinate regression techniques. However, the intensive computational overhead and quantization error introduced by heatmaps pose many limitations on their application. And coordinate regression faces difficulties in learning mapping cross and misaligned keypoints, resulting in poor robustness. Recently, pose estimation based on Coordinate Classification encodes global spatial information into one-dimensional representations in X and Y directions, which turns keypoint localization into a classification problem and thus simplifies the model while effectively improving pose estimation accuracy. Motivated by this, SD-Pose is proposed in this work, which is a spatially decoupled human pose estimation model guided by adaptive pose perception. Specifically, the model first employs a Pyramid Adaptive Feature Extractor (PAFE) to obtain multi-scale featuremaps and generate adaptive keypoint weights to assist the model in extracting unique features for keypoints at different locations. Then, the Spatial Decoupling and Coordinated Analysis Module (SDCAM) simplifies the localization problem while considering both global and fine-grained features. Experimental results on MPII human pose and COCO keypoint detection datasets validate the effectiveness of the SD-Pose model and also display satisfied performance in recovering detailed information for keypoints such as Elbow, Hip, and Ankle.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"48 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141194514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0