Multimedia Systems最新文献

Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles. 适应食物：一个智能系统，可以根据特定的饮食和健康的生活方式调整食谱。

IF 3.5 3区计算机科学

Multimedia Systems Pub Date : 2025-01-01 Epub Date: 2025-02-01 DOI: 10.1007/s00530-025-01667-y

Andrea Morales-Garzón, Karel Gutiérrez-Batista, Maria J Martin-Bautista

{"title":"Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles.","authors":"Andrea Morales-Garzón, Karel Gutiérrez-Batista, Maria J Martin-Bautista","doi":"10.1007/s00530-025-01667-y","DOIUrl":"10.1007/s00530-025-01667-y","url":null,"abstract":"This paper presents AdaptaFood, a system to adapt recipes to specific dietary constraints. This is a common societal issue due to various dietary needs arising from medical conditions, allergies, or nutritional preferences. AdaptaFood provides recipe adaptations from two inputs: a recipe image (a fine-tuned image-captioning model allows us to extract the ingredients) or a recipe object (we extract the ingredients from the recipe features). For the adaptation, we propose to use an attention-based language sentence model based on BERT to learn the semantics of the ingredients and, therefore, discover the hidden relations among them. Specifically, we use them to perform two tasks: (1) align the food items from several sources to expand recipe information; (2) use the semantic features embedded in the representation vector to detect potential food substitutes for the ingredients. The results show that the model successfully learns domain-specific knowledge after re-training it to the food computing domain. Combining this acquired knowledge with the adopted strategy for sentence representation and food replacement enables the generation of high-quality recipe versions and dealing with the heterogeneity of different-origin food data.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"31 1","pages":"87"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787260/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143124098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating generalized zero-shot learning based on dual-path feature enhancement 基于双路径特征增强生成广义零点学习

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-19 DOI: 10.1007/s00530-024-01485-8

Xinyi Chang, Zhen Wang, Wenhao Liu, Limeng Gao, Bingshuai Yan

{"title":"Generating generalized zero-shot learning based on dual-path feature enhancement","authors":"Xinyi Chang, Zhen Wang, Wenhao Liu, Limeng Gao, Bingshuai Yan","doi":"10.1007/s00530-024-01485-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01485-8","url":null,"abstract":"Generalized zero-shot learning (GZSL) can classify both seen and unseen class samples, which plays a significant role in practical applications such as emerging species recognition and medical image recognition. However, most existing GZSL methods directly use the pre-trained deep model to learn the image feature. Due to the data distribution inconsistency between the GZSL dataset and the pre-training dataset, the obtained image features have an inferior performance. The distribution of different class image features is similar, which makes them difficult to distinguish. To solve this problem, we propose a dual-path feature enhancement (DPFE) model, which consists of four modules: the feature generation network (FGN), the local fine-grained feature enhancement (LFFE) module, the global coarse-grained feature enhancement (GCFE) module, and the feedback module (FM). The feature generation network can synthesize unseen class image features. We enhance the image features’ discriminative and semantic relevance from both local and global perspectives. To focus on the image’s local discriminative regions, the LFFE module processes the image in blocks and minimizes the semantic cycle-consistency loss to ensure that the region block features contain key classification semantic information. To prevent information loss caused by image blocking, we design the GCFE module. It ensures the consistency between the global image features and the semantic centers, thereby improving the discriminative power of the features. In addition, the feedback module feeds the discriminator network’s middle layer information back to the generator network. As a result, the synthesized image features are more similar to the real features. Experimental results demonstrate that the proposed DPFE method outperforms the state-of-the-arts on four zero-shot learning benchmark datasets.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"8 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Triple fusion and feature pyramid decoder for RGB-D semantic segmentation 用于 RGB-D 语义分割的三重融合和特征金字塔解码器

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-16 DOI: 10.1007/s00530-024-01459-w

Bin Ge, Xu Zhu, Zihan Tang, Chenxing Xia, Yiming Lu, Zhuang Chen

{"title":"Triple fusion and feature pyramid decoder for RGB-D semantic segmentation","authors":"Bin Ge, Xu Zhu, Zihan Tang, Chenxing Xia, Yiming Lu, Zhuang Chen","doi":"10.1007/s00530-024-01459-w","DOIUrl":"https://doi.org/10.1007/s00530-024-01459-w","url":null,"abstract":"Current RGB-D semantic segmentation networks incorporate depth information as an extra modality and merge RGB and depth features using methods such as equal-weighted concatenation or simple fusion strategies. However, these methods hinder the effective utilization of cross-modal information. Aiming at the problem that existing RGB-D semantic segmentation networks fail to fully utilize RGB and depth features, we propose an RGB-D semantic segmentation network, based on triple fusion and feature pyramid decoding, which achieves bidirectional interaction and fusion of RGB and depth features via the proposed three-stage cross-modal fusion module (TCFM). The TCFM proposes utilizing cross-modal cross-attention to intermix the data from two modalities into another modality. It fuses the RGB attributes and depth features proficiently, utilizing the channel-adaptive weighted fusion module. Furthermore, this paper introduces a lightweight feature pyramidal decoder network to fuse the multi-scale parts taken out by the encoder effectively. Experiments on NYU Depth V2 and SUN RGB-D datasets demonstrate that the cross-modal feature fusion network proposed in this study efficiently segments intricate scenes.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"38 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet 利用深度并行挤压和激励以及关注 Unet 自动进行淋巴结分割

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-13 DOI: 10.1007/s00530-024-01465-y

Zhaorui Liu, Hao Chen, Caiyin Tang, Quan Li, Tao Peng

引用次数: 0

CAFIN: cross-attention based face image repair network CAFIN：基于交叉注意力的人脸图像修复网络

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-13 DOI: 10.1007/s00530-024-01466-x

Yaqian Li, Kairan Li, Haibin Li, Wenming Zhang

引用次数: 0

A survey on deep learning-based camouflaged object detection 基于深度学习的伪装物体检测调查

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-11 DOI: 10.1007/s00530-024-01478-7

Junmin Zhong, Anzhi Wang, Chunhong Ren, Jintao Wu

{"title":"A survey on deep learning-based camouflaged object detection","authors":"Junmin Zhong, Anzhi Wang, Chunhong Ren, Jintao Wu","doi":"10.1007/s00530-024-01478-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01478-7","url":null,"abstract":"Camouflaged object detection (COD) is an emerging visual detection task that aims to identify objects that conceal themselves in the surrounding environment. The high intrinsic similarities between the camouflaged objects and their backgrounds make COD far more challenging than traditional object detection. Recently, COD has attracted increasing research interest in the computer vision community, and numerous deep learning-based methods have been proposed, showing great potential. However, most of the existing work focuses on analyzing the structure of COD models, with few overview works summarizing deep learning-based models. To address this gap, we provide a comprehensive analysis and summary of deep learning-based COD models. Specifically, we first classify 48 deep learning-based COD models and analyze their advantages and disadvantages. Second, we introduce widely available datasets for COD and performance evaluation metrics. Then, we evaluate the performance of existing deep learning-based COD models on these four datasets. Finally, we indicate relevant applications and discuss challenges and future research directions for the COD task.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"24 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method 基于改进的 YOLOv8 方法的人脸实例分割和张口度分析

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-11 DOI: 10.1007/s00530-024-01472-z

Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang

{"title":"Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang","doi":"10.1007/s00530-024-01472-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01472-z","url":null,"abstract":"Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implicit neural representation steganography by neuron pruning 通过神经元修剪实现隐式神经表征隐写术

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-10 DOI: 10.1007/s00530-024-01476-9

Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke

{"title":"Implicit neural representation steganography by neuron pruning","authors":"Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke","doi":"10.1007/s00530-024-01476-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01476-9","url":null,"abstract":"Recently, implicit neural representation (INR) has started to be applied in image steganography. However, the quality of stego and secret images represented by INR is generally low. In this paper, we propose an implicit neural representation steganography method by neuron pruning. Initially, we randomly deactivate a portion of neurons to train an INR function for implicitly representing the secret image. Subsequently, we prune the neurons that are deemed unimportant for representing the secret image in a unstructured manner to obtain a secret function, while marking the positions of neurons as the key. Finally, based on a partial optimization strategy, we reactivate the pruned neurons to construct a stego function for representing the cover image. The recipient only needs the shared key to recover the secret function from the stego function in order to reconstruct the secret image. Experimental results demonstrate that this method not only allows for lossless recovery of the secret image, but also performs well in terms of capacity, fidelity, and undetectability. The experiments conducted on images of different resolutions validate that our proposed method exhibits significant advantages in image quality over existing implicit representation steganography methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"258 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition 基于自我监督骨架的动作识别多尺度运动对比学习

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-10 DOI: 10.1007/s00530-024-01463-0

Yushan Wu, Zengmin Xu, Mengwei Yuan, Tianchi Tang, Ruxing Meng, Zhongyuan Wang

{"title":"Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition","authors":"Yushan Wu, Zengmin Xu, Mengwei Yuan, Tianchi Tang, Ruxing Meng, Zhongyuan Wang","doi":"10.1007/s00530-024-01463-0","DOIUrl":"https://doi.org/10.1007/s00530-024-01463-0","url":null,"abstract":"People process things and express feelings through actions, action recognition has been able to be widely studied, yet under-explored. Traditional self-supervised skeleton-based action recognition focus on joint point features, ignoring the inherent semantic information of body structures at different scales. To address this problem, we propose a multi-scale Motion Contrastive Learning of Visual Representations (MsMCLR) model. The model utilizes the Multi-scale Motion Attention (MsM Attention) module to divide the skeletal features into three scale levels, extracting cross-frame and cross-node motion features from them. To obtain more motion patterns, a combination of strong data augmentation is used in the proposed model, which motivates the model to utilize more motion features. However, the feature sequences generated by strong data augmentation make it difficult to maintain identity of the original sequence. Hence, we introduce a dual distributional divergence minimization method, proposing a multi-scale motion loss function. It utilizes the embedding distribution of the ordinary augmentation branch to supervise the loss computation of the strong augmentation branch. Finally, the proposed method is evaluated on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The accuracy of our method is 1.4–3.0% higher than the frontier models.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"83 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

C2IENet: Multi-branch medical image fusion based on contrastive constraint features and information exchange C2IENet：基于对比约束特征和信息交换的多分支医学影像融合

IF 3.9 3区计算机科学

Multimedia Systems Pub Date : 2024-09-09 DOI: 10.1007/s00530-024-01473-y

Jing Di, Chan Liang, Li Ren, Wenqing Guo, Jizhao Liu, Jing Lian

{"title":"C2IENet: Multi-branch medical image fusion based on contrastive constraint features and information exchange","authors":"Jing Di, Chan Liang, Li Ren, Wenqing Guo, Jizhao Liu, Jing Lian","doi":"10.1007/s00530-024-01473-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01473-y","url":null,"abstract":"In the field of medical image fusion, traditional approaches often fail to differentiate between the unique characteristics of each raw image, leading to fused images with compromised texture and structural clarity. Addressing this, we introduce an advanced multi-branch fusion method characterized by contrast-enhanced features and interactive information exchange. This method integrates a multi-scale residual module and a gradient-dense module within a private branch to precisely extract and enrich texture details from individual raw images. In parallel, a common feature extraction branch, equipped with an information interaction module, processes paired raw images to synergistically capture complementary and shared functional information across modalities. Additionally, we implement a sophisticated attention mechanism tailored for both the private and public branches to enhance global feature extraction, thereby significantly improving the contrast and contour definition of the fused image. A novel correlation consistency loss function further refines the fusion process by optimizing the information sharing between modalities, promoting the correlation among basic cross-modal features while minimizing the correlation of high-frequency details across different modalities. Objective evaluations demonstrate substantial improvements in indices such as EN, MI, QMI, SSIM, AG, SF, and (text {Q}^{text {AB/F}}), with average increases of 23.67%, 12.35%, 4.22%, 20.81%, 8.96%, 6.38%, and 25.36%, respectively. These results underscore our method’s superiority in achieving enhanced texture detail and contrast in fused images compared to conventional algorithms, as validated by both subjective assessments and objective performance metrics.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"19 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0