{"title":"A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution","authors":"Feiqiang Liu, Aiwen Jiang, Lihui Chen","doi":"10.1007/s00530-024-01415-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01415-8","url":null,"abstract":"<p>Magnetic resonance (MR) images are widely used for clinical diagnosis, whereas some surrounding factors always limit the resolution, so under-sampled data is usually generated during imaging. Since high-resolution (HR) MR images contribute to the clinic diagnosis, reconstructing HR MR images from these under-sampled data is pretty important. Recently, deep learning (DL) methods for HR reconstruction of MR images have achieved impressive performance. However, it is difficult to collect enough data for training DL models in practice due to medical data privacy regulations. Fortunately, federated learning (FL) is proposed to eliminate this issue by local/distributed training and encryption. In this paper, we propose a multi-scale channel attention network (MSCAN) for MR image super-resolution (SR) and integrate it into an FL framework named FedAve to make use of data from multiple institutions and avoid privacy risk. Specifically, to utilize multi-scale information in MR images, we introduce a multi-scale feature block (MSFB), in which multi-scale features are extracted and attention among features at different scales is captured to re-weight these multi-scale features. Then, a spatial gradient profile loss is integrated into MSCAN to facilitate the recovery of textures in MR images. Last, we incorporate MSCAN into FedAve to simulate the scenery of collaborated training among multiple institutions. Ablation studies show the effectiveness of the multi-scale features, the multi-scale channel attention, and the texture loss. Comparative experiments with some state-of-the-art (SOTA) methods indicate that the proposed MSCAN is superior to the compared methods and the model with FL has close results to the one trained by centralized data.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes","authors":"Qian Liu, Zhensheng Li, Youwei Qi, Cunbao Wang","doi":"10.1007/s00530-024-01429-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01429-2","url":null,"abstract":"<p>Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive client selection and model aggregation for heterogeneous federated learning","authors":"Rui Zhai, Haozhe Jin, Wei Gong, Ke Lu, Yanhong Liu, Yalin Song, Junyang Yu","doi":"10.1007/s00530-024-01386-w","DOIUrl":"https://doi.org/10.1007/s00530-024-01386-w","url":null,"abstract":"<p>Federated Learning (FL) is a distributed machine learning method that allows multiple clients to collaborate on model training without sharing raw data. However, FL faces challenges with data heterogeneity, leading to reduced model accuracy and slower convergence. Although existing client selection methods can alleviate the above problems, there is still room to improve FL performance. To tackle these problems, we first propose a novel client selection method based on Multi-Armed Bandit (MAB). The method uses the historical training information uploaded by each client to calculate its correlation and contribution. The calculated values are then used to select a set of clients that can bring the most benefit, i.e., maximizing both model accuracy and convergence speed. Second, we propose an adaptive global model aggregation method that utilizes the local training information of selected clients to dynamically assign weights to local model parameters. Extensive experiments on various datasets with different heterogeneous settings demonstrate that our proposed method is effectively improving FL performance compared to several benchmarks.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A method of hybrid dilated and global convolution networks for pavement crack detection","authors":"Zhong Qu, Ming Li, Bin Yuan, Guoqing Mu","doi":"10.1007/s00530-024-01408-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01408-7","url":null,"abstract":"<p>Automatic crack detection is important for efficient and economical pavement maintenance. With the development of Convolutional Neural Networks (CNNs), crack detection methods have been mostly based on CNNs. In this paper, we propose a novel automatic crack detection network architecture, named hybrid dilated and global convolutional networks. Firstly, we integrate the hybrid dilated convolution module into ResNet-152 network, which can effectively aggregate global features. Then, we use the global convolution module to enhance the classification and localization ability of the extracted features. Finally, the feature fusion module is introduced to fuse multi-scale and multi-level feature maps. The proposed network can capture crack features from a global perspective and generate the corresponding feature maps. In order to demonstrate the effectiveness of our proposed method, we evaluate it on the four public crack datasets, DeepCrack, CFD, Cracktree200 and CRACK500, which achieves <i>ODS</i> values as 87.12%, 83.96%, 82.66%, 81.35% and <i>OIS</i> values as 87.55%, 84.82%, 83.56% and 82.98%. Compared with HED, RCF, DeepCrackT, FPHBN, ResNet-152 and DeepCrack, the <i>ODS</i> value performance improvement made in our method is 1.21%, 3.35%, 3.07%, 3.36%, 4.79% and 1% on DeepCrack dataset. Sufficient experimental statistics certificate that our proposed method outperforms other state-of-the-art crack detection, edge detection and image segmentation methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised masked face inpainting based on contrastive learning and attention mechanism","authors":"Weiguo Wan, Shunming Chen, Li Yao, Yingmei Zhang","doi":"10.1007/s00530-024-01411-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01411-y","url":null,"abstract":"<p>Masked face inpainting, aiming to restore realistic facial details and complete textures, remains a challenging task. In this paper, an unsupervised masked face inpainting method based on contrastive learning and attention mechanism is proposed. First, to overcome the constraint of a paired training dataset, a contrastive learning network framework is constructed by comparing features extracted from inpainted face image patches with those from input masked face image patches. Subsequently, to extract more effective facial features, a feature attention module is designed, which can focus on the significant feature information and establish long-range dependency relationships. In addition, a PatchGAN-based discriminator is refined with spectral normalization to enhance the stability of training the proposed network and guide the generator in producing more realistic face images. Numerous experiment results indicate that our approach can obtain better masked face inpainting results than the comparison approaches overall in terms of both subjective and objective evaluations, as well as face recognition accuracy.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu
{"title":"Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO","authors":"Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu","doi":"10.1007/s00530-024-01410-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01410-z","url":null,"abstract":"<p>The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141719136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyi Tang, Simin An, Yuanwei Liu, Yong Su, Jin Chen
{"title":"M2AST:MLP-mixer-based adaptive spatial-temporal graph learning for human motion prediction","authors":"Junyi Tang, Simin An, Yuanwei Liu, Yong Su, Jin Chen","doi":"10.1007/s00530-024-01351-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01351-7","url":null,"abstract":"<p>Human motion prediction is a challenging task in human-centric computer vision, involving forecasting future poses based on historical sequences. Despite recent progress in modeling spatial-temporal relationships of motion sequences using complex structured graphs, few approaches have provided an adaptive and lightweight representation for varying graph structures of human motion. Taking inspiration from the advantages of MLP-Mixer, a lightweight architecture designed for learning complex interactions in multi-dimensional data, we explore its potential as a backbone for motion prediction. To this end, we propose a novel MLP-Mixer-based adaptive spatial-temporal pattern learning framework (M<span>(^2)</span>AST). Our framework includes an adaptive spatial mixer to model the spatial relationships between joints, an adaptive temporal mixer to learn temporal smoothness, and a local dynamic mixer to capture fine-grained cross-dependencies between joints of adjacent poses. The final method achieves a compact representation of human motion dynamics by adaptively considering spatial-temporal dependencies from coarse to fine. Unlike the trivial spatial-temporal MLP-Mixer, our proposed approach can more effectively capture both local and global spatial-temporal relationships simultaneously. We extensively evaluated our proposed framework on three commonly used benchmarks (Human3.6M, AMASS, 3DPW MoCap), demonstrating comparable or better performance than existing state-of-the-art methods in both short and long-term predictions, despite having significantly fewer parameters. Overall, our proposed framework provides a novel and efficient solution for human motion prediction with adaptive graph learning.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinren Min, Yang Liu, Shengjing Zhou, Huihua Huang, Li Zhang, Xiaojun Gong, Dongshan Yang, Menghao Wang, Rui Yang, Mingyang Zhong
{"title":"Global adaptive histogram feature network for automatic segmentation of infection regions in CT images","authors":"Xinren Min, Yang Liu, Shengjing Zhou, Huihua Huang, Li Zhang, Xiaojun Gong, Dongshan Yang, Menghao Wang, Rui Yang, Mingyang Zhong","doi":"10.1007/s00530-024-01392-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01392-y","url":null,"abstract":"<p>Accurate and timely diagnosis of COVID-like virus is of paramount importance for lifesaving. In this work, deep learning techniques are applied to lung CT image segmentation for accurate disease diagnosis. We discuss the limitations of current diagnostic methods, such as RT-PCR, and highlights the advantages of deep learning, including its ability to automatically learn features and handle complex lesion morphology and texture. We, therefore, propose a novel deep learning framework, GAHFNet, specifically designed for automatic segmentation of COVID-19 lung CT images. The proposed method addresses the challenges in lung CT image segmentation, such as the complex image structure and difficulties of distinguishing COVID-19 pneumonia lesions from other pathologies. We provide the detailed description of the proposed GAHFNet. Finally, comprehensive experiments are carried out to evaluate the performance of GAHFNet, and the proposed method outperforms other traditional and the state-of-the-art methods in various evaluation metrics, demonstrating the effectiveness and the efficiency of the proposed method in this task. GAHFNet is able to facilitate the application of artificial intelligence in COVID-19 diagnosis and achieve accurate automatic segmentation of infected areas in COVID-19 lung CT images.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graph","authors":"Peng Su, Jingyuan Hu","doi":"10.1007/s00530-024-01406-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01406-9","url":null,"abstract":"<p>Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}