{"title":"HCT: a hybrid CNN and transformer network for hyperspectral image super-resolution","authors":"Huapeng Wu, Chenyun Wang, Chenyang Lu, Tianming Zhan","doi":"10.1007/s00530-024-01387-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01387-9","url":null,"abstract":"<p>Recently, convolutional neural network (CNN) and transformer based on hyperspectral image super-resolution methods have achieved superior performance. Nevertheless, this is still an important problem how to effectively extract local and global features and improve spectral representation of hyperspectral image. In this paper, we propose a hybrid CNN and transformer network (HCT) for hyperspectral image super-resolution, which consists of a transformer module with local–global spatial attention mechanism (LSMSAformer) and a convolution module with 3D convolution (3DDWTC) to process high and low frequency information, respectively. Specifically, in the transformer branch, the introduced attention mechanism module (LSMSA) is used to extract local–global spatial features at different scales. In the convolution branch, 3DDWTC is proposed to learn local spatial information and preserve the spectral features, which can enhance the representation of the network. Extensive experimental results show that the proposed method can obtain better results than some state-of-the-art hyperspectral image super-resolution methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li
{"title":"Low-parameter GAN inversion framework based on hypernetwork","authors":"Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li","doi":"10.1007/s00530-024-01379-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01379-9","url":null,"abstract":"<p>In response to the significant parameter overhead in current Generative Adversarial Networks (GAN) inversion methods when balancing high fidelity and editability, we propose a novel lightweight inversion framework based on an optimized generator. We aim to balance fidelity and editability within the StyleGAN latent space. To achieve this, the study begins by mapping raw data to the <span>({W}^{+})</span> latent space, enhancing the quality of the resulting inverted images. Following this mapping step, we introduce a carefully designed lightweight hypernetwork. This hypernetwork operates to selectively modify primary detailed features, thereby leading to a notable reduction in the parameter count essential for model training. By learning parameter variations, the precision of subsequent image editing is augmented. Lastly, our approach integrates a multi-channel parallel optimization computing module into the above structure to decrease the time needed for model image processing. Extensive experiments were conducted in facial and automotive imagery domains to validate our lightweight inversion framework. Results demonstrate that our method achieves equivalent or superior inversion and editing quality, utilizing fewer parameters.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SenseMLP: a parallel MLP architecture for sensor-based human activity recognition","authors":"Weilin Li, Jiaming Guo, Hong Wu","doi":"10.1007/s00530-024-01384-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01384-y","url":null,"abstract":"<p>Human activity recognition (HAR) with wearable inertial sensors is a burgeoning field, propelled by advances in sensor technology. Deep learning methods for HAR have notably enhanced recognition accuracy in recent years. Nonetheless, the complexity of previous models often impedes their use in real-life scenarios, particularly in online applications. Addressing this gap, we introduce SenseMLP, a novel approach employing a multi-layer perceptron (MLP) neural network architecture. SenseMLP features three parallel MLP branches that independently process and integrate features across the time, channel, and frequency dimensions. This structure not only simplifies the model but also significantly reduces the number of required parameters compared to previous deep learning HAR frameworks. We conducted comprehensive evaluations of SenseMLP against benchmark HAR datasets, including PAMAP2, OPPORTUNITY, USC-HAD, and SKODA. Our findings demonstrate that SenseMLP not only achieves state-of-the-art performance in terms of accuracy but also boasts fewer parameters and lower floating-point operations per second. For further research and application in the field, the source code of SenseMLP is available at https://github.com/forfrees/SenseMLP.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view and region reasoning semantic enhancement for image-text retrieval","authors":"Wengang Cheng, Ziyi Han, Di He, Lifang Wu","doi":"10.1007/s00530-024-01383-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01383-z","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141336936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: MFDAT: a stock trend prediction of the doublegraph attention network based on multisource information fusion","authors":"Kun Huang, Xiaoming Li, Neal Xiong, Yihe Yang","doi":"10.1007/s00530-024-01376-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01376-y","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141337406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network","authors":"Qihan He, Zhongxu Li, Wenyuan Yang","doi":"10.1007/s00530-024-01367-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01367-z","url":null,"abstract":"<p>Road damage detection using computer vision and deep learning to automatically identify all kinds of road damage is an efficient application in object detection, which can significantly improve the efficiency of road maintenance planning and repair work and ensure road safety. However, due to the complexity of target recognition, the existing road damage detection models usually carry a large number of parameters and a large amount of computation, resulting in a slow inference speed, which limits the actual deployment of the model on the equipment with limited computing resources to a certain extent. In this study, we propose a road damage detector named LMFE-RDD for balancing speed and accuracy, which constructs a Lightweight Multi-Feature Extraction Network (LMFE-Net) as the backbone network and an Efficient Semantic Fusion Network (ESF-Net) for multi-scale feature fusion. First, as the backbone feature extraction network, LMFE-Net inputs road damage images to obtain three different scale feature maps. Second, ESF-Net fuses these three feature graphs and outputs three fusion features. Finally, the detection head is sent for target identification and positioning, and the final result is obtained. In addition, we use WDB loss, a multi-task loss function with a non-monotonic dynamic focusing mechanism, to pay more attention to bounding box regression losses. The experimental results show that the proposed LMFE-RDD model has competitive accuracy while ensuring speed. In the Multi-Perspective Road Damage Dataset, combining the data from all perspectives, LMFE-RDD achieves the detection speed of 51.0 FPS and 64.2% mAP@0.5, but the parameters are only 13.5 M.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution","authors":"P. V. Yeswanth, S. Deivalakshmi","doi":"10.1007/s00530-024-01377-x","DOIUrl":"https://doi.org/10.1007/s00530-024-01377-x","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141344631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}