Multimedia Systems最新文献

筛选
英文 中文
HCT: a hybrid CNN and transformer network for hyperspectral image super-resolution HCT:用于高光谱图像超分辨率的混合 CNN 和变换器网络
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-20 DOI: 10.1007/s00530-024-01387-9
Huapeng Wu, Chenyun Wang, Chenyang Lu, Tianming Zhan
{"title":"HCT: a hybrid CNN and transformer network for hyperspectral image super-resolution","authors":"Huapeng Wu, Chenyun Wang, Chenyang Lu, Tianming Zhan","doi":"10.1007/s00530-024-01387-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01387-9","url":null,"abstract":"<p>Recently, convolutional neural network (CNN) and transformer based on hyperspectral image super-resolution methods have achieved superior performance. Nevertheless, this is still an important problem how to effectively extract local and global features and improve spectral representation of hyperspectral image. In this paper, we propose a hybrid CNN and transformer network (HCT) for hyperspectral image super-resolution, which consists of a transformer module with local–global spatial attention mechanism (LSMSAformer) and a convolution module with 3D convolution (3DDWTC) to process high and low frequency information, respectively. Specifically, in the transformer branch, the introduced attention mechanism module (LSMSA) is used to extract local–global spatial features at different scales. In the convolution branch, 3DDWTC is proposed to learn local spatial information and preserve the spectral features, which can enhance the representation of the network. Extensive experimental results show that the proposed method can obtain better results than some state-of-the-art hyperspectral image super-resolution methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-parameter GAN inversion framework based on hypernetwork 基于超网络的低参数 GAN 反演框架
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-17 DOI: 10.1007/s00530-024-01379-9
Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li
{"title":"Low-parameter GAN inversion framework based on hypernetwork","authors":"Hongyang Wang, Ting Wang, Dong Xiang, Wenjie Yang, Jia Li","doi":"10.1007/s00530-024-01379-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01379-9","url":null,"abstract":"<p>In response to the significant parameter overhead in current Generative Adversarial Networks (GAN) inversion methods when balancing high fidelity and editability, we propose a novel lightweight inversion framework based on an optimized generator. We aim to balance fidelity and editability within the StyleGAN latent space. To achieve this, the study begins by mapping raw data to the <span>({W}^{+})</span> latent space, enhancing the quality of the resulting inverted images. Following this mapping step, we introduce a carefully designed lightweight hypernetwork. This hypernetwork operates to selectively modify primary detailed features, thereby leading to a notable reduction in the parameter count essential for model training. By learning parameter variations, the precision of subsequent image editing is augmented. Lastly, our approach integrates a multi-channel parallel optimization computing module into the above structure to decrease the time needed for model image processing. Extensive experiments were conducted in facial and automotive imagery domains to validate our lightweight inversion framework. Results demonstrate that our method achieves equivalent or superior inversion and editing quality, utilizing fewer parameters.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SenseMLP: a parallel MLP architecture for sensor-based human activity recognition SenseMLP:基于传感器的人类活动识别并行 MLP 架构
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-17 DOI: 10.1007/s00530-024-01384-y
Weilin Li, Jiaming Guo, Hong Wu
{"title":"SenseMLP: a parallel MLP architecture for sensor-based human activity recognition","authors":"Weilin Li, Jiaming Guo, Hong Wu","doi":"10.1007/s00530-024-01384-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01384-y","url":null,"abstract":"<p>Human activity recognition (HAR) with wearable inertial sensors is a burgeoning field, propelled by advances in sensor technology. Deep learning methods for HAR have notably enhanced recognition accuracy in recent years. Nonetheless, the complexity of previous models often impedes their use in real-life scenarios, particularly in online applications. Addressing this gap, we introduce SenseMLP, a novel approach employing a multi-layer perceptron (MLP) neural network architecture. SenseMLP features three parallel MLP branches that independently process and integrate features across the time, channel, and frequency dimensions. This structure not only simplifies the model but also significantly reduces the number of required parameters compared to previous deep learning HAR frameworks. We conducted comprehensive evaluations of SenseMLP against benchmark HAR datasets, including PAMAP2, OPPORTUNITY, USC-HAD, and SKODA. Our findings demonstrate that SenseMLP not only achieves state-of-the-art performance in terms of accuracy but also boasts fewer parameters and lower floating-point operations per second. For further research and application in the field, the source code of SenseMLP is available at https://github.com/forfrees/SenseMLP.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
More accurate heatmap generation method for human pose estimation 更精确的人体姿态估计热图生成方法
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-16 DOI: 10.1007/s00530-024-01390-0
Yongfeng Qi, Hengrui Zhang, Jia Liu
{"title":"More accurate heatmap generation method for human pose estimation","authors":"Yongfeng Qi, Hengrui Zhang, Jia Liu","doi":"10.1007/s00530-024-01390-0","DOIUrl":"https://doi.org/10.1007/s00530-024-01390-0","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141335692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view and region reasoning semantic enhancement for image-text retrieval 用于图像文本检索的多视图和区域推理语义增强技术
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-15 DOI: 10.1007/s00530-024-01383-z
Wengang Cheng, Ziyi Han, Di He, Lifang Wu
{"title":"Multi-view and region reasoning semantic enhancement for image-text retrieval","authors":"Wengang Cheng, Ziyi Han, Di He, Lifang Wu","doi":"10.1007/s00530-024-01383-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01383-z","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141336936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: MFDAT: a stock trend prediction of the doublegraph attention network based on multisource information fusion 更正:MFDAT:基于多源信息融合的双图注意力网络的股票走势预测
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-15 DOI: 10.1007/s00530-024-01376-y
Kun Huang, Xiaoming Li, Neal Xiong, Yihe Yang
{"title":"Correction to: MFDAT: a stock trend prediction of the doublegraph attention network based on multisource information fusion","authors":"Kun Huang, Xiaoming Li, Neal Xiong, Yihe Yang","doi":"10.1007/s00530-024-01376-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01376-y","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141337406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BCRA: bidirectional cross-modal implicit relation reasoning and aligning for text-to-image person retrieval BCRA:用于文本到图像人物检索的双向跨模态隐含关系推理和对齐
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-15 DOI: 10.1007/s00530-024-01372-2
Zhaoqi Li, Yongping Xie
{"title":"BCRA: bidirectional cross-modal implicit relation reasoning and aligning for text-to-image person retrieval","authors":"Zhaoqi Li, Yongping Xie","doi":"10.1007/s00530-024-01372-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01372-2","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141336926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network LMFE-RDD:采用轻量级多特征提取网络的道路损坏检测器
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-14 DOI: 10.1007/s00530-024-01367-z
Qihan He, Zhongxu Li, Wenyuan Yang
{"title":"LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network","authors":"Qihan He, Zhongxu Li, Wenyuan Yang","doi":"10.1007/s00530-024-01367-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01367-z","url":null,"abstract":"<p>Road damage detection using computer vision and deep learning to automatically identify all kinds of road damage is an efficient application in object detection, which can significantly improve the efficiency of road maintenance planning and repair work and ensure road safety. However, due to the complexity of target recognition, the existing road damage detection models usually carry a large number of parameters and a large amount of computation, resulting in a slow inference speed, which limits the actual deployment of the model on the equipment with limited computing resources to a certain extent. In this study, we propose a road damage detector named LMFE-RDD for balancing speed and accuracy, which constructs a Lightweight Multi-Feature Extraction Network (LMFE-Net) as the backbone network and an Efficient Semantic Fusion Network (ESF-Net) for multi-scale feature fusion. First, as the backbone feature extraction network, LMFE-Net inputs road damage images to obtain three different scale feature maps. Second, ESF-Net fuses these three feature graphs and outputs three fusion features. Finally, the detection head is sent for target identification and positioning, and the final result is obtained. In addition, we use WDB loss, a multi-task loss function with a non-monotonic dynamic focusing mechanism, to pay more attention to bounding box regression losses. The experimental results show that the proposed LMFE-RDD model has competitive accuracy while ensuring speed. In the Multi-Perspective Road Damage Dataset, combining the data from all perspectives, LMFE-RDD achieves the detection speed of 51.0 FPS and 64.2% mAP@0.5, but the parameters are only 13.5 M.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution ASFESRN:利用图像超分辨率缩小玉米叶片病害实时检测的差距
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-14 DOI: 10.1007/s00530-024-01377-x
P. V. Yeswanth, S. Deivalakshmi
{"title":"ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution","authors":"P. V. Yeswanth, S. Deivalakshmi","doi":"10.1007/s00530-024-01377-x","DOIUrl":"https://doi.org/10.1007/s00530-024-01377-x","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141344631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConIS: controllable text-driven image stylization with semantic intensity ConIS:可控文本驱动的图像风格化与语义强度
IF 3.9 3区 计算机科学
Multimedia Systems Pub Date : 2024-06-13 DOI: 10.1007/s00530-024-01381-1
Gaoming Yang, Changgeng Li, Ji Zhang
{"title":"ConIS: controllable text-driven image stylization with semantic intensity","authors":"Gaoming Yang, Changgeng Li, Ji Zhang","doi":"10.1007/s00530-024-01381-1","DOIUrl":"https://doi.org/10.1007/s00530-024-01381-1","url":null,"abstract":"","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141348641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信