The Visual Computer最新文献_第3页

Filter-deform attention GAN: constructing human motion videos from few images 滤波变形注意力 GAN：从少量图像构建人体运动视频

The Visual Computer Pub Date : 2024-08-26 DOI: 10.1007/s00371-024-03595-w

Jianjun Zhu, Huihuang Zhao, Yudong Zhang

引用次数: 0

GLDC: combining global and local consistency of multibranch depth completion GLDC：结合多分支深度补全的全局和局部一致性

The Visual Computer Pub Date : 2024-08-26 DOI: 10.1007/s00371-024-03609-7

Yaping Deng, Yingjiang Li, Zibo Wei, Keying Li

{"title":"GLDC: combining global and local consistency of multibranch depth completion","authors":"Yaping Deng, Yingjiang Li, Zibo Wei, Keying Li","doi":"10.1007/s00371-024-03609-7","DOIUrl":"https://doi.org/10.1007/s00371-024-03609-7","url":null,"abstract":"Depth completion aims to generate dense depth maps from sparse depth maps and corresponding RGB images. In this task, the locality based on the convolutional layer poses challenges for the network in obtaining global information. While the Transformer-based architecture performs well in capturing global information, it may lead to the loss of local detail features. Consequently, improving the simultaneous attention to global and local information is crucial for achieving effective depth completion. This paper proposes a novel and effective dual-encoder–three-decoder network, consisting of local and global branches. Specifically, the local branch uses a convolutional network, and the global branch utilizes a Transformer network to extract rich features. Meanwhile, the local branch is dominated by color image and the global branch is dominated by depth map to thoroughly integrate and utilize multimodal information. In addition, a gate fusion mechanism is used in the decoder stage to fuse local and global information, to achieving high-performance depth completion. This hybrid architecture is conducive to the effective fusion of local detail information and contextual information. Experimental results demonstrated the superiority of our method over other advanced methods on KITTI Depth Completion and NYU v2 datasets.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Orhlr-net: one-stage residual learning network for joint single-image specular highlight detection and removal Orhlr-net：用于联合检测和去除单张图像镜面高光的单级残差学习网络

The Visual Computer Pub Date : 2024-08-24 DOI: 10.1007/s00371-024-03607-9

Wenzhe Shi, Ziqi Hu, Hao Chen, Hengjia Zhang, Jiale Yang, Li Li

{"title":"Orhlr-net: one-stage residual learning network for joint single-image specular highlight detection and removal","authors":"Wenzhe Shi, Ziqi Hu, Hao Chen, Hengjia Zhang, Jiale Yang, Li Li","doi":"10.1007/s00371-024-03607-9","DOIUrl":"https://doi.org/10.1007/s00371-024-03607-9","url":null,"abstract":"Detecting and removing specular highlights is a complex task that can greatly enhance various visual tasks in real-world environments. Although previous works have made great progress, they often ignore specular highlight areas or produce unsatisfactory results with visual artifacts such as color distortion. In this paper, we present a framework that utilizes an encoder–decoder structure for the combined task of specular highlight detection and removal in single images, employing specular highlight mask guidance. The encoder uses EfficientNet as a feature extraction backbone network to convert the input RGB image into a series of feature maps. The decoder gradually restores these feature maps to their original size through up-sampling. In the specular highlight detection module, we enhance the network by utilizing residual modules to extract additional feature information, thereby improving detection accuracy. For the specular highlight removal module, we introduce the Convolutional Block Attention Module, which dynamically captures the importance of each channel and spatial location in the input feature map. This enables the model to effectively distinguish between foreground and background, resulting in enhanced adaptability and accuracy in complex scenes. We evaluate the proposed method on the publicly available SHIQ dataset, and its superiority is demonstrated through a comparative analysis of the experimental results. The source code will be available at https://github.com/hzq2333/ORHLR-Net.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention Slot-VTON：基于主体驱动的扩散式虚拟试戴与插槽注意力

The Visual Computer Pub Date : 2024-08-23 DOI: 10.1007/s00371-024-03603-z

Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu

{"title":"Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention","authors":"Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu","doi":"10.1007/s00371-024-03603-z","DOIUrl":"https://doi.org/10.1007/s00371-024-03603-z","url":null,"abstract":"Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EGCT: enhanced graph convolutional transformer for 3D point cloud representation learning EGCT：用于三维点云表示学习的增强型图卷积变换器

The Visual Computer Pub Date : 2024-08-23 DOI: 10.1007/s00371-024-03600-2

Gang Chen, Wenju Wang, Haoran Zhou, Xiaolin Wang

{"title":"EGCT: enhanced graph convolutional transformer for 3D point cloud representation learning","authors":"Gang Chen, Wenju Wang, Haoran Zhou, Xiaolin Wang","doi":"10.1007/s00371-024-03600-2","DOIUrl":"https://doi.org/10.1007/s00371-024-03600-2","url":null,"abstract":"It is an urgent problem of high-precision 3D environment perception to carry out representation learning on point cloud data, which complete the synchronous acquisition of local and global feature information. However, current representation learning methods either only focus on how to efficiently learn local features, or capture long-distance dependencies but lose the fine-grained features. Therefore, we explore transformer on topological structures of point cloud graphs, proposing an enhanced graph convolutional transformer (EGCT) method. EGCT construct graph topology for disordered and unstructured point cloud. Then it uses the enhanced point feature representation method to further aggregate the feature information of all neighborhood points, which can compactly represent the features of this local neighborhood graph. Subsequent process, the graph convolutional transformer simultaneously performs self-attention calculations and convolution operations on the point coordinates and features of the neighborhood graph. It efficiently utilizes the spatial geometric information of point cloud objects. Therefore, EGCT learns comprehensive geometric information of point cloud objects, which can help to improve segmentation and classification accuracy. On the ShapeNetPart and ModelNet40 datasets, our EGCT method achieves a mIoU of 86.8%, OA and AA of 93.5% and 91.2%, respectively. On the large-scale indoor scene point cloud dataset (S3DIS), the OA of EGCT method is 90.1%, and the mIoU is 67.8%. Experimental results demonstrate that our EGCT method can achieve comparable point cloud segmentation and classification performance to state-of-the-art methods while maintaining low model complexity. Our source code is available at https://github.com/shepherds001/EGCT.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mfpenet: multistage foreground-perception enhancement network for remote-sensing scene classification Mfpenet：用于遥感场景分类的多级前景感知增强网络

The Visual Computer Pub Date : 2024-08-13 DOI: 10.1007/s00371-024-03587-w

Junding Sun, Chenxu Wang, Haifeng Sima, Xiaosheng Wu, Shuihua Wang, Yudong Zhang

{"title":"Mfpenet: multistage foreground-perception enhancement network for remote-sensing scene classification","authors":"Junding Sun, Chenxu Wang, Haifeng Sima, Xiaosheng Wu, Shuihua Wang, Yudong Zhang","doi":"10.1007/s00371-024-03587-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03587-w","url":null,"abstract":"Scene classification plays a vital role in the field of remote-sensing (RS). However, remote-sensing images have the essential properties of complex scene information and large-scale spatial changes, as well as the high similarity between various classes and the significant differences within the same class, which brings great challenges to scene classification. To address these issues, a multistage foreground-perception enhancement network (MFPENet) is proposed to enhance the ability to perceive foreground features, thereby improving classification accuracy. Firstly, to enrich the scene semantics of feature information, a multi-scale feature aggregation module is specifically designed using dilated convolution, which takes the features of different stages of the backbone network as input data to obtain enhanced multiscale features. Then, a novel foreground-perception enhancement module is designed to capture foreground information. Unlike the previous methods, we separate foreground features by designing feature masks and then innovatively explore the symbiotic relationship between foreground features and scene features to improve the recognition ability of foreground features further. Finally, a hierarchical attention module is designed to reduce the interference of redundant background details on classification. By embedding the dependence between adjacent level features into the attention mechanism, the model can pay more accurate attention to the key information. Redundancy is reduced, and the loss of useful information is minimized. Experiments on three public RS scene classification datasets [UC-Merced, Aerial Image Dataset, and NWPU-RESISC45] show that our method achieves highly competitive results. Future work will focus on utilizing the background features outside the effective foreground features in the image as a decision aid to improve the distinguishability between similar scenes. The source code of our proposed algorithm and the related datasets are available at https://github.com/Hpu-wcx/MFPENet.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Directional latent space representation for medical image segmentation 用于医学图像分割的定向潜空间表示法

The Visual Computer Pub Date : 2024-08-12 DOI: 10.1007/s00371-024-03589-8

Xintao Liu, Yan Gao, Changqing Zhan, Qiao Wangr, Yu Zhang, Yi He, Hongyan Quan

引用次数: 0

Toward robust visual tracking for UAV with adaptive spatial-temporal weighted regularization 利用自适应时空加权正则化实现无人机的鲁棒视觉跟踪

The Visual Computer Pub Date : 2024-08-07 DOI: 10.1007/s00371-024-03290-w

Zhi Chen, Lijun Liu, Zhen Yu

{"title":"Toward robust visual tracking for UAV with adaptive spatial-temporal weighted regularization","authors":"Zhi Chen, Lijun Liu, Zhen Yu","doi":"10.1007/s00371-024-03290-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03290-w","url":null,"abstract":"The unmanned aerial vehicles (UAV) visual object tracking method based on the discriminative correlation filter (DCF) has gained extensive research and attention due to its superior computation and extraordinary progress, but is always suffers from unnecessary boundary effects. To solve the aforementioned problems, a spatial-temporal regularization correlation filter framework is proposed, which is achieved by introducing a constant regularization term to penalize the coefficients of the DCF filter. The tracker can substantially improve the tracking performance but increase computational complexity. However, these kinds of methods make the object fail to adapt to specific appearance variations, and we need to pay much effort in fine-tuning the spatial-temporal regularization weight coefficients. In this work, an adaptive spatial-temporal weighted regularization (ASTWR) model is proposed. An ASTWR module is introduced to obtain the weighted spatial-temporal regularization coefficients automatically. The proposed ASTWR model can deal effectively with complex situations and substantially improve the credibility of tracking results. In addition, an adaptive spatial-temporal constraint adjusting mechanism is proposed. By repressing the drastic appearance changes between adjacent frames, the tracker enables smooth filter learning in the detection phase. Substantial experiments show that the proposed tracker performs favorably against homogeneous UAV-based and DCF-based trackers. Moreover, the ASTWR tracker reaches over 35 FPS on a single CPU platform, and gains an AUC score of 57.9% and 49.7% on the UAV123 and VisDrone2020 datasets, respectively.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data visualization in healthcare and medicine: a survey 医疗保健中的数据可视化：调查

The Visual Computer Pub Date : 2024-08-07 DOI: 10.1007/s00371-024-03586-x

Xunan Tan, Xiang Suo, Wenjun Li, Lei Bi, Fangshu Yao

{"title":"Data visualization in healthcare and medicine: a survey","authors":"Xunan Tan, Xiang Suo, Wenjun Li, Lei Bi, Fangshu Yao","doi":"10.1007/s00371-024-03586-x","DOIUrl":"https://doi.org/10.1007/s00371-024-03586-x","url":null,"abstract":"Visualization analysis is crucial in healthcare as it provides insights into complex data and aids healthcare professionals in efficiency. Information visualization leverages algorithms to reduce the complexity of high-dimensional heterogeneous data, thereby enhancing healthcare professionals’ understanding of the hidden associations among data structures. In the field of healthcare visualization, efforts have been made to refine and enhance the utility of data through diverse algorithms and visualization techniques. This review aims to summarize the existing research in this domain and identify future research directions. We searched Web of Science, Google Scholar and IEEE Xplore databases, and ultimately, 76 articles were included in our analysis. We collected and synthesized the research findings from these articles, with a focus on visualization, artificial intelligence and supporting tasks in healthcare. Our study revealed that researchers from diverse fields have employed a wide range of visualization techniques to visualize various types of data. We summarized these visualization methods and proposed recommendations for future research. We anticipate that our findings will promote further development and application of visualization techniques in healthcare.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient minor defects detection on steel surface via res-attention and position encoding 通过重定向和位置编码有效检测钢表面的微小缺陷

The Visual Computer Pub Date : 2024-08-07 DOI: 10.1007/s00371-024-03583-0

Chuang Wu, Tingqin He

{"title":"Efficient minor defects detection on steel surface via res-attention and position encoding","authors":"Chuang Wu, Tingqin He","doi":"10.1007/s00371-024-03583-0","DOIUrl":"https://doi.org/10.1007/s00371-024-03583-0","url":null,"abstract":"Impurities and complex manufacturing processes result in many minor, dense steel defects. This situation requires precise defect detection models for effective protection. The single-stage model (based on YOLO) is a popular choice among current models, renowned for its computational efficiency and suitability for real-time online applications. However, existing YOLO-based models often fail to detect small features. To address this issue, we introduce an efficient steel surface defect detection model in YOLOv7, incorporating a feature preservation block (FPB) and location awareness feature pyramid network (LAFPN). The FPB uses shortcut connections that allow the upper layers to access detailed information directly, thus capturing minor defect features more effectively. Furthermore, LAFPN integrates coordinate data during the feature fusion phase, enhancing the detection of minor defects. We introduced a new loss function to identify and locate minor defects accurately. Extensive testing on two public datasets has demonstrated the superior performance of our model compared to five baseline models, achieving an impressive 80.8 mAP on the NEU-DET dataset and 72.6 mAP on the GC10-DET dataset.","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0