Computer Vision and Image Understanding最新文献

筛选
英文 中文
Multi-modal transformer with language modality distillation for early pedestrian action anticipation 多模态转换器与语言模态提炼,用于早期行人行动预测
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-10 DOI: 10.1016/j.cviu.2024.104144
{"title":"Multi-modal transformer with language modality distillation for early pedestrian action anticipation","authors":"","doi":"10.1016/j.cviu.2024.104144","DOIUrl":"10.1016/j.cviu.2024.104144","url":null,"abstract":"<div><p>Language-vision integration has become an increasingly popular research direction within the computer vision field. In recent years, there has been a growing recognition of the importance of incorporating linguistic information into visual tasks, particularly in domains such as action anticipation. This integration allows anticipation models to leverage textual descriptions to gain deeper contextual understanding, leading to more accurate predictions. In this work, we focus on pedestrian action anticipation, where the objective is the early prediction of pedestrians’ future actions in urban environments. Our method relies on a multi-modal transformer model that encodes past observations and produces predictions at different anticipation times, employing a learned mask technique to filter out redundancy in the observed frames. Instead of relying solely on visual cues extracted from images or videos, we explore the impact of integrating textual information in enriching the input modalities of our pedestrian action anticipation model. We investigate various techniques for generating descriptive captions corresponding to input images, aiming to enhance the anticipation performance. Evaluation results on available public benchmarks demonstrate the effectiveness of our method in improving the prediction performance at different anticipation times compared to previous works. Additionally, incorporating the language modality in our anticipation model proved significant improvement, reaching a 29.5% increase in the F1 score at 1-second anticipation and a 16.66% increase at 4-second anticipation. These results underscore the potential of language-vision integration in advancing pedestrian action anticipation in complex urban environments.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S107731422400225X/pdfft?md5=56f12e2679069b787f5e626421a0e104&pid=1-s2.0-S107731422400225X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion HBANet:用于红外和可见光图像融合的混合边界感知注意力网络
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-10 DOI: 10.1016/j.cviu.2024.104161
{"title":"HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion","authors":"","doi":"10.1016/j.cviu.2024.104161","DOIUrl":"10.1016/j.cviu.2024.104161","url":null,"abstract":"<div><p>Infrared and visible image fusion is an extensively investigated problem in infrared image processing, aiming to extract useful information from source images. However, the automatic fusion of these images presents a significant challenge due to the large domain difference and ambiguous boundaries. In this article, we propose a novel image fusion approach based on hybrid boundary-aware attention, termed HBANet, which models global dependencies across the image and leverages boundary-wise prior knowledge to supplement local details. Specifically, we design a novel mixed boundary-aware attention module that is capable of leveraging spatial information to the fullest extent and integrating long dependencies across different domains. To preserve the integrity of texture and structural information, we introduced a sophisticated loss function that comprises structure, intensity, and variation losses. Our method has been demonstrated to outperform state-of-the-art methods in terms of both visual and quantitative metrics, in our experiments on public datasets. Furthermore, our approach also exhibits great generalization capability, achieving satisfactory results in CT and MRI image fusion tasks.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network 基于图结构和改进级联金字塔网络的人机交互检测算法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104162
{"title":"Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network","authors":"","doi":"10.1016/j.cviu.2024.104162","DOIUrl":"10.1016/j.cviu.2024.104162","url":null,"abstract":"<div><p>Aiming at the problem of insufficient use of human–object interaction (HOI) information and spatial location information in images, we propose a human–object​ interaction detection network based on graph structure and improved cascade pyramid. This network is composed of three branches, namely, graph branch, human–object branch and human pose branch. In graph branch, we propose a Graph-based Interactive Feature Generation Algorithm (GIFGA) to address the inadequate utilization of interaction information. GIFGA constructs an initial dense graph model by taking humans and objects as nodes and their interaction relationships as edges. Then, by traversing each node, the graph model is updated to generate the final interaction features. In human pose branch, we propose an Improved Cascade Pyramid Network (ICPN) to tackle the underutilization of spatial location information. ICPN extracts human pose features and maps both the object bounding boxes and extracted human pose maps onto the global feature map to capture the most discriminative interaction-related region features within the global context. Finally, the features from the three branches are fed into a Multi-Layer Perceptron (MLP) for fusion and then classified for recognition. Experimental results demonstrate that our network achieves mAP of 54.93% and 28.69% on the V-COCO and HICO-DET datasets, respectively.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection VIDF-Net:用于三维物体检测的体素-图像动态融合方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104164
{"title":"VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection","authors":"","doi":"10.1016/j.cviu.2024.104164","DOIUrl":"10.1016/j.cviu.2024.104164","url":null,"abstract":"<div><p>In recent years, multi-modal fusion methods have shown excellent performance in the field of 3D object detection, which select the voxel centers and globally fuse with image features across the scene. However, these approaches exist two issues. First, The distribution of voxel density is highly heterogeneous due to the discrete volumes. Additionally, there are significant differences in the features between images and point clouds. Global fusion does not take into account the correspondence between these two modalities, which leads to the insufficient fusion. In this paper, we propose a new multi-modal fusion method named Voxel-Image Dynamic Fusion (VIDF). Specifically, VIDF-Net is composed of the Voxel Centroid Mapping module (VCM) and the Deformable Attention Fusion module (DAF). The Voxel Centroid Mapping module is used to calculate the centroid of voxel features and map them onto the image plane, which can locate the position of voxel features more effectively. We then use the Deformable Attention Fusion module to dynamically calculates the offset of each voxel centroid from the image position and combine these two modalities. Furthermore, we propose Region Proposal Network with Channel-Spatial Aggregate to combine channel and spatial attention maps for improved multi-scale feature interaction. We conduct extensive experiments on the KITTI dataset to demonstrate the outstanding performance of proposed VIDF network. In particular, significant improvements have been observed in the Hard categories of Cars and Pedestrians, which shows the significant effectiveness of our approach in dealing with complex scenarios.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation HAD-Net:基于注意力 U 的网络,采用超尺度移动聚合和最大对角线采样,用于医学图像分割
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104151
{"title":"HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation","authors":"","doi":"10.1016/j.cviu.2024.104151","DOIUrl":"10.1016/j.cviu.2024.104151","url":null,"abstract":"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Targeted adversarial attack on classic vision pipelines 对经典视觉管道的针对性对抗攻击
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-06 DOI: 10.1016/j.cviu.2024.104140
{"title":"Targeted adversarial attack on classic vision pipelines","authors":"","doi":"10.1016/j.cviu.2024.104140","DOIUrl":"10.1016/j.cviu.2024.104140","url":null,"abstract":"<div><p>Deep networks are susceptible to adversarial attacks. End-to-end differentiability of deep networks provides the analytical formulation which has aided in proliferation of diverse adversarial attacks. On the contrary, handcrafted pipelines (local feature matching, bag-of-words based place recognition, and visual tracking) consist of intuitive approaches and perhaps lack end-to-end formal description. In this work, we show that classic handcrafted pipelines are also susceptible to adversarial attacks.</p><p>We propose a novel targeted adversarial attack for multiple well-known handcrafted pipelines and datasets. Our attack is able to match an image with any given target image which can be completely different from the original image. Our approach manages to attack simple (image registration) as well as sophisticated multi-stage (place recognition (FAB-MAP), visual tracking (ORB-SLAM3)) pipelines. We outperform multiple baselines over different public datasets (Places, KITTI and HPatches).</p><p>Our analysis shows that although vulnerable, achieving true imperceptibility is harder in case of targeted attack on handcrafted pipelines. To this end, we propose a stealthy attack where the noise is perceptible but appears benign. In order to assist the community in further examining the weakness of popular handcrafted pipelines we release our code.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video DBMHT:用于视频中三维人体姿态估计的双分支多假设变换器
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-06 DOI: 10.1016/j.cviu.2024.104147
{"title":"DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video","authors":"","doi":"10.1016/j.cviu.2024.104147","DOIUrl":"10.1016/j.cviu.2024.104147","url":null,"abstract":"<div><p>The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch architecture to capture temporal and spatial information and generate multiple hypotheses. To merge these hypotheses, we adopt a lightweight module to integrate spatial and temporal representations. The DBMHT can not only capture spatial information from each joint in the human body and temporal information from each frame in the video but also merge multiple hypotheses that have different spatio-temporal information. Comprehensive evaluation on two challenging datasets (i.e. Human3.6M and MPI-INF-3DHP) demonstrates the superior performance of DBMHT, marking it as a robust and efficient approach for accurate 3D HPE in dynamic scenarios. The results show that our model surpasses the state-of-the-art approach by 1.9% MPJPE with ground truth 2D keypoints as input.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous fake media detection: Adapting deepfake detectors to new generative techniques 连续假媒体检测:让深度假货检测器适应新的生成技术
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-06 DOI: 10.1016/j.cviu.2024.104143
{"title":"Continuous fake media detection: Adapting deepfake detectors to new generative techniques","authors":"","doi":"10.1016/j.cviu.2024.104143","DOIUrl":"10.1016/j.cviu.2024.104143","url":null,"abstract":"<div><p>Generative techniques continue to evolve at an impressively high rate, driven by the hype about these technologies. This rapid advancement severely limits the application of deepfake detectors, which, despite numerous efforts by the scientific community, struggle to achieve sufficiently robust performance against the ever-changing content. To address these limitations, in this paper, we propose an analysis of two continuous learning techniques on a <em>Short</em> and a <em>Long</em> sequence of fake media. Both sequences include a complex and heterogeneous range of deepfakes (generated images and videos) from GANs, computer graphics techniques, and unknown sources. Our experiments show that continual learning could be important in mitigating the need for generalizability. In fact, we show that, although with some limitations, continual learning methods help to maintain good performance across the entire training sequence. For these techniques to work in a sufficiently robust way, however, it is necessary that the tasks in the sequence share similarities. In fact, according to our experiments, the order and similarity of the tasks can affect the performance of the models over time. To address this problem, we show that it is possible to group tasks based on their similarity. This small measure allows for a significant improvement even in longer sequences. This result suggests that continual techniques can be combined with the most promising detection methods, allowing them to catch up with the latest generative techniques. In addition to this, we propose an overview of how this learning approach can be integrated into a deepfake detection pipeline for continuous integration and continuous deployment (CI/CD). This allows you to keep track of different funds, such as social networks, new generative tools, or third-party datasets, and through the integration of continuous learning, allows constant maintenance of the detectors.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002248/pdfft?md5=055418833f110c748b5c22d95d3c42b9&pid=1-s2.0-S1077314224002248-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agglomerator++: Interpretable part-whole hierarchies and latent space representations in neural networks Agglomerator++:神经网络中可解释的部分-整体层次结构和潜在空间表示法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-06 DOI: 10.1016/j.cviu.2024.104159
{"title":"Agglomerator++: Interpretable part-whole hierarchies and latent space representations in neural networks","authors":"","doi":"10.1016/j.cviu.2024.104159","DOIUrl":"10.1016/j.cviu.2024.104159","url":null,"abstract":"<div><p>Deep neural networks achieve outstanding results in a large variety of tasks, often outperforming human experts. However, a known limitation of current neural architectures is the poor accessibility in understanding and interpreting the network’s response to a given input. This is directly related to the huge number of variables and the associated non-linearities of neural models, which are often used as black boxes. This lack of transparency, particularly in crucial areas like autonomous driving, security, and healthcare, can trigger skepticism and limit trust, despite the networks’ high performance. In this work, we want to advance the interpretability in neural networks. We present Agglomerator++, a framework capable of providing a representation of part-whole hierarchies from visual cues and organizing the input distribution to match the conceptual-semantic hierarchical structure between classes. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100, showing that our solution delivers a more interpretable model compared to other state-of-the-art approaches. Our code is available at <span><span>https://mmlab-cv.github.io/Agglomeratorplusplus/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002406/pdfft?md5=ad401203069cc93800237abddffe0b0d&pid=1-s2.0-S1077314224002406-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pyramid transformer-based triplet hashing for robust visual place recognition 基于金字塔变换器的三重哈希算法用于稳健的视觉地点识别
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2024-09-06 DOI: 10.1016/j.cviu.2024.104167
{"title":"Pyramid transformer-based triplet hashing for robust visual place recognition","authors":"","doi":"10.1016/j.cviu.2024.104167","DOIUrl":"10.1016/j.cviu.2024.104167","url":null,"abstract":"<div><p>Deep hashing is being used to approximate nearest neighbor search for large-scale image recognition problems. However, CNN architectures have dominated similar applications. We present a Pyramid Transformer-based Triplet Hashing architecture to handle large-scale place recognition challenges in this study, leveraging the capabilities of Vision Transformer (ViT). For feature representation, we create a Siamese Pyramid Transformer backbone. We present a multi-scale feature aggregation technique to learn discriminative features for scale-invariant features. In addition, we observe that binary codes suitable for place recognition are sub-optimal. To overcome this issue, we use a self-restraint triplet loss deep learning network to create compact hash codes, further increasing recognition accuracy. To the best of our knowledge, this is the first study to use a triplet loss deep learning network to handle the deep hashing learning problem. We do extensive experiments on four difficult place datasets: KITTI, Nordland, VPRICE, and EuRoC. The experimental findings reveal that the suggested technique performs at the cutting edge of large-scale visual place recognition challenges.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信