International Journal of Computer Vision最新文献

筛选
英文 中文
StyleAdapter: A Unified Stylized Image Generation Model 样式适配器统一的风格化图像生成模型
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02253-x
Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo
{"title":"StyleAdapter: A Unified Stylized Image Generation Model","authors":"Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo","doi":"10.1007/s11263-024-02253-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02253-x","url":null,"abstract":"<p>This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"60 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Correlation for Fingerprinting Deep Face Recognition 指纹深度人脸识别的样本相关性
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02254-w
Jiyang Guan, Jian Liang, Yanbo Wang, Ran He
{"title":"Sample Correlation for Fingerprinting Deep Face Recognition","authors":"Jiyang Guan, Jian Liang, Yanbo Wang, Ran He","doi":"10.1007/s11263-024-02254-w","DOIUrl":"https://doi.org/10.1007/s11263-024-02254-w","url":null,"abstract":"<p>Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning techniques. However, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention nowadays. Previous methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning techniques. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model outputs. Extensive results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, <i>p</i>-value and F1 score. Furthermore, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods. The code will be available at https://github.com/guanjiyang/SAC_JC.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"75 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Show-1:将像素模型和潜在扩散模型用于文本到视频的生成
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-24 DOI: 10.1007/s11263-024-02271-9
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou
{"title":"Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation","authors":"David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou","doi":"10.1007/s11263-024-02271-9","DOIUrl":"https://doi.org/10.1007/s11263-024-02271-9","url":null,"abstract":"<p>Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution, which can also remove potential artifacts and corruptions from low-resolution videos. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15 G vs. 72 G). Furthermore, our Show-1 model can be readily adapted for motion customization and video stylization applications through simple temporal attention layer finetuning. Our model achieves state-of-the-art performance on standard video generation benchmarks. Code of Show-1 is publicly available and more videos can be found here.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"98 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Vector Fields for Implicit Surface Representation and Inference 用于隐式表面表示和推理的神经向量场
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-22 DOI: 10.1007/s11263-024-02251-z
Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool
{"title":"Neural Vector Fields for Implicit Surface Representation and Inference","authors":"Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool","doi":"10.1007/s11263-024-02251-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02251-z","url":null,"abstract":"<p>Neural implicit fields have recently shown increasing success in representing, learning and analysis of 3D shapes. Signed distance fields and occupancy fields are still the preferred choice of implicit representations with well-studied properties, despite their restriction to closed surfaces. With neural networks, unsigned distance fields as well as several other variations and training principles have been proposed with the goal to represent all classes of shapes. In this paper, we develop a novel and yet a fundamental representation considering unit vectors in 3D space and call it Vector Field (VF). At each point in <span>(mathbb {R}^3)</span>, VF is directed to the closest point on the surface. We theoretically demonstrate that VF can be easily transformed to surface density by computing the flux density. Unlike other standard representations, VF directly encodes an important physical property of the surface, its normal. We further show the advantages of VF representation, in learning open, closed, or multi-layered surfaces. We show that, thanks to the continuity property of the neural optimization with VF, a separate distance field becomes unnecessary for extracting surfaces from the implicit field via Marching Cubes. We compare our method on several datasets including ShapeNet where the proposed new neural implicit field shows superior accuracy in representing any type of shape, outperforming other standard methods. Codes are available at https://github.com/edomel/ImplicitVF.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"66 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Text-to-Video Retrieval from Image Captioning 从图像字幕学习文本到视频检索
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-22 DOI: 10.1007/s11263-024-02202-8
Lucas Ventura, Cordelia Schmid, Gül Varol
{"title":"Learning Text-to-Video Retrieval from Image Captioning","authors":"Lucas Ventura, Cordelia Schmid, Gül Varol","doi":"10.1007/s11263-024-02202-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02202-8","url":null,"abstract":"<p>We describe a protocol to study text-to-video retrieval training with unlabeled videos, where we assume (i) no access to labels for any videos, i.e., no access to the set of ground-truth captions, but (ii) access to labeled images in the form of text. Using image expert models is a realistic scenario given that annotating images is cheaper therefore scalable, in contrast to expensive video labeling schemes. Recently, zero-shot image experts such as CLIP have established a new strong baseline for video understanding tasks. In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos. We show that automatically labeling video frames with image captioning allows text-to-video retrieval training. This process adapts the features to the target domain at no manual annotation cost, consequently outperforming the strong zero-shot CLIP baseline. During training, we sample captions from multiple video frames that best match the visual content, and perform a temporal pooling over frame representations by scoring frames according to their relevance to each caption. We conduct extensive ablations to provide insights and demonstrate the effectiveness of this simple framework by outperforming the CLIP zero-shot baselines on text-to-video retrieval on three standard datasets, namely ActivityNet, MSR-VTT, and MSVD. Code and models will be made publicly available.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"13 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CogCartoon: Towards Practical Story Visualization CogCartoon:实现实用的故事可视化
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-21 DOI: 10.1007/s11263-024-02267-5
Zhongyang Zhu, Jie Tang
{"title":"CogCartoon: Towards Practical Story Visualization","authors":"Zhongyang Zhu, Jie Tang","doi":"10.1007/s11263-024-02267-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02267-5","url":null,"abstract":"<p>The state-of-the-art methods for story visualization demonstrate a significant demand for training data and storage, as well as limited flexibility in story presentation, thereby rendering them impractical for real-world applications. We introduce CogCartoon, a practical story visualization method based on pre-trained diffusion models. To alleviate dependence on data and storage, we propose an innovative strategy of character-plugin generation that can represent a specific character as a compact 316 KB plugin by using a few training samples. To facilitate enhanced flexibility, we employ a strategy of plugin-guided and layout-guided inference, enabling users to seamlessly incorporate new characters and custom layouts into the generated image results at their convenience. We have conducted comprehensive qualitative and quantitative studies, providing compelling evidence for the superiority of CogCartoon over existing methodologies. Moreover, CogCartoon demonstrates its power in tackling challenging tasks, including long story visualization and realistic style story visualization.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"45 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing AgMTR:用于遥感中少镜头分割的代理挖掘变换器
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-21 DOI: 10.1007/s11263-024-02252-y
Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun
{"title":"AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing","authors":"Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun","doi":"10.1007/s11263-024-02252-y","DOIUrl":"https://doi.org/10.1007/s11263-024-02252-y","url":null,"abstract":"<p>Few-shot Segmentation aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer, which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder and the Semantic Alignment Decoder are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-<span>(5^i)</span> and COCO-<span>(20^{i})</span>.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"13 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interweaving Insights: High-Order Feature Interaction for Fine-Grained Visual Recognition 交织洞察力:细粒度视觉识别的高阶特征交互
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-20 DOI: 10.1007/s11263-024-02260-y
Arindam Sikdar, Yonghuai Liu, Siddhardha Kedarisetty, Yitian Zhao, Amr Ahmed, Ardhendu Behera
{"title":"Interweaving Insights: High-Order Feature Interaction for Fine-Grained Visual Recognition","authors":"Arindam Sikdar, Yonghuai Liu, Siddhardha Kedarisetty, Yitian Zhao, Amr Ahmed, Ardhendu Behera","doi":"10.1007/s11263-024-02260-y","DOIUrl":"https://doi.org/10.1007/s11263-024-02260-y","url":null,"abstract":"<p>This paper presents a novel approach for Fine-Grained Visual Classification (FGVC) by exploring Graph Neural Networks (GNNs) to facilitate high-order feature interactions, with a specific focus on constructing both inter- and intra-region graphs. Unlike previous FGVC techniques that often isolate global and local features, our method combines both features seamlessly during learning via graphs. Inter-region graphs capture long-range dependencies to recognize global patterns, while intra-region graphs delve into finer details within specific regions of an object by exploring high-dimensional convolutional features. A key innovation is the use of shared GNNs with an attention mechanism coupled with the Approximate Personalized Propagation of Neural Predictions (APPNP) message-passing algorithm, enhancing information propagation efficiency for better discriminability and simplifying the model architecture for computational efficiency. Additionally, the introduction of residual connections improves performance and training stability. Comprehensive experiments showcase state-of-the-art results on benchmark FGVC datasets, affirming the efficacy of our approach. This work underscores the potential of GNN in modeling high-level feature interactions, distinguishing it from previous FGVC methods that typically focus on singular aspects of feature representation. Our source code is available at https://github.com/Arindam-1991/I2-HOFI.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"107 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Generalization and Causal Explanation in Self-Supervised Learning 论自我监督学习中的泛化和因果解释
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-19 DOI: 10.1007/s11263-024-02263-9
Wenwen Qiang, Zeen Song, Ziyin Gu, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong
{"title":"On the Generalization and Causal Explanation in Self-Supervised Learning","authors":"Wenwen Qiang, Zeen Song, Ziyin Gu, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong","doi":"10.1007/s11263-024-02263-9","DOIUrl":"https://doi.org/10.1007/s11263-024-02263-9","url":null,"abstract":"<p>Self-supervised learning (SSL) methods learn from unlabeled data and achieve high generalization performance on downstream tasks. However, they may also suffer from overfitting to their training data and lose the ability to adapt to new tasks. To investigate this phenomenon, we conduct experiments on various SSL methods and datasets and make two observations: (1) Overfitting occurs abruptly in later layers and epochs, while generalizing features are learned in early layers for all epochs; (2) Coding rate reduction can be used as an indicator to measure the degree of overfitting in SSL models. Based on these observations, we propose Undoing Memorization Mechanism (UMM), a plug-and-play method that mitigates overfitting of the pre-trained feature extractor by aligning the feature distributions of the early and the last layers to maximize the coding rate reduction of the last layer output. The learning process of UMM is a bi-level optimization process. We provide a causal analysis of UMM to explain how UMM can help the pre-trained feature extractor overcome overfitting and recover generalization. We also demonstrate that UMM significantly improves the generalization performance of SSL methods on various downstream tasks. The source code is to be released at https://github.com/ZeenSong/UMM.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"16 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample 通过自适应约束自我注意力和因果关系解构样本进行面部动作单元检测
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-10-17 DOI: 10.1007/s11263-024-02258-6
Zhiwen Shao, Hancheng Zhu, Yong Zhou, Xiang Xiang, Bing Liu, Rui Yao, Lizhuang Ma
{"title":"Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample","authors":"Zhiwen Shao, Hancheng Zhu, Yong Zhou, Xiang Xiang, Bing Liu, Rui Yao, Lizhuang Ma","doi":"10.1007/s11263-024-02258-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02258-6","url":null,"abstract":"<p>Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs. Recently, the prevailing techniques of self-attention and causal inference have been introduced to AU detection. However, most existing methods directly learn self-attention guided by AU detection, or employ common patterns for all AUs during causal intervention. The former often captures irrelevant information in a global range, and the latter ignores the specific causal characteristic of each AU. In this paper, we propose a novel AU detection framework called <span>(textrm{AC}^{2})</span>D by adaptively constraining self-attention weight distribution and causally deconfounding the sample confounder. Specifically, we explore the mechanism of self-attention weight distribution, in which the self-attention weight distribution of each AU is regarded as spatial distribution and is adaptively learned under the constraint of location-predefined attention and the guidance of AU detection. Moreover, we propose a causal intervention module for each AU, in which the bias caused by training samples and the interference from irrelevant AUs are both suppressed. Extensive experiments show that our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks, including BP4D, DISFA, GFT, and BP4D+ in constrained scenarios and Aff-Wild2 in unconstrained scenarios.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"232 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信