arXiv - CS - Computer Vision and Pattern Recognition最新文献

筛选
英文 中文
MagicStyle: Portrait Stylization Based on Reference Image MagicStyle:基于参考图像的肖像风格化
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08156
Zhaoli Deng, Kaibin Zhou, Fanyi Wang, Zhenpeng Mi
{"title":"MagicStyle: Portrait Stylization Based on Reference Image","authors":"Zhaoli Deng, Kaibin Zhou, Fanyi Wang, Zhenpeng Mi","doi":"arxiv-2409.08156","DOIUrl":"https://doi.org/arxiv-2409.08156","url":null,"abstract":"The development of diffusion models has significantly advanced the research\u0000on image stylization, particularly in the area of stylizing a content image\u0000based on a given style image, which has attracted many scholars. The main\u0000challenge in this reference image stylization task lies in how to maintain the\u0000details of the content image while incorporating the color and texture features\u0000of the style image. This challenge becomes even more pronounced when the\u0000content image is a portrait which has complex textural details. To address this\u0000challenge, we propose a diffusion model-based reference image stylization\u0000method specifically for portraits, called MagicStyle. MagicStyle consists of\u0000two phases: Content and Style DDIM Inversion (CSDI) and Feature Fusion Forward\u0000(FFF). The CSDI phase involves a reverse denoising process, where DDIM\u0000Inversion is performed separately on the content image and the style image,\u0000storing the self-attention query, key and value features of both images during\u0000the inversion process. The FFF phase executes forward denoising, harmoniously\u0000integrating the texture and color information from the pre-stored feature\u0000queries, keys and values into the diffusion generation process based on our\u0000Well-designed Feature Fusion Attention (FFA). We conducted comprehensive\u0000comparative and ablation experiments to validate the effectiveness of our\u0000proposed MagicStyle and FFA.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis 高频反梦境ooth:稳健防御图像合成
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08167
Takuto Onikubo, Yusuke Matsui
{"title":"High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis","authors":"Takuto Onikubo, Yusuke Matsui","doi":"arxiv-2409.08167","DOIUrl":"https://doi.org/arxiv-2409.08167","url":null,"abstract":"Recently, text-to-image generative models have been misused to create\u0000unauthorized malicious images of individuals, posing a growing social problem.\u0000Previous solutions, such as Anti-DreamBooth, add adversarial noise to images to\u0000protect them from being used as training data for malicious generation.\u0000However, we found that the adversarial noise can be removed by adversarial\u0000purification methods such as DiffPure. Therefore, we propose a new adversarial\u0000attack method that adds strong perturbation on the high-frequency areas of\u0000images to make it more robust to adversarial purification. Our experiment\u0000showed that the adversarial images retained noise even after adversarial\u0000purification, hindering malicious image generation.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation IFAdapter:基于文本到图像生成的实例特征控制
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08240
Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang
{"title":"IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation","authors":"Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang","doi":"arxiv-2409.08240","DOIUrl":"https://doi.org/arxiv-2409.08240","url":null,"abstract":"While Text-to-Image (T2I) diffusion models excel at generating visually\u0000appealing images of individual instances, they struggle to accurately position\u0000and control the features generation of multiple instances. The Layout-to-Image\u0000(L2I) task was introduced to address the positioning challenges by\u0000incorporating bounding boxes as spatial control signals, but it still falls\u0000short in generating precise instance features. In response, we propose the\u0000Instance Feature Generation (IFG) task, which aims to ensure both positional\u0000accuracy and feature fidelity in generated instances. To address the IFG task,\u0000we introduce the Instance Feature Adapter (IFAdapter). The IFAdapter enhances\u0000feature depiction by incorporating additional appearance tokens and utilizing\u0000an Instance Semantic Map to align instance-level features with spatial\u0000locations. The IFAdapter guides the diffusion process as a plug-and-play\u0000module, making it adaptable to various community models. For evaluation, we\u0000contribute an IFG benchmark and develop a verification pipeline to objectively\u0000compare models' abilities to generate instances with accurate positioning and\u0000features. Experimental results demonstrate that IFAdapter outperforms other\u0000models in both quantitative and qualitative evaluations.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints UGAD: 利用频率指纹的通用生成式人工智能探测器
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.07913
Inzamamul Alam, Muhammad Shahid Muneer, Simon S. Woo
{"title":"UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints","authors":"Inzamamul Alam, Muhammad Shahid Muneer, Simon S. Woo","doi":"arxiv-2409.07913","DOIUrl":"https://doi.org/arxiv-2409.07913","url":null,"abstract":"In the wake of a fabricated explosion image at the Pentagon, an ability to\u0000discern real images from fake counterparts has never been more critical. Our\u0000study introduces a novel multi-modal approach to detect AI-generated images\u0000amidst the proliferation of new generation methods such as Diffusion models.\u0000Our method, UGAD, encompasses three key detection steps: First, we transform\u0000the RGB images into YCbCr channels and apply an Integral Radial Operation to\u0000emphasize salient radial features. Secondly, the Spatial Fourier Extraction\u0000operation is used for a spatial shift, utilizing a pre-trained deep learning\u0000network for optimal feature extraction. Finally, the deep neural network\u0000classification stage processes the data through dense layers using softmax for\u0000classification. Our approach significantly enhances the accuracy of\u0000differentiating between real and AI-generated images, as evidenced by a 12.64%\u0000increase in accuracy and 28.43% increase in AUC compared to existing\u0000state-of-the-art methods.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Attention Based Influence Model for Manual and Nonmanual Sign Language Analysis 手动和非手动手语分析中基于交叉注意力的影响模型
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08162
Lipisha Chaudhary, Fei Xu, Ifeoma Nwogu
{"title":"Cross-Attention Based Influence Model for Manual and Nonmanual Sign Language Analysis","authors":"Lipisha Chaudhary, Fei Xu, Ifeoma Nwogu","doi":"arxiv-2409.08162","DOIUrl":"https://doi.org/arxiv-2409.08162","url":null,"abstract":"Both manual (relating to the use of hands) and non-manual markers (NMM), such\u0000as facial expressions or mouthing cues, are important for providing the\u0000complete meaning of phrases in American Sign Language (ASL). Efforts have been\u0000made in advancing sign language to spoken/written language understanding, but\u0000most of these have primarily focused on manual features only. In this work,\u0000using advanced neural machine translation methods, we examine and report on the\u0000extent to which facial expressions contribute to understanding sign language\u0000phrases. We present a sign language translation architecture consisting of\u0000two-stream encoders, with one encoder handling the face and the other handling\u0000the upper body (with hands). We propose a new parallel cross-attention decoding\u0000mechanism that is useful for quantifying the influence of each input modality\u0000on the output. The two streams from the encoder are directed simultaneously to\u0000different attention stacks in the decoder. Examining the properties of the\u0000parallel cross-attention weights allows us to analyze the importance of facial\u0000markers compared to body and hand features during a translating task.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Style Based Clustering of Visual Artworks 基于风格的视觉艺术作品聚类
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08245
Abhishek Dangeti, Pavan Gajula, Vivek Srivastava, Vikram Jamwal
{"title":"Style Based Clustering of Visual Artworks","authors":"Abhishek Dangeti, Pavan Gajula, Vivek Srivastava, Vikram Jamwal","doi":"arxiv-2409.08245","DOIUrl":"https://doi.org/arxiv-2409.08245","url":null,"abstract":"Clustering artworks based on style has many potential real-world applications\u0000like art recommendations, style-based search and retrieval, and the study of\u0000artistic style evolution in an artwork corpus. However, clustering artworks\u0000based on style is largely an unaddressed problem. A few present methods for\u0000clustering artworks principally rely on generic image feature representations\u0000derived from deep neural networks and do not specifically deal with the\u0000artistic style. In this paper, we introduce and deliberate over the notion of\u0000style-based clustering of visual artworks. Our main objective is to explore\u0000neural feature representations and architectures that can be used for\u0000style-based clustering and observe their impact and effectiveness. We develop\u0000different methods and assess their relative efficacy for style-based clustering\u0000through qualitative and quantitative analysis by applying them to four artwork\u0000corpora and four curated synthetically styled datasets. Our analysis provides\u0000some key novel insights on architectures, feature representations, and\u0000evaluation methods suitable for style-based clustering.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Control+Shift: Generating Controllable Distribution Shifts 控制+转变:产生可控的分配转变
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.07940
Roy Friedman, Rhea Chowers
{"title":"Control+Shift: Generating Controllable Distribution Shifts","authors":"Roy Friedman, Rhea Chowers","doi":"arxiv-2409.07940","DOIUrl":"https://doi.org/arxiv-2409.07940","url":null,"abstract":"We propose a new method for generating realistic datasets with distribution\u0000shifts using any decoder-based generative model. Our approach systematically\u0000creates datasets with varying intensities of distribution shifts, facilitating\u0000a comprehensive analysis of model performance degradation. We then use these\u0000generated datasets to evaluate the performance of various commonly used\u0000networks and observe a consistent decline in performance with increasing shift\u0000intensity, even when the effect is almost perceptually unnoticeable to the\u0000human eye. We see this degradation even when using data augmentations. We also\u0000find that enlarging the training dataset beyond a certain point has no effect\u0000on the robustness and that stronger inductive biases increase robustness.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scribble-Guided Diffusion for Training-free Text-to-Image Generation 用于免训练文本到图像生成的涂鸦引导扩散技术
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08026
Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim
{"title":"Scribble-Guided Diffusion for Training-free Text-to-Image Generation","authors":"Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim","doi":"arxiv-2409.08026","DOIUrl":"https://doi.org/arxiv-2409.08026","url":null,"abstract":"Recent advancements in text-to-image diffusion models have demonstrated\u0000remarkable success, yet they often struggle to fully capture the user's intent.\u0000Existing approaches using textual inputs combined with bounding boxes or region\u0000masks fall short in providing precise spatial guidance, often leading to\u0000misaligned or unintended object orientation. To address these limitations, we\u0000propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that\u0000utilizes simple user-provided scribbles as visual prompts to guide image\u0000generation. However, incorporating scribbles into diffusion models presents\u0000challenges due to their sparse and thin nature, making it difficult to ensure\u0000accurate orientation alignment. To overcome these challenges, we introduce\u0000moment alignment and scribble propagation, which allow for more effective and\u0000flexible alignment between generated images and scribble inputs. Experimental\u0000results on the PASCAL-Scribble dataset demonstrate significant improvements in\u0000spatial control and consistency, showcasing the effectiveness of scribble-based\u0000guidance in diffusion models. Our code is available at\u0000https://github.com/kaist-cvml-lab/scribble-diffusion.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis VI3DRM:通过逼真的新颖视图合成从稀疏视图实现细致的三维重建
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.08207
Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao
{"title":"VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis","authors":"Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao","doi":"arxiv-2409.08207","DOIUrl":"https://doi.org/arxiv-2409.08207","url":null,"abstract":"Recently, methods like Zero-1-2-3 have focused on single-view based 3D\u0000reconstruction and have achieved remarkable success. However, their predictions\u0000for unseen areas heavily rely on the inductive bias of large-scale pretrained\u0000diffusion models. Although subsequent work, such as DreamComposer, attempts to\u0000make predictions more controllable by incorporating additional views, the\u0000results remain unrealistic due to feature entanglement in the vanilla latent\u0000space, including factors such as lighting, material, and structure. To address\u0000these issues, we introduce the Visual Isotropy 3D Reconstruction Model\u0000(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates\u0000within an ID consistent and perspective-disentangled 3D latent space. By\u0000facilitating the disentanglement of semantic information, color, material\u0000properties and lighting, VI3DRM is capable of generating highly realistic\u0000images that are indistinguishable from real photographs. By leveraging both\u0000real and synthesized images, our approach enables the accurate construction of\u0000pointmaps, ultimately producing finely textured meshes or point clouds. On the\u0000NVS task, tested on the GSO dataset, VI3DRM significantly outperforms\u0000state-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of\u00000.929, and an LPIPS of 0.027. Code will be made available upon publication.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters 显微曼巴仅用 4M 参数揭示显微图像的秘密
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI: arxiv-2409.07896
Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao
{"title":"Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters","authors":"Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao","doi":"arxiv-2409.07896","DOIUrl":"https://doi.org/arxiv-2409.07896","url":null,"abstract":"In the field of medical microscopic image classification (MIC), CNN-based and\u0000Transformer-based models have been extensively studied. However, CNNs struggle\u0000with modeling long-range dependencies, limiting their ability to fully utilize\u0000semantic information in images. Conversely, Transformers are hampered by the\u0000complexity of quadratic computations. To address these challenges, we propose a\u0000model based on the Mamba architecture: Microscopic-Mamba. Specifically, we\u0000designed the Partially Selected Feed-Forward Network (PSFFN) to replace the\u0000last linear layer of the Visual State Space Module (VSSM), enhancing Mamba's\u0000local feature extraction capabilities. Additionally, we introduced the\u0000Modulation Interaction Feature Aggregation (MIFA) module to effectively\u0000modulate and dynamically aggregate global and local features. We also\u0000incorporated a parallel VSSM mechanism to improve inter-channel information\u0000interaction while reducing the number of parameters. Extensive experiments have\u0000demonstrated that our method achieves state-of-the-art performance on five\u0000public datasets. Code is available at\u0000https://github.com/zs1314/Microscopic-Mamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信