arXiv - CS - Computer Vision and Pattern Recognition最新文献

PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba PhysMamba：利用慢-快时差曼巴进行高效远程生理测量

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12031

Chaoqi Luo, Yiping Xie, Zitong Yu

{"title":"PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba","authors":"Chaoqi Luo, Yiping Xie, Zitong Yu","doi":"arxiv-2409.12031","DOIUrl":"https://doi.org/arxiv-2409.12031","url":null,"abstract":"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\u0000physiological signals and monitoring heart activity without any contact,\u0000showing significant potential in various applications. Previous deep learning\u0000based rPPG measurement are primarily based on CNNs and Transformers. However,\u0000the limited receptive fields of CNNs restrict their ability to capture\u0000long-range spatio-temporal dependencies, while Transformers also struggle with\u0000modeling long video sequences with high complexity. Recently, the state space\u0000models (SSMs) represented by Mamba are known for their impressive performance\u0000on capturing long-range dependencies from long sequences. In this paper, we\u0000propose the PhysMamba, a Mamba-based framework, to efficiently represent\u0000long-range physiological dependencies from facial videos. Specifically, we\u0000introduce the Temporal Difference Mamba block to first enhance local dynamic\u0000differences and further model the long-range spatio-temporal context. Moreover,\u0000a dual-stream SlowFast architecture is utilized to fuse the multi-scale\u0000temporal features. Extensive experiments are conducted on three benchmark\u0000datasets to demonstrate the superiority and efficiency of PhysMamba. The codes\u0000are available at https://github.com/Chaoqi31/PhysMamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Massively Multi-Person 3D Human Motion Forecasting with Scene Context 利用场景语境进行大规模多人三维人体运动预测

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12189

Felix B Mueller, Julian Tanke, Juergen Gall

引用次数: 0

ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation ChefFusion：整合食谱和食物图像生成的多模态基础模型

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12010

Peiyu Li, Xiaobao Huang, Yijun Tian, Nitesh V. Chawla

引用次数: 0

Distilling Channels for Efficient Deep Tracking 提炼通道，实现高效深度跟踪

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11785

Shiming Ge, Zhao Luo, Chunhui Zhang, Yingying Hua, Dacheng Tao

{"title":"Distilling Channels for Efficient Deep Tracking","authors":"Shiming Ge, Zhao Luo, Chunhui Zhang, Yingying Hua, Dacheng Tao","doi":"arxiv-2409.11785","DOIUrl":"https://doi.org/arxiv-2409.11785","url":null,"abstract":"Deep trackers have proven success in visual tracking. Typically, these\u0000trackers employ optimally pre-trained deep networks to represent all diverse\u0000objects with multi-channel features from some fixed layers. The deep networks\u0000employed are usually trained to extract rich knowledge from massive data used\u0000in object classification and so they are capable to represent generic objects\u0000very well. However, these networks are too complex to represent a specific\u0000moving object, leading to poor generalization as well as high computational and\u0000memory costs. This paper presents a novel and general framework termed channel\u0000distillation to facilitate deep trackers. To validate the effectiveness of\u0000channel distillation, we take discriminative correlation filter (DCF) and ECO\u0000for example. We demonstrate that an integrated formulation can turn feature\u0000compression, response map generation, and model update into a unified energy\u0000minimization problem to adaptively select informative feature channels that\u0000improve the efficacy of tracking moving objects on the fly. Channel\u0000distillation can accurately extract good channels, alleviating the influence of\u0000noisy channels and generally reducing the number of channels, as well as\u0000adaptively generalizing to different channels and networks. The resulting deep\u0000tracker is accurate, fast, and has low memory requirements. Extensive\u0000experimental evaluations on popular benchmarks clearly demonstrate the\u0000effectiveness and generalizability of our framework.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ORB-SfMLearner: ORB-Guided Self-supervised Visual Odometry with Selective Online Adaptation ORB-SfMLearner：具有选择性在线适应功能的 ORB 引导的自监督视觉测距仪

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11692

Yanlin Jin, Rui-Yang Ju, Haojun Liu, Yuzhong Zhong

引用次数: 0

On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery 关于侧扫声纳图像分类任务中的视觉变换器

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12026

BW Sheffield, Jeffrey Ellen, Ben Whitmore

{"title":"On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery","authors":"BW Sheffield, Jeffrey Ellen, Ben Whitmore","doi":"arxiv-2409.12026","DOIUrl":"https://doi.org/arxiv-2409.12026","url":null,"abstract":"Side-scan sonar (SSS) imagery presents unique challenges in the\u0000classification of man-made objects on the seafloor due to the complex and\u0000varied underwater environments. Historically, experts have manually interpreted\u0000SSS images, relying on conventional machine learning techniques with\u0000hand-crafted features. While Convolutional Neural Networks (CNNs) significantly\u0000advanced automated classification in this domain, they often fall short when\u0000dealing with diverse seafloor textures, such as rocky or ripple sand bottoms,\u0000where false positive rates may increase. Recently, Vision Transformers (ViTs)\u0000have shown potential in addressing these limitations by utilizing a\u0000self-attention mechanism to capture global information in image patches,\u0000offering more flexibility in processing spatial hierarchies. This paper\u0000rigorously compares the performance of ViT models alongside commonly used CNN\u0000architectures, such as ResNet and ConvNext, for binary classification tasks in\u0000SSS imagery. The dataset encompasses diverse geographical seafloor types and is\u0000balanced between the presence and absence of man-made objects. ViT-based models\u0000exhibit superior classification performance across f1-score, precision, recall,\u0000and accuracy metrics, although at the cost of greater computational resources.\u0000CNNs, with their inductive biases, demonstrate better computational efficiency,\u0000making them suitable for deployment in resource-constrained environments like\u0000underwater vehicles. Future research directions include exploring\u0000self-supervised learning for ViTs and multi-modal fusion to further enhance\u0000performance in challenging underwater environments.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models 通过扩散模型的时空组合生成复杂的三维人体运动

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11920

Lorenzo Mandelli, Stefano Berretti

引用次数: 0

GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation GUNet：用于生成稳定和多样化姿势的图卷积网络联合扩散模型

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11689

Shuowen Liang, Sisi Li, Qingyun Wang, Cen Zhang, Kaiquan Zhu, Tian Yang

{"title":"GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation","authors":"Shuowen Liang, Sisi Li, Qingyun Wang, Cen Zhang, Kaiquan Zhu, Tian Yang","doi":"arxiv-2409.11689","DOIUrl":"https://doi.org/arxiv-2409.11689","url":null,"abstract":"Pose skeleton images are an important reference in pose-controllable image\u0000generation. In order to enrich the source of skeleton images, recent works have\u0000investigated the generation of pose skeletons based on natural language. These\u0000methods are based on GANs. However, it remains challenging to perform diverse,\u0000structurally correct and aesthetically pleasing human pose skeleton generation\u0000with various textual inputs. To address this problem, we propose a framework\u0000with GUNet as the main model, PoseDiffusion. It is the first generative\u0000framework based on a diffusion model and also contains a series of variants\u0000fine-tuned based on a stable diffusion model. PoseDiffusion demonstrates\u0000several desired properties that outperform existing methods. 1) Correct\u0000Skeletons. GUNet, a denoising model of PoseDiffusion, is designed to\u0000incorporate graphical convolutional neural networks. It is able to learn the\u0000spatial relationships of the human skeleton by introducing skeletal information\u0000during the training process. 2) Diversity. We decouple the key points of the\u0000skeleton and characterise them separately, and use cross-attention to introduce\u0000textual conditions. Experimental results show that PoseDiffusion outperforms\u0000existing SoTA algorithms in terms of stability and diversity of text-driven\u0000pose skeleton generation. Qualitative analyses further demonstrate its\u0000superiority for controllable generation in Stable Diffusion.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MitoSeg: Mitochondria Segmentation Tool MitoSeg：线粒体分割工具

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11974

Faris Serdar Taşel, Efe Çiftci

引用次数: 0

Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation 寻找主观真相：为人工智能模型综合评估收集 200 万张选票

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11904

Dimitrios Christodoulou, Mads Kuhlmann-Jørgensen

引用次数: 0