arXiv - CS - Computer Vision and Pattern Recognition最新文献

筛选
英文 中文
SFDA-rPPG: Source-Free Domain Adaptive Remote Physiological Measurement with Spatio-Temporal Consistency SFDA-rPPG:具有时空一致性的无源域自适应远程生理测量技术
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12040
Yiping Xie, Zitong Yu, Bingjie Wu, Weicheng Xie, Linlin Shen
{"title":"SFDA-rPPG: Source-Free Domain Adaptive Remote Physiological Measurement with Spatio-Temporal Consistency","authors":"Yiping Xie, Zitong Yu, Bingjie Wu, Weicheng Xie, Linlin Shen","doi":"arxiv-2409.12040","DOIUrl":"https://doi.org/arxiv-2409.12040","url":null,"abstract":"Remote Photoplethysmography (rPPG) is a non-contact method that uses facial\u0000video to predict changes in blood volume, enabling physiological metrics\u0000measurement. Traditional rPPG models often struggle with poor generalization\u0000capacity in unseen domains. Current solutions to this problem is to improve its\u0000generalization in the target domain through Domain Generalization (DG) or\u0000Domain Adaptation (DA). However, both traditional methods require access to\u0000both source domain data and target domain data, which cannot be implemented in\u0000scenarios with limited access to source data, and another issue is the privacy\u0000of accessing source domain data. In this paper, we propose the first\u0000Source-free Domain Adaptation benchmark for rPPG measurement (SFDA-rPPG), which\u0000overcomes these limitations by enabling effective domain adaptation without\u0000access to source domain data. Our framework incorporates a Three-Branch\u0000Spatio-Temporal Consistency Network (TSTC-Net) to enhance feature consistency\u0000across domains. Furthermore, we propose a new rPPG distribution alignment loss\u0000based on the Frequency-domain Wasserstein Distance (FWD), which leverages\u0000optimal transport to align power spectrum distributions across domains\u0000effectively and further enforces the alignment of the three branches. Extensive\u0000cross-domain experiments and ablation studies demonstrate the effectiveness of\u0000our proposed method in source-free domain adaptation settings. Our findings\u0000highlight the significant contribution of the proposed FWD loss for\u0000distributional alignment, providing a valuable reference for future research\u0000and applications. The source code is available at\u0000https://github.com/XieYiping66/SFDA-rPPG","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applications of Knowledge Distillation in Remote Sensing: A Survey 知识蒸馏在遥感中的应用:调查
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12111
Yassine Himeur, Nour Aburaed, Omar Elharrouss, Iraklis Varlamis, Shadi Atalla, Wathiq Mansoor, Hussain Al Ahmad
{"title":"Applications of Knowledge Distillation in Remote Sensing: A Survey","authors":"Yassine Himeur, Nour Aburaed, Omar Elharrouss, Iraklis Varlamis, Shadi Atalla, Wathiq Mansoor, Hussain Al Ahmad","doi":"arxiv-2409.12111","DOIUrl":"https://doi.org/arxiv-2409.12111","url":null,"abstract":"With the ever-growing complexity of models in the field of remote sensing\u0000(RS), there is an increasing demand for solutions that balance model accuracy\u0000with computational efficiency. Knowledge distillation (KD) has emerged as a\u0000powerful tool to meet this need, enabling the transfer of knowledge from large,\u0000complex models to smaller, more efficient ones without significant loss in\u0000performance. This review article provides an extensive examination of KD and\u0000its innovative applications in RS. KD, a technique developed to transfer\u0000knowledge from a complex, often cumbersome model (teacher) to a more compact\u0000and efficient model (student), has seen significant evolution and application\u0000across various domains. Initially, we introduce the fundamental concepts and\u0000historical progression of KD methods. The advantages of employing KD are\u0000highlighted, particularly in terms of model compression, enhanced computational\u0000efficiency, and improved performance, which are pivotal for practical\u0000deployments in RS scenarios. The article provides a comprehensive taxonomy of\u0000KD techniques, where each category is critically analyzed to demonstrate the\u0000breadth and depth of the alternative options, and illustrates specific case\u0000studies that showcase the practical implementation of KD methods in RS tasks,\u0000such as instance segmentation and object detection. Further, the review\u0000discusses the challenges and limitations of KD in RS, including practical\u0000constraints and prospective future directions, providing a comprehensive\u0000overview for researchers and practitioners in the field of RS. Through this\u0000organization, the paper not only elucidates the current state of research in KD\u0000but also sets the stage for future research opportunities, thereby contributing\u0000significantly to both academic research and real-world applications.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis EFCM:压缩模型上的高效微调,用于在医学图像分析中部署大型模型
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11817
Shaojie Li, Zhaoshuo Diao
{"title":"EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis","authors":"Shaojie Li, Zhaoshuo Diao","doi":"arxiv-2409.11817","DOIUrl":"https://doi.org/arxiv-2409.11817","url":null,"abstract":"The recent development of deep learning large models in medicine shows\u0000remarkable performance in medical image analysis and diagnosis, but their large\u0000number of parameters causes memory and inference latency challenges. Knowledge\u0000distillation offers a solution, but the slide-level gradients cannot be\u0000backpropagated for student model updates due to high-resolution pathological\u0000images and slide-level labels. This study presents an Efficient Fine-tuning on\u0000Compressed Models (EFCM) framework with two stages: unsupervised feature\u0000distillation and fine-tuning. In the distillation stage, Feature Projection\u0000Distillation (FPD) is proposed with a TransScan module for adaptive receptive\u0000field adjustment to enhance the knowledge absorption capability of the student\u0000model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM,\u0000Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are\u0000conducted on 11 downstream datasets related to three large medical models:\u0000RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The\u0000experimental results demonstrate that the EFCM framework significantly improves\u0000accuracy and efficiency in handling slide-level pathological image problems,\u0000effectively addressing the challenges of deploying large medical models.\u0000Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC\u0000compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The\u0000analysis of model inference efficiency highlights the high efficiency of the\u0000distillation fine-tuning method.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Encoding for Image Recall: Human-Like Memory 图像再现的神经编码:类人记忆
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11750
Virgile Foussereau, Robin Dumas
{"title":"Neural Encoding for Image Recall: Human-Like Memory","authors":"Virgile Foussereau, Robin Dumas","doi":"arxiv-2409.11750","DOIUrl":"https://doi.org/arxiv-2409.11750","url":null,"abstract":"Achieving human-like memory recall in artificial systems remains a\u0000challenging frontier in computer vision. Humans demonstrate remarkable ability\u0000to recall images after a single exposure, even after being shown thousands of\u0000images. However, this capacity diminishes significantly when confronted with\u0000non-natural stimuli such as random textures. In this paper, we present a method\u0000inspired by human memory processes to bridge this gap between artificial and\u0000biological memory systems. Our approach focuses on encoding images to mimic the\u0000high-level information retained by the human brain, rather than storing raw\u0000pixel data. By adding noise to images before encoding, we introduce variability\u0000akin to the non-deterministic nature of human memory encoding. Leveraging\u0000pre-trained models' embedding layers, we explore how different architectures\u0000encode images and their impact on memory recall. Our method achieves impressive\u0000results, with 97% accuracy on natural images and near-random performance (52%)\u0000on textures. We provide insights into the encoding process and its implications\u0000for machine learning memory systems, shedding light on the parallels between\u0000human and artificial intelligence memory mechanisms.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SymFace: Additional Facial Symmetry Loss for Deep Face Recognition SymFace:深度人脸识别的额外面部对称性损失
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11816
Pritesh Prakash, Koteswar Rao Jerripothula, Ashish Jacob Sam, Prinsh Kumar Singh, S Umamaheswaran
{"title":"SymFace: Additional Facial Symmetry Loss for Deep Face Recognition","authors":"Pritesh Prakash, Koteswar Rao Jerripothula, Ashish Jacob Sam, Prinsh Kumar Singh, S Umamaheswaran","doi":"arxiv-2409.11816","DOIUrl":"https://doi.org/arxiv-2409.11816","url":null,"abstract":"Over the past decade, there has been a steady advancement in enhancing face\u0000recognition algorithms leveraging advanced machine learning methods. The role\u0000of the loss function is pivotal in addressing face verification problems and\u0000playing a game-changing role. These loss functions have mainly explored\u0000variations among intra-class or inter-class separation. This research examines\u0000the natural phenomenon of facial symmetry in the face verification problem. The\u0000symmetry between the left and right hemi faces has been widely used in many\u0000research areas in recent decades. This paper adopts this simple approach\u0000judiciously by splitting the face image vertically into two halves. With the\u0000assumption that the natural phenomena of facial symmetry can enhance face\u0000verification methodology, we hypothesize that the two output embedding vectors\u0000of split faces must project close to each other in the output embedding space.\u0000Inspired by this concept, we penalize the network based on the disparity of\u0000embedding of the symmetrical pair of split faces. Symmetrical loss has the\u0000potential to minimize minor asymmetric features due to facial expression and\u0000lightning conditions, hence significantly increasing the inter-class variance\u0000among the classes and leading to more reliable face embedding. This loss\u0000function propels any network to outperform its baseline performance across all\u0000existing network architectures and configurations, enabling us to achieve SoTA\u0000results.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the Black Box: Independent Functional Module Evaluation for Bird's-Eye-View Perception Model 揭开黑盒的面纱:鸟瞰感知模型的独立功能模块评估
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11969
Ludan Zhang, Xiaokang Ding, Yuqi Dai, Lei He, Keqiang Li
{"title":"Unveiling the Black Box: Independent Functional Module Evaluation for Bird's-Eye-View Perception Model","authors":"Ludan Zhang, Xiaokang Ding, Yuqi Dai, Lei He, Keqiang Li","doi":"arxiv-2409.11969","DOIUrl":"https://doi.org/arxiv-2409.11969","url":null,"abstract":"End-to-end models are emerging as the mainstream in autonomous driving\u0000perception. However, the inability to meticulously deconstruct their internal\u0000mechanisms results in diminished development efficacy and impedes the\u0000establishment of trust. Pioneering in the issue, we present the Independent\u0000Functional Module Evaluation for Bird's-Eye-View Perception Model (BEV-IFME), a\u0000novel framework that juxtaposes the module's feature maps against Ground Truth\u0000within a unified semantic Representation Space to quantify their similarity,\u0000thereby assessing the training maturity of individual functional modules. The\u0000core of the framework lies in the process of feature map encoding and\u0000representation aligning, facilitated by our proposed two-stage Alignment\u0000AutoEncoder, which ensures the preservation of salient information and the\u0000consistency of feature structure. The metric for evaluating the training\u0000maturity of functional modules, Similarity Score, demonstrates a robust\u0000positive correlation with BEV metrics, with an average correlation coefficient\u0000of 0.9387, attesting to the framework's reliability for assessment purposes.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning EventAug:基于事件学习的多方面时空数据增强方法
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11813
Yukun Tian, Hao Chen, Yongjian Deng, Feihong Shen, Kepan Liu, Wei You, Ziyang Zhang
{"title":"EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning","authors":"Yukun Tian, Hao Chen, Yongjian Deng, Feihong Shen, Kepan Liu, Wei You, Ziyang Zhang","doi":"arxiv-2409.11813","DOIUrl":"https://doi.org/arxiv-2409.11813","url":null,"abstract":"The event camera has demonstrated significant success across a wide range of\u0000areas due to its low time latency and high dynamic range. However, the\u0000community faces challenges such as data deficiency and limited diversity, often\u0000resulting in over-fitting and inadequate feature learning. Notably, the\u0000exploration of data augmentation techniques in the event community remains\u0000scarce. This work aims to address this gap by introducing a systematic\u0000augmentation scheme named EventAug to enrich spatial-temporal diversity. In\u0000particular, we first propose Multi-scale Temporal Integration (MSTI) to\u0000diversify the motion speed of objects, then introduce Spatial-salient Event\u0000Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants.\u0000Our EventAug can facilitate models learning with richer motion patterns, object\u0000variants and local spatio-temporal relations, thus improving model robustness\u0000to varied moving speeds, occlusions, and action disruptions. Experiment results\u0000show that our augmentation method consistently yields significant improvements\u0000across different tasks and backbones (e.g., a 4.87% accuracy gain on DVS128\u0000Gesture). Our code will be publicly available for this community.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Qwen2-VL:增强视觉语言模型在任何分辨率下对世界的感知能力
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.12191
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin
{"title":"Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution","authors":"Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin","doi":"arxiv-2409.12191","DOIUrl":"https://doi.org/arxiv-2409.12191","url":null,"abstract":"We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL\u0000models that redefines the conventional predetermined-resolution approach in\u0000visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism,\u0000which enables the model to dynamically process images of varying resolutions\u0000into different numbers of visual tokens. This approach allows the model to\u0000generate more efficient and accurate visual representations, closely aligning\u0000with human perceptual processes. The model also integrates Multimodal Rotary\u0000Position Embedding (M-RoPE), facilitating the effective fusion of positional\u0000information across text, images, and videos. We employ a unified paradigm for\u0000processing both images and videos, enhancing the model's visual perception\u0000capabilities. To explore the potential of large multimodal models, Qwen2-VL\u0000investigates the scaling laws for large vision-language models (LVLMs). By\u0000scaling both the model size-with versions at 2B, 8B, and 72B parameters-and the\u0000amount of training data, the Qwen2-VL Series achieves highly competitive\u0000performance. Notably, the Qwen2-VL-72B model achieves results comparable to\u0000leading models such as GPT-4o and Claude3.5-Sonnet across various multimodal\u0000benchmarks, outperforming other generalist models. Code is available at\u0000url{https://github.com/QwenLM/Qwen2-VL}.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intraoperative Registration by Cross-Modal Inverse Neural Rendering 通过跨模态反向神经渲染进行术中配准
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11983
Maximilian Fehrentz, Mohammad Farid Azampour, Reuben Dorent, Hassan Rasheed, Colin Galvin, Alexandra Golby, William M. Wells, Sarah Frisken, Nassir Navab, Nazim Haouchine
{"title":"Intraoperative Registration by Cross-Modal Inverse Neural Rendering","authors":"Maximilian Fehrentz, Mohammad Farid Azampour, Reuben Dorent, Hassan Rasheed, Colin Galvin, Alexandra Golby, William M. Wells, Sarah Frisken, Nassir Navab, Nazim Haouchine","doi":"arxiv-2409.11983","DOIUrl":"https://doi.org/arxiv-2409.11983","url":null,"abstract":"We present in this paper a novel approach for 3D/2D intraoperative\u0000registration during neurosurgery via cross-modal inverse neural rendering. Our\u0000approach separates implicit neural representation into two components, handling\u0000anatomical structure preoperatively and appearance intraoperatively. This\u0000disentanglement is achieved by controlling a Neural Radiance Field's appearance\u0000with a multi-style hypernetwork. Once trained, the implicit neural\u0000representation serves as a differentiable rendering engine, which can be used\u0000to estimate the surgical camera pose by minimizing the dissimilarity between\u0000its rendered images and the target intraoperative image. We tested our method\u0000on retrospective patients' data from clinical cases, showing that our method\u0000outperforms state-of-the-art while meeting current clinical standards for\u0000registration. Code and additional resources can be found at\u0000https://maxfehrentz.github.io/style-ngp/.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View RopeBEV:鸟瞰式多摄像头路边感知网络
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI: arxiv-2409.11706
Jinrang Jia, Guangqi Yi, Yifeng Shi
{"title":"RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View","authors":"Jinrang Jia, Guangqi Yi, Yifeng Shi","doi":"arxiv-2409.11706","DOIUrl":"https://doi.org/arxiv-2409.11706","url":null,"abstract":"Multi-camera perception methods in Bird's-Eye-View (BEV) have gained wide\u0000application in autonomous driving. However, due to the differences between\u0000roadside and vehicle-side scenarios, there currently lacks a multi-camera BEV\u0000solution in roadside. This paper systematically analyzes the key challenges in\u0000multi-camera BEV perception for roadside scenarios compared to vehicle-side.\u0000These challenges include the diversity in camera poses, the uncertainty in\u0000Camera numbers, the sparsity in perception regions, and the ambiguity in\u0000orientation angles. In response, we introduce RopeBEV, the first dense\u0000multi-camera BEV approach. RopeBEV introduces BEV augmentation to address the\u0000training balance issues caused by diverse camera poses. By incorporating\u0000CamMask and ROIMask (Region of Interest Mask), it supports variable camera\u0000numbers and sparse perception, respectively. Finally, camera rotation embedding\u0000is utilized to resolve orientation ambiguity. Our method ranks 1st on the\u0000real-world highway dataset RoScenes and demonstrates its practical value on a\u0000private urban dataset that covers more than 50 intersections and 600 cameras.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信