Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops最新文献
Jia Wei, Xiaoqi Zhao, Jonghye Woo, Jinsong Ouyang, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu
{"title":"Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation.","authors":"Jia Wei, Xiaoqi Zhao, Jonghye Woo, Jinsong Ouyang, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu","doi":"10.1109/cvprw67362.2025.00642","DOIUrl":"10.1109/cvprw67362.2025.00642","url":null,"abstract":"<p><p><i>Single domain generalization (SDG) has recently attracted growing attention in medical image segmentation. One promising strategy for SDG is to leverage consistent semantic shape priors across different imaging protocols, scanner vendors, and clinical sites. However, existing dictionary learning methods that encode shape priors often suffer from limited representational power with a small set of offline computed shape elements, or overfitting when the dictionary size grows. Moreover, they are not readily compatible with large foundation models such as the Segment Anything Model (SAM). In this paper, we propose a novel</i> Mixture-of-Shape-Experts (MoSE) <i>framework that seamlessly integrates the idea of mixture-of-experts (MoE) training into dictionary learning to efficiently capture diverse and robust shape priors. Our method conceptualizes each dictionary atom as a \"shape expert,\" which specializes in encoding distinct semantic shape information. A gating network dynamically fuses these shape experts into a robust shape map, with sparse activation guided by SAM encoding to prevent overfitting. We further provide this shape map as a prompt to SAM, utilizing the powerful generalization capability of SAM through bidirectional integration. All modules, including the shape dictionary, are trained in an end-to-end manner. Extensive experiments on multiple public datasets demonstrate its effectiveness</i>.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2025 ","pages":"6450-6460"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12506896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobing Yu, Jin Yang, Xiao Wu, Peijie Qiu, Xiaofeng Liu
{"title":"FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning.","authors":"Xiaobing Yu, Jin Yang, Xiao Wu, Peijie Qiu, Xiaofeng Liu","doi":"10.1109/cvprw67362.2025.00637","DOIUrl":"10.1109/cvprw67362.2025.00637","url":null,"abstract":"<p><p>How to continuously adapt a pre-trained model for sequential tasks with different prediction class labels and/or domains, and finally learn a generalizable model across diverse tasks is a long-lasting challenge. Continual learning (CL) has emerged as a promising approach to leverage pre-trained models (e.g., Transformers) for sequential tasks. While many existing CL methods incrementally store additional learned structures, such as Low-Rank Adaptation (LoRA) adapters or prompts-and sometimes even preserve features from previous samples to maintain performance. This leads to unsustainable parameter growth and escalating storage costs as the number of tasks increases. Moreover, current approaches often lack task similarity awareness, which further hinders the model's ability to effectively adapt to new tasks without interfering with previously acquired knowledge. To address these challenges, we propose FM-LoRA, a novel and efficient low-rank adaptation method that integrates both a dynamic rank selector (DRS) and dynamic meta-prompting (DMP). This framework allocates model capacity more effectively across tasks by leveraging a shared low-rank subspace critical for preserving knowledge, thereby avoiding continual parameter expansion. Extensive experiments on various CL benchmarks, including ImageNet-R, CIFAR100, and CUB200 for class-incremental learning (CIL), and DomainNet for domain-incremental learning (DIL), with Transformers backbone demonstrate that FM-LoRA effectively mitigates catastrophic forgetting while delivering robust performance across a diverse range of tasks and domains.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2025 ","pages":"6399-6408"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Sauer, Yuan Tian, Joerg Bewersdorf, Jens Rittscher
{"title":"Refining Biologically Inconsistent Segmentation Masks with Masked Autoencoders.","authors":"Alexander Sauer, Yuan Tian, Joerg Bewersdorf, Jens Rittscher","doi":"10.1109/CVPRW63382.2024.00684","DOIUrl":"10.1109/CVPRW63382.2024.00684","url":null,"abstract":"<p><p>Microscopy images often feature regions of low signal-to-noise ratio (SNR) which leads to a considerable amount of ambiguity in the correct corresponding segmentation. This ambiguity can introduce inconsistencies in the segmentation mask which violate known biological constraints. In this work, we present a methodology which identifies areas of low SNR and refines the segmentation masks such that they are consistent with biological structures. Low SNR regions with uncertain segmentation are detected using model ensembling and selectively restored by a masked autoencoder (MAE) which leverages information about well-imaged surrounding areas. The prior knowledge of biologically consistent segmentation masks is directly learned from the data. We validate our approach in the context of analysing intracellular structures, specifically by refining segmentation masks of mitochondria in expansion microscopy images with a global staining.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":" ","pages":"6904-6912"},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617224/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142820174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Jiang, Alexander Gedeon, Yiwei Lyu, Eric Landgraf, Yufeng Zhang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Honglak Lee, Todd Hollon
{"title":"Super-resolution of biomedical volumes with 2D supervision.","authors":"Cheng Jiang, Alexander Gedeon, Yiwei Lyu, Eric Landgraf, Yufeng Zhang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Honglak Lee, Todd Hollon","doi":"10.1109/cvprw63382.2024.00690","DOIUrl":"10.1109/cvprw63382.2024.00690","url":null,"abstract":"<p><p>Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth/z-axis resolution, and an insufficient amount of high-quality volumetric data. Leveraging the abundance of high-resolution 2D microscopy data, we introduce masked slice diffusion for super-resolution (MSDSR), which exploits the inherent equivalence in the data-generating distribution across all spatial dimensions of biological specimens. This intrinsic characteristic allows for super-resolution models trained on high-resolution images from one plane (e.g., XY) to effectively generalize to others (XZ, YZ), overcoming the traditional dependency on orientation. We focus on the application of MSDSR to stimulated Raman histology (SRH), an optical imaging modality for biological specimen analysis and intraoperative diagnosis, characterized by its rapid acquisition of high-resolution 2D images but slow and costly optical z-sectioning. To evaluate MSDSR's efficacy, we introduce a new performance metric, SliceFID, and demonstrate MSDSR's superior performance over baseline models through extensive evaluations. Our findings reveal that MSDSR not only significantly enhances the quality and resolution of 3D volumetric data, but also addresses major obstacles hindering the broader application of 3D volumetric microscopy in clinical diagnostics and biomedical research.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"6966-6977"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11444667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142362483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M Dumitrascu, Yalin Wang
{"title":"nnMobileNet: Rethinking CNN for Retinopathy Research.","authors":"Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M Dumitrascu, Yalin Wang","doi":"10.1109/CVPRW63382.2024.00234","DOIUrl":"10.1109/CVPRW63382.2024.00234","url":null,"abstract":"<p><p>Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"2285-2294"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144063350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juyoung Yun, Shahira Abousamra, Chen Li, Rajarsi Gupta, Tahsin Kurc, Dimitris Samaras, Alison Van Dyke, Joel Saltz, Chao Chen
{"title":"Uncertainty Estimation for Tumor Prediction with Unlabeled Data.","authors":"Juyoung Yun, Shahira Abousamra, Chen Li, Rajarsi Gupta, Tahsin Kurc, Dimitris Samaras, Alison Van Dyke, Joel Saltz, Chao Chen","doi":"10.1109/cvprw63382.2024.00688","DOIUrl":"10.1109/cvprw63382.2024.00688","url":null,"abstract":"<p><p>Estimating uncertainty of a neural network is crucial in providing transparency and trustworthiness. In this paper, we focus on uncertainty estimation for digital pathology prediction models. To explore the large amount of unlabeled data in digital pathology, we propose to adopt novel learning method that can fully exploit unlabeled data. The proposed method achieves superior performance compared with different baselines including the celebrated Monte-Carlo Dropout. Closeup inspection of uncertain regions reveal insight into the model and improves the trustworthiness of the models.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"6946-6954"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sai Kumar Reddy Manne, Brendan Martin, Tyler Roy, Ryan Neilson, Rebecca Peters, Meghana Chillara, Christine W Lary, Katherine J Motyl, Michael Wan
{"title":"NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer.","authors":"Sai Kumar Reddy Manne, Brendan Martin, Tyler Roy, Ryan Neilson, Rebecca Peters, Meghana Chillara, Christine W Lary, Katherine J Motyl, Michael Wan","doi":"10.1109/cvprw63382.2024.00686","DOIUrl":"10.1109/cvprw63382.2024.00686","url":null,"abstract":"<p><p>Osteoclast cell image analysis plays a key role in osteoporosis research, but it typically involves extensive manual image processing and hand annotations by a trained expert. In the last few years, a handful of machine learning approaches for osteoclast image analysis have been developed, but none have addressed the full instance segmentation task required to produce the same output as that of the human expert led process. Furthermore, none of the prior, fully automated algorithms have publicly available code, pretrained models, or annotated datasets, inhibiting reproduction and extension of their work. We present a new dataset with ~2 × 10<sup>5</sup> expert annotated mouse osteoclast masks, together with a deep learning instance segmentation method which works for both in vitro mouse osteoclast cells on plastic tissue culture plates and human osteoclast cells on bone chips. To our knowledge, this is the first work to automate the full osteoclast instance segmentation task. Our method achieves a performance of 0.82 mAP<sub>0.5</sub> (mean average precision at intersection-over-union threshold of 0.5) in cross validation for mouse osteoclasts. We present a novel <b>n</b>uclei-aware <b>o</b>steoclast <b>i</b>nstance <b>se</b>gmentation training strategy (<b>NOISe</b>) based on the unique biology of osteoclasts, to improve the model's generalizability and boost the mAP<sub>0.5</sub> from 0.60 to 0.82 on human osteoclasts. We publish our annotated mouse osteoclast image dataset, instance segmentation models, and code at github.com/michaelwwan/noise to enable reproducibility and to provide a public tool to accelerate osteoporosis research.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"6926-6935"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629985/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjin Zhang, Keyi Li, Sen Yang, Sifan Yuan, Ivan Marsic, Genevieve J Sippel, Mary S Kim, Randall S Burd
{"title":"Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking.","authors":"Wenjin Zhang, Keyi Li, Sen Yang, Sifan Yuan, Ivan Marsic, Genevieve J Sippel, Mary S Kim, Randall S Burd","doi":"10.1109/cvprw63382.2024.00500","DOIUrl":"10.1109/cvprw63382.2024.00500","url":null,"abstract":"<p><p>Trauma is a leading cause of mortality worldwide, with about 20% of these deaths being preventable. Most of these preventable deaths result from errors during the initial resuscitation of injured patients. Decision support has been evaluated as an approach to support teams during this phase to reduce errors. Existing systems require manual data entry and monitoring, which makes tasks challenging to accomplish in a time-critical setting. This paper identified the specific challenges of achieving effective decision support in trauma resuscitation based on computer vision techniques, including complex backgrounds, crowded scenes, fine-grained activities, and a scarcity of labeled data. To address the first three challenges, the proposed system involved an actor tracker that identifies individuals, allowing the system to focus on actor-specific features. Video Masked Autoencoder (Video-MAE) was used to overcome the issue of insufficient labeled data. This approach enables self-supervised learning using unlabeled video content, improving feature representation for medical activities. For more reliable performance, an ensemble fusion method was introduced. This technique combines predictions from consecutive video clips and different actors. Our method outperformed existing approaches in identifying fine-grained activities, providing a solution for activity recognition in trauma resuscitation and similar complex domains.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"4950-4958"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation.","authors":"Swati Jindal, Mohit Yadav, Roberto Manduchi","doi":"10.1109/cvprw63382.2024.00065","DOIUrl":"10.1109/cvprw63382.2024.00065","url":null,"abstract":"<p><p><i>Gaze is an essential prompt for analyzing human behavior and attention. Recently, there has been an increasing interest in determining gaze direction from facial videos. However, video gaze estimation faces significant challenges, such as understanding the dynamic evolution of gaze in video sequences, dealing with static backgrounds, and adapting to variations in illumination. To address these challenges, we propose a simple and novel deep learning model designed to estimate gaze from videos, incorporating a specialized attention module. Our method employs a spatial attention mechanism that tracks spatial dynamics within videos. This technique enables accurate gaze direction prediction through a temporal sequence model, adeptly transforming spatial observations into temporal insights, thereby significantly improving gaze estimation accuracy. Additionally, our approach integrates Gaussian processes to include individual-specific traits, facilitating the personalization of our model with just a few labeled samples. Experimental results confirm the efficacy of the proposed approach, demonstrating its success in both within-dataset and cross-dataset settings. Specifically, our proposed approach achieves state-of-the-art performance on the Gaze360 dataset, improving by</i> 2.5° <i>without personalization. Further, by personalizing the model with just three samples, we achieved an additional improvement of</i> 0.8°. <i>The code and pre-trained models are available at</i> https://github.com/jswati31/stage.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2024 ","pages":"604-614"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II","authors":"","doi":"10.1007/978-3-031-18910-4","DOIUrl":"https://doi.org/10.1007/978-3-031-18910-4","url":null,"abstract":"","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83723766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}