{"title":"Architecture-Agnostic Untrained Network Priors for Image Reconstruction with Frequency Regularization.","authors":"Yilin Liu, Yunkui Pang, Jiang Li, Yong Chen, Pew-Thian Yap","doi":"10.1007/978-3-031-72630-9_20","DOIUrl":"10.1007/978-3-031-72630-9_20","url":null,"abstract":"<p><p>Untrained networks inspired by deep image priors have shown promising capabilities in recovering high-quality images from noisy or partial measurements <i>without requiring training sets</i>. Their success is widely attributed to implicit regularization due to the spectral bias of suitable network architectures. However, the application of such network-based priors often entails superfluous architectural decisions, risks of overfitting, and lengthy optimization processes, all of which hinder their practicality. To address these challenges, we propose efficient architecture-agnostic techniques to directly modulate the spectral bias of network priors: 1) bandwidth-constrained input, 2) bandwidth-controllable upsamplers, and 3) Lipschitz-regularized convolutional layers. We show that, with <i>just a few lines of code</i>, we can reduce overfitting in underperforming architectures and close performance gaps with high-performing counterparts, minimizing the need for extensive architecture tuning. This makes it possible to employ a more <i>compact</i> model to achieve performance similar or superior to larger models while reducing runtime. Demonstrated on inpainting-like MRI reconstruction task, our results signify for the first time that architectural biases, overfitting, and runtime issues of untrained network priors can be simultaneously addressed without architectural modifications. Our code is publicly available .</p>","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"15072 ","pages":"341-358"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142904254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengfeng Lai, Joohi Chauhan, Brittany N Dugger, Chen-Nee Chuah
{"title":"Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data.","authors":"Zhengfeng Lai, Joohi Chauhan, Brittany N Dugger, Chen-Nee Chuah","doi":"10.1007/978-3-031-73039-9_15","DOIUrl":"10.1007/978-3-031-73039-9_15","url":null,"abstract":"<p><p>Contrastive Language-Image Pre-training (CLIP) has shown its proficiency in acquiring distinctive visual representations and exhibiting strong generalization across diverse vision tasks. However, its effectiveness in pathology image analysis, particularly with limited labeled data, remains an ongoing area of investigation due to challenges associated with significant domain shifts and catastrophic forgetting. Thus, it is imperative to devise efficient adaptation strategies in this domain to enable scalable analysis. In this study, we introduce Path-CLIP, a framework tailored for a swift adaptation of CLIP to various pathology tasks. Firstly, we propose Residual Feature Refinement (RFR) with a dynamically adjustable ratio to effectively integrate and balance source and task-specific knowledge. Secondly, we introduce Hidden Representation Perturbation (HRP) and Dual-view Vision Contrastive (DVC) techniques to mitigate overfitting issues. Finally, we present the Doublet Multimodal Contrastive Loss (DMCL) for fine-tuning CLIP for pathology tasks. We demonstrate that Path-CLIP adeptly adapts pre-trained CLIP to downstream pathology tasks, yielding competitive results. Specifically, Path-CLIP achieves over +<b>19%</b> improvement in accuracy when utilizing mere <b>0.1%</b> of labeled data in PCam with only 10 minutes of fine-tuning while running on a single GPU.</p>","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"15122 ","pages":"256-273"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11949240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143733479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems.","authors":"Yaşar Utku Alçalar, Mehmet Akçakaya","doi":"10.1007/978-3-031-73010-8_26","DOIUrl":"10.1007/978-3-031-73010-8_26","url":null,"abstract":"<p><p>Diffusion models have emerged as powerful generative techniques for solving inverse problems. Despite their success in a variety of inverse problems in imaging, these models require many steps to converge, leading to slow inference time. Recently, there has been a trend in diffusion models for employing sophisticated noise schedules that involve more frequent iterations of timesteps at lower noise levels, thereby improving image generation and convergence speed. However, application of these ideas for solving inverse problems with diffusion models remain challenging, as these noise schedules do not perform well when using empirical tuning for the forward model log-likelihood term weights. To tackle these challenges, we propose zero-shot approximate posterior sampling (ZAPS) that leverages connections to zero-shot physics-driven deep learning. ZAPS fixes the number of sampling steps, and uses zero-shot training with a physics-guided loss function to learn log-likelihood weights at each irregular timestep. We apply ZAPS to the recently proposed diffusion posterior sampling method as baseline, though ZAPS can also be used with other posterior sampling diffusion models. We further approximate the Hessian of the logarithm of the prior using a diagonalization approach with learnable diagonal entries for computational efficiency. These parameters are optimized over a fixed number of epochs with a given computational budget. Our results for various noisy inverse problems, including Gaussian and motion deblurring, inpainting, and super-resolution show that ZAPS reduces inference time, provides robustness to irregular noise schedules and improves reconstruction quality. Code is available at https://github.com/ualcalar17/ZAPS.</p>","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"15141 ","pages":"444-460"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11736016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keenon Werling, Janelle Kaneda, Tian Tan, Rishi Agarwal, Six Skov, Tom Van Wouwe, Scott Uhlrich, Nicholas Bianco, Carmichael Ong, Antoine Falisse, Shardul Sapkota, Aidan Chandra, Joshua Carter, Ezio Preatoni, Benjamin Fregly, Jennifer Hicks, Scott Delp, C Karen Liu
{"title":"AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale.","authors":"Keenon Werling, Janelle Kaneda, Tian Tan, Rishi Agarwal, Six Skov, Tom Van Wouwe, Scott Uhlrich, Nicholas Bianco, Carmichael Ong, Antoine Falisse, Shardul Sapkota, Aidan Chandra, Joshua Carter, Ezio Preatoni, Benjamin Fregly, Jennifer Hicks, Scott Delp, C Karen Liu","doi":"10.1007/978-3-031-73223-2_27","DOIUrl":"10.1007/978-3-031-73223-2_27","url":null,"abstract":"<p><p>While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of movements. We present the <i>AddBiomechanics Dataset 1.0</i>, which includes physically accurate human dynamics of 273 human subjects, over 70 hours of motion and force plate data, totaling more than 24 million frames. To construct this dataset, novel analytical methods were required, which are also reported here. We propose a benchmark for estimating human dynamics from motion using this dataset, and present several baseline results. The AddBiomechanics Dataset is publicly available at addbiomechanics.org/download_data.html.</p>","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"15146 ","pages":"490-508"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948690/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143733480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding","authors":"Cheng Shi, Sibei Yang","doi":"10.1007/978-3-031-20059-5_12","DOIUrl":"https://doi.org/10.1007/978-3-031-20059-5_12","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"19 1","pages":"201-218"},"PeriodicalIF":0.0,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78452878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Zhang, Junle Yu, Xiaolin Huang, Wenhui Zhou, Ji Hou
{"title":"PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry","authors":"Yu Zhang, Junle Yu, Xiaolin Huang, Wenhui Zhou, Ji Hou","doi":"10.1007/978-3-031-20080-9_26","DOIUrl":"https://doi.org/10.1007/978-3-031-20080-9_26","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"73 1","pages":"443-459"},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88805254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanghyun Woo, KwanYong Park, Seoung Wug Oh, In-So Kweon, Joon-Young Lee
{"title":"Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection","authors":"Sanghyun Woo, KwanYong Park, Seoung Wug Oh, In-So Kweon, Joon-Young Lee","doi":"10.1007/978-3-031-19806-9_14","DOIUrl":"https://doi.org/10.1007/978-3-031-19806-9_14","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"46 1","pages":"238-258"},"PeriodicalIF":0.0,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83742480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}