{"title":"A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning","authors":"Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang","doi":"10.1109/tpami.2024.3498346","DOIUrl":"https://doi.org/10.1109/tpami.2024.3498346","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"3 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petr Hruby, Timothy Duff, Anton Leykin, Tomas Pajdla
{"title":"Learning to Solve Hard Minimal Problems.","authors":"Petr Hruby, Timothy Duff, Anton Leykin, Tomas Pajdla","doi":"10.1109/TPAMI.2023.3307898","DOIUrl":"10.1109/TPAMI.2023.3307898","url":null,"abstract":"<p><p>We present an approach to solving hard geometric optimization problems in the RANSAC framework. The hard minimal problems arise from relaxing the original geometric optimization problem into a minimal problem with many spurious solutions. Our approach avoids computing large numbers of spurious solutions. We design a learning strategy for selecting a starting problem-solution pair that can be numerically continued to the problem and the solution of interest. We demonstrate our approach by developing a RANSAC solver for the problem of computing the relative pose of three calibrated cameras, via a minimal relaxation using four points in each view. On average, we can solve a single problem in under 70 μs. We also benchmark and study our engineering choices on the very familiar problem of computing the relative pose of two calibrated cameras, via the minimal case of five points in two views.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10055768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sean M Farrell, Vivek Boominathan, Nathaniel Raymondi, Ashutosh Sabharwal, Ashok Veeraraghavan
{"title":"CoIR: Compressive Implicit Radar.","authors":"Sean M Farrell, Vivek Boominathan, Nathaniel Raymondi, Ashutosh Sabharwal, Ashok Veeraraghavan","doi":"10.1109/TPAMI.2023.3301553","DOIUrl":"10.1109/TPAMI.2023.3301553","url":null,"abstract":"<p><p>Using millimeter wave (mmWave) signals for imaging has an important advantage in that they can penetrate through poor environmental conditions such as fog, dust, and smoke that severely degrade optical-based imaging systems. However, mmWave radars, contrary to cameras and LiDARs, suffer from low angular resolution because of small physical apertures and conventional signal processing techniques. Sparse radar imaging, on the other hand, can increase the aperture size while minimizing the power consumption and read out bandwidth. This paper presents CoIR, an analysis by synthesis method that leverages the implicit neural network bias in convolutional decoders and compressed sensing to perform high accuracy sparse radar imaging. The proposed system is data set-agnostic and does not require any auxiliary sensors for training or testing. We introduce a sparse array design that allows for a 5.5× reduction in the number of antenna elements needed compared to conventional MIMO array designs. We demonstrate our system's improved imaging performance over standard mmWave radars and other competitive untrained methods on both simulated and experimental mmWave radar data.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9971344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua
{"title":"Transformer-Empowered Invariant Grounding for Video Question Answering.","authors":"Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua","doi":"10.1109/TPAMI.2023.3303451","DOIUrl":"10.1109/TPAMI.2023.3303451","url":null,"abstract":"<p><p>Video Question Answering (VideoQA) is the task of answering questions about a video. At its core is the understanding of the alignments between video scenes and question semantics to yield the answer. In leading VideoQA models, the typical learning objective, empirical risk minimization (ERM), tends to over-exploit the spurious correlations between question-irrelevant scenes and answers, instead of inspecting the causal effect of question-critical scenes, which undermines the prediction with unreliable reasoning. In this work, we take a causal look at VideoQA and propose a modal-agnostic learning framework, named Invariant Grounding for VideoQA (IGV), to ground the question-critical scene, whose causal relations with answers are invariant across different interventions on the complement. With IGV, leading VideoQA models are forced to shield the answering from the negative influence of spurious correlations, which significantly improves their reasoning ability. To unleash the potential of this framework, we further provide a Transformer-Empowered Invariant Grounding for VideoQA (TIGV), a substantial instantiation of IGV framework that naturally integrates the idea of invariant grounding into a transformer-style backbone. Experiments on four benchmark datasets validate our design in terms of accuracy, visual explainability, and generalization ability over the leading baselines. Our code is available at https://github.com/yl3800/TIGV.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9968656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Count-Free Single-Photon 3D Imaging with Race Logic.","authors":"Atul Ingle, David Maier","doi":"10.1109/TPAMI.2023.3302822","DOIUrl":"10.1109/TPAMI.2023.3302822","url":null,"abstract":"<p><p>Single-photon cameras (SPCs) have emerged as a promising new technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by precisely capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Can we estimate distances without explicitly storing photon counts? Yes-here we present an online approach for distance estimation suitable for resource-constrained settings with limited bandwidth, memory and compute. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms as opposed to conventional equi-width histograms. Equi-depth histograms are a more succinct representation for \"peaky\" distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another k-quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide at least an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional histogram-based processing methods.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9953599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinge Yang, Qiang Fu, Mohamed Elhoseiny, Wolfgang Heidrich
{"title":"Aberration-Aware Depth-From-Focus.","authors":"Xinge Yang, Qiang Fu, Mohamed Elhoseiny, Wolfgang Heidrich","doi":"10.1109/TPAMI.2023.3301931","DOIUrl":"10.1109/TPAMI.2023.3301931","url":null,"abstract":"<p><p>Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of network models on both synthetic and real-world data. The experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model for different datasets. The code will be available in github.com/vccimaging/Aberration-Aware-Depth-from-Focus.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9951494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isolating Signals in Passive Non-Line-of-Sight Imaging using Spectral Content.","authors":"Connor Hashemi, Rafael Avelar, James Leger","doi":"10.1109/TPAMI.2023.3301336","DOIUrl":"10.1109/TPAMI.2023.3301336","url":null,"abstract":"<p><p>In real-life passive non-line-of-sight (NLOS) imaging there is an overwhelming amount of undesired scattered radiance, called clutter, that impedes reconstruction of the desired NLOS scene. This paper explores using the spectral domain of the scattered light field to separate the desired scattered radiance from the clutter. We propose two techniques: The first separates the multispectral scattered radiance into a collection of objects each with their own uniform color. The objects which correspond to clutter can then be identified and removed based on how well they can be reconstructed using NLOS imaging algorithms. This technique requires very few priors and uses off-the-shelf algorithms. For the second technique, we derive and solve a convex optimization problem assuming we know the desired signal's spectral content. This method is quicker and can be performed with fewer spectral measurements. We demonstrate both techniques using realistic scenarios. In the presence of clutter that is 50 times stronger than the desired signal, the proposed reconstruction of the NLOS scene is 23 times more accurate than typical reconstructions and 5 times more accurate than using the leading clutter rejection method.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias
{"title":"Supervision by Denoising.","authors":"Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias","doi":"10.1109/TPAMI.2023.3299789","DOIUrl":"10.1109/TPAMI.2023.3299789","url":null,"abstract":"<p><p>Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose \"supervision by denoising\" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging-anatomical brain reconstruction (3D) and cortical parcellation (2D)-to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. Our code available at https://github.com/seannz/sud.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9958188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}