{"title":"Count-Free Single-Photon 3D Imaging with Race Logic.","authors":"Atul Ingle, David Maier","doi":"10.1109/TPAMI.2023.3302822","DOIUrl":"10.1109/TPAMI.2023.3302822","url":null,"abstract":"<p><p>Single-photon cameras (SPCs) have emerged as a promising new technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by precisely capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Can we estimate distances without explicitly storing photon counts? Yes-here we present an online approach for distance estimation suitable for resource-constrained settings with limited bandwidth, memory and compute. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms as opposed to conventional equi-width histograms. Equi-depth histograms are a more succinct representation for \"peaky\" distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another k-quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide at least an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional histogram-based processing methods.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9953599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinge Yang, Qiang Fu, Mohamed Elhoseiny, Wolfgang Heidrich
{"title":"Aberration-Aware Depth-From-Focus.","authors":"Xinge Yang, Qiang Fu, Mohamed Elhoseiny, Wolfgang Heidrich","doi":"10.1109/TPAMI.2023.3301931","DOIUrl":"10.1109/TPAMI.2023.3301931","url":null,"abstract":"<p><p>Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of network models on both synthetic and real-world data. The experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model for different datasets. The code will be available in github.com/vccimaging/Aberration-Aware-Depth-from-Focus.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9951494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isolating Signals in Passive Non-Line-of-Sight Imaging using Spectral Content.","authors":"Connor Hashemi, Rafael Avelar, James Leger","doi":"10.1109/TPAMI.2023.3301336","DOIUrl":"10.1109/TPAMI.2023.3301336","url":null,"abstract":"<p><p>In real-life passive non-line-of-sight (NLOS) imaging there is an overwhelming amount of undesired scattered radiance, called clutter, that impedes reconstruction of the desired NLOS scene. This paper explores using the spectral domain of the scattered light field to separate the desired scattered radiance from the clutter. We propose two techniques: The first separates the multispectral scattered radiance into a collection of objects each with their own uniform color. The objects which correspond to clutter can then be identified and removed based on how well they can be reconstructed using NLOS imaging algorithms. This technique requires very few priors and uses off-the-shelf algorithms. For the second technique, we derive and solve a convex optimization problem assuming we know the desired signal's spectral content. This method is quicker and can be performed with fewer spectral measurements. We demonstrate both techniques using realistic scenarios. In the presence of clutter that is 50 times stronger than the desired signal, the proposed reconstruction of the NLOS scene is 23 times more accurate than typical reconstructions and 5 times more accurate than using the leading clutter rejection method.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digging Into Uncertainty-based Pseudo-label for Robust Stereo Matching","authors":"Zhelun Shen, Xibin Song, Yuchao Dai, Dingfu Zhou, Zhibo Rao, Liangjun Zhang","doi":"10.48550/arXiv.2307.16509","DOIUrl":"https://doi.org/10.48550/arXiv.2307.16509","url":null,"abstract":"Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose to dig into uncertainty estimation for robust stereo matching. Specifically, to balance the disparity distribution, we employ a pixel-level uncertainty estimation to adaptively adjust the next stage disparity searching space, in this way driving the network progressively prune out the space of unlikely correspondences. Then, to solve the limited ground truth data, an uncertainty-based pseudo-label is proposed to adapt the pre-trained model to the new domain, where pixel-level and area-level uncertainty estimation are proposed to filter out the high-uncertainty pixels of predicted disparity maps and generate sparse while reliable pseudo-labels to align the domain gap. Experimentally, our method shows strong cross-domain, adapt, and joint generalization and obtains 1st place on the stereo task of Robust Vision Challenge 2020. Additionally, our uncertainty-based pseudo-labels can be extended to train monocular depth estimation networks in an unsupervised way and even achieves comparable performance with the supervised methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47838193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias
{"title":"Supervision by Denoising.","authors":"Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias","doi":"10.1109/TPAMI.2023.3299789","DOIUrl":"10.1109/TPAMI.2023.3299789","url":null,"abstract":"<p><p>Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose \"supervision by denoising\" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging-anatomical brain reconstruction (3D) and cortical parcellation (2D)-to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. Our code available at https://github.com/seannz/sud.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9958188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Zodiacal Light For Spaceborne Calibration Of Polarimetric Imagers.","authors":"Or Avitan, Yoav Y Schechner, Ehud Behar","doi":"10.1109/TPAMI.2023.3299526","DOIUrl":"10.1109/TPAMI.2023.3299526","url":null,"abstract":"<p><p>We propose that spaceborne polarimetric imagers can be calibrated, or self-calibrated using zodiacal light (ZL). ZL is created by a cloud of interplanetary dust particles. It has a significant degree of polarization in a wide field of view. From space, ZL is unaffected by terrestrial disturbances. ZL is insensitive to the camera location, so it is suited for simultaneous cross-calibration of satellite constellations. ZL changes on a scale of months, thus being a quasi-constant target in realistic calibration sessions. We derive a forward model for polarimetric image formation. Based on it, we formulate an inverse problem for polarimetric calibration and self-calibration, as well as an algorithm for the solution. The methods here are demonstrated in simulations. Towards these simulations, we render polarized images of the sky, including ZL from space, polarimetric disturbances, and imaging noise.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9885615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang
{"title":"Human Motion Generation: A Survey","authors":"Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang","doi":"10.48550/arXiv.2307.10894","DOIUrl":"https://doi.org/10.48550/arXiv.2307.10894","url":null,"abstract":"Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"63 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139357487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Count-Free Single-Photon 3D Imaging with Race Logic","authors":"A. Ingle, David Maier","doi":"10.48550/arXiv.2307.04924","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04924","url":null,"abstract":"Single-photon cameras (SPCs) have emerged as a promising new technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by precisely capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Can we estimate distances without explicitly storing photon counts? Yes-here we present an online approach for distance estimation suitable for resource-constrained settings with limited bandwidth, memory and compute. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms as opposed to conventional equi-width histograms. Equi-depth histograms are a more succinct representation for \"peaky\" distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another k-quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide at least an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional histogram-based processing methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43712581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carolin Schmitt, B. Anti'c, Andrei Neculai, J. Lee, Andreas Geiger
{"title":"Towards Scalable Multi-View Reconstruction of Geometry and Materials","authors":"Carolin Schmitt, B. Anti'c, Andrei Neculai, J. Lee, Andreas Geiger","doi":"10.48550/arXiv.2306.03747","DOIUrl":"https://doi.org/10.48550/arXiv.2306.03747","url":null,"abstract":"In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45725916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN","authors":"Yang‐Zhi Hu, Qian Zheng, Xudong Jiang, Gang Pan","doi":"10.48550/arXiv.2305.19868","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19868","url":null,"abstract":"Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spiking function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: https://github.com/yangfan-hu/Fast-SNN.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46962526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}