IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Unsupervised Degradation Representation Learning for Unpaired Restoration of Images and Point Clouds. 用于图像和点云非配对修复的无监督退化表征学习。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-30 DOI: 10.1109/TPAMI.2024.3471571
Longguang Wang, Yulan Guo, Yingqian Wang, Xiaoyu Dong, Qingyu Xu, Jungang Yang, Wei An
{"title":"Unsupervised Degradation Representation Learning for Unpaired Restoration of Images and Point Clouds.","authors":"Longguang Wang, Yulan Guo, Yingqian Wang, Xiaoyu Dong, Qingyu Xu, Jungang Yang, Wei An","doi":"10.1109/TPAMI.2024.3471571","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3471571","url":null,"abstract":"<p><p>Restoration tasks in low-level vision aim to restore high-quality (HQ) data from their low-quality (LQ) observations. To circumvents the difficulty of acquiring paired data in real scenarios, unpaired approaches that aim to restore HQ data solely on unpaired data are drawing increasing interest. Since restoration tasks are tightly coupled with the degradation model, unknown and highly diverse degradations in real scenarios make learning from unpaired data quite challenging. In this paper, we propose a degradation representation learning scheme to address this challenge. By learning to distinguish various degradations in the representation space, our degradation representations can extract implicit degradation information in an unsupervised manner. Moreover, to handle diverse degradations, we develop degradation-aware (DA) convolutions with flexible adaption to various degradations to fully exploit the degrdation information in the learned representations. Based on our degradation representations and DA convolutions, we introduce a generic framework for unpaired restoration tasks. Based on our framework, we propose UnIRnet and UnPRnet for unpaired image and point cloud restoration tasks, respectively. It is demonstrated that our degradation representation learning scheme can extract discriminative representations to obtain accurate degradation information. Experiments on unpaired image and point cloud restoration tasks show that our UnIRnet and UnPRnet achieve state-of-the-art performance.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noise Self-Regression: A New Learning Paradigm to Enhance Low-Light Images Without Task-Related Data. 噪声自回归:在没有任务相关数据的情况下增强弱光图像的新学习范例
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-28 DOI: 10.1109/TPAMI.2024.3487361
Zhao Zhang, Suiyi Zhao, Xiaojie Jin, Mingliang Xu, Yi Yang, Shuicheng Yan, Meng Wang
{"title":"Noise Self-Regression: A New Learning Paradigm to Enhance Low-Light Images Without Task-Related Data.","authors":"Zhao Zhang, Suiyi Zhao, Xiaojie Jin, Mingliang Xu, Yi Yang, Shuicheng Yan, Meng Wang","doi":"10.1109/TPAMI.2024.3487361","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3487361","url":null,"abstract":"<p><p>Deep learning-based low-light image enhancement (LLIE) is a task of leveraging deep neural networks to enhance the image illumination while keeping the image content unchanged. From the perspective of training data, existing methods complete the LLIE task driven by one of the following three data types: paired data, unpaired data and zero-reference data. Each type of these data-driven methods has its own advantages, e.g., zero-reference data-based methods have very low requirements on training data and can meet the human needs in many scenarios. In this paper, we leverage pure Gaussian noise to complete the LLIE task, which further reduces the requirements for training data in LLIE tasks and can be used as another alternative in practical use. Specifically, we propose Noise SElf-Regression (NoiSER) without access to any task-related data, simply learns a convolutional neural network equipped with an instance-normalization layer by taking a random noise image, N(0,σ<sup>2</sup>) for each pixel, as both input and output for each training pair, and then the low-light image is fed to the trained network for predicting the normal-light image. Technically, an intuitive explanation for its effectiveness is as follows: 1) the self-regression reconstructs the contrast between adjacent pixels of the input image, 2) the instance-normalization layer may naturally remediate the overall magnitude/lighting of the input image, and 3) the N(0,σ<sup>2</sup>) assumption for each pixel enforces the output image to follow the well-known gray-world hypothesis [1] when the image size is big enough. Compared to current state-of-the-art LLIE methods with access to different task-related data, NoiSER is highly competitive in enhancement quality, yet with a much smaller model size, and much lower training and inference cost. In addition, the experiments also demonstrate that NoiSER has great potential in overexposure suppression and joint processing with other restoration tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning. 合成前的分解:为零镜头合成学习学习不变的分解特征
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-28 DOI: 10.1109/TPAMI.2024.3487222
Tian Zhang, Kongming Liang, Ruoyi Du, Wei Chen, Zhanyu Ma
{"title":"Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning.","authors":"Tian Zhang, Kongming Liang, Ruoyi Du, Wei Chen, Zhanyu Ma","doi":"10.1109/TPAMI.2024.3487222","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3487222","url":null,"abstract":"<p><p>Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an \"encoding-reshuffling-decoding\" process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios. Codes are available at https://github.com/PRIS-CV/Disentangling-before-Composing.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSRR-MaxpoolNMS++: Fast Non-Maximum Suppression with Discretization and Pooling. PSRR-MaxpoolNMS++:利用离散化和池化实现快速非最大值抑制
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-28 DOI: 10.1109/TPAMI.2024.3485898
Tianyi Zhang, Chunyun Chen, Yun Liu, Xue Geng, Mohamed M Sabry Aly, Jie Lin
{"title":"PSRR-MaxpoolNMS++: Fast Non-Maximum Suppression with Discretization and Pooling.","authors":"Tianyi Zhang, Chunyun Chen, Yun Liu, Xue Geng, Mohamed M Sabry Aly, Jie Lin","doi":"10.1109/TPAMI.2024.3485898","DOIUrl":"10.1109/TPAMI.2024.3485898","url":null,"abstract":"<p><p>Non-maximum suppression (NMS) is an essential post-processing step for object detection. The de-facto standard for NMS, namely GreedyNMS, is not parallelizable and could thus be the performance bottleneck in object detection pipelines. MaxpoolNMS is introduced as a fast and parallelizable alternative to GreedyNMS. However, MaxpoolNMS is only capable of replacing the GreedyNMS at the first stage of two-stage detectors like Faster R-CNN. To address this issue, we observe that MaxpoolNMS employs the process of box coordinate discretization followed by local score argmax calculation, to discard the nested-loop pipeline in GreedyNMS to enable parallelizable implementations. In this paper, we introduce a simple Relationship Recovery module and a Pyramid Shifted MaxpoolNMS module to improve the above two stages, respectively. With these two modules, our PSRR-MaxpoolNMS is a generic and parallelizable approach, which can completely replace GreedyNMS at all stages in all detectors. Furthermore, we extend PSRR-MaxpoolNMS to the more powerful PSRR-MaxpoolNMS++. As for box coordinate discretization, we propose Density-based Discretization for better adherence to the target density of the suppression. As for local score argmax calculation, we propose an Adjacent Scale Pooling scheme for mining out the duplicated box pairs more accurately and efficiently. Extensive experiments demonstrate that both our PSRR-MaxpoolNMS and PSRR-MaxpoolNMS++ outperform MaxpoolNMS by a large margin. Additionally, PSRR-MaxpoolNMS++ not only surpasses PSRR-MaxpoolNMS but also attains competitive accuracy and much better efficiency when compared with GreedyNMS. Therefore, PSRR-MaxpoolNMS++ is a parallelizable NMS solution that can effectively replace GreedyNMS at all stages in all detectors.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations. FLAC:通过抑制属性-类别关联进行公平感知表征学习。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-28 DOI: 10.1109/TPAMI.2024.3487254
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou
{"title":"FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations.","authors":"Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou","doi":"10.1109/TPAMI.2024.3487254","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3487254","url":null,"abstract":"<p><p>Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-unaware approaches, i.e., approaches operating without such labels, exhibit considerably lower performance. To overcome these limitations, this work introduces FLAC, a methodology that minimizes mutual information between the features extracted by the model and a protected attribute, without the use of attribute labels. To do that, FLAC proposes a sampling strategy that highlights underrepresented samples in the dataset, and casts the problem of learning fair representations as a probability matching problem that leverages representations extracted by a bias-capturing classifier. It is theoretically shown that FLAC can indeed lead to fair representations, that are independent of the protected attributes. FLAC surpasses the current state-of-the-art on Biased-MNIST, CelebA, and UTKFace, by 29.1%, 18.1%, and 21.9%, respectively. Additionally, FLAC exhibits 2.2% increased accuracy on ImageNet-A and up to 4.2% increased accuracy on Corrupted-Cifar10. Finally, in most experiments, FLAC even outperforms the bias label-aware state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement. 利用时空相关性增强技术实现基于窗口的快速事件去噪。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI: 10.1109/TPAMI.2024.3467709
Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi
{"title":"Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement.","authors":"Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi","doi":"10.1109/TPAMI.2024.3467709","DOIUrl":"10.1109/TPAMI.2024.3467709","url":null,"abstract":"<p><p>Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based on probability distributions in both temporal and spatial domains to improve interpretability. In temporal domain, we use timestamp deviations between processing events and central event to judge the temporal correlation and filter out temporal-irrelevant events. In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise and use the learned convolutional sparse coding to optimize the objective function. Based on the theoretical analysis, we build Temporal Window (TW) module and Soft Spatial Feature Embedding (SSFE) module to process temporal and spatial information separately, and construct a novel multi-scale window-based event denoising network, named WedNet. The high denoising accuracy and fast running speed of our WedNet enables us to achieve real-time denoising in complex scenes. Extensive experimental results verify the effectiveness and robustness of our WedNet. Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation. 通过参数高效自适应进行有缺失模态的鲁棒多模态学习
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI: 10.1109/TPAMI.2024.3476487
Md Kaykobad Reza, Ashley Prater-Bennette, M Salman Asif
{"title":"Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation.","authors":"Md Kaykobad Reza, Ashley Prater-Bennette, M Salman Asif","doi":"10.1109/TPAMI.2024.3476487","DOIUrl":"10.1109/TPAMI.2024.3476487","url":null,"abstract":"<p><p>Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 1% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on five different multimodal tasks across seven datasets. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous-time Object Segmentation using High Temporal Resolution Event Camera. 使用高时间分辨率事件摄像机进行连续时间物体分割。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI: 10.1109/TPAMI.2024.3477591
Lin Zhu, Xianzhang Chen, Lizhi Wang, Xiao Wang, Yonghong Tian, Hua Huang
{"title":"Continuous-time Object Segmentation using High Temporal Resolution Event Camera.","authors":"Lin Zhu, Xianzhang Chen, Lizhi Wang, Xiao Wang, Yonghong Tian, Hua Huang","doi":"10.1109/TPAMI.2024.3477591","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3477591","url":null,"abstract":"<p><p>Event cameras are novel bio-inspired sensors, where individual pixels operate independently and asynchronously, generating intensity changes as events. Leveraging the microsecond resolution (no motion blur) and high dynamic range (compatible with extreme light conditions) of events, there is considerable promise in directly segmenting objects from sparse and asynchronous event streams in various applications. However, different from the rich cues in video object segmentation, it is challenging to segment complete objects from the sparse event stream. In this paper, we present the first framework for continuous-time object segmentation from event stream. Given the object mask at the initial time, our task aims to segment the complete object at any subsequent time in event streams. Specifically, our framework consists of a Recurrent Temporal Embedding Extraction (RTEE) module based on a novel ResLSTM, a Cross-time Spatiotemporal Feature Modeling (CSFM) module which is a transformer architecture with long-term and short-term matching modules, and a segmentation head. The historical events and masks (reference sets) are recurrently fed into our framework along with current-time events. The temporal embedding is updated as new events are input, enabling our framework to continuously process the event stream. To train and test our model, we construct both real-world and simulated event-based object segmentation datasets, each comprising event streams, APS images, and object annotations. Extensive experiments on our datasets demonstrate the effectiveness of the proposed recurrent architecture. Our code and dataset are available at https://sites.google.com/view/ecos-net/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-Grained Lightweight Strategy 双粒度轻量级策略
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI: 10.1109/TPAMI.2024.3437421
Debin Liu;Xiang Bai;Ruonan Zhao;Xianjun Deng;Laurence T. Yang
{"title":"Dual-Grained Lightweight Strategy","authors":"Debin Liu;Xiang Bai;Ruonan Zhao;Xianjun Deng;Laurence T. Yang","doi":"10.1109/TPAMI.2024.3437421","DOIUrl":"10.1109/TPAMI.2024.3437421","url":null,"abstract":"Removing redundant parameters and computations before the model training has attracted a great interest as it can effectively reduce the storage space of the model, speed up the training and inference of the model, and save energy consumption during the running of the model. In addition, the simplification of deep neural network models can enable high-performance network models to be deployed to resource-constrained edge devices, thus promoting the development of the intelligent world. However, current pruning at initialization methods exhibit poor performance at extreme sparsity. In order to improve the performance of the model under extreme sparsity, this paper proposes a dual-grained lightweight strategy-TEDEPR. This is the first time that TEDEPR has used tensor theory in the pruning at initialization method to optimize the structure of a sparse sub-network model and improve its performance. Specifically, first, at the coarse-grained level, we represent the weight matrix or weight tensor of the model as a low-rank tensor decomposition form and use multi-step chain operations to enhance the feature extraction capability of the base module to construct a low-rank compact network model. Second, unimportant weights are pruned at a fine-grained level based on the trainability of the weights in the low-rank model before the training of the model, resulting in the final compressed model. To evaluate the superiority of TEDEPR, we conducted extensive experiments on MNIST, UCF11, CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet datasets with LeNet, LSTM, VGGNet, ResNet and Transformer architectures, and compared with state-of-the-art methods. The experimental results show that TEDEPR has higher accuracy, faster training and inference, and less storage space than other pruning at initialization methods under extreme sparsity.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10228-10245"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model. Changen2:多时遥感生成变化基础模型
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI: 10.1109/TPAMI.2024.3475824
Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong
{"title":"Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model.","authors":"Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong","doi":"10.1109/TPAMI.2024.3475824","DOIUrl":"10.1109/TPAMI.2024.3475824","url":null,"abstract":"<p><p>Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a \"generative change foundation model\" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing \"foundation models\", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 <sup>2</sup> pixel single-temporal images can yield time series of any length and resolutions of 1,024 <sup>2</sup> pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model and datasets are available at https://github.com/Z-Zheng/pytorch-change-models.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信