Yuqing Ma, Wei Liu, Yajun Gao, Yang Yuan, Shihao Bai, Haotong Qin, Xianglong Liu
{"title":"SeeMore: a spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation","authors":"Yuqing Ma, Wei Liu, Yajun Gao, Yang Yuan, Shihao Bai, Haotong Qin, Xianglong Liu","doi":"10.1007/s11432-022-3859-8","DOIUrl":null,"url":null,"abstract":"<p>Predicting future frames using historical spatiotemporal data sequences is challenging and critical, and it is receiving a lot of attention these days from academic and industrial scholars. Most spatiotemporal predictive algorithms ignore the valuable backward reasoning ability and the disparate learning complexities among different layers and hence, cannot build good long-term dependencies and spatial correlations, resulting in suboptimal solutions. To address the aforementioned issues, we propose a two-stage coarse-to-fine spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation (SeeMore) in this paper, which includes a bidirectional distillation network (BDN) and a level-specific meta-adapter (LMA), to gain bidirectional multilevel reasoning. In the first stage, BDN concentrates on bidirectional dynamics modeling and coarsely constructs spatial correlations of different layers, while LMA is introduced in the second fine-tuning stage to refine the multilevel spatial correlations from a meta-learning perspective. In particular, BDN mimics the forward and backward reasoning abilities of humans in a distillation manner, which aids in the development of long-term dependencies. The LMA views learning of different layers as disparate but related tasks and guides the transfer of learning experiences among these tasks through learning complexities. Thus, each layer could be closer to its solutions and could extract more informative spatial correlations. By capturing the enhanced short-term spatial correlations and long-term temporal dependencies, the proposed model could extract adequate knowledge from sequential historical observations and accurately predict future frames whose backtracking preconditions are consistent with the historical sequence. Our work is general and robust enough to be integrated into most spatiotemporal predictive models without requiring additional computation or memory cost during inference. Extensive experiments on four widely used predictive learning benchmarks validated the proposed model’s effectiveness in comparison to state-of-the-art approaches (e.g., 10.6% improvement of Mean Squared Error on the Moving MNIST dataset).</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"41 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-022-3859-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting future frames using historical spatiotemporal data sequences is challenging and critical, and it is receiving a lot of attention these days from academic and industrial scholars. Most spatiotemporal predictive algorithms ignore the valuable backward reasoning ability and the disparate learning complexities among different layers and hence, cannot build good long-term dependencies and spatial correlations, resulting in suboptimal solutions. To address the aforementioned issues, we propose a two-stage coarse-to-fine spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation (SeeMore) in this paper, which includes a bidirectional distillation network (BDN) and a level-specific meta-adapter (LMA), to gain bidirectional multilevel reasoning. In the first stage, BDN concentrates on bidirectional dynamics modeling and coarsely constructs spatial correlations of different layers, while LMA is introduced in the second fine-tuning stage to refine the multilevel spatial correlations from a meta-learning perspective. In particular, BDN mimics the forward and backward reasoning abilities of humans in a distillation manner, which aids in the development of long-term dependencies. The LMA views learning of different layers as disparate but related tasks and guides the transfer of learning experiences among these tasks through learning complexities. Thus, each layer could be closer to its solutions and could extract more informative spatial correlations. By capturing the enhanced short-term spatial correlations and long-term temporal dependencies, the proposed model could extract adequate knowledge from sequential historical observations and accurately predict future frames whose backtracking preconditions are consistent with the historical sequence. Our work is general and robust enough to be integrated into most spatiotemporal predictive models without requiring additional computation or memory cost during inference. Extensive experiments on four widely used predictive learning benchmarks validated the proposed model’s effectiveness in comparison to state-of-the-art approaches (e.g., 10.6% improvement of Mean Squared Error on the Moving MNIST dataset).
期刊介绍:
Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.