Siyuan Huang , Yepeng Liu , Haoyi Cui , Fan Zhang , Jinjiang Li , Xiaofeng Zhang , Mingli Zhang , Caiming Zhang
{"title":"MEAformer: An all-MLP transformer with temporal external attention for long-term time series forecasting","authors":"Siyuan Huang , Yepeng Liu , Haoyi Cui , Fan Zhang , Jinjiang Li , Xiaofeng Zhang , Mingli Zhang , Caiming Zhang","doi":"10.1016/j.ins.2024.120605","DOIUrl":null,"url":null,"abstract":"<div><p>Transformer-based models have significantly improved performance in Long-term Time Series Forecasting (LTSF). These models employ various self-attention mechanisms to discover long-term dependencies. However, the computational efficiency is hampered by the inherent permutation invariance of self-attention, and they primarily focus on relationships within the sequence while neglecting potential relationships between different sample sequences. This limits the ability and flexibility of self-attention in LTSF. In addition, the Transformer's decoder outputs sequences in an autoregressive manner, leading to slow inference speed and error accumulation effects, especially for LTSF. Regarding the issues with Transformer-based models for LTSF, we propose a model better suited for LTSF, named MEAformer. MEAformer adopts a fully connected Multi-Layer Perceptron (MLP) architecture consisting of two types of layers: encoder layers and MLP layers. Unlike most encoder layers in Transformer-based models, the MEAformer replaces self-attention with temporal external attention. Temporal external attention explores potential relationships between different sample sequences in the training dataset. Compared to the quadratic complexity of self-attention mechanisms, temporal external attention has efficient linear complexity. Encoder layers can be stacked multiple times to capture time-dependent relationships at different scales. Furthermore, the MEAformer replaces the intricate decoder layers of the original model with more straightforward MLP layers. This modification aims to enhance inference speed and facilitate single-pass sequence generation, effectively mitigating the problem of error accumulation effects. Regarding long-term forecasting, MEAformer achieves state-of-the-art performance on six benchmark datasets, covering five real-world domains: energy, transportation, economy, weather, and disease. Code is available at: <span>https://github.com/huangsiyuan924/MEAformer</span><svg><path></path></svg>.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"669 ","pages":"Article 120605"},"PeriodicalIF":8.1000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524005188","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer-based models have significantly improved performance in Long-term Time Series Forecasting (LTSF). These models employ various self-attention mechanisms to discover long-term dependencies. However, the computational efficiency is hampered by the inherent permutation invariance of self-attention, and they primarily focus on relationships within the sequence while neglecting potential relationships between different sample sequences. This limits the ability and flexibility of self-attention in LTSF. In addition, the Transformer's decoder outputs sequences in an autoregressive manner, leading to slow inference speed and error accumulation effects, especially for LTSF. Regarding the issues with Transformer-based models for LTSF, we propose a model better suited for LTSF, named MEAformer. MEAformer adopts a fully connected Multi-Layer Perceptron (MLP) architecture consisting of two types of layers: encoder layers and MLP layers. Unlike most encoder layers in Transformer-based models, the MEAformer replaces self-attention with temporal external attention. Temporal external attention explores potential relationships between different sample sequences in the training dataset. Compared to the quadratic complexity of self-attention mechanisms, temporal external attention has efficient linear complexity. Encoder layers can be stacked multiple times to capture time-dependent relationships at different scales. Furthermore, the MEAformer replaces the intricate decoder layers of the original model with more straightforward MLP layers. This modification aims to enhance inference speed and facilitate single-pass sequence generation, effectively mitigating the problem of error accumulation effects. Regarding long-term forecasting, MEAformer achieves state-of-the-art performance on six benchmark datasets, covering five real-world domains: energy, transportation, economy, weather, and disease. Code is available at: https://github.com/huangsiyuan924/MEAformer.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.