MEAformer: An all-MLP transformer with temporal external attention for long-term time series forecasting

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2024-04-15 DOI:10.1016/j.ins.2024.120605

Siyuan Huang , Yepeng Liu , Haoyi Cui , Fan Zhang , Jinjiang Li , Xiaofeng Zhang , Mingli Zhang , Caiming Zhang

{"title":"MEAformer: An all-MLP transformer with temporal external attention for long-term time series forecasting","authors":"Siyuan Huang , Yepeng Liu , Haoyi Cui , Fan Zhang , Jinjiang Li , Xiaofeng Zhang , Mingli Zhang , Caiming Zhang","doi":"10.1016/j.ins.2024.120605","DOIUrl":null,"url":null,"abstract":"<div><p>Transformer-based models have significantly improved performance in Long-term Time Series Forecasting (LTSF). These models employ various self-attention mechanisms to discover long-term dependencies. However, the computational efficiency is hampered by the inherent permutation invariance of self-attention, and they primarily focus on relationships within the sequence while neglecting potential relationships between different sample sequences. This limits the ability and flexibility of self-attention in LTSF. In addition, the Transformer's decoder outputs sequences in an autoregressive manner, leading to slow inference speed and error accumulation effects, especially for LTSF. Regarding the issues with Transformer-based models for LTSF, we propose a model better suited for LTSF, named MEAformer. MEAformer adopts a fully connected Multi-Layer Perceptron (MLP) architecture consisting of two types of layers: encoder layers and MLP layers. Unlike most encoder layers in Transformer-based models, the MEAformer replaces self-attention with temporal external attention. Temporal external attention explores potential relationships between different sample sequences in the training dataset. Compared to the quadratic complexity of self-attention mechanisms, temporal external attention has efficient linear complexity. Encoder layers can be stacked multiple times to capture time-dependent relationships at different scales. Furthermore, the MEAformer replaces the intricate decoder layers of the original model with more straightforward MLP layers. This modification aims to enhance inference speed and facilitate single-pass sequence generation, effectively mitigating the problem of error accumulation effects. Regarding long-term forecasting, MEAformer achieves state-of-the-art performance on six benchmark datasets, covering five real-world domains: energy, transportation, economy, weather, and disease. Code is available at: <span>https://github.com/huangsiyuan924/MEAformer</span><svg><path></path></svg>.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"669 ","pages":"Article 120605"},"PeriodicalIF":8.1000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524005188","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based models have significantly improved performance in Long-term Time Series Forecasting (LTSF). These models employ various self-attention mechanisms to discover long-term dependencies. However, the computational efficiency is hampered by the inherent permutation invariance of self-attention, and they primarily focus on relationships within the sequence while neglecting potential relationships between different sample sequences. This limits the ability and flexibility of self-attention in LTSF. In addition, the Transformer's decoder outputs sequences in an autoregressive manner, leading to slow inference speed and error accumulation effects, especially for LTSF. Regarding the issues with Transformer-based models for LTSF, we propose a model better suited for LTSF, named MEAformer. MEAformer adopts a fully connected Multi-Layer Perceptron (MLP) architecture consisting of two types of layers: encoder layers and MLP layers. Unlike most encoder layers in Transformer-based models, the MEAformer replaces self-attention with temporal external attention. Temporal external attention explores potential relationships between different sample sequences in the training dataset. Compared to the quadratic complexity of self-attention mechanisms, temporal external attention has efficient linear complexity. Encoder layers can be stacked multiple times to capture time-dependent relationships at different scales. Furthermore, the MEAformer replaces the intricate decoder layers of the original model with more straightforward MLP layers. This modification aims to enhance inference speed and facilitate single-pass sequence generation, effectively mitigating the problem of error accumulation effects. Regarding long-term forecasting, MEAformer achieves state-of-the-art performance on six benchmark datasets, covering five real-world domains: energy, transportation, economy, weather, and disease. Code is available at: https://github.com/huangsiyuan924/MEAformer.

查看原文本刊更多论文

MEAformer：用于长期时间序列预测的具有时间外部注意力的全 MLP 转换器

基于变压器的模型大大提高了长期时间序列预测（LTSF）的性能。这些模型采用各种自注意机制来发现长期依赖关系。然而，自注意固有的排列不变性妨碍了计算效率，而且它们主要关注序列内部的关系，而忽略了不同样本序列之间的潜在关系。这限制了自注意在 LTSF 中的能力和灵活性。此外，变换器的解码器以自回归方式输出序列，导致推理速度缓慢和误差累积效应，尤其是对 LTSF 而言。针对基于 Transformer 的 LTSF 模型存在的问题，我们提出了一种更适合 LTSF 的模型，命名为 MEAformer。MEAformer 采用全连接的多层感知器（MLP）架构，由两类层组成：编码器层和 MLP 层。与基于 Transformer 的模型中的大多数编码器层不同，MEAformer 用时间外部注意力取代了自我注意力。时间外部注意力探索训练数据集中不同样本序列之间的潜在关系。与自我注意机制的二次复杂性相比，时态外部注意具有高效的线性复杂性。编码器层可以多次叠加，以捕捉不同尺度的时间依赖关系。此外，MEAformer 用更简单的 MLP 层取代了原始模型中复杂的解码器层。这一修改旨在提高推理速度，方便单通道序列生成，有效缓解误差累积效应问题。在长期预测方面，MEAformer 在涵盖能源、交通、经济、天气和疾病等五个现实世界领域的六个基准数据集上取得了最先进的性能。代码见：https://github.com/huangsiyuan924/MEAformer。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.