Adaptively Sparse Transformers Hawkes Process

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems Pub Date : 2023-08-01 DOI:10.1142/s0218488523500319

Yue Gao, Jian-Wei Liu

{"title":"Adaptively Sparse Transformers Hawkes Process","authors":"Yue Gao, Jian-Wei Liu","doi":"10.1142/s0218488523500319","DOIUrl":null,"url":null,"abstract":"Nowadays, many sequences of events are generated in areas as diverse as healthcare, finance, and social network. People have been studying these data for a long time. They hope to predict the type and occurrence time of the next event by using relationships among events in the data. recently, with the successful application of Recurrent Neural Network (RNN) in natural language processing, it has been introduced into point process. However, RNN cannot capture the long-term dependence among events well, and self-attention can partially mitigate this problem precisely. Transformer Hawkes Process (THP) using self-attention greatly improves the performance of the Hawkes Process, but THP cannot ignore the effect of irrelevant events, which will affect the computational complexity and prediction accuracy of the model. In this paper, we propose an Adaptively Sparse Transformers Hawkes Process (ASTHP). ASTHP considers the periodicity and nonlinearity of event time in the time encoding process. The sparsity of the ASTHP is achieved by substituting Softmax with [Formula: see text]-entmax: [Formula: see text]-entmax is a differentiable generalization of Softmax that allows unrelated events to gain exact zero weight. By optimizing the neural network parameters, different attention heads can adaptively select sparse modes (from Softmax to Sparsemax). Compared with the existing models, ASTHP model not only ensures the prediction performance but also improves the interpretability of the model. For example, the accuracy of ASTHP model on MIMIC-II dataset is improved by nearly 3 percentage points, and the model fitting degree and stability are also improved significantly.","PeriodicalId":50283,"journal":{"name":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","volume":"19 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218488523500319","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, many sequences of events are generated in areas as diverse as healthcare, finance, and social network. People have been studying these data for a long time. They hope to predict the type and occurrence time of the next event by using relationships among events in the data. recently, with the successful application of Recurrent Neural Network (RNN) in natural language processing, it has been introduced into point process. However, RNN cannot capture the long-term dependence among events well, and self-attention can partially mitigate this problem precisely. Transformer Hawkes Process (THP) using self-attention greatly improves the performance of the Hawkes Process, but THP cannot ignore the effect of irrelevant events, which will affect the computational complexity and prediction accuracy of the model. In this paper, we propose an Adaptively Sparse Transformers Hawkes Process (ASTHP). ASTHP considers the periodicity and nonlinearity of event time in the time encoding process. The sparsity of the ASTHP is achieved by substituting Softmax with [Formula: see text]-entmax: [Formula: see text]-entmax is a differentiable generalization of Softmax that allows unrelated events to gain exact zero weight. By optimizing the neural network parameters, different attention heads can adaptively select sparse modes (from Softmax to Sparsemax). Compared with the existing models, ASTHP model not only ensures the prediction performance but also improves the interpretability of the model. For example, the accuracy of ASTHP model on MIMIC-II dataset is improved by nearly 3 percentage points, and the model fitting degree and stability are also improved significantly.

查看原文本刊更多论文

自适应稀疏变压器Hawkes过程

如今，在医疗保健、金融和社交网络等不同领域产生了许多事件序列。人们研究这些数据已经很长时间了。他们希望通过使用数据中事件之间的关系来预测下一个事件的类型和发生时间。近年来，随着递归神经网络(RNN)在自然语言处理中的成功应用，它已被引入到点处理中。然而，RNN不能很好地捕获事件之间的长期依赖关系，而自关注可以部分地缓解这一问题。变压器霍克斯过程(Transformer Hawkes Process, THP)采用自注意方法，大大提高了霍克斯过程的性能，但不能忽视不相关事件的影响，影响模型的计算复杂度和预测精度。本文提出了一种自适应稀疏变压器Hawkes过程(ASTHP)。在时间编码过程中考虑了事件时间的周期性和非线性。ASTHP的稀疏性是通过用[公式:参见文本]-entmax:[公式:参见文本]-entmax代替Softmax来实现的，entmax是Softmax的可微分泛化，它允许不相关的事件获得精确的零权重。通过优化神经网络参数，不同的注意头可以自适应地选择稀疏模式(从Softmax到Sparsemax)。与现有模型相比，ASTHP模型不仅保证了预测性能，而且提高了模型的可解释性。例如，在MIMIC-II数据集上，哮喘模型的精度提高了近3个百分点，模型的拟合程度和稳定性也得到了显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

2.70

自引率

0.00%

发文量

审稿时长

13.5 months

期刊介绍： The International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems is a forum for research on various methodologies for the management of imprecise, vague, uncertain or incomplete information. The aim of the journal is to promote theoretical or methodological works dealing with all kinds of methods to represent and manipulate imperfectly described pieces of knowledge, excluding results on pure mathematics or simple applications of existing theoretical results. It is published bimonthly, with worldwide distribution to researchers, engineers, decision-makers, and educators.