Adaptively Sparse Transformers Hawkes Process

IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yue Gao, Jian-Wei Liu
{"title":"Adaptively Sparse Transformers Hawkes Process","authors":"Yue Gao, Jian-Wei Liu","doi":"10.1142/s0218488523500319","DOIUrl":null,"url":null,"abstract":"Nowadays, many sequences of events are generated in areas as diverse as healthcare, finance, and social network. People have been studying these data for a long time. They hope to predict the type and occurrence time of the next event by using relationships among events in the data. recently, with the successful application of Recurrent Neural Network (RNN) in natural language processing, it has been introduced into point process. However, RNN cannot capture the long-term dependence among events well, and self-attention can partially mitigate this problem precisely. Transformer Hawkes Process (THP) using self-attention greatly improves the performance of the Hawkes Process, but THP cannot ignore the effect of irrelevant events, which will affect the computational complexity and prediction accuracy of the model. In this paper, we propose an Adaptively Sparse Transformers Hawkes Process (ASTHP). ASTHP considers the periodicity and nonlinearity of event time in the time encoding process. The sparsity of the ASTHP is achieved by substituting Softmax with [Formula: see text]-entmax: [Formula: see text]-entmax is a differentiable generalization of Softmax that allows unrelated events to gain exact zero weight. By optimizing the neural network parameters, different attention heads can adaptively select sparse modes (from Softmax to Sparsemax). Compared with the existing models, ASTHP model not only ensures the prediction performance but also improves the interpretability of the model. For example, the accuracy of ASTHP model on MIMIC-II dataset is improved by nearly 3 percentage points, and the model fitting degree and stability are also improved significantly.","PeriodicalId":50283,"journal":{"name":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","volume":"19 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218488523500319","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays, many sequences of events are generated in areas as diverse as healthcare, finance, and social network. People have been studying these data for a long time. They hope to predict the type and occurrence time of the next event by using relationships among events in the data. recently, with the successful application of Recurrent Neural Network (RNN) in natural language processing, it has been introduced into point process. However, RNN cannot capture the long-term dependence among events well, and self-attention can partially mitigate this problem precisely. Transformer Hawkes Process (THP) using self-attention greatly improves the performance of the Hawkes Process, but THP cannot ignore the effect of irrelevant events, which will affect the computational complexity and prediction accuracy of the model. In this paper, we propose an Adaptively Sparse Transformers Hawkes Process (ASTHP). ASTHP considers the periodicity and nonlinearity of event time in the time encoding process. The sparsity of the ASTHP is achieved by substituting Softmax with [Formula: see text]-entmax: [Formula: see text]-entmax is a differentiable generalization of Softmax that allows unrelated events to gain exact zero weight. By optimizing the neural network parameters, different attention heads can adaptively select sparse modes (from Softmax to Sparsemax). Compared with the existing models, ASTHP model not only ensures the prediction performance but also improves the interpretability of the model. For example, the accuracy of ASTHP model on MIMIC-II dataset is improved by nearly 3 percentage points, and the model fitting degree and stability are also improved significantly.
自适应稀疏变压器Hawkes过程
如今,在医疗保健、金融和社交网络等不同领域产生了许多事件序列。人们研究这些数据已经很长时间了。他们希望通过使用数据中事件之间的关系来预测下一个事件的类型和发生时间。近年来,随着递归神经网络(RNN)在自然语言处理中的成功应用,它已被引入到点处理中。然而,RNN不能很好地捕获事件之间的长期依赖关系,而自关注可以部分地缓解这一问题。变压器霍克斯过程(Transformer Hawkes Process, THP)采用自注意方法,大大提高了霍克斯过程的性能,但不能忽视不相关事件的影响,影响模型的计算复杂度和预测精度。本文提出了一种自适应稀疏变压器Hawkes过程(ASTHP)。在时间编码过程中考虑了事件时间的周期性和非线性。ASTHP的稀疏性是通过用[公式:参见文本]-entmax:[公式:参见文本]-entmax代替Softmax来实现的,entmax是Softmax的可微分泛化,它允许不相关的事件获得精确的零权重。通过优化神经网络参数,不同的注意头可以自适应地选择稀疏模式(从Softmax到Sparsemax)。与现有模型相比,ASTHP模型不仅保证了预测性能,而且提高了模型的可解释性。例如,在MIMIC-II数据集上,哮喘模型的精度提高了近3个百分点,模型的拟合程度和稳定性也得到了显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.70
自引率
0.00%
发文量
48
审稿时长
13.5 months
期刊介绍: The International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems is a forum for research on various methodologies for the management of imprecise, vague, uncertain or incomplete information. The aim of the journal is to promote theoretical or methodological works dealing with all kinds of methods to represent and manipulate imperfectly described pieces of knowledge, excluding results on pure mathematics or simple applications of existing theoretical results. It is published bimonthly, with worldwide distribution to researchers, engineers, decision-makers, and educators.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信