Nonlinear Regression With Hierarchical Recurrent Neural Networks Under Missing Data

IEEE transactions on artificial intelligence Pub Date : 2024-03-22 DOI:10.1109/TAI.2024.3404414

S. Onur Sahin;Suleyman S. Kozat

{"title":"Nonlinear Regression With Hierarchical Recurrent Neural Networks Under Missing Data","authors":"S. Onur Sahin;Suleyman S. Kozat","doi":"10.1109/TAI.2024.3404414","DOIUrl":null,"url":null,"abstract":"We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10536892/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.

查看原文本刊更多论文

缺失数据下的分层递归神经网络非线性回归

我们研究的是序列数据的回归（或预测）问题，这些数据可能存在条目缺失和/或长度不同的情况。机器学习文献对这一问题进行了大量研究，因为由于数据损坏、测量误差等类似原因，这种缺失在大多数现实应用中都很常见。为此，我们引入了一种涉及一组长短期记忆（LSTM）网络的新型分层架构，该架构只使用序列中的现有输入，而不对缺失数据进行任何推算或统计假设。为了纳入缺失信息，我们根据之前输入的 "存在模式"，以分层方式将输入空间划分为不同区域，然后为这些区域分配不同的 LSTM 网络。从这个意义上说，我们将 LSTM 网络作为这些区域的专家，并自适应地组合它们的输出来生成我们的最终输出。我们的方法是通用的，因此可以定制 LSTM 网络建模的分区（存在模式）集，也可以随时使用其他序列架构，如文章中所示的门控递归单元 (GRU) 网络和递归神经网络 (RNN)。我们还提供了所提架构的计算复杂度分析，其计算复杂度与传统 LSTM 架构的计算复杂度相同。在实验中，与最先进的方法相比，我们的算法在著名的金融和现实生活数据集上取得了显著的性能提升。我们还分享了我们算法的源代码，以促进其他研究和我们成果的可复制性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量