针对不平衡和不完整时间序列数据的深度回归建模

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-03-19 DOI:10.1109/TETCI.2024.3372435

Murtadha D. Hssayeni;Behnaz Ghoraani

{"title":"针对不平衡和不完整时间序列数据的深度回归建模","authors":"Murtadha D. Hssayeni;Behnaz Ghoraani","doi":"10.1109/TETCI.2024.3372435","DOIUrl":null,"url":null,"abstract":"During the collection of time-series data, many reasons lead to imbalanced and incomplete datasets. Consequently, it becomes challenging to develop deep convolutional models without suffering from overfitting. Our objective in this paper was to investigate an emerging but rather underutilized framework of Conditional Generative Adversarial Networks (cGANs) for improving deep regression models for time-series data with an imbalanced and incomplete distribution. First, we investigated the potential of using a vanilla cGAN as a data imputation to improve the generalizability of the developed models to unseen data in such datasets. Next, we proposed a modified cGAN architecture with improved extrapolation and generalizability of the regression models. Our investigations used an imbalanced synthetic non-stationary dataset, a real-world dataset in Parkinson's disease (PD) application domain, and one publicly-available dataset for Negative Affect (NA) estimation. We found that vanilla cGAN failed to generate realistic time-series data due to severe mode collapse, limiting its application as a data imputation for imbalanced and incomplete data. Importantly, the proposed cGAN framework significantly improved extrapolation and generalizability for the prediction of regression scores with an average improvement of 56%, 34%, and 18%, respectively, in mean absolute error for the synthetic, PD, and NA datasets when compared with traditional Convolutional Neural Networks. The codes are publicly available on Github.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 6","pages":"3767-3778"},"PeriodicalIF":5.3000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Regression Modeling for Imbalanced and Incomplete Time-Series Data\",\"authors\":\"Murtadha D. Hssayeni;Behnaz Ghoraani\",\"doi\":\"10.1109/TETCI.2024.3372435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the collection of time-series data, many reasons lead to imbalanced and incomplete datasets. Consequently, it becomes challenging to develop deep convolutional models without suffering from overfitting. Our objective in this paper was to investigate an emerging but rather underutilized framework of Conditional Generative Adversarial Networks (cGANs) for improving deep regression models for time-series data with an imbalanced and incomplete distribution. First, we investigated the potential of using a vanilla cGAN as a data imputation to improve the generalizability of the developed models to unseen data in such datasets. Next, we proposed a modified cGAN architecture with improved extrapolation and generalizability of the regression models. Our investigations used an imbalanced synthetic non-stationary dataset, a real-world dataset in Parkinson's disease (PD) application domain, and one publicly-available dataset for Negative Affect (NA) estimation. We found that vanilla cGAN failed to generate realistic time-series data due to severe mode collapse, limiting its application as a data imputation for imbalanced and incomplete data. Importantly, the proposed cGAN framework significantly improved extrapolation and generalizability for the prediction of regression scores with an average improvement of 56%, 34%, and 18%, respectively, in mean absolute error for the synthetic, PD, and NA datasets when compared with traditional Convolutional Neural Networks. The codes are publicly available on Github.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":\"8 6\",\"pages\":\"3767-3778\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10475374/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10475374/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在收集时间序列数据的过程中，很多原因会导致数据集的不平衡和不完整。因此，在不出现过拟合的情况下开发深度卷积模型变得极具挑战性。我们在本文中的目标是研究条件生成对抗网络（cGANs）这一新兴但未得到充分利用的框架，以改进具有不平衡和不完整分布的时间序列数据的深度回归模型。首先，我们研究了使用普通 cGAN 作为数据估算的潜力，以提高所开发模型对此类数据集中未见数据的泛化能力。接下来，我们提出了一种改进的 cGAN 架构，该架构改进了回归模型的外推和泛化能力。我们的研究使用了一个不平衡的合成非稳态数据集、一个帕金森病（PD）应用领域的真实数据集和一个公开的负情感（NA）估计数据集。我们发现，vanilla cGAN 由于严重的模式崩溃而无法生成真实的时间序列数据，这限制了它作为不平衡和不完整数据的数据估算的应用。重要的是，与传统卷积神经网络相比，所提出的 cGAN 框架显著提高了回归分数预测的外推和泛化能力，在合成、PD 和 NA 数据集上的平均绝对误差分别提高了 56%、34% 和 18%。代码可在 Github 上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Regression Modeling for Imbalanced and Incomplete Time-Series Data

During the collection of time-series data, many reasons lead to imbalanced and incomplete datasets. Consequently, it becomes challenging to develop deep convolutional models without suffering from overfitting. Our objective in this paper was to investigate an emerging but rather underutilized framework of Conditional Generative Adversarial Networks (cGANs) for improving deep regression models for time-series data with an imbalanced and incomplete distribution. First, we investigated the potential of using a vanilla cGAN as a data imputation to improve the generalizability of the developed models to unseen data in such datasets. Next, we proposed a modified cGAN architecture with improved extrapolation and generalizability of the regression models. Our investigations used an imbalanced synthetic non-stationary dataset, a real-world dataset in Parkinson's disease (PD) application domain, and one publicly-available dataset for Negative Affect (NA) estimation. We found that vanilla cGAN failed to generate realistic time-series data due to severe mode collapse, limiting its application as a data imputation for imbalanced and incomplete data. Importantly, the proposed cGAN framework significantly improved extrapolation and generalizability for the prediction of regression scores with an average improvement of 56%, 34%, and 18%, respectively, in mean absolute error for the synthetic, PD, and NA datasets when compared with traditional Convolutional Neural Networks. The codes are publicly available on Github.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.