Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI:10.1109/CIDM.2011.5949447

F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein

{"title":"Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data","authors":"F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein","doi":"10.1109/CIDM.2011.5949447","DOIUrl":null,"url":null,"abstract":"One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2011.5949447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.

查看原文本刊更多论文

处理包含缺失数据的不对齐、不均匀采样时间序列的计算智能方法

在采集过程中存储的数据量不断增加的一个后果是，采样时间序列更容易以不对齐的不均匀方式收集和/或部分丢失或不可用(丢失数据)。由于它们对数据挖掘技术的严重影响，本工作提出了以下方法:(a)对齐未对齐的不均匀采样数据;(b)与缺失机制造成的缺失值相比，区分与低采样频率相关的缺失值;(c)通过统计和模糊建模方法对缺失数据的可恢复和不可恢复部分进行分类。这些方法对随机模拟的测试数据集进行了评估，这些数据集包含不同数量的缺失数据。结果表明:(1)以采样频率最高的变量为模板，结合三次插值，可以在不显著误差的情况下对不均匀数据进行偏移;(2)使用相对于平均采样时间的95%置信区间，可以成功地将低采样频率导致的缺失值与真正缺失值区分开来;(3)模糊建模对可恢复段的分类效果较好，而统计方法对不可恢复段的分类效果较好。当测试数据集中缺失数据的数量增加时，本文提出的三种方法的性能都会下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

自引率

0.00%

发文量