F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein
{"title":"处理包含缺失数据的不对齐、不均匀采样时间序列的计算智能方法","authors":"F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein","doi":"10.1109/CIDM.2011.5949447","DOIUrl":null,"url":null,"abstract":"One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data\",\"authors\":\"F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein\",\"doi\":\"10.1109/CIDM.2011.5949447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.\",\"PeriodicalId\":211565,\"journal\":{\"name\":\"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"volume\":\"129 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIDM.2011.5949447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2011.5949447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data
One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.