Unsupervised Classification of Categorical Time Series through Innovative Distances

Proceedings of the 4th International Conference on Statistics: Theory and Applications Pub Date : 2022-08-01 DOI:10.11159/icsta22.111

Ángel López-Oriona, J. A. Vilar, P. D’Urso

{"title":"Unsupervised Classification of Categorical Time Series through Innovative Distances","authors":"Ángel López-Oriona, J. A. Vilar, P. D’Urso","doi":"10.11159/icsta22.111","DOIUrl":null,"url":null,"abstract":"- In this paper, two novel distances for nominal time series are introduced. Both of them are based on features describing the serial dependence patterns between each pair of categories. The first dissimilarity employs the so-called association measures, whereas the second computes correlation quantities between indicator processes whose uniqueness is guaranteed from standard stationary conditions. The metrics are used to construct crisp algorithms for clustering categorical series. The approaches are able to group series generated from similar underlying stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. An extensive simulation study shows that the devised clustering algorithms outperform several alternative procedures proposed in the literature. Specifically, they achieve better results than approaches based on maximum likelihhod estimation, which take advantage of knowing the real underlying procedures. Both innovative dissimilarities could be useful for practitioners in the field of time series clustering.","PeriodicalId":325859,"journal":{"name":"Proceedings of the 4th International Conference on Statistics: Theory and Applications","volume":"06 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Statistics: Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11159/icsta22.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

- In this paper, two novel distances for nominal time series are introduced. Both of them are based on features describing the serial dependence patterns between each pair of categories. The first dissimilarity employs the so-called association measures, whereas the second computes correlation quantities between indicator processes whose uniqueness is guaranteed from standard stationary conditions. The metrics are used to construct crisp algorithms for clustering categorical series. The approaches are able to group series generated from similar underlying stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. An extensive simulation study shows that the devised clustering algorithms outperform several alternative procedures proposed in the literature. Specifically, they achieve better results than approaches based on maximum likelihhod estimation, which take advantage of knowing the real underlying procedures. Both innovative dissimilarities could be useful for practitioners in the field of time series clustering.

查看原文本刊更多论文

基于创新距离的分类时间序列无监督分类

本文介绍了标称时间序列的两种新距离。它们都基于描述每对类别之间串行依赖模式的特征。第一个不相似性采用所谓的关联度量，而第二个计算指标过程之间的相关量，其唯一性从标准平稳条件中得到保证。这些度量被用来构造清晰的分类序列聚类算法。这些方法能够对相似的底层随机过程产生的序列进行分组，对来自广泛模型的序列获得准确的结果，并且计算效率高。广泛的模拟研究表明，所设计的聚类算法优于文献中提出的几种替代程序。具体来说，它们比基于最大似然估计的方法获得更好的结果，后者利用了了解真实的底层过程的优势。这两种创新的不相似性可以为时间序列聚类领域的从业者提供有用的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 4th International Conference on Statistics: Theory and Applications

自引率

0.00%

发文量