Odyssey:一个支持时间序列集群旅程的引擎

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI:10.14778/3611540.3611622

John Paparrizos, Sai Prasanna Teja Reddy

{"title":"Odyssey:一个支持时间序列集群旅程的引擎","authors":"John Paparrizos, Sai Prasanna Teja Reddy","doi":"10.14778/3611540.3611622","DOIUrl":null,"url":null,"abstract":"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":3.3000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Odyssey: An Engine Enabling the Time-Series Clustering Journey\",\"authors\":\"John Paparrizos, Sai Prasanna Teja Reddy\",\"doi\":\"10.14778/3611540.3611622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.\",\"PeriodicalId\":54220,\"journal\":{\"name\":\"Proceedings of the Vldb Endowment\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Vldb Endowment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3611540.3611622\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611622","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

摘要

聚类是最流行的时间序列任务之一，因为它支持无监督的数据探索，并且通常用作其他任务的子例程或预处理步骤。尽管几十年来一直是跨学科研究的活跃主题，但只有有限的努力集中在时间序列的基准聚类方法上。不幸的是，这些研究(i)忽略了流行的方法和整个类别的方法;(ii)考虑了基础距离测量的有限选择;(iii)对少数数据集进行评估;或者(iv)避免对研究结果进行严格的统计验证。此外，突然出现的热情和最近提出的深度学习方法强调了对全面研究的迫切需要。基于上述限制，我们提出了Odyssey，一个模块化和可扩展的web引擎，以全面评估80种时间序列聚类方法，涵盖数据挖掘，机器学习和深度学习文献中的9个不同类别。Odyssey能够对128个不同的时间序列数据集进行严格的统计分析。通过它的交互界面，Odyssey (i)揭示了每个类的最佳执行方法;(ii)确定以前被省略的表现特别好的班级;(iii)质疑关于在聚类中使用弹性度量的主张;(iv)突出参数调整的影响;(v)揭穿深度学习方法优越性的说法。奥德赛不仅促进了在这一领域进行的最广泛的研究，而且重要的是，它揭示了一种进步的错觉，而实际上，所评估的方法都无法超越传统方法，即k -Shape，并具有统计学上的显著差异。总的来说，Odyssey为时间序列集群技术的发展奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Odyssey: An Engine Enabling the Time-Series Clustering Journey

Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Vldb Endowment Computer Science-General Computer Science

CiteScore

7.70

自引率

0.00%

发文量

期刊介绍： The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.