{"title":"Odyssey:一个支持时间序列集群旅程的引擎","authors":"John Paparrizos, Sai Prasanna Teja Reddy","doi":"10.14778/3611540.3611622","DOIUrl":null,"url":null,"abstract":"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Odyssey: An Engine Enabling the Time-Series Clustering Journey\",\"authors\":\"John Paparrizos, Sai Prasanna Teja Reddy\",\"doi\":\"10.14778/3611540.3611622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.\",\"PeriodicalId\":54220,\"journal\":{\"name\":\"Proceedings of the Vldb Endowment\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Vldb Endowment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3611540.3611622\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611622","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Odyssey: An Engine Enabling the Time-Series Clustering Journey
Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.
期刊介绍:
The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.