{"title":"使用领域相关属性聚类时间序列的基于树的方法","authors":"Mahsa Ashouri, G. Shmueli, Chor-yiu Sin","doi":"10.1080/2573234X.2019.1645574","DOIUrl":null,"url":null,"abstract":"ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"41 1","pages":"1 - 23"},"PeriodicalIF":1.7000,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Tree-based methods for clustering time series using domain-relevant attributes\",\"authors\":\"Mahsa Ashouri, G. Shmueli, Chor-yiu Sin\",\"doi\":\"10.1080/2573234X.2019.1645574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.\",\"PeriodicalId\":36417,\"journal\":{\"name\":\"Journal of Business Analytics\",\"volume\":\"41 1\",\"pages\":\"1 - 23\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2019-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Business Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/2573234X.2019.1645574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2019.1645574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Tree-based methods for clustering time series using domain-relevant attributes
ABSTRACT We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.