Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil
{"title":"比较智能电表时间序列的聚类方法:研究数据集属性对性能的影响","authors":"Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil","doi":"10.1016/j.apenergy.2025.125811","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.</div><div>This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.</div><div>Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and <span><math><mi>k</mi></math></span>-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with <span><math><mi>k</mi></math></span>-medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.</div></div>","PeriodicalId":246,"journal":{"name":"Applied Energy","volume":"391 ","pages":"Article 125811"},"PeriodicalIF":11.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance\",\"authors\":\"Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil\",\"doi\":\"10.1016/j.apenergy.2025.125811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.</div><div>This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.</div><div>Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and <span><math><mi>k</mi></math></span>-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with <span><math><mi>k</mi></math></span>-medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.</div></div>\",\"PeriodicalId\":246,\"journal\":{\"name\":\"Applied Energy\",\"volume\":\"391 \",\"pages\":\"Article 125811\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306261925005410\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306261925005410","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance
The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.
This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.
Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and -sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with -medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.
期刊介绍:
Applied Energy serves as a platform for sharing innovations, research, development, and demonstrations in energy conversion, conservation, and sustainable energy systems. The journal covers topics such as optimal energy resource use, environmental pollutant mitigation, and energy process analysis. It welcomes original papers, review articles, technical notes, and letters to the editor. Authors are encouraged to submit manuscripts that bridge the gap between research, development, and implementation. The journal addresses a wide spectrum of topics, including fossil and renewable energy technologies, energy economics, and environmental impacts. Applied Energy also explores modeling and forecasting, conservation strategies, and the social and economic implications of energy policies, including climate change mitigation. It is complemented by the open-access journal Advances in Applied Energy.