比较智能电表时间序列的聚类方法：研究数据集属性对性能的影响

IF 11 1区工程技术 Q1 ENERGY & FUELS

Applied Energy Pub Date : 2025-04-18 DOI:10.1016/j.apenergy.2025.125811

Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil

{"title":"比较智能电表时间序列的聚类方法：研究数据集属性对性能的影响","authors":"Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil","doi":"10.1016/j.apenergy.2025.125811","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.</div><div>This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.</div><div>Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and <span><math><mi>k</mi></math></span>-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with <span><math><mi>k</mi></math></span>-medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.</div></div>","PeriodicalId":246,"journal":{"name":"Applied Energy","volume":"391 ","pages":"Article 125811"},"PeriodicalIF":11.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance\",\"authors\":\"Luke W. Yerbury , Ricardo J.G.B. Campello , G.C. Livingston Jr , Mark Goldsworthy , Lachlan O’Neil\",\"doi\":\"10.1016/j.apenergy.2025.125811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.</div><div>This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.</div><div>Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and <span><math><mi>k</mi></math></span>-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with <span><math><mi>k</mi></math></span>-medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.</div></div>\",\"PeriodicalId\":246,\"journal\":{\"name\":\"Applied Energy\",\"volume\":\"391 \",\"pages\":\"Article 125811\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306261925005410\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306261925005410","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}

引用次数: 0

摘要

智能电表被广泛用于监测能源消耗，产生了大量高分辨率时间序列数据，但这些数据仍未得到充分利用。虽然聚类已经成为挖掘智能电表时间序列（SMTS）数据的基本工具，但尽管进行了大量的比较研究，选择合适的聚类方法仍然具有挑战性。这些研究通常依赖于有问题的方法，并且考虑的方法范围有限，经常忽略来自更广泛的时间序列聚类文献的引人注目的方法。因此，他们很难为设计自己的聚类方法的实践者提供可靠的指导。本文提出了SMTS聚类方法的综合比较框架，使用专家知情的合成数据集，强调峰值消费行为是基本的聚类概念。采用分阶段的方法，我们首先评估了31种距离度量和8种使用留一分类的表示方法，然后结合11种聚类算法检验了更适合的方法。我们进一步评估了这些组合对影响真实数据集聚类性能的关键数据集属性系统变化的鲁棒性，包括聚类平衡、噪声和异常值的存在。我们的研究结果表明，在保持幅度敏感性的同时适应局部时间变化的方法，特别是动态时间翘曲和k滑动距离，始终优于传统方法。在其他关键发现中，我们发现，当与k- medioids或使用Ward链接的分层聚类相结合时，这些方法在不同的数据集特征中表现出一致的鲁棒性，而无需仔细调整参数。这些和其他的研究结果为实践者提供了可操作的建议，并且用真实世界的数据验证表明，我们的研究结果有效地转化为实际的SMTS聚类任务。最后，我们的数据集和代码是公开的，以支持新方法和被忽视的方法的开发、评估和比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance

The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches.

This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers.

Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and

k

-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with

k

-medoids or hierarchical clustering using Ward’s linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Energy 工程技术-工程：化工

CiteScore

21.20

自引率

10.70%

发文量

1830

审稿时长

41 days

期刊介绍： Applied Energy serves as a platform for sharing innovations, research, development, and demonstrations in energy conversion, conservation, and sustainable energy systems. The journal covers topics such as optimal energy resource use, environmental pollutant mitigation, and energy process analysis. It welcomes original papers, review articles, technical notes, and letters to the editor. Authors are encouraged to submit manuscripts that bridge the gap between research, development, and implementation. The journal addresses a wide spectrum of topics, including fossil and renewable energy technologies, energy economics, and environmental impacts. Applied Energy also explores modeling and forecasting, conservation strategies, and the social and economic implications of energy policies, including climate change mitigation. It is complemented by the open-access journal Advances in Applied Energy.