电力消费数据集:陷阱与机遇

Christoph Klemenjak, A. Reinhardt, Lucas Pereira, S. Makonin, M. Berges, W. Elmenreich
{"title":"电力消费数据集:陷阱与机遇","authors":"Christoph Klemenjak, A. Reinhardt, Lucas Pereira, S. Makonin, M. Berges, W. Elmenreich","doi":"10.1145/3360322.3360867","DOIUrl":null,"url":null,"abstract":"Real-world data sets are crucial to develop and test signal processing and machine learning algorithms to solve energy-related problems. Their scope and data resolution is, however, often limited to the means required to fulfill the experimenters' objectives and moreover governed by personal experience, budgetary and time constraints, and the availability of equipment. As a result, numerous differences between data sets can be observed, e.g., regarding their sampling rates, the number of sensors deployed, their amplitude resolutions, storage formats, or the availability and extent of ground-truth annotations. This heterogeneity poses a significant problem for researchers intending to comparatively use data sets because of the required data conversion, re-sampling, and adaptation steps. In short, there is a lack of widely agreed best practices for designing, deploying, and operating electrical data collection systems. We address this limitation by dissecting the collection methodologies used in existing data sets. By offering recommendations for data collection, data storage, and data provision, we intend to foster the creation of data sets with increased usability and comparability, and thus a greater benefit to the community.","PeriodicalId":128826,"journal":{"name":"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Electricity Consumption Data Sets: Pitfalls and Opportunities\",\"authors\":\"Christoph Klemenjak, A. Reinhardt, Lucas Pereira, S. Makonin, M. Berges, W. Elmenreich\",\"doi\":\"10.1145/3360322.3360867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-world data sets are crucial to develop and test signal processing and machine learning algorithms to solve energy-related problems. Their scope and data resolution is, however, often limited to the means required to fulfill the experimenters' objectives and moreover governed by personal experience, budgetary and time constraints, and the availability of equipment. As a result, numerous differences between data sets can be observed, e.g., regarding their sampling rates, the number of sensors deployed, their amplitude resolutions, storage formats, or the availability and extent of ground-truth annotations. This heterogeneity poses a significant problem for researchers intending to comparatively use data sets because of the required data conversion, re-sampling, and adaptation steps. In short, there is a lack of widely agreed best practices for designing, deploying, and operating electrical data collection systems. We address this limitation by dissecting the collection methodologies used in existing data sets. By offering recommendations for data collection, data storage, and data provision, we intend to foster the creation of data sets with increased usability and comparability, and thus a greater benefit to the community.\",\"PeriodicalId\":128826,\"journal\":{\"name\":\"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3360322.3360867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3360322.3360867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

现实世界的数据集对于开发和测试信号处理和机器学习算法来解决能源相关问题至关重要。然而,它们的范围和数据分辨率往往限于实现实验人员目标所需的手段,而且还受个人经验、预算和时间限制以及设备的可用性的制约。因此,可以观察到数据集之间的许多差异,例如,关于它们的采样率,部署的传感器数量,它们的幅度分辨率,存储格式或地基真值注释的可用性和范围。由于需要数据转换、重新采样和适应步骤,这种异质性给打算比较使用数据集的研究人员带来了重大问题。简而言之,对于设计、部署和操作电气数据收集系统,缺乏广泛认可的最佳实践。我们通过剖析现有数据集中使用的收集方法来解决这一限制。通过提供有关数据收集、数据存储和数据提供的建议,我们打算促进创建具有更高可用性和可比性的数据集,从而为社区带来更大的利益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Electricity Consumption Data Sets: Pitfalls and Opportunities
Real-world data sets are crucial to develop and test signal processing and machine learning algorithms to solve energy-related problems. Their scope and data resolution is, however, often limited to the means required to fulfill the experimenters' objectives and moreover governed by personal experience, budgetary and time constraints, and the availability of equipment. As a result, numerous differences between data sets can be observed, e.g., regarding their sampling rates, the number of sensors deployed, their amplitude resolutions, storage formats, or the availability and extent of ground-truth annotations. This heterogeneity poses a significant problem for researchers intending to comparatively use data sets because of the required data conversion, re-sampling, and adaptation steps. In short, there is a lack of widely agreed best practices for designing, deploying, and operating electrical data collection systems. We address this limitation by dissecting the collection methodologies used in existing data sets. By offering recommendations for data collection, data storage, and data provision, we intend to foster the creation of data sets with increased usability and comparability, and thus a greater benefit to the community.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信