Ignacio Aravena;Chih-Che Sun;Ranyu Shi;Subir Majumder;Weihang Yan;Jhi-Young Joo;Le Xie;Jiyu Wang
{"title":"开放电力系统数据集和开放仿真引擎:机器学习应用综述","authors":"Ignacio Aravena;Chih-Che Sun;Ranyu Shi;Subir Majumder;Weihang Yan;Jhi-Young Joo;Le Xie;Jiyu Wang","doi":"10.1109/OAJPE.2025.3573958","DOIUrl":null,"url":null,"abstract":"A major factor behind the success of machine learning (ML) models in multiple domains is the availability and accessibility of large, labeled, and well-organized datasets for training and benchmarking. In comparison, power grid datasets face three major challenges: (i) real-world data is often restricted by regulatory constraints, privacy reasons, or security concerns, making it difficult to obtain and work with; (ii) synthetic datasets, which are created to address these limitations, often have incomplete information and are released using specialized tools, making them inaccessible to the broader community; and, (iii) input-output datasets are difficult to generate through simulation for non-experts because open-source simulators are not known outside the power system community. This survey addresses these challenges by serving as an entry point to publicly available datasets and simulators for researchers venturing in this area. We review the current landscape of open-source power network data, machine models, consumer demand profiles, renewable generation data, and inverter models. We also examine open-source power system simulators, which are crucial for generating high-quality, high-fidelity power grid datasets. We aim to provide a foundation for overcoming data scarcity and advance towards a structured web of datasets and simulators to support the development of ML for power systems.","PeriodicalId":56187,"journal":{"name":"IEEE Open Access Journal of Power and Energy","volume":"12 ","pages":"353-365"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11015807","citationCount":"0","resultStr":"{\"title\":\"Open Power System Datasets and Open Simulation Engines: A Survey Toward Machine Learning Applications\",\"authors\":\"Ignacio Aravena;Chih-Che Sun;Ranyu Shi;Subir Majumder;Weihang Yan;Jhi-Young Joo;Le Xie;Jiyu Wang\",\"doi\":\"10.1109/OAJPE.2025.3573958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major factor behind the success of machine learning (ML) models in multiple domains is the availability and accessibility of large, labeled, and well-organized datasets for training and benchmarking. In comparison, power grid datasets face three major challenges: (i) real-world data is often restricted by regulatory constraints, privacy reasons, or security concerns, making it difficult to obtain and work with; (ii) synthetic datasets, which are created to address these limitations, often have incomplete information and are released using specialized tools, making them inaccessible to the broader community; and, (iii) input-output datasets are difficult to generate through simulation for non-experts because open-source simulators are not known outside the power system community. This survey addresses these challenges by serving as an entry point to publicly available datasets and simulators for researchers venturing in this area. We review the current landscape of open-source power network data, machine models, consumer demand profiles, renewable generation data, and inverter models. We also examine open-source power system simulators, which are crucial for generating high-quality, high-fidelity power grid datasets. We aim to provide a foundation for overcoming data scarcity and advance towards a structured web of datasets and simulators to support the development of ML for power systems.\",\"PeriodicalId\":56187,\"journal\":{\"name\":\"IEEE Open Access Journal of Power and Energy\",\"volume\":\"12 \",\"pages\":\"353-365\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11015807\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Access Journal of Power and Energy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11015807/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Access Journal of Power and Energy","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11015807/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
Open Power System Datasets and Open Simulation Engines: A Survey Toward Machine Learning Applications
A major factor behind the success of machine learning (ML) models in multiple domains is the availability and accessibility of large, labeled, and well-organized datasets for training and benchmarking. In comparison, power grid datasets face three major challenges: (i) real-world data is often restricted by regulatory constraints, privacy reasons, or security concerns, making it difficult to obtain and work with; (ii) synthetic datasets, which are created to address these limitations, often have incomplete information and are released using specialized tools, making them inaccessible to the broader community; and, (iii) input-output datasets are difficult to generate through simulation for non-experts because open-source simulators are not known outside the power system community. This survey addresses these challenges by serving as an entry point to publicly available datasets and simulators for researchers venturing in this area. We review the current landscape of open-source power network data, machine models, consumer demand profiles, renewable generation data, and inverter models. We also examine open-source power system simulators, which are crucial for generating high-quality, high-fidelity power grid datasets. We aim to provide a foundation for overcoming data scarcity and advance towards a structured web of datasets and simulators to support the development of ML for power systems.