A comparative analysis of PPO and SAC algorithms for energy optimization with country-level energy consumption insights

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS
Enes Bajrami, Andrea Kulakov, Eftim Zdravevski, Petre Lameski
{"title":"A comparative analysis of PPO and SAC algorithms for energy optimization with country-level energy consumption insights","authors":"Enes Bajrami,&nbsp;Andrea Kulakov,&nbsp;Eftim Zdravevski,&nbsp;Petre Lameski","doi":"10.1016/j.ifacsc.2025.100344","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.</div></div><div><h3>Methodology:</h3><div>Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.</div></div><div><h3>Significant findings:</h3><div>SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"34 ","pages":"Article 100344"},"PeriodicalIF":1.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background:

This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.

Methodology:

Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.

Significant findings:

SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.
能源优化的PPO和SAC算法与国家级能源消耗洞察的比较分析
背景:本研究利用深度强化学习解决了全国范围的能源优化问题。与以往依赖于模拟环境或合成数据集的工作不同,本研究整合了现实世界的能源指标,包括发电量、温室气体排放、可再生能源份额、化石燃料依赖和石油消耗。这些指标来自世界能源消费数据集,涵盖了发达和发展中国家的能源系统,从而能够在不同背景下评估智能控制政策。方法:两种先进的算法,近端策略优化(PPO)和软行为者批评家(SAC),在多阶段评估运行(300-3000集)中使用PyTorch实现和训练。在执行速度、行动一致性和奖励优化等关键指标上进行了比较绩效分析。第二项区域分析侧重于对比巴尔干和北欧国家,以评估高度发达和发展中国家能源基础设施之间的算法适应性。重大发现:SAC展示了卓越的计算吞吐量和策略稳定性,使其适用于实时和资源受限的环境。PPO表现出更强的行动幅度,为高影响干预提供了更自信的控制信号。两种代理在响应性和适应性方面都明显优于基于规则的基线。提出的框架通过将深度强化学习与可解释的国家级能量指标相结合,代表了一种新的贡献。未来的工作将把评估扩展到其他大陆,包括亚洲、非洲和南美洲,以评估全球可扩展性和适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IFAC Journal of Systems and Control
IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-
CiteScore
3.70
自引率
5.30%
发文量
17
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信