Enes Bajrami, Andrea Kulakov, Eftim Zdravevski, Petre Lameski
{"title":"A comparative analysis of PPO and SAC algorithms for energy optimization with country-level energy consumption insights","authors":"Enes Bajrami, Andrea Kulakov, Eftim Zdravevski, Petre Lameski","doi":"10.1016/j.ifacsc.2025.100344","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.</div></div><div><h3>Methodology:</h3><div>Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.</div></div><div><h3>Significant findings:</h3><div>SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"34 ","pages":"Article 100344"},"PeriodicalIF":1.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.
Methodology:
Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.
Significant findings:
SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.