A comparative analysis of PPO and SAC algorithms for energy optimization with country-level energy consumption insights

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS

IFAC Journal of Systems and Control Pub Date : 2025-10-03 DOI:10.1016/j.ifacsc.2025.100344

Enes Bajrami, Andrea Kulakov, Eftim Zdravevski, Petre Lameski

{"title":"A comparative analysis of PPO and SAC algorithms for energy optimization with country-level energy consumption insights","authors":"Enes Bajrami, Andrea Kulakov, Eftim Zdravevski, Petre Lameski","doi":"10.1016/j.ifacsc.2025.100344","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.</div></div><div><h3>Methodology:</h3><div>Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.</div></div><div><h3>Significant findings:</h3><div>SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"34 ","pages":"Article 100344"},"PeriodicalIF":1.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background:

This study addresses national-scale energy optimization using deep reinforcement learning. Unlike prior works that rely on simulated environments or synthetic datasets, this research integrates real-world energy indicators, including electricity generation, greenhouse gas emissions, renewable energy share, fossil fuel dependency, and oil consumption. These indicators, sourced from the World Energy Consumption dataset, capture both developed and developing energy systems, enabling the evaluation of intelligent control policies across diverse contexts.

Methodology:

Two advanced algorithms, Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC), were implemented and trained using PyTorch across multi-phase evaluation runs (300–3000 episodes). Comparative performance analysis was conducted on key metrics: execution speed, action consistency, and reward optimization. A secondary regional analysis focused on contrasting the Balkan and Nordic countries to evaluate algorithm adaptability between highly developed and developing energy infrastructures.

Significant findings:

SAC demonstrated superior computational throughput and policy stability, making it suitable for real-time and resource-constrained environments. PPO exhibited stronger action magnitudes, enabling more assertive control signals for high-impact interventions. Both agents significantly outperformed a rule-based baseline in responsiveness and adaptability. The proposed framework represents a novel contribution by combining deep reinforcement learning with interpretable, country-level energy indicators. Future work will extend the evaluation to additional continents, including Asia, Africa, and South America, to assess global scalability and applicability.

查看原文本刊更多论文

能源优化的PPO和SAC算法与国家级能源消耗洞察的比较分析

背景：本研究利用深度强化学习解决了全国范围的能源优化问题。与以往依赖于模拟环境或合成数据集的工作不同，本研究整合了现实世界的能源指标，包括发电量、温室气体排放、可再生能源份额、化石燃料依赖和石油消耗。这些指标来自世界能源消费数据集，涵盖了发达和发展中国家的能源系统，从而能够在不同背景下评估智能控制政策。方法：两种先进的算法，近端策略优化（PPO）和软行为者批评家（SAC），在多阶段评估运行（300-3000集）中使用PyTorch实现和训练。在执行速度、行动一致性和奖励优化等关键指标上进行了比较绩效分析。第二项区域分析侧重于对比巴尔干和北欧国家，以评估高度发达和发展中国家能源基础设施之间的算法适应性。重大发现：SAC展示了卓越的计算吞吐量和策略稳定性，使其适用于实时和资源受限的环境。PPO表现出更强的行动幅度，为高影响干预提供了更自信的控制信号。两种代理在响应性和适应性方面都明显优于基于规则的基线。提出的框架通过将深度强化学习与可解释的国家级能量指标相结合，代表了一种新的贡献。未来的工作将把评估扩展到其他大陆，包括亚洲、非洲和南美洲，以评估全球可扩展性和适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-

CiteScore

3.70

自引率

5.30%

发文量