{"title":"Carbon trading supply chain management based on constrained deep reinforcement learning","authors":"Qinghao Wang, Yaodong Yang","doi":"10.1007/s10458-024-09669-2","DOIUrl":null,"url":null,"abstract":"<div><p>The issue of carbon emissions is a critical global concern, and how to effectively reduce energy consumption and emissions is a challenge faced by the industrial sector, which is highly emphasized in supply chain management. The complexity arises from the intricate coupling mechanism between carbon trading and ordering. T he large-scale state space involved and various constraints make cost optimization difficult. Carbon quota constraints and sequential decision-making exacerbate the challenges for businesses. Existing research implements rule-based and heuristic numerical simulation, which struggles to adapt to time-varying environments. We develop a unified framework from the perspective of Constrained Markov Decision Processes (CMDP). Constrained Deep Reinforcement Learning (DRL) with its powerful high-dimensional representations of neural networks and effective decision-making capabilities under constraints, provides a potential solution for supply chain management that includes carbon trading. DRL with constraints is a crucial tool to study cost optimization for enterprises. This paper constructs a DRL algorithm for Double Order based on PPO-Lagrangian (DOPPOL), aimed at addressing a supply chain management model that integrates carbon trading decisions and ordering decisions. The results indicate that businesses can optimize both business and carbon costs, thereby increasing overall profits, as well as adapt to various demand uncertainties. DOPPOL outperforms the traditional method (<i>s</i>, <i>S</i>) in fluctuating demand scenarios. By introducing carbon trading, enterprises are able to adjust supply chain orders and carbon emissions through interaction, and improve operational efficiency. Finally, we emphasize the significant role of carbon pricing in enterprise contracts in terms of profitability, as reasonable prices can help control carbon emissions and reduce costs. Our research is of great importance in achieving climate change control, as well as promoting sustainability.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Agents and Multi-Agent Systems","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10458-024-09669-2","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The issue of carbon emissions is a critical global concern, and how to effectively reduce energy consumption and emissions is a challenge faced by the industrial sector, which is highly emphasized in supply chain management. The complexity arises from the intricate coupling mechanism between carbon trading and ordering. T he large-scale state space involved and various constraints make cost optimization difficult. Carbon quota constraints and sequential decision-making exacerbate the challenges for businesses. Existing research implements rule-based and heuristic numerical simulation, which struggles to adapt to time-varying environments. We develop a unified framework from the perspective of Constrained Markov Decision Processes (CMDP). Constrained Deep Reinforcement Learning (DRL) with its powerful high-dimensional representations of neural networks and effective decision-making capabilities under constraints, provides a potential solution for supply chain management that includes carbon trading. DRL with constraints is a crucial tool to study cost optimization for enterprises. This paper constructs a DRL algorithm for Double Order based on PPO-Lagrangian (DOPPOL), aimed at addressing a supply chain management model that integrates carbon trading decisions and ordering decisions. The results indicate that businesses can optimize both business and carbon costs, thereby increasing overall profits, as well as adapt to various demand uncertainties. DOPPOL outperforms the traditional method (s, S) in fluctuating demand scenarios. By introducing carbon trading, enterprises are able to adjust supply chain orders and carbon emissions through interaction, and improve operational efficiency. Finally, we emphasize the significant role of carbon pricing in enterprise contracts in terms of profitability, as reasonable prices can help control carbon emissions and reduce costs. Our research is of great importance in achieving climate change control, as well as promoting sustainability.
期刊介绍:
This is the official journal of the International Foundation for Autonomous Agents and Multi-Agent Systems. It provides a leading forum for disseminating significant original research results in the foundations, theory, development, analysis, and applications of autonomous agents and multi-agent systems. Coverage in Autonomous Agents and Multi-Agent Systems includes, but is not limited to:
Agent decision-making architectures and their evaluation, including: cognitive models; knowledge representation; logics for agency; ontological reasoning; planning (single and multi-agent); reasoning (single and multi-agent)
Cooperation and teamwork, including: distributed problem solving; human-robot/agent interaction; multi-user/multi-virtual-agent interaction; coalition formation; coordination
Agent communication languages, including: their semantics, pragmatics, and implementation; agent communication protocols and conversations; agent commitments; speech act theory
Ontologies for agent systems, agents and the semantic web, agents and semantic web services, Grid-based systems, and service-oriented computing
Agent societies and societal issues, including: artificial social systems; environments, organizations and institutions; ethical and legal issues; privacy, safety and security; trust, reliability and reputation
Agent-based system development, including: agent development techniques, tools and environments; agent programming languages; agent specification or validation languages
Agent-based simulation, including: emergent behavior; participatory simulation; simulation techniques, tools and environments; social simulation
Agreement technologies, including: argumentation; collective decision making; judgment aggregation and belief merging; negotiation; norms
Economic paradigms, including: auction and mechanism design; bargaining and negotiation; economically-motivated agents; game theory (cooperative and non-cooperative); social choice and voting
Learning agents, including: computational architectures for learning agents; evolution, adaptation; multi-agent learning.
Robotic agents, including: integrated perception, cognition, and action; cognitive robotics; robot planning (including action and motion planning); multi-robot systems.
Virtual agents, including: agents in games and virtual environments; companion and coaching agents; modeling personality, emotions; multimodal interaction; verbal and non-verbal expressiveness
Significant, novel applications of agent technology
Comprehensive reviews and authoritative tutorials of research and practice in agent systems
Comprehensive and authoritative reviews of books dealing with agents and multi-agent systems.