Carbon trading supply chain management based on constrained deep reinforcement learning

IF 2 3区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Autonomous Agents and Multi-Agent Systems Pub Date : 2024-08-06 DOI:10.1007/s10458-024-09669-2

Qinghao Wang, Yaodong Yang

{"title":"Carbon trading supply chain management based on constrained deep reinforcement learning","authors":"Qinghao Wang, Yaodong Yang","doi":"10.1007/s10458-024-09669-2","DOIUrl":null,"url":null,"abstract":"<div><p>The issue of carbon emissions is a critical global concern, and how to effectively reduce energy consumption and emissions is a challenge faced by the industrial sector, which is highly emphasized in supply chain management. The complexity arises from the intricate coupling mechanism between carbon trading and ordering. T he large-scale state space involved and various constraints make cost optimization difficult. Carbon quota constraints and sequential decision-making exacerbate the challenges for businesses. Existing research implements rule-based and heuristic numerical simulation, which struggles to adapt to time-varying environments. We develop a unified framework from the perspective of Constrained Markov Decision Processes (CMDP). Constrained Deep Reinforcement Learning (DRL) with its powerful high-dimensional representations of neural networks and effective decision-making capabilities under constraints, provides a potential solution for supply chain management that includes carbon trading. DRL with constraints is a crucial tool to study cost optimization for enterprises. This paper constructs a DRL algorithm for Double Order based on PPO-Lagrangian (DOPPOL), aimed at addressing a supply chain management model that integrates carbon trading decisions and ordering decisions. The results indicate that businesses can optimize both business and carbon costs, thereby increasing overall profits, as well as adapt to various demand uncertainties. DOPPOL outperforms the traditional method (<i>s</i>, <i>S</i>) in fluctuating demand scenarios. By introducing carbon trading, enterprises are able to adjust supply chain orders and carbon emissions through interaction, and improve operational efficiency. Finally, we emphasize the significant role of carbon pricing in enterprise contracts in terms of profitability, as reasonable prices can help control carbon emissions and reduce costs. Our research is of great importance in achieving climate change control, as well as promoting sustainability.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Agents and Multi-Agent Systems","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10458-024-09669-2","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The issue of carbon emissions is a critical global concern, and how to effectively reduce energy consumption and emissions is a challenge faced by the industrial sector, which is highly emphasized in supply chain management. The complexity arises from the intricate coupling mechanism between carbon trading and ordering. T he large-scale state space involved and various constraints make cost optimization difficult. Carbon quota constraints and sequential decision-making exacerbate the challenges for businesses. Existing research implements rule-based and heuristic numerical simulation, which struggles to adapt to time-varying environments. We develop a unified framework from the perspective of Constrained Markov Decision Processes (CMDP). Constrained Deep Reinforcement Learning (DRL) with its powerful high-dimensional representations of neural networks and effective decision-making capabilities under constraints, provides a potential solution for supply chain management that includes carbon trading. DRL with constraints is a crucial tool to study cost optimization for enterprises. This paper constructs a DRL algorithm for Double Order based on PPO-Lagrangian (DOPPOL), aimed at addressing a supply chain management model that integrates carbon trading decisions and ordering decisions. The results indicate that businesses can optimize both business and carbon costs, thereby increasing overall profits, as well as adapt to various demand uncertainties. DOPPOL outperforms the traditional method (s, S) in fluctuating demand scenarios. By introducing carbon trading, enterprises are able to adjust supply chain orders and carbon emissions through interaction, and improve operational efficiency. Finally, we emphasize the significant role of carbon pricing in enterprise contracts in terms of profitability, as reasonable prices can help control carbon emissions and reduce costs. Our research is of great importance in achieving climate change control, as well as promoting sustainability.

Abstract Image

查看原文本刊更多论文

基于约束深度强化学习的碳交易供应链管理

碳排放问题是全球关注的重要问题，如何有效减少能源消耗和排放是工业部门面临的挑战，而供应链管理则是其中的重中之重。其复杂性源于碳交易与订货之间错综复杂的耦合机制。其中涉及的大规模状态空间和各种约束条件给成本优化带来了困难。碳配额约束和顺序决策加剧了企业面临的挑战。现有研究采用基于规则和启发式的数值模拟，难以适应时变环境。我们从受限马尔可夫决策过程（CMDP）的角度开发了一个统一的框架。约束深度强化学习（DRL）具有强大的神经网络高维表示和约束条件下的有效决策能力，为包括碳交易在内的供应链管理提供了潜在的解决方案。带有约束条件的 DRL 是研究企业成本优化的重要工具。本文构建了一种基于 PPO-拉格朗日（DOPPOL）的双订货 DRL 算法，旨在解决将碳交易决策与订货决策相结合的供应链管理模式。结果表明，企业可以同时优化业务成本和碳成本，从而提高整体利润，并适应各种需求不确定性。在需求波动情况下，DOPPOL优于传统方法（s，S）。通过引入碳交易，企业能够通过互动调整供应链订单和碳排放，提高运营效率。最后，我们强调了碳定价在企业合约中对盈利的重要作用，因为合理的价格有助于控制碳排放和降低成本。我们的研究对于实现气候变化控制以及促进可持续发展具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Agents and Multi-Agent Systems 工程技术-计算机：人工智能

CiteScore

6.00

自引率

5.30%

发文量

审稿时长

>12 weeks

期刊介绍： This is the official journal of the International Foundation for Autonomous Agents and Multi-Agent Systems. It provides a leading forum for disseminating significant original research results in the foundations, theory, development, analysis, and applications of autonomous agents and multi-agent systems. Coverage in Autonomous Agents and Multi-Agent Systems includes, but is not limited to: Agent decision-making architectures and their evaluation, including: cognitive models; knowledge representation; logics for agency; ontological reasoning; planning (single and multi-agent); reasoning (single and multi-agent) Cooperation and teamwork, including: distributed problem solving; human-robot/agent interaction; multi-user/multi-virtual-agent interaction; coalition formation; coordination Agent communication languages, including: their semantics, pragmatics, and implementation; agent communication protocols and conversations; agent commitments; speech act theory Ontologies for agent systems, agents and the semantic web, agents and semantic web services, Grid-based systems, and service-oriented computing Agent societies and societal issues, including: artificial social systems; environments, organizations and institutions; ethical and legal issues; privacy, safety and security; trust, reliability and reputation Agent-based system development, including: agent development techniques, tools and environments; agent programming languages; agent specification or validation languages Agent-based simulation, including: emergent behavior; participatory simulation; simulation techniques, tools and environments; social simulation Agreement technologies, including: argumentation; collective decision making; judgment aggregation and belief merging; negotiation; norms Economic paradigms, including: auction and mechanism design; bargaining and negotiation; economically-motivated agents; game theory (cooperative and non-cooperative); social choice and voting Learning agents, including: computational architectures for learning agents; evolution, adaptation; multi-agent learning. Robotic agents, including: integrated perception, cognition, and action; cognitive robotics; robot planning (including action and motion planning); multi-robot systems. Virtual agents, including: agents in games and virtual environments; companion and coaching agents; modeling personality, emotions; multimodal interaction; verbal and non-verbal expressiveness Significant, novel applications of agent technology Comprehensive reviews and authoritative tutorials of research and practice in agent systems Comprehensive and authoritative reviews of books dealing with agents and multi-agent systems.