{"title":"供应链订单分类的深度学习和策略优化方法","authors":"Ramakrishna Garine , Ripon K. Chakrabortty","doi":"10.1016/j.sca.2025.100166","DOIUrl":null,"url":null,"abstract":"<div><div>Timely delivery is a critical performance metric in supply chain management, yet achieving consistent on-time delivery has become increasingly challenging in the face of global uncertainties and complex logistics networks. Recent disruptions, such as pandemics, extreme weather events, and geopolitical conflicts, have exposed vulnerabilities in supply chains, resulting in frequent delivery delays. While traditional heuristics and simple statistical methods have proven inadequate to capture the myriad factors that contribute to delays in modern supply chains, Machine learning (ML) and Deep Learning (DL) approaches have emerged as powerful tools to improve the accuracy and reliability of delivery delay prediction. Consequently, this study presents a hybrid predictive framework that integrates DL models with Reinforcement Learning (RL) to improve binary classification of order status (on-time vs. late). We first benchmark several DL architectures, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bi-LSTM, and Stacked LSTM, enhanced with regularization and extended training epochs, alongside a fine-tuned eXtreme Gradient Boost (XGBoost) model. These models are evaluated using accuracy, precision, recall, and the F1-score, with Bi-LSTM and Stacked LSTM achieving strong generalization performance. Building on this, we deploy a Proximal Policy Optimization (PPO) agent that incorporates deep learning outputs as part of its observation space. The RL agent uses a reward-based feedback loop to improve adaptability under dynamic conditions. Experimental results show that the hybrid DL-RL model achieves superior classification accuracy and an F1-score greater than 0.99, outperforming standalone methods. Although the PPO agent alone struggled with detecting minorities due to imbalance, integrating DL features mitigated this limitation. The findings support the use of hybrid architectures for real-time order status prediction and provide a scalable pathway for intelligent supply chain decision making. Future work will address class imbalance and enhance policy robustness through cost-sensitive and explainable RL strategies.</div></div>","PeriodicalId":101186,"journal":{"name":"Supply Chain Analytics","volume":"12 ","pages":"Article 100166"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep learning and policy optimization approach for supply chain order classification\",\"authors\":\"Ramakrishna Garine , Ripon K. Chakrabortty\",\"doi\":\"10.1016/j.sca.2025.100166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Timely delivery is a critical performance metric in supply chain management, yet achieving consistent on-time delivery has become increasingly challenging in the face of global uncertainties and complex logistics networks. Recent disruptions, such as pandemics, extreme weather events, and geopolitical conflicts, have exposed vulnerabilities in supply chains, resulting in frequent delivery delays. While traditional heuristics and simple statistical methods have proven inadequate to capture the myriad factors that contribute to delays in modern supply chains, Machine learning (ML) and Deep Learning (DL) approaches have emerged as powerful tools to improve the accuracy and reliability of delivery delay prediction. Consequently, this study presents a hybrid predictive framework that integrates DL models with Reinforcement Learning (RL) to improve binary classification of order status (on-time vs. late). We first benchmark several DL architectures, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bi-LSTM, and Stacked LSTM, enhanced with regularization and extended training epochs, alongside a fine-tuned eXtreme Gradient Boost (XGBoost) model. These models are evaluated using accuracy, precision, recall, and the F1-score, with Bi-LSTM and Stacked LSTM achieving strong generalization performance. Building on this, we deploy a Proximal Policy Optimization (PPO) agent that incorporates deep learning outputs as part of its observation space. The RL agent uses a reward-based feedback loop to improve adaptability under dynamic conditions. Experimental results show that the hybrid DL-RL model achieves superior classification accuracy and an F1-score greater than 0.99, outperforming standalone methods. Although the PPO agent alone struggled with detecting minorities due to imbalance, integrating DL features mitigated this limitation. The findings support the use of hybrid architectures for real-time order status prediction and provide a scalable pathway for intelligent supply chain decision making. Future work will address class imbalance and enhance policy robustness through cost-sensitive and explainable RL strategies.</div></div>\",\"PeriodicalId\":101186,\"journal\":{\"name\":\"Supply Chain Analytics\",\"volume\":\"12 \",\"pages\":\"Article 100166\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Supply Chain Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949863525000664\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supply Chain Analytics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949863525000664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A deep learning and policy optimization approach for supply chain order classification
Timely delivery is a critical performance metric in supply chain management, yet achieving consistent on-time delivery has become increasingly challenging in the face of global uncertainties and complex logistics networks. Recent disruptions, such as pandemics, extreme weather events, and geopolitical conflicts, have exposed vulnerabilities in supply chains, resulting in frequent delivery delays. While traditional heuristics and simple statistical methods have proven inadequate to capture the myriad factors that contribute to delays in modern supply chains, Machine learning (ML) and Deep Learning (DL) approaches have emerged as powerful tools to improve the accuracy and reliability of delivery delay prediction. Consequently, this study presents a hybrid predictive framework that integrates DL models with Reinforcement Learning (RL) to improve binary classification of order status (on-time vs. late). We first benchmark several DL architectures, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bi-LSTM, and Stacked LSTM, enhanced with regularization and extended training epochs, alongside a fine-tuned eXtreme Gradient Boost (XGBoost) model. These models are evaluated using accuracy, precision, recall, and the F1-score, with Bi-LSTM and Stacked LSTM achieving strong generalization performance. Building on this, we deploy a Proximal Policy Optimization (PPO) agent that incorporates deep learning outputs as part of its observation space. The RL agent uses a reward-based feedback loop to improve adaptability under dynamic conditions. Experimental results show that the hybrid DL-RL model achieves superior classification accuracy and an F1-score greater than 0.99, outperforming standalone methods. Although the PPO agent alone struggled with detecting minorities due to imbalance, integrating DL features mitigated this limitation. The findings support the use of hybrid architectures for real-time order status prediction and provide a scalable pathway for intelligent supply chain decision making. Future work will address class imbalance and enhance policy robustness through cost-sensitive and explainable RL strategies.