An autonomous pharmaceutical supply chain network for resilience and optimization: A multi-agent deep reinforcement learning framework
DOI:
https://doi.org/10.31181/jdaic10002042026mKeywords:
pharmaceutical supply chain, multi-agent reinforcement learning, demand forecasting, inventory optimization, TimeGAN, resilience, cold-chain logisticsAbstract
The pharmaceutical supply chain must sustain high service levels under demand shocks, cold-chain constraints, and data-sharing limitations. However, many studies still separate forecasting from replenishment, rely on static policies, or evaluate resilience without a clearly specified control model. This paper develops an Autonomous Pharmaceutical Supply Chain Network (APSCN) that couples demand forecasting, synthetic time-series generation, and multi-agent deep reinforcement learning for resilient inventory control. The case study represents an anonymized regional United States vaccine distribution network with one central manufacturing hub and ten regional distribution centers operating over a 52-week horizon. A long short-term memory (LSTM) forecaster and an extreme gradient boosting (XGBoost) benchmark are trained on historical demand and time-series generative adversarial network (TimeGAN)-augmented disruption scenarios informed by a susceptible-infected-recovered (SIR) epidemic signal. Inventory decisions are then coordinated by multi-agent deep deterministic policy gradient (MADDPG) agents under explicit shelf-life, service-level, and cost constraints. The paper contributes a formal optimization and reward framework for cold-chain pharmaceutical logistics, a richer case-study description linking geography, network topology, and disruption design, and a comparative evaluation against autoregressive integrated moving average (ARIMA), Prophet, XGBoost, static rules, and single-agent reinforcement learning. In the simulated disruption scenario, the proposed LSTM + MADDPG configuration reduces mean absolute percentage error from 11.7% to 6.0%, lowers total cost from $1.25M to $0.85M, maintains a 99.1% service level, eliminates stockout incidents, and shortens recovery time from 28 to 4 days. The findings indicate that autonomous, decentralized control can materially improve both efficiency and resilience in pharmaceutical distribution networks.
Downloads
References
Al-Hourani, S., & Weraikat, D. (2025). A systematic review of artificial intelligence (AI) and machine learning (ML) in pharmaceutical supply chain (PSC) resilience: Current trends and future directions. Sustainability, 17(14), 6591.
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. Holden-Day.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). New York: Association for Computing Machinery.
Choi, T. Y., Dooley, K. J., & Rungtusanatham, M. (2001). Supply networks and complex adaptive systems: Control versus emergence. Journal of Operations Management, 19(3), 351-366.
Dehaybe, H., Catanzaro, D., & Chevalier, P. (2024). Deep reinforcement learning for inventory optimization with non-stationary uncertain demand. European Journal of Operational Research, 314(2), 433-445.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Hu, J., Xia, L., Huang, T., & Wu, H. (2025). A multi-agent deep reinforcement learning approach for multi-echelon inventory optimization and its application to the beer game. Transportation Research Part E: Logistics and Transportation Review, 203, 104367.
Ivanov, D. (2018). Revealing interfaces of supply chain resilience and sustainability: A simulation study. International Journal of Production Research, 56(10), 3507-3523.
Jafarian, M., Mahdavi, I., Tajdin, A., & Tirkolaee, E. B. (2025). A multi-stage machine learning model to design a sustainable-resilient-digitalized pharmaceutical supply chain. Socio-Economic Planning Sciences, 98, 102165.
Kaur, A., & Prakash, G. (2025). Intelligent inventory management: AI-driven solution for the pharmaceutical supply chain. Societal Impacts, 5, 100109.
Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings A, 115(772), 700-721.
Kumar, V., Goodarzian, F., Ghasemi, P., Chan, F. T. S., & Gupta, N. (2025). Artificial intelligence applications in healthcare supply chain networks under disaster conditions. International Journal of Production Research, 63(2), 395-403.
Lei, C., Zhang, H., Wang, Z., & Miao, Q. (2025). Deep learning for demand forecasting: A framework incorporating variational mode decomposition and attention mechanism. Processes, 13(2), 594.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R, Vishwanathan, S., & Garnett, R. (eds.), Advances in Neural Information Processing Systems, vol. 30 (pp. 6379-6390). Long Beach, USA: Curran Associates.
Lu, X., Wang, H., Peng, Z., Liao, C., & Liu, C. (2025). Dynamic optimization of multi-echelon supply chain inventory policies under disruptive scenarios: A deep reinforcement learning approach. Symmetry, 17(12), 2078.
Papalexi, M., Vafadarnikjoo, A., Bamford, D., & Dehe, B. (2026). Developing pharmaceutical supply chain resilient capabilities: The role of Industry 4.0 technologies. Supply Chain Management: An International Journal, 31(7), 1-20.
Pettit, T. J., Croxton, K. L., & Fiksel, J. (2019). The evolution of resilience in supply chain management: A retrospective on ensuring supply chain resilience. Journal of Business Logistics, 40(1), 56-65.
Riachy, C., He, M., Joneidy, S., Qin, S., Payne, T., Boulton, G., Occhipinti, A., & Angione, C. (2025). Enhancing deep learning for demand forecasting to address large data gaps. Expert Systems with Applications, 268, 126200.
Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345-383.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
Vlachos, I., & Reddy, P. G. (2025). Machine learning in supply chain management: Systematic literature review and future research agenda. International Journal of Production Research, 63(16), 5987-6016.
Yang, Y., Wang, M., Wang, J., Li, P., & Zhou, M. (2025). Multi-agent deep reinforcement learning for integrated demand forecasting and inventory optimization in sensor-enabled retail supply chains. Sensors, 25(8), 2428.
Yoon, J., Jarrett, D., & van der Schaar, M. (2019). Time-series generative adversarial networks. In: Wallach, H, Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., & Garnett, R. (eds.), Advances in Neural Information Processing Systems, vol. 32 (pp. 5509-5519). Long Beach, USA: Curran Associates.
Zhang, B., Tan, W. J., Cai, W., & Zhang, A. N. (2024). Leveraging multi-agent reinforcement learning for digital transformation in supply chain inventory optimization. Sustainability, 16(22), 9996.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Divanshu Mittal

This work is licensed under a Creative Commons Attribution 4.0 International License.







