Deep Reinforcement Learning Algorithms for Heterogeneous Multiple Knapsack Problems

doi:10.19678/j.issn.1000-3428.0070166

Abstract

Abstract:

By focusing on the traditional multi-Knapsack Problem (KP) in typical logistics system operations, this study abstracts a Heterogeneous Multiple Knapsack Problem (HMKP) and formulates an improved Deep Deterministic Policy Gradient (DDPG) algorithm to solve it. The DDPG algorithm tends to fall into a local optimum when solving the 0-1 KP. To address this issue, a Dynamic Randomization Mechanism (DRM) and Dynamic Penalty Mechanism (DPM) are adopted and embedded with an improved Transformer module to optimize the algorithm. Then, a Dynamic DDPG (TDP-DDPG) algorithm is proposed based on the improved Transformer module. First, a tabu list is added to prevent repeated searches. The TDP-DDPG algorithm demonstrates efficient search capability across several experimental algorithms, finding the ideal optimum in 39 classical algorithms in test sets 1 and 2 from low to high dimensionality, as well as in higher dimensionality test set 3 and in three out of six algorithms in large-scale test set 4. Experiments show that the TDP-DDPG algorithm has stronger optimization seeking ability after incorporating the improved strategy. Next, the BPD-DDPG algorithm is designed based on the TDP-DDPG algorithm to solve the HMKP with higher complexity and is analyzed and evaluated in high-dimensional arithmetic by combining several classical 0-1 KP examples. The results show that the BPD-DDPG algorithm is more accurate than Gurobi in three low-scale cases; however, the solution time is longer. The BPD-DDPG algorithm can efficiently solve high-dimension, large-scale HMKP at a low computational cost within an acceptable time.

Key words: Deep Reinforcement Learning (DRL), 0-1 Knapsack Problem (KP), Heterogeneous Multiple Knapsack Problem (HMKP), Transformer module, dynamic penalty mechanism, tabu list

摘要：

从传统多背包问题(KP)与典型物流系统运作场景出发, 抽象出异构多背包问题(HMKP), 并制定改进深度确定性策略梯度(DDPG)算法对HMKP进行研究和求解。针对DDPG算法在解决0-1 KP时容易陷入局部最优的缺点, 采用动态随机机制(DRM)和动态惩罚机制(DPM)对DDPG算法进行改进, 并嵌入改进Transformer模块来优化算法, 提出基于改进Transformer模块的动态深度确定性策略梯度(TDP-DDPG)算法, 并加入禁忌表防止重复搜索。TDP-DDPG算法在多个实验算例中展现了高效的搜索能力, 在由低到高维度的测试集1、2以及更高维度的测试集3中所有39个算例都能找到最优值, 在大规模测试集4的6个算例中有3个能找到最优值。实验表明, TDP-DDPG算法在融入改进策略后具备更强的寻优能力。在此基础上, 设计基于TDP-DDPG算法的BPD-DDPG算法来解决复杂度更高的HMKP, 且分别在多个经典0-1 KP算例组合而成的高维度算例中进行分析评估。结果显示BPD-DDPG算法与商业求解器Gurobi相比虽求解时间长, 但在3个低规模算例中求解准确率比Gurobi高。BPD-DDPG算法能在可接受时间范围内以低计算代价高效解决高维度、大规模的HMKP。

关键词: 深度强化学习, 0-1背包问题, 异构多背包问题, Transformer模块, 动态惩罚机制, 禁忌表

LI Bin, GUO Yi. Deep Reinforcement Learning Algorithms for Heterogeneous Multiple Knapsack Problems[J]. Computer Engineering, 2026, 52(4): 140-162.

李斌, 郭毅. 面向异构多背包问题的深度强化学习算法[J]. 计算机工程, 2026, 52(4): 140-162.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070166

https://www.ecice06.com/EN/Y2026/V52/I4/140

Figures/Tables 33

Fig.1 Solution architecture for HMKP

Fig.2 MDP process

Fig.3 DRL rationale framework

Fig.4 Actor-Critic structure diagram

Fig.5 DRM operation process

Fig.6 MSA computational architecture

Fig.7 Binary division

Fig.8 BPD-DDPG algorithmic framework

Fig.9 Comparison plot of different algorithms on kp8

Fig.10 Comparison 1 of convergence times of different algorithms

Fig.11 Comparison 2 of convergence times of different algorithms

Fig.12 Comparison plot of different algorithms on kp15

Fig.13 Comparison 3 of convergence times of different algorithms

Fig.14 Difference analysis between TDP-DDPG algorithm and DQN-baseline algorithm

Fig.15 Comparison plot of different algorithms on kp33

Fig.16 Comparison 4 of convergence times of different algorithms

Fig.17 Comparison plot of TDP-DDPG algorithms on kp45

References 49

1	DANTZIG G B. Discrete-variable extremum problems. Operations Research, 1957, 5(2): 266- 288. doi: 10.1287/opre.5.2.266
2	WANG C, PAN Q K, SANG H Y, et al. A cascaded flowshop joint scheduling problem with makespan minimization: a mathematical model and shifting iterated greedy algorithm. Swarm and Evolutionary Computation, 2024, 86, 101489. doi: 10.1016/j.swevo.2024.101489
3	SONG X R, GAO S, CHEN C B, et al. A new hybrid method in global dynamic path planning of mobile robot. International Journal of Computers Communications & Control, 2018, 13(6): 1032- 1046.
4	WU Y H, MENG X Y, ZHANG J R, et al. Effective LSTMs with seasonal-trend decomposition and adaptive learning and niching-based backtracking search algorithm for time series forecasting. Expert Systems with Applications, 2024, 236, 121202. doi: 10.1016/j.eswa.2023.121202
5	ZHANG S, LIU S Y. A discrete improved artificial bee colony algorithm for 0-1 knapsack problem. IEEE Access, 2019, 7, 104982- 104991. doi: 10.1109/ACCESS.2019.2930638
6	ÖZTVRK S, AHMAD R, AKHTAR N. Variants of artificial bee colony algorithm and its applications in medical image processing. Applied Soft Computing, 2020, 97, 106799. doi: 10.1016/j.asoc.2020.106799
7	ABDEL-BASSET M, EL-SHAHAT D, SANGAIAH A K. A modified nature inspired meta-heuristic whale optimization algorithm for solving 0-1 knapsack problem. International Journal of Machine Learning and Cybernetics, 2019, 10(3): 495- 514. doi: 10.1007/s13042-017-0731-3
8	GHAREHCHOPOGH F S, GHOLIZADEH H. A comprehensive survey: whale optimization algorithm and its applications. Swarm and Evolutionary Computation, 2019, 48, 1- 24.
9	WU L S, YOU X M, LIU S. Multi-ant colony algorithm based on cooperative game and dynamic path tracking. Computer Networks, 2023, 237, 110077. doi: 10.1016/j.comnet.2023.110077
10	WU C J, ZHOU S J, XIAO L C. Dynamic path planning based on improved ant colony algorithm in traffic congestion. IEEE Access, 2020, 8, 180773- 180783. doi: 10.1109/ACCESS.2020.3028467
11	YANG Z L. Competing leaders grey wolf optimizer and its application for training multi-layer perceptron classifier. Expert Systems with Applications, 2024, 239, 122349. doi: 10.1016/j.eswa.2023.122349
12	刘志强, 何丽, 袁亮, 等. 采用改进灰狼算法的移动机器人路径规划. 西安交通大学学报, 2022, 56(10): 49- 60.
	LIU Z, HE L, YUAN L, et al. Path planning of mobile robot based on TGWO algorithm. Journal of Xi'an Jiaotong University, 2022, 56, 49- 60.
13	杨艳, 刘生建, 周永权. 贪心二进制狮群优化算法求解多维背包问题. 计算机应用, 2020, 40(5): 1291- 1294.
	YANG Y, LIU S J, ZHOU Y Q. Greedy binary lion swarm optimization algorithm for solving multidimensional knapsack problem. Journal of Computer Applications, 2020, 40(5): 1291- 1294.
14	LIU J F, LI D F, WU Y, et al. Lion swarm optimization algorithm for comparative study with application to optimal dispatch of cascade hydropower stations. Applied Soft Computing, 2020, 87, 105974. doi: 10.1016/j.asoc.2019.105974
15	WANG Q, HAO Y S, ZHANG J W. Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems. Journal of King Saud University-Computer and Information Sciences, 2023, 35(9): 101787. doi: 10.1016/j.jksuci.2023.101787
16	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. [2024-08-01]. https://arxiv.org/abs/1611.09940.
17	KOOL W, VAN HOOF WELLING M. Attention, learn to solve routing problems![EB/OL]. [2024-08-01]. https://arxiv.org/abs/1803.08475.1.
18	HU W X, ISHIHARA H, CHEN C H, et al. Deep reinforcement learning two-way transit signal priority algorithm for optimizing headway adherence and speed. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(8): 7920- 7931. doi: 10.1109/TITS.2023.3266461
19	WANG Y, CHEN Z B. Dynamic graph Conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem. Mathematical Biosciences and Engineering, 2022, 19(10): 9730- 9748. doi: 10.3934/mbe.2022452
20	WANG Q, HAO Y S. Routing optimization with Monte Carlo tree search-based multi-agent reinforcement learning. Applied Intelligence, 2023, 53(21): 25881- 25896. doi: 10.1007/s10489-023-04881-1
21	KHALIL E, DAI H, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing System. New York, USA: ACM Press, 2017: 6351-636.
22	SUI J, DING S, LIU R, et al. Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning[C]//Proceedings of Asian Conference on Machine Learning. [S. l. ]: PMLR, 2021: 1301-1316.
23	王景熠, 朱予涵, 周茹平, 等. 基于动态粒子群优化的X结构Steiner最小树算法. 计算机工程, 2024, 50(9): 226- 234. doi: 10.19678/j.issn.1000-3428.0068408
	WANG J Y, ZHU Y H, ZHOU R P, et al. X-architecture Steiner minimum tree algorithm based on dynamic particle swarm optimization. Computer Engineering, 2024, 50(9): 226- 234. doi: 10.19678/j.issn.1000-3428.0068408
24	葛非, 闵珊, 邱含, 等. 求解时间依赖型绿色车辆路径问题的算法研究. 计算机工程, 2024, 50(4): 1- 10. doi: 10.19678/j.issn.1000-3428.0068790
	GE F, MIN S, QIU H, et al. Research on an algorithm for solving time-dependent green vehicle routing problem. Computer Engineering, 2024, 50(4): 1- 10. doi: 10.19678/j.issn.1000-3428.0068790
25	陈思远, 林丕源, 黄沛杰. 指针网络改进遗传算法求解旅行商问题. 计算机工程与应用, 2020, 56(19): 231- 236.
	CHEN S Y, LIN P Y, HUANG P J. Pointer network improved genetic algorithm for solving traveling salesmen problem. Computer Engineering and Applications, 2020, 56(19): 231- 236.
26	LI B J, WU G H, HE Y M, et al. An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. CAA Journal of Automatica Sinica, 2022, 9(7): 1115- 1138.
27	LEE E S, PARK J H, KIM M Y, et al. High efficiency integrated transformer design in DAB converters for solid-state transformers. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7147- 7160. doi: 10.1109/TVT.2022.3168561
28	朱凯, 李理, 张彤, 等. 基于Transformer的多阶段运动模糊图像修复网络. 计算机工程, 2024, 50(9): 276- 285. doi: 10.19678/j.issn.1000-3428.0068246
	ZHU K, LI L, ZHANG T, et al. Multi-stage motion blur image restoration network based on Transformer. Computer Engineering, 2024, 50(9): 276- 285. doi: 10.19678/j.issn.1000-3428.0068246
29	李斌, 唐志斌. 面向异构多背包问题的多级二进制帝国竞争算法. 计算机应用, 2023, 43(9): 2855- 2867.
	LI B, TANG Z B. Multiple level binary imperialist competitive algorithm for solving heterogeneous multiple knapsack problem. Journal of Computer Applications, 2023, 43(9): 2855- 2867.
30	李斌, 李文锋. 基于MAS的集装箱码头物流系统协同生产调度体系. 计算机集成制造系统, 2011, 17(11): 2502- 2513.
	LI B, LI W F. Container terminal logistics systems collaborative scheduling based on multi-agent systems. Computer Integrated Manufacturing Systems, 2011, 17(11): 2502- 2513.
31	ZHENG K C, JIA X L, CHI K K, et al. DDPG-based joint time and energy management in ambient backscatter-assisted hybrid underlay CRNs. IEEE Transactions on Communications, 2022, 71(1): 441- 456.
32	SYAVASYA C V S R, MUDDANA A L. Optimization of autonomous vehicle speed control mechanisms using hybrid DDPG-SHAP-DRL-stochastic algorithm. Advances in Engineering Software, 2022, 173, 103245. doi: 10.1016/j.advengsoft.2022.103245
33	XIAO H P, FU L J, SHANG C Y, et al. Ship energy scheduling with DQN-CE algorithm combining bi-directional LSTM and attention mechanism. Applied Energy, 2023, 347, 121378. doi: 10.1016/j.apenergy.2023.121378
34	ZHANG C W, ZHENG K J, TIAN Y, et al. Advertising impression resource allocation strategy with multi-level budget constraint DQN in real-time bidding. Neurocomputing, 2022, 488, 647- 656. doi: 10.1016/j.neucom.2021.11.072
35	ERVURAL B, HAKLI H. A binary reptile search algorithm based on transfer functions with a new stochastic repair method for 0-1 knapsack problems. Computers & Industrial Engineering, 2023, 178, 109080.
36	陈桢, 钟一文, 林娟. 求解0-1背包问题的混合贪婪遗传算法. 计算机应用, 2021, 41(1): 87- 94.
	CHEN Z, ZHONG Y W, LIN J. Hybrid greedy genetic algorithm for solving 0-1 knapsack problem. Journal of Computer Applications, 2021, 41(1): 87- 94.
37	刘生建, 杨艳, 周永权. 求解0-1背包问题的二进制狮群算法. 计算机工程与科学, 2019, 41(11): 2079- 2087.
	LIU S J, YANG Y, ZHOU Y Q. A binary lion warm algorithm for solving 0-1 knapsack problem. Computer Engineering & Science, 2019, 41(11): 2079- 2087.
38	罗亚波, 滕红玺. 求解0-1背包问题的牵制平衡算法. 工业工程, 2023, 26(3): 116- 123.
	LUO Y B, TENG H X. An interdependence balance algorithm for solving 0-1 knapsack problems. Industrial Engineering Journal, 2023, 26(3): 116.
39	汤飞, 何永义. 基于离散二进制粒子群-模拟退火算法求解0-1背包问题. 工业控制计算机, 2021, 34(5): 83-84, 86.
	TANG F, HE Y Y. Solving 0-1 knapsack problem based on particle binary swarm-simulated annealing algorithm. Industrial Control Computer, 2021, 34(5): 83-84, 86.
40	KULKARNI A J, SHABIR H. Solving 0-1 knapsack problem using cohort intelligence algorithm. International Journal of Machine Learning and Cybernetics, 2016, 7(3): 427- 441. doi: 10.1007/s13042-014-0272-y
41	ZHOU Y Q, BAO Z F, LUO Q F, et al. A complex-valued encoding wind driven optimization for the 0-1 knapsack problem. Applied Intelligence, 2017, 46(3): 684- 702. doi: 10.1007/s10489-016-0855-2
42	ZHOU Y Q, LI L L, MA M Z. A complex-valued encoding bat algorithm for solving 0-1 knapsack problem. Neural Processing Letters, 2016, 44(2): 407- 430. doi: 10.1007/s11063-015-9465-y
43	万晓琼, 张惠珍. 求解0-1背包问题的混合蝙蝠算法. 计算机应用研究, 2019, 36(9): 2579- 2583.
	WAN X Q, ZHANG H Z. Hybrid bat algorithm for solving 0-1 knapsack problem. Application Research of Computers, 2019, 36(9): 2579- 2583.
44	李枝勇, 马良, 张惠珍. 求解0/1背包问题的自适应元胞粒子群算法. 计算机工程, 2014, 40(10): 198- 203. doi: 10.3969/j.issn.1000-3428.2014.10.037
	LI Z Y, MA L, ZHANG H Z. Adaptive cellular particle swarm algorithm for solving 0/1 knapsack problem. Computer Engineering, 2014, 40(10): 198- 203. doi: 10.3969/j.issn.1000-3428.2014.10.037
45	MIRJALILI S, MIRJALILI S M, YANG X S. Binary bat algorithm. Neural Computing and Applications, 2014, 25(3): 663- 681.
46	CHEN Y, XIE W C, ZOU X F. A binary differential evolution algorithm learning from explored solutions. Neurocomputing, 2015, 149, 1038- 1047. doi: 10.1016/j.neucom.2014.07.030
47	PENG H, WU Z J, SHAO P, et al. Dichotomous binary differential evolution for knapsack problems. Mathematical Problems in Engineering, 2016, 2016, 5732489.
48	ALI I M, ESSAM D, KASMARIK K. Novel binary differential evolution algorithm for knapsack problems. Information Sciences, 2021, 542, 177- 194. doi: 10.1016/j.ins.2020.07.013
49	GUPTA S, SHU W H, ZHANG Y, et al. Differential evolution-driven traffic light scheduling for vehicle-pedestrian mixed-flow networks. Knowledge-Based Systems, 2023, 274, 110636. doi: 10.1016/j.knosys.2023.110636

[1]	WANG Xingjie, WANG Kan, FEI Rong, WANG Huaijun, GUO Yinbo, LAN Dapeng, ZHU Xiaojie. Computing Power Allocation Strategy Based on Diffusion Model in Satellite Edge Networks [J]. Computer Engineering, 2026, 52(1): 346-355.
[2]	CUI Mengmeng, SHI Jingyan, XIANG Haolong. Dynamic Vehicle Edge Task Offloading Method Based on Air-Ground Collaboration [J]. Computer Engineering, 2025, 51(9): 25-37.
[3]	MIAO Ru, LI Yi, ZHOU Ke, ZHANG Yanna, CHANG Ranran, MENG Geng. A Study on Improved Faster R-CNN Model for Multi-Object Detection in Remote Sensing Images [J]. Computer Engineering, 2025, 51(8): 292-304.
[4]	LI Siyuan, ZHONG Xingyu, LI Kaiyin, XU Qingzhen. Strategy Teaching Research Based on Multilayer Graph Relationship and Reinforcement Learning [J]. Computer Engineering, 2025, 51(3): 122-130.
[5]	WANG Cong, LIU Shuai, ZUO Mingmin. Task Offloading Strategy for Internet of Vehicles Based on Implicit Quantile Network [J]. Computer Engineering, 2025, 51(12): 244-254.
[6]	ZHU Li, GAO Jingkai, ZHU Chunqiang, DENG Fan. Short-term Power Load Forecasting Based on Dynamic Multi-Scale and Dual Attention Mechanisms [J]. Computer Engineering, 2025, 51(10): 369-380.
[7]	FENG Ruifeng, CHEN Yanru. Improved Genetic Algorithm Incorporating DRL for the Vehicle Routing Problem with Occasional Drivers and Scheduled Lines [J]. Computer Engineering, 2025, 51(10): 357-368.
[8]	Huanyu LU, Yonghong ZHANG, Guangyi MA, Donglin XIE, Wei TIAN. Semi-Supervised Adversarial Learning-Based Water Body Extraction from Remote Sensing Images [J]. Computer Engineering, 2024, 50(7): 251-263.
[9]	SONG Yanrui, ZHUANG Lei, XU Zexi, FENG Xu, MO Wenshuai. Reliable Service Function Chain Deployment Algorithm Based on Edge-Cloud Collaboration [J]. Computer Engineering, 2024, 50(12): 184-193.
[10]	CAO Jiawang, TIAN Weiwei, LIU Xueling, LI Yuxin, FENG Rui. Segmentation of Brain Substantia Nigra Pars Compacta Based on Improved U-Net [J]. Computer Engineering, 2022, 48(11): 14-21,29.

Please choose a citation manager

Content to export