Neural Combinatorial Optimization Model Based on Bidirectional Construction Strategy

doi:10.19678/j.issn.1000-3428.0252115

Abstract

Abstract: Combinatorial optimization problems have important applications in areas such as logistics path planning, but their solution space exponentially expands with the problem size, leading to severe challenges for traditional methods. In recent years, neural combinatorial optimization methods based on reinforcement learning have been able to achieve solution quality close to that of traditional solvers while keeping the solution consumption time short. The mainstream method POMO (Policy Optimization with Multiple Optima) enhances the training stability through symmetry optimization, but its unidirectional sequence generation mechanism still suffers from a double limitation: on the one hand, it is difficult for the traditional constructive method to fully exploit the symmetry features of the problem; on the other hand, the endpoint information can’t effectively participate in the decision-making process of the remote node. To address this problem, this paper proposes a Bidirectional Construction Strategy (BCS)-based POMO model, named BCS-POMO, which dynamically selects the extension direction with higher confidence by constructing the solution sequence in parallel from the start point and the end point, avoiding models that are caught in a dilemma due to unidirectional constructions. The model exploits the symmetry of the construction sequence to achieve weight parameter sharing and improves the efficiency through batch parallel computation. Experiments have shown that the BCS-POMO effectively reinforces the role of endpoint information as a decision aid in the construction process, which reduces the error by 16% and 18% for the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP), respectively, verifying the effectiveness of the bidirectional construction strategy in exploiting the endpoint information and the advantages of symmetry modelling.

摘要： 组合优化问题在物流路径规划等领域具有重要应用价值，但其解空间随问题规模呈指数级扩张，导致传统方法面临严峻挑战。近年来基于强化学习的神经组合优化方法能够在保持较短求解耗时的同时，解质量已接近传统求解器的水平。主流方法POMO（Policy Optimization with Multiple Optima）通过对称性优化增强了训练稳定性，但其单向序列生成机制仍存在双重局限：一方面，传统构造式方法难以充分挖掘问题对称性特征；另一方面，终点信息无法有效参与远端节点的决策过程。针对这一问题，提出了基于双向构造策略（BCS）的BCS-POMO模型，它能通过起点与终点双向并行构造解序列，动态选择更有把握的扩展方向，避免模型因单向构造而陷入两难的抉择之中。该模型利用构造序列对称性实现权重参数共享，并通过批量并行计算提升效率。实验表明，BCS-POMO有效强化了终点信息在构造过程中的决策辅助作用，在旅行商问题（TSP）和有能力约束的车辆路径调度问题（CVRP）上分别使误差降低了16%和18%，验证了双向构造策略对终点信息利用的有效性和对称性建模的优势。

WANG Chaoyang, SUN Weiwei. Neural Combinatorial Optimization Model Based on Bidirectional Construction Strategy[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252115.

王朝扬, 孙未未. 基于双向构造策略的神经组合优化模型[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252115.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252115

References

[1] COOK W J, CUNNINGHAM W H, PULLEYBLANK W R,et al. Combinatorial optimization[M]. New York:John Wiley&Sons, Inc., 2010.
[2] 刘振宏,蔡茂诚.组合最优化算法和复杂性[M].北京:清华大学出版社, 1988.LIU Z H, CAI M C. Combinatorial optimization algorithms and complexity[M]. Beijing:Tsinghua University Press, 1988.
[3] HOCHBA D S. Approximation algorithms for NP-hard problems[J]. SIGACT News, 1997, 28(2):40-52.
[4] 王扬, 陈智斌, 吴兆蕊, 等. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2021, 16(02): 261-279.
[5] MAO H, SCHWARZKOPF M, VENKATAKRISHNAN S B, et al. Learning scheduling algorithms for data processing clusters[C]//Proceedings of the ACM Special Interest Group on Data Communication, Beijing, Aug 19-23, 2019. New York:ACM, 2019:270-288.
[6] VAZIRANI V V. Approximation algorithms[M]. Berlin, Heidelberg:Springer, 2013.
[7] KARIMI-MAMAGHAN M, MOHAMMADI M, MEYER P, et al. Machine learning at the service of meta-heuristics for solving combinatorial optimization problems:a state-ofthe-art[J]. European Journal of Operational Research, 2021,296(2):393-748.
[8] HELSGAUN K. An Extension of the Lin-Kernighan-Helsgaun TSP Solver for Constrained Traveling Salesman and Vehicle Routing Problems: Technical report[J]. Roskilde: Roskilde University, 2017, 12: 966-980.
[9] KONSTANTAKOPOULOS G D, GAYIALIS S P, KECHAGIAS E P. Vehicle routing problem and related algorithms for logistics distribution: a literature review and classification[J/OL]. Operational Research, 2022: 2033-2062. http://dx.doi.org/10.1007/s12351-020-00600-7. DOI:10.1007/s12351-020-00600-7.
[10] OPTIMIZATION I G. Gurobi optimizer reference manual[EB/OL].(2020-10-27). https://www.gurobi.com.
[11] HOPFIELD J J, TANK D W. Neural computation of decisions in optimization problems[J]. Biological Cybernetics,1985, 52(3):141-152.
[12] BELLO I, PHAM H, LE QuocV, et al. Neural Combinatorial Optimization with Reinforcement Learning[J]. arXiv: Artificial Intelligence,arXiv: Artificial Intelligence, 2016.
[13] VINYALS O, FORTUNATO M, JAITLY N. Pointer Networks[J]. arXiv: Machine Learning,arXiv: Machine Learning, 2015.
[14] KOOL W, VAN HOOF H, GROMICHO J, et al. Deep Policy Dynamic Programming for Vehicle Routing Problems[M/OL]//Integration of Constraint Programming, Artificial Intelligence, and Operations Research,Lecture Notes in Computer Science. 2022: 190-213. http://dx.doi.org/10.1007/978-3-031-08011-1_14. DOI:10.1007/978-3-031-08011-1_14.
[15] ZHANG W, DIETTERICH T G. A reinforcement learning approach to job-shop scheduling[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence,Montreal, Aug 20-25, 1995. San Francisco:Morgan Kaufmann Publishers Inc., 1995:1114-1120.
[16] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27.LIU Q, ZHAI J W, ZHANG Z C, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers,2018, 41(1):1-27.
[17] 郭田德,韩丛英,唐思琦.组合优化机器学习方法[M].北京: 科学出版社, 2019.GUO T D, HAN C Y, TANG S Q. Machine learning methods for combinatorial optimization[M]. Beijing:Science Press,2019.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All you Need[J]. Neural Information Processing Systems,Neural Information Processing Systems, 2017.
[19] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J/OL]. Machine Learning, 1992: 229-256. http://dx.doi.org/10.1007/bf00992696. DOI:10.1007/bf00992696.
[20] KWON Y D, CHOO J, KIM B, et al. POMO: Policy Optimization with Multiple Optima for ReinforcementLearning[J]. Neural Information Processing Systems,Neural Information Processing Systems, 2020.
[21] Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy[J]. 2023.
[22] KIM M, PARK J, PARK J. Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization[J]. 2022.
[23] ZHOU J, WU Y, SONG W, et al. Towards Omni-generalizable Neural Methods for Vehicle Routing Problems[J]. 2023.
[24] KWON Y D, CHOO J, YOON I, et al. Matrix Encoding Networks for Neural Combinatorial Optimization[J]. arXiv: Learning,arXiv: Learning, 2021.
[25] KOOL W, HOOF H, WELLING M. Attention, Learn to Solve Routing Problems![J]. arXiv: Machine Learning,arXiv: Machine Learning, 2018.
[26] The Traveling salesman problem: a computational study[J/OL]. Choice Reviews Online, 2007: 45-0928-45-0928. http://dx.doi.org/10.5860/choice.45-0928. DOI:10.5860/choice.45-0928.
[27] HELSGAUN K. An effective implementation of the Lin–Kernighan traveling salesman heuristic[J/OL]. European Journal of Operational Research, 2000: 106-130. http://dx.doi.org/10.1016/s0377-2217(99)00284-2. DOI:10.1016/s0377-2217(99)00284-2.

Please choose a citation manager

Content to export