作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (6): 136-145. doi: 10.19678/j.issn.1000-3428.0069109

• 人工智能与模式识别 • 上一篇    下一篇

基于分层强化学习的在线三维装箱模型

亓明凯, 王迪, 张立晔*()   

  1. 山东理工大学计算机科学与技术学院, 山东 淄博 255000
  • 收稿日期:2023-12-27 出版日期:2025-06-15 发布日期:2024-05-28
  • 通讯作者: 张立晔
  • 基金资助:
    山东省自然科学基金(ZR2023MF015)

Online 3D Bin Packing Model Based on Hierarchical Reinforcement Learning

QI Mingkai, WANG Di, ZHANG Liye*()   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, Shandong, China
  • Received:2023-12-27 Online:2025-06-15 Published:2024-05-28
  • Contact: ZHANG Liye

摘要:

在过去的一些研究中, 人工智能如何以一种分层的方式在多个抽象级别和多个时间尺度上表示感知和行动规划逐渐成为一个研究热点。受限于技术手段, 多数工作都局限在人工分解任务阶段, 如在三维装箱问题(3D-BPP)中, 通过启发式规则指导神经网络解析打包点帮助智能体分解状态空间, 将原本庞大、复杂的空间转换为一个个子空间, 为神经网络提供更好的备选解决方案。然而这种方式受限于规则本身, 若规则不能完美地拆解问题, 则这种固定规则的辅助会限制神经网络的性能, 使得更好的解决方案被规则本身忽略。针对这种情况, 提出一种基于启发式规则融合策略的改进装箱配置树(PCT)模型, 通过分层强化学习的思想将问题分层, 引入图注意力分类模型来判断在当前情况下最优的空间点拓展方案, 由此为拆解箱体内部空间点与探寻可行性位置提供更多的排列组合方式。实验结果表明, 基于启发式规则融合策略的改进模型在多个数据集上表现优于原始模型, 在包含额外密度信息的数据集中平均装箱利用率高达77.2%, 较原始模型提升1.7百分点, 能够在合理的时间内给出性能更优的解决方案。

关键词: 分层强化学习, 三维装箱, 图注意力网络, 启发式空间拓展, 深度强化学习

Abstract:

Previous studies have shown increasing interest in understanding how artificial intelligence represents perception and action planning in a hierarchical manner across multiple abstraction levels and timescales. Owing to technological constraints, most studies have been limited to the artificial decomposition of tasks, such as the 3D Bin Packing Problem (3DBPP). In this scenario, heuristic rules guide neural networks in the analysis of the packing points during the task decomposition stage, thus helping the agent decompose the state space. This transforms the originally vast and complex space into individual subspaces, thereby providing the neural network with better alternative solutions. However, these rules cause performance limitations. If the rules cannot perfectly decompose the problem, fixed-rule assistance may restrict the performance of the neural network by overlooking better solutions that the rules may ignore. To address this problem, a heuristic rule fusion strategy is used in this study to improve the original Packing Configuration Tree (PCT) model. This strategy is based on the concept of hierarchical reinforcement learning to layer the problem, in which a graph attention classification model is introduced to determine the optimal spatial point expansion scheme for the current situation. This approach ensures more possibilities for the combination and arrangement of dismantling internal space points and exploring feasible positions. The results of experiments show that the improved model based on heuristic fusion strategy for layered problems performs better than the original model on multiple datasets. In datasets containing additional density information, the average packing utilization rate reaches 77.2%, which is a 1.7 percentage point improvement over the original model. The proposed model provides more optimal solutions within a reasonable amount of time.

Key words: hierarchical reinforcement learning, 3D bin packing, Graph Attention Network (GAT), heuristic space expansion, deep reinforcement learning