作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 31-46. doi: 10.19678/j.issn.1000-3428.0068808

• 热点与综述 • 上一篇    下一篇

基于机器学习的数据库多表连接顺序选择研究综述

王浩*(), 高锦涛, 王杰   

  1. 宁夏大学信息工程学院,宁夏 银川 750021
  • 收稿日期:2023-11-10 出版日期:2025-07-15 发布日期:2025-07-14
  • 通讯作者: 王浩
  • 基金资助:
    国家自然科学基金(62102201); 宁夏回族自治区自然科学基金(2022AAC05010); 宁夏回族自治区自然科学基金(2021BEB04054); 宁夏回族自治区自然科学基金(2021AAC03034)

Review of Multi-table Join Order Selection in Databases Based on Machine Learning

WANG Hao*(), GAO Jintao, WANG Jie   

  1. School of Information Engineering, Ningxia University, Yinchuan 750021, Ningxia, China
  • Received:2023-11-10 Online:2025-07-15 Published:2025-07-14
  • Contact: WANG Hao

摘要:

多表连接顺序选择是指在进行查询优化时为查询语句中涉及的多个表选择最优的连接顺序以提升查询性能。在复杂查询中,不同的表连接顺序能够显著影响查询执行效率。在大数据时代,面对庞大的数据集、多样的应用环境以及复杂的查询语句,基于启发式规则的传统多表连接顺序算法无法根据环境动态适应和自我学习,缺乏泛化能力,因此选择次优的多表连接顺序,甚至会严重影响查询性能。随着机器学习技术的蓬勃发展,面向数据库的人工智能(AI4DB)技术逐渐引领查询优化领域。机器学习技术能够解决传统连接顺序选择算法存在的问题,在自我学习以及场景适应方面具有较好表现。首先介绍连接顺序的传统选择算法,挖掘其存在的问题,然后总结当前主流的针对多表连接的机器学习模型,并分别介绍它们的核心技术方案,在效果、可用场景等方面对它们进行横向对比,为该领域后续科研工作者提供有价值的参考。

关键词: 数据库, 查询优化, 机器学习, 连接顺序, 面向数据库的人工智能

Abstract:

Multi-table join order selection refers to the process of determining the optimal join sequence among the tables involved in a query during query optimization, to improve execution performance. In complex queries, different join orders can significantly affect query efficiency. In the era of big data, traditional join order selection algorithms, which typically based on heuristic rules, are challenged by massive datasets, diverse application scenarios, and complex query workloads. Their inability to dynamically adapt to environmental changes or to self-improve through learning affects the generalizability of these models, often resulting in suboptimal join orders that can severely degrade query performance. With the rapid advancement of machine learning, Artificial Intelligence for Databases (AI4DB) has emerged as a transformative approach to query optimization. Machine learning-based techniques address the limitations of traditional methods by enabling self-learning and context-aware adaptations. This study first reviews classical join order selection algorithms and then analyzes their inherent limitations. Next, state-of-the-art machine learning models for multi-table join optimization are systematically summarized, detailing their core technical designs. A comparative analysis is provided in terms of effectiveness and applicable scenarios, offering valuable insights for future research in this field.

Key words: database, query optimization, machine learning, join order, Artificial Intelligence for Databases (AI4DB)