Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (9): 71-77,88. doi: 10.19678/j.issn.1000-3428.0064560

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

NL2SQL Model Based on Self-Pruning Heterogeneous Graph

HUANG Junyang1, WANG Zhenyu2, LIANG Jiaqing1, XIAO Yanghua1   

  1. 1. School of Software, Fudan University, Shanghai 200433, China;
    2. Science and Technology on Information Systems Engineering Laboratory, Nanjing 210007, China
  • Received:2022-04-26 Revised:2022-05-26 Published:2022-09-08

基于自裁剪异构图的NL2SQL模型

黄君扬1, 王振宇2, 梁家卿1, 肖仰华1   

  1. 1. 复旦大学 软件学院, 上海 200433;
    2. 信息系统工程重点实验室, 南京 210007
  • 作者简介:黄君扬(1996—),男,硕士研究生,主研方向为自然语言处理、知识图谱;王振宇,工程师、硕士;梁家卿,博士后;肖仰华,教授、博士。
  • 基金资助:
    国家自然科学基金(62102095);中国博士后基金(2020M681173,2021T140124);上海市科技创新行动计划(19511120400);信息系统工程重点实验室开放基金(05202002)。

Abstract: Natural Language to Structured Query Language(NL2SQL) is a critical task in semantic parsing.The goal of NL2SQL lies in the joint learning of natural language query and database schema.Existing approaches construct a heterogeneous graph to jointly encode the entire database schema and natural language query.However, the heterogeneous graph constructed using this approach introduces a large amount of useless information and ignores the importance of different information in the schema.A novel NL2SQL model based on a self-pruning heterogeneous graph and relative position attention mechanism called the SPRELA model is proposed to improve the logic and execution accuracy of NL2SQL.The proposed model is implemented with a sequence-to-sequence architecture and uses a pre-trained language model, ELECTRA, as its backbone.An initial heterogeneous graph is constructed by introducing expert knowledge for database schema and natural language query.The heterogeneous graph is self-pruned for the natural language query, and a multi-head relative position is used to encode the self-pruned database schema and natural language query information.The target SQL statement is generated using a tree-structured decoder with predefined SQL syntax.The SPRELA model is experimented on the Spider dataset and achieves an accuracy of 71.1% in terms of execution accuracy.Compared with the Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL(RaSaP) model at the same parameter level, the SPRELA model is improved by 1.1 percentage points.The results demonstrate that the SPRELA model better aligns database schema with natural language questions and understands the semantic information in natural language queries.

Key words: Natural Language to Structured Query Language(NL2SQL), heterogeneous graph, self-pruning mechanism, semantic parsing, pre-trained language model

摘要: 自然语言转换为结构化查询语言(NL2SQL)是语义解析领域的重要任务,其核心为对数据库模式和自然语言问句进行联合学习。现有研究通过将整个数据库模式和自然语言问句联合编码构建异构图,使得异构图中引入大量无用信息,并且忽略了数据库模式中不同信息的重要性。为提高NL2SQL模型的逻辑与执行准确率,提出一种基于自裁剪异构图与相对位置注意力机制的NL2SQL模型(SPRELA)。采用序列到序列的框架,使用ELECTRA预训练语言模型作为骨干网络。引入专家知识,对数据库模式和自然语言问句构建初步异构图。基于自然语言问句对初步异构图进行自裁剪,并使用多头相对位置注意力机制编码自裁剪后的数据库模式与自然语言问句。利用树型解码器和预定义的SQL语法,解码生成SQL语句。在Spider数据集上的实验结果表明,SPRELA模型执行准确率达到71.1%,相比于相同参数量级别的RaSaP模型提升了1.1个百分点,能够更好地将数据库模式与自然语言问句对齐,从而理解自然语言查询中的语义信息。

关键词: 自然语言转换为结构化查询语言, 异构图, 自裁剪机制, 语义解析, 预训练语言模型

CLC Number: