Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

DARE-SQL: A Multi-Candidate SQL Generation and Selection Framework Based on Question Ambiguity

  

  • Published:2026-04-21

DARE-SQL:基于问题歧义的多候选SQL生成与选择框架

Abstract: The Text-to-SQL task aims to convert natural language queries (NLQ) into Structured Query Language (SQL). Although the rise of Large Language Models (LLMs) has redefined the paradigm of this task, most existing studies focus on optimizing the model's schema awareness and SQL generation capabilities through prompt engineering, while often neglecting the prevalent semantic ambiguity in natural language. This neglect leads to comprehension biases when models handle complex scenarios. To address this, we propose a Text-to-SQL framework with Disambiguation, Analysis, Refinement, and Election (DARE-SQL). The framework first leverages the semantic reasoning capabilities of LLMs to construct a semantic expansion module, which generates an expanded set of questions covering the user's potential intent space to explicate and capture fuzzy semantics. Subsequently, differentiated generation strategies are applied to questions from various sources, and a refinement mechanism based on execution feedback is introduced to optimize the results, thereby building a high-quality set of candidate SQLs. Finally, a two-stage selection strategy based on question consensus is employed to filter for the optimal solution that balances both accuracy and execution performance. Experimental results demonstrate that DARE-SQL achieves an Execution Accuracy (EX) of 71.71% and a Valid Efficiency Score (VES) of 70.41 on the challenging BIRD benchmark, and reaches 88.10% EX on the classic Spider dataset. These results validate the effectiveness of explicit ambiguity modeling in enhancing performance for complex Text-to-SQL tasks.

摘要: Text-to-SQL任务旨在将自然语言查询(NLQ)转化为结构化查询语言(SQL)。尽管大语言模型(LLM)的兴起重新定义了该任务的范式,但现有研究多侧重于通过提示工程优化模型对模式信息的感知及SQL生成能力,往往忽略了自然语言中普遍存在的语义歧义性,导致模型在处理复杂问题时易产生理解偏差。为此,本文提出一种基于歧义分析的多候选生成与选择框架——DARE-SQL(A Text-to-SQL Framework with Disambiguation, Analysis, Refinement and Election)。该框架首先利用LLM的语义推理能力构建语义扩展模块,针对潜在歧义生成覆盖用户意图空间的扩展问题集,以显化并捕捉模糊语义。随后,针对不同来源的问题采用差异化生成策略,并引入基于执行反馈的修正机制优化生成结果,构建高质量候选SQL集合。最后,通过问题共识的两阶段选择策略,筛选出兼顾准确性与执行性能的最优解。实验结果表明,DARE-SQL在具有挑战性的BIRD基准上取得了71.71%的执行准确率(EX)与70.41的有效效率得分(VES),并在Spider数据集上达到88.10%的EX,验证了显式建模语义歧义对提升复杂Text-to-SQL任务性能的有效性。