ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL
2511.00985v2
cs.DB, cs.AI, cs.CL
2025-11-06
Авторы:
Yiwen Jiao, Tonghui Ren, Yuche Gao, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang
Abstract
Large Language Models (LLMs) have demonstrated remarkable progress in
translating natural language to SQL, but a significant semantic gap persists
between their general knowledge and domain-specific semantics of databases.
Historical translation logs constitute a rich source of this missing in-domain
knowledge, where SQL queries inherently encapsulate real-world usage patterns
of database schema. Existing methods primarily enhance the reasoning process
for individual translations but fail to accumulate in-domain knowledge from
past translations. We introduce ORANGE, an online self-evolutionary framework
that constructs database-specific knowledge bases by parsing SQL queries from
translation logs. By accumulating in-domain knowledge that contains schema and
data semantics, ORANGE progressively reduces the semantic gap and enhances the
accuracy of subsequent SQL translations. To ensure reliability, we propose a
novel nested Chain-of-Thought SQL-to-Text strategy with tuple-semantic
tracking, which reduces semantic errors during knowledge generation.
Experiments on multiple benchmarks confirm the practicality of ORANGE,
demonstrating its effectiveness for real-world Text-to-SQL deployment,
particularly in handling complex and domain-specific queries.
Ссылки и действия
Дополнительные ресурсы: