ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL

2511.00985v2 cs.DB, cs.AI, cs.CL 2025-11-06
Авторы:

Yiwen Jiao, Tonghui Ren, Yuche Gao, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang

Abstract

Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-specific semantics of databases. Historical translation logs constitute a rich source of this missing in-domain knowledge, where SQL queries inherently encapsulate real-world usage patterns of database schema. Existing methods primarily enhance the reasoning process for individual translations but fail to accumulate in-domain knowledge from past translations. We introduce ORANGE, an online self-evolutionary framework that constructs database-specific knowledge bases by parsing SQL queries from translation logs. By accumulating in-domain knowledge that contains schema and data semantics, ORANGE progressively reduces the semantic gap and enhances the accuracy of subsequent SQL translations. To ensure reliability, we propose a novel nested Chain-of-Thought SQL-to-Text strategy with tuple-semantic tracking, which reduces semantic errors during knowledge generation. Experiments on multiple benchmarks confirm the practicality of ORANGE, demonstrating its effectiveness for real-world Text-to-SQL deployment, particularly in handling complex and domain-specific queries.

Ссылки и действия

Связанные статьи

From Documents to Database: Failure Modes for Industrial Assets

## Контекст Инфраструктура индустриальных активов широко используется в различных отраслях, но ее эффективное управление...

2025-09-24

Text to Query Plans for Question Answering on Large Tables

## Контекст В современном мире, где объемы данных растет экспоненциально, эффективное использование больших табличных да...

2025-08-28