OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models

2512.04738v1 cs.CL, cs.AI, cs.DB 2025-12-06
Авторы:

Zhuoyue Wan, Wentao Hu, Chen Jason Zhang, Yuanfeng Song, Shuaimin Li, Ruiqiang Xiao, Xiao-Yong Wei, Raymond Chi-Wing Wong

Abstract

Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction, existing solutions often rely on large-scale closed-source models that suffer from high inference costs, limited transparency, and lack of adaptability for lightweight deployment. In this paper, we present OsmT, an open-source tag-aware language model specifically designed to bridge natural language and Overpass Query Language (OverpassQL), a structured query language for accessing large-scale OpenStreetMap (OSM) data. To enhance the accuracy and structural validity of generated queries, we introduce a Tag Retrieval Augmentation (TRA) mechanism that incorporates contextually relevant tag knowledge into the generation process. This mechanism is designed to capture the hierarchical and relational dependencies present in the OSM database, addressing the topological complexity inherent in geospatial query formulation. In addition, we define a reverse task, OverpassQL-to-Text, which translates structured queries into natural language explanations to support query interpretation and improve user accessibility. We evaluate OsmT on a public benchmark against strong baselines and observe consistent improvements in both query generation and interpretation. Despite using significantly fewer parameters, our model achieves competitive accuracy, demonstrating the effectiveness of open-source pre-trained language models in bridging natural language and structured query languages within schema-rich geospatial environments.

Ссылки и действия

Связанные статьи

Play by the Type Rules: Inferring Constraints for LLM Functions in Declarative P...

## Контекст Интеграция LLM-powered operators в declarative query languages позволяет объединить дешевые и интерпретируем...

2025-09-26

Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc ...

## Контекст Безрассильные языковые модели (PLM) обучены на больших объемах немаркированных данных и проявляют выдающиеся...

2025-08-27

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Docume...

## Контекст В последние годы технологии текстовой обработки и машинного обучения приобрели неоспоримую роль в решении ра...

2025-08-19