Multi Language Models for On-the-Fly Syntax Highlighting
2510.04166v1
cs.SE, cs.AI
2025-10-08
Авторы:
Marco Edoardo Palma, Pooja Rani, Harald C. Gall
Abstract
Syntax highlighting is a critical feature in modern software development
environments, enhancing code readability and developer productivity. However,
delivering accurate highlighting in real time remains challenging for online
and web-based development tools due to strict time and memory constraints on
backend services. These systems must serve highlights rapidly and frequently,
even when code is partially valid or invalid. This has led to on-the-fly syntax
highlighting, where visual annotations are generated just before content is
served, often at high request rates and under incomplete input conditions. To
meet these demands efficiently, state-of-the-art models use deep learning to
learn the behavior of brute-force syntax highlighting resolvers, tools that are
easy to implement but too slow for production. Through the Deep Abstraction
process, brute-force strategies are encoded into fast statistical models that
achieve both high accuracy and low-latency inference. Despite their success,
such models face key challenges: they support only one programming language per
model, require large datasets from slow brute-force generators, and involve
resource-intensive training. In multi-language environments, this means
maintaining multiple independent models, increasing system complexity and
operational cost. This work addresses these issues by introducing a unified
model capable of highlighting up to six mainstream programming languages,
reducing deployment complexity by a factor of six and improving performance on
unseen languages. A novel normalization technique significantly enhances model
generalization, while few-shot learning experiments show that a small number of
oracle samples can replace large datasets, minimizing dependence on brute-force
generators. Combined, these innovations enable efficient, scalable, and
cost-effective syntax highlighting across diverse programming languages.
Ссылки и действия
Дополнительные ресурсы: