NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
2509.25757v1
cs.AI, cs.CL, cs.CV, cs.SC
2025-10-02
Авторы:
Danial Kamali, Parisa Kordjamshidi
Abstract
Modern Vision-Language Models (VLMs) have achieved impressive performance in
various tasks, yet they often struggle with compositional reasoning, the
ability to decompose and recombine concepts to solve novel problems. While
neuro-symbolic approaches offer a promising direction, they are typically
constrained by crisp logical execution or predefined predicates, which limit
flexibility. In this work, we introduce NePTune, a neuro-symbolic framework
that overcomes these limitations through a hybrid execution model that
integrates the perception capabilities of foundation vision models with the
compositional expressiveness of symbolic reasoning. NePTune dynamically
translates natural language queries into executable Python programs that blend
imperative control flow with soft logic operators capable of reasoning over
VLM-generated uncertainty. Operating in a training-free manner, NePTune, with a
modular design, decouples perception from reasoning, yet its differentiable
operations support fine-tuning. We evaluate NePTune on multiple visual
reasoning benchmarks and various domains, utilizing adversarial tests, and
demonstrate a significant improvement over strong base models, as well as its
effective compositional generalization and adaptation capabilities in novel
environments.