Symbolic Neural Generation with Applications to Lead Discovery in Drug Design
2510.23379v1
cs.LG, cs.AI, cs.NE, q-bio.BM, I.2.6; I.2.1; J.3
2025-10-29
Авторы:
Ashwin Srinivasan, A Baskar, Tirtharaj Dash, Michael Bain, Sanjay Kumar Dey, Mainak Banerjee
Abstract
We investigate a relatively underexplored class of hybrid neurosymbolic
models integrating symbolic learning with neural reasoning to construct data
generators meeting formal correctness criteria. In \textit{Symbolic Neural
Generators} (SNGs), symbolic learners examine logical specifications of
feasible data from a small set of instances -- sometimes just one. Each
specification in turn constrains the conditional information supplied to a
neural-based generator, which rejects any instance violating the symbolic
specification. Like other neurosymbolic approaches, SNG exploits the
complementary strengths of symbolic and neural methods. The outcome of an SNG
is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible
instances constructed from data, $X$ a set of generated new instances that
satisfy the description, and $W$ an associated weight. We introduce a semantics
for such systems, based on the construction of appropriate \textit{base} and
\textit{fibre} partially-ordered sets combined into an overall partial order,
and outline a probabilistic extension relevant to practical applications. In
this extension, SNGs result from searching over a weighted partial ordering. We
implement an SNG combining a restricted form of Inductive Logic Programming
(ILP) with a large language model (LLM) and evaluate it on early-stage drug
design. Our main interest is the description and the set of potential inhibitor
molecules generated by the SNG. On benchmark problems -- where drug targets are
well understood -- SNG performance is statistically comparable to
state-of-the-art methods. On exploratory problems with poorly understood
targets, generated molecules exhibit binding affinities on par with leading
clinical candidates. Experts further find the symbolic specifications useful as
preliminary filters, with several generated molecules identified as viable for
synthesis and wet-lab testing.