SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes
2510.25788v1
cs.LG, cond-mat.mtrl-sci
2025-11-01
Авторы:
Siddharth Verma, Alankar Alankar
Abstract
High-energy materials (HEMs) are critical for propulsion and defense domains,
yet their discovery remains constrained by experimental data and restricted
access to testing facilities. This work presents a novel approach toward
high-energy molecules by combining Long Short-Term Memory (LSTM) networks for
molecular generation and Attentive Graph Neural Networks (GNN) for property
predictions. We propose a transformative embedding space construction strategy
that integrates fixed SHA-256 embeddings with partially trainable
representations. Unlike conventional regularization techniques, this changes
the representational basis itself, reshaping the molecular input space before
learning begins. Without recourse to pretraining, the generator achieves 67.5%
validity and 37.5% novelty. The generated library exhibits a mean Tanimoto
coefficient of 0.214 relative to training set signifying the ability of
framework to generate a diverse chemical space. We identified 37 new super
explosives higher than 9 km/s predicted detonation velocity.