Layer Probing Improves Kinase Functional Prediction with Protein Language Models

2512.00376v1 q-bio.QM, cs.AI, cs.LG 2025-12-02
Авторы:

Ajit Kumar, IndraPrakash Jha

Abstract

Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.

Ссылки и действия