Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training

2512.02315v1 q-bio.BM, cs.LG 2025-12-04

Авторы:

Felix Teufel, Aaron W. Kollasch, Yining Huang, Ole Winther, Kevin K. Yang, Pascal Notin, Debora S. Marks

Abstract

Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without large task-specific datasets. By encoding sequence information, auxiliary zero-shot predictions, and sparse experimental labels from many assays as a unified token set in a pre-training masked-language modeling paradigm, PRIMO learns to prioritize promising variants through a preference-based loss function. Across diverse protein families and properties-including both substitution and indel mutations-PRIMO outperforms zero-shot and fully supervised baselines. This work underscores the power of combining large-scale pre-training with efficient test-time adaptation to tackle challenging protein design tasks where data collection is expensive and label availability is limited.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Unlocking hidden biomolecular conformational landscapes in diffusion models at i...

EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for...

Compact Artificial Neural Network Models for Predicting Protein Residue -- RNA B...

EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbon...

Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection

Навигация