ACS Catal. 2026 Mar 30;16(8):7669-7682. doi: 10.1021/acscatal.6c00759. eCollection 2026 Apr 17.

ABSTRACT

Large language models (LLMs) have demonstrated their limitations in addressing the design of active proteins that rely on intricate intramolecular interactions, particularly in the engineering of biocatalysts. Conducting real-world studies from targeted laboratory assays has become the de facto standard for artificial intelligence (AI) research in complex biological tasks. In this study, we present a standardized strategy using function-targeted models to decode the subtle effect of sequence variations on the function. Unlike affinity-oriented protein-protein interaction studies using LLMs, our model targets the specific functional interpretation, thereby guiding enzyme evolution. We established the VERnet model using deep mutation scanning data that underwent self-distillation, achieving an optimal accuracy of 93.5% for interpreting CYP2C9 variants. Through directed evolution at conserved positions enhanced by generative AI, we identified multiple CYP2C9 variants exhibiting a broad range of functional alterations. Additionally, a fine-tuned model optimized by AlphaFold3 significantly improved the prediction of variants involving the substitution of two amino acids. Molecular dynamics simulations revealed the structural and dynamic features of the catalytic alterations in evolved variants. The in vitro validation of metabolic activity strongly corroborated the in silico predictions, highlighting the substantial potential of AI models in predicting functional evolution.

PMID:42022779 | PMC:PMC13097142 | DOI:10.1021/acscatal.6c00759