By Insights Team in AI — 22 Jul 2025

Oxford Team Launches Million-Scale Antibody-Antigen Model: How Far Can Large Models Go?}

Oxford researchers developed a massive antibody-antigen dataset and model, exploring the data needs and limits of large-scale AI models in predicting binding affinity, with over a million data points.

Antibody drugs are powerful tools against cancer and viruses, but their effectiveness hinges on the binding strength between antibodies and antigens (ΔΔG). Predicting this key metric has long challenged researchers—either relying on costly experiments or AI models hampered by limited data.

Recently, Oxford University’s team developed Graphinity, a equivariant graph neural network built directly from antibody-antigen structures. While achieving a Pearson correlation coefficient (r) of 0.87 in ΔΔG prediction, it also faced overfitting issues.

To address this, they constructed nearly 1 million ΔΔG data points using FoldX on over 20,000 synthetic datasets, studying the data volume and types needed for accurate predictions.

This research, titled "Investigating the volume and diversity of data needed for generalizable antibody–antigen ΔΔG prediction", was published on July 8, 2025, in Nature Computational Science.

Link to paper: https://www.nature.com/articles/s43588-025-00823-8

Why is antibody development so challenging?

Antibodies bind specifically to targets, mediating therapeutic effects. Controlling affinity during development is crucial, but traditional experiments are slow, pushing researchers toward machine learning solutions.

Methods like FoldX and Rosetta Flex ddG rely on physics and empirical parameters, taking minutes to hours per mutation with variable accuracy. Early ML models on mutation datasets like AB-Bind and SKEMPI showed limited generalization, with correlation (r) dropping to 0.17–0.26 when strict sequence similarity was enforced.

Graphinity uses the structures of wild-type and mutated complexes as input, processing graph representations with Siamese EGNNs to predict ΔΔG. Despite high initial performance (r=0.87), this was overfitting, as accuracy dropped significantly when sequence similarity thresholds increased.

Summary: The core issue is limited and biased data. To explore this, researchers generated nearly 1 million ΔΔG points via exhaustive mutations using FoldX on structural antibody datasets, capturing key molecular interactions.

In cross-validation, the model achieved an impressive r=0.89 with 90% CDR sequence similarity. To quantify how much data is needed for reliable predictions, they found that training with at least 90,000 mutations stabilized the Pearson correlation above 0.85, using a total of 94,126 data points.

Data distribution analysis showed models trained on smaller datasets tend to regress toward the mean, yet still maintain high correlation with true values. The key is not just quantity but diversity of data, evaluated through three metrics: sequence diversity, amino acid substitution types, and mutation location distribution.

Using 100,000 mutations from diverse subsets, the team observed that antibody sequence diversity and chemical space richness are critical, while mutation location (core vs. periphery) has less impact.

In practical tests with 36,391 experimental data points, Graphinity achieved ROC AUC of 0.90 and average precision (AP) of 0.82, demonstrating it can learn the distribution rather than just memorize.

Data scarcity remains a challenge

Despite large-scale validation, the team concluded that the main obstacle in experimental ΔΔG prediction is data availability, not model architecture. They estimate at least 90,000 ΔΔG values are needed to reach a Pearson correlation above 0.85, emphasizing the need for more diverse, high-quality datasets.

Current data on antibody sequences and amino acid substitutions are limited, underscoring the importance of transitioning toward machine learning-grade data and developing methods for universal affinity prediction.

Subscribe to QQ Insights