AbFold -- an AlphaFold Based Transfer Learning Model for Accurate Antibody Structure Prediction

Peng Chao1, Zelong Wang1, Peize Zhao3, Weifeng Ge1,2,*, Charles Huang2,3,*
1 College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China. 2 Hong Kong Graduate School of Advanced Studies, Hong Kong 3 Palindromic Labs Limited, Hong Kong * Corresponding author.

Abstract

Antibodies are a group of proteins generated by B cells, which are crucial for the immune system. The importance of antibodies is ever-growing in pharmaceutics and biotherapeutics. Despite recent advancements pioneered by AlphaFold in general protein 3D structure prediction, accurate structure prediction of antibodies still lags behind, primarily due to the difficulty in modeling the Complementarity-determining regions (CDRs), especially the most variable CDR-H3 loop.

This paper presents AbFold, a transfer learning antibody structure prediction model with 3D point cloud refinement and unsupervised learning techniques. AbFold consistently produces state-of-the-art results on the prediction accuracy of the six CDR loops. The predictions of AbFold achieve an average RMSD of 1.51 Å for both heavy and light chains and an average RMSD of 3.04 Å for CDR-H3, bettering current models AlphaFold and IgFold. AbFold will contribute to antibody structure prediction and design processes.

Approach

Starting from antibody amino acid sequences, AbFold not only extracts information from homologous sequences but also leverages a pre-trained antibody language model to extract sequence features. By fusing homologous sequence features and language model features through a deep neural network, AbFold dynamically models the three-dimensional structure of antibodies. In terms of input, AbFold accepts the heavy-chain and light-chain sequences of antibodies. First, it uses MMseqs2 to search the Uniref90 protein sequence database, obtain homologous sequences of the input sequence, and generate a multiple sequence alignment (MSA). Next, transfer learning is applied to extract multi-dimensional features of the homologous sequences (denoted as s1) and the interaction information between each pair of amino acids in the sequence (denoted as z) from AlphaFold2. Meanwhile, the antibody sequences are input into AntiBERTy, a protein language model pre-trained on antibody sequences, to extract the language model features of the sequences; these features are further input into the IgFold feature encoding module to generate multi-dimensional feature information (denoted as s2). Subsequently, a deep neural network fuses the multi-dimensional feature information from s1 and s2 to obtain comprehensive sequence features (denoted as s). Through a triangular self-attention module, the updated multi-dimensional features s and the pairwise amino acid information z are mutually updated. Finally, the updated s and z are input into the structure prediction module, which dynamically predicts the three-dimensional structure of the antibody using an invariant point attention mechanism and a variational autoencoder.


Achieve Full-Atomic Precision Prediction of Antibody Structures

AbFold enables end-to-end modeling of antibody conformations with full-atomic precision. First, it uses MMseqs2 to search the Uniref90 sequence database to obtain homologous sequences of the antibody sequence and their multiple sequence alignments (MSAs), and extracts language model features via the antibody language model AntiBERTy. Then, transfer learning is employed to extract homologous sequence features and language model features from AlphaFold-Multimer and IgFold, respectively. Subsequently, these two types of features are fused through an information fusion module based on fully connected layers to generate the final sequence features, which are then input into a neural network based on a variational autoencoder for dynamic modeling of antibody structures.

Higher-Precision CDR H3 Loop

We compared AbFold with other antibody structure prediction methods on 153 test antibodies. The results show that in the structural prediction of CDR loops, AbFold exhibits the lowest error, and its predicted structures are closer to the native structures. Among the six CDR loops, AbFold achieves the lowest average error in five loops except the H1 loop, and the average error of its H1 loop prediction is only 0.02 Å higher than the optimal result. For the CDR H3 loop, which is the most difficult to model, AbFold has an average RMSD of 2.30 Å. In CDR regions with strong structural variability, AbFold’s RMSD is significantly lower than that of other methods including AlphaFold2-Multimer, indicating that AbFold has stronger modeling ability for dynamic structures.

Our Other Research

iMano

Artificial intelligence empowers various types of antibody human sources.

AbDiff

Generative Prediction of Antibody Conformational Ensembles Using Denoising Diffusion Probabilistic Models.

BibTeX

@article {Peng2023.04.20.537598,
	author = {Peng, Chao and Wang, Zelong and Zhao, Peize and Ge, Weifeng and Huang, Charles},
	title = {AbFold -- an AlphaFold Based Transfer Learning Model for Accurate Antibody Structure Prediction},
	elocation-id = {2023.04.20.537598},
	year = {2023},
	doi = {10.1101/2023.04.20.537598},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2023/04/21/2023.04.20.537598},
	eprint = {https://www.biorxiv.org/content/early/2023/04/21/2023.04.20.537598.full.pdf},
	journal = {bioRxiv}
}