In the wake of a growing number of cases of COVID-19, DeepMind has utilized their AlphaFold algorithm to predict a variety of protein structures associated with COVID-19. Given a sequence of amino acids, the building blocks for proteins, AlphaFold is able to predict a three-dimensional protein structure. Typically, going from a sequence of amino acids to a three-dimensional structure is a long and intensive process, requiring a wide variety of protein visualization techniques and structural analysis such as cryo-electron microscopy, nuclear magnetic resonance, and X-ray crystallography.
However, AlphaFold, which recently won the CASP13 competition (Critical Assessment of Techniques for Protein Structure Prediction), bypasses these techniques with a deep neural network that predicts distances and angles between amino acids, scored with gradient descent. It uses free-modeling, which means that it ignores similar structures when making predictions, which is particularly helpful for COVID-19, as few similar protein structures are readily available.
AlphaFold is composed of three distinct layers of deep neural networks. The first layer is composed of a variational autoencoder stacked with an attention model, which generates realistic-looking fragments based on a single sequence’s amino acids. The second layer is split into two sublayers. The first sublayer optimizes inter-residue distances using a 1D CNN on a contact map, which is a 2D representation of amino acid residue distance by projecting the contact map onto a single dimension to input into the CNN. The second sublayer optimizes a scoring network, which is how much the generated substructures look like a protein using a 3D CNN. After regularizing, they add a third neural network layer that scores the generated protein against the actual model.
The model conducted training on the Protein Data Bank, which is a freely accessible database that contains the three-dimensional structures for larger biological molecules such as proteins and nucleic acids. The model takes in a few inputs including aatype, a one-hot encoding of amino acid type, the deletion probability, the fraction of sequences that had a deletion at this position, and a gap matrix, which gives an indication of the variance due to gap states. The output contains a distogram, which includes the predicted secondary structure and accessible surface area.
After cross-validating their results on the COVID-19 spike protein with the structures determined experimentally by the Francis Crick Institute, DeepMind submitted their predictions for the proteins whose structures are not readily determined. These proteins include the membrane protein, protein 3a, nsp2, nsp4, nsp6, and papain-like C-terminal domain. These protein structures can potentially contain docking sites for new drugs or therapeutics, and were intended to help with future drug development in the efforts to contain COVID-19.
Several other groups are applying AI technologies to assist in the fight against Covid-19. For example, a thoracic imaging group leveraged a ResNet50 backbone connected to a 3D CNN via a max pooling layer to distinguish Covid-19 from community-acquired pneumonia. Blue Dot used an online natural language processing ML algorithm to predict the location of the next outbreak.