Prof. Marc Riedel is using machine learning, GPUs, and supercomputers to understand the SARS-CoV-2 virus
Prof. Marc Riedel recently received an NSF Early Concept Grant for Exploratory Research (EAGER) to conduct research on computationally predicting which individuals will present a successful antiviral response to the SARS-CoV-2 virus (COVID-19).
The $200,000 grant has been awarded under the Division of Computing and Communication Foundations (under the Directorate for Computer and Information Science and Engineering) for a period of 2 years. Prof. Riedel is collaborating with Dr. George Vasmatzis in the Department of Molecular Medicine at the Mayo Clinic on the project.
A Primer on How the COVID-19 Virus Attacks
One person’s response to a pathogen can be different from that of another, leading to widely varied outcomes. Such variation is partly dependent on their genes, and this is particularly relevant in the context of the current COVID-19 pandemic.
The S protein of the SARS-CoV-2 virus (the spike structure seen in illustrations of the virus) attaches itself to the ACE2 protein (Angiotensin-converting enzyme 2 which helps regulate blood pressure) on the surface of cells, triggering changes that result in the virus RNA entering the cell. Once inside, it takes over the host cell’s protein making machinery, rapidly making copies of itself which then invade other healthy cells.
To fight off the invader, the body’s immune system responds by raising body temperature, ingesting infected cells, and creating antibodies. But things begin to differ at this point. In some individuals, the attack stays confined to the upper respiratory tract, and they begin to recover relatively quickly. But if the immune system cannot defeat the virus at this stage, it travels down the respiratory system and takes hold in the lungs, which can have deadly consequences. But even among patients whose lungs are under attack by the SARS-CoV-2 virus, some recover with low levels of support (such as oxygen), while others develop acute respiratory distress syndrome (ARDS). Patients with ARDS typically need ventilators, and many die.
The Relevance of Prof. Riedel’s Research
Riedel’s project titled, “Computationally Predicting and Characterizing the Immune Response to Viral Infections,” seeks to forecast who among us will be able to successfully fight off the virus. Such knowledge can be critical in exploring treatment options, and tailoring the therapy to the specific individual.
The immune response to SARS-CoV-2 hinges on whether the viral protein fragments bind into a groove in cell-surface proteins, like a key into a lock. With the full set of proteins associated with SARS-CoV-2 now available, the project aims to predict, through purely computational means, whether such binding happens for all viral protein fragments, for all common variants of the cell surface proteins—so for all keys into all types of locks.
What Does This Mean Technically?
The human leukocyte antigen (HLA) system is a group of proteins, found on the surface of nearly all our cells, encoded by our major histocompatibility complex (MHC) genes. Cells transport protein fragments from viruses to their surface where they can bind to the HLA proteins. Once bound, they become targets for killer T cells. This provides a first, critical level of defense against viral infections. Cells that present the protein fragments are quickly killed off.
The goal of the project is to predict which protein fragments from SARS-CoV-2 will bind to each genetic variant of HLA proteins commonly found in the U.S. population. HLA typing (commonly performed for paternity testing) can be performed to establish which genetic variant an individual has.
The conventional approach is to simulate at the atomic level: the trajectories of all atoms in the system are determined by numerically solving Newton’s equations of motion. Simulating a single binding event takes days of supercomputing time. There are 374 variants of the relevant genes in the population, each coding for slightly different 3D structures of the cell-surface proteins. There are about 38,000 viral protein fragments from SARS-CoV-2. So the scale of the problem is to perform about 14 million such simulations.
This is a daunting computational problem. Riedel’s group will tackle it in a three-tiered approach. First, they will try machine-learning algorithms. These are quick, but they will provide limited information, as most of the protein fragments of the SARS-Cov-2 virus are new to science, and data for training these algorithms is limited. Next, the group will develop atomic-level simulation software, deployed on graphical processing units (GPUs). These are amazing devices: basically supercomputers in our PCs, that most people use to play video games. GPU simulations will narrow the set of protein fragments to consider for each variant of HLA molecule. Finally, Riedel’s group will deploy their algorithms at scale on supercomputing clusters in the cloud to compute binding strengths with a high degree of accuracy.
The successful implementation of the project will be a radical move in the direction of improved understanding of how the pandemic has developed. The computational infrastructure at work here could potentially be used in the future for other pandemics, caused by viruses or bacteria. It could also be transformative in characterizing the human immune system and its response to pathogens and could guide vaccine development.