||
These authors jointly supervised this work: Jan Baumbach, Josch Konstantin Pauling.
LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
Responding quickly to unknown pathogens is crucial to stop uncontrolled spread of diseases that lead to epidemics, such as the novel coronavirus, and to keep protective measures at a level that causes as little social and economic harm as possible. This can be achieved through computational approaches that significantly speed up drug discovery. A powerful approach is to restrict the search to existing drugs through drug repurposing, which can vastly accelerate the usually long approval process. In this Review, we examine a representative set of currently used computational approaches to identify repurposable drugs for COVID-19, as well as their underlying data resources. Furthermore, we compare drug candidates predicted by computational methods to drugs being assessed by clinical trials. Finally, we discuss lessons learned from the reviewed research efforts, including how to successfully connect computational approaches with experimental studies, and propose a unified drug repurposing strategy for better preparedness in the case of future outbreaks.
The novel SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) pathogen has infected around 60 million people and caused more than a million deaths worldwide (https://covid19.who.int/; as of November 2020). As a result, there is a need to find treatments that can be applied immediately to reduce mortality or morbidity.
Repurposing existing drugs is a rapid and effective way to provide such treatments by identifying new uses for drugs that have well-established pharmacological and safety profiles. Many drugs used to treat different diseases have already been successfully repurposed and approved for new indications. While repurposing can be conducted at any point in drug development, its greatest potential can be applied to drugs that are already approved. In the case of the COVID-19 pandemic, it is a fast and cost-efficient approach to identify novel treatments.
Recent studies have increasingly employed computational methods to systematically predict new drug targets or drug repurposing candidates. In contrast to experimental high-throughput screening, in silico approaches are faster, lower-cost, and can serve as an initial filtering step for evaluating thousands of compounds. Thus, they are useful for prioritizing drugs that warrant further evaluation and experimental validation. This requires the application of suitable algorithmic approaches to identify mechanisms relevant or specific to the disease.
This Review discusses current in silico drug repurposing efforts for COVID-19, followed by a discussion of the lessons learned from different perspectives (from data resources to the quality of predictions) and a proposed unified strategy to improve the response in potential future outbreaks. The covered studies employed standard drug repurposing workflows and data-driven algorithms.
As new studies are published almost every day, it is not possible to provide a broad and comprehensive overview of all repurposing studies. Hence, this Review focuses on the computational methods for drug repurposing, their application, availability and feasibility in a selection of studies (peer-reviewed and preprint) that were selected to cover a wide variety of different methods. It is worth noting that most of these studies are not considered successful clinically. Nevertheless, it is important to properly evaluate and improve the predictive power of in silico approaches that are capable of utilizing information from existing drugs as well as host and virus biology, even with limited availability of data on the novel emerging pathogen. This promotes a rapid and practical response to infection and therefore improves success in future pandemics, particularly in tackling the rise in infection cases at the early stages of the pandemic or ahead of vaccine development.
Besides experimental datasets, the rapid availability of resources that integrate different data types is crucial in a pandemic. Sharing data accelerates research, as computational methods depend on high-quality datasets, and experimental labs do not need to collect the information on their own. The large number of resources used in COVID-19 drug repurposing studies have shown that data can be quickly generated and gathered through strong community efforts. This section presents a selection of data resources used in the reviewed studies to describe the resource types that accelerated computational drug repurposing approaches: most of them are general data resources that were already established before the pandemic but that have been extended with COVID-19 or SARS-CoV-2-specific data. The resources used in the reviewed studies are listed in Supplementary Table . A list of COVID-19 specific data resources that were not used in the reviewed studies but may become relevant in the future is given in Supplementary Table .
All molecular data used in the reviewed publications were extracted from already established, general data resources that were quickly extended with SARS-CoV-2-specific data. Resources such as GenBank, the GISAID initiative, or UniProt provide genomic/proteomic sequence information about hosts and SARS-CoV-2. Structural resources collecting information about proteins, such as the Protein Data Bank (PDB), were extended by various SARS-CoV-2-specific proteins. Finally, transcriptome resources that collect gene expression data were used in several COVID-19 drug repurposing approaches. For instance, the Genotype-Tissue Expression (GTEx) program offers insights into tissue-specific gene expression. Expression in lung tissues is of high interest in COVID-19 drug repurposing research and was often integrated in computational models or studies. Other resources, such as the LINCS L1000 database, profile gene expression changes under certain drug treatment conditions and were used to identify drugs with reverse expression profiles to the samples infected with SARS-CoV-2.
Protein–protein interaction (PPI) networks enable visualization and analyses of the interactions between either host or virus proteins and other host proteins. Furthermore, PPI networks allow for particular adaptation and search strategies (for example, edge filtering) and can be connected to drug resources. Gordon et al. identified 332 high-confidence virus–host interactions between SARS-CoV-2 and human proteins. It was the only newly created and exclusively SARS-CoV-2-related resource used in the reviewed publications of this work. VirHostNet,, a virus–host PPI resource that already existed before the 2019/2020 SARS outbreak, was expanded with 167 new SARS-CoV-2 interactions. In contrast to virus–host PPIs, host PPIs are not virus specific. All resources that were used in the reviewed studies were already available before the pandemic but have since been widely used in COVID-19 drug repurposing approaches,. Besides molecular networks, knowledge graphs, such as the Global Network of Biomedical Relationships (GNBR), have demonstrated their utility for drug repurposing. These networks comprise various types of biological relationships assembled from literature and were integrated into COVID-19 drug repurposing approaches.
Drug databases that already existed before the pandemic and that are continuously extended with newly developed drugs were used to connect the results of different approaches to potential drugs. A widely used drug database is DrugBank, with more than 13,000 drug entries of approved and in-trial drugs, including drug targets. On the other hand, ChEMBL and ZINC15 contain millions of compounds that exhibit drug-like properties.
Drug repurposing approaches also benefited from trial databases as they can be used to validate whether the predicted drugs are already in trial or have not yet been evaluated. Examples of such resources are the EU Clinical Trials Register (https://www.clinicaltrialsregister.eu/) and ClinicalTrials.gov (https://clinicaltrials.gov/). The latter contains more than 350,000 research studies from 219 countries.
Various clinical, experimental and computational drug repurposing efforts have been rapidly mobilized prioritizing compounds to identify promising drug candidates for the SARS-CoV-2 pandemic. In this section, we examine a selection of studies representing the different computational approaches to identify potential new targets and repurposable drugs for COVID-19.
Virus-targeting approaches mostly rely on structure-based drug screening methods, which take the three-dimensional structures of target proteins to predict affinities or interaction energies of known chemical compounds to the proteins (Fig. ). These methods were mainly used to identify candidate drugs that target viral proteins, so we refer to them as virus-targeting approaches, although they can also be applied to host proteins. Two main methodological workflows were applied, namely, structure-based and deep-learning (DL)-based drug screening. Here, we describe these methods and compare 23 COVID-19 drug repurposing studies,,,,,,,,,,,,,,,,,,,,,,.
The input data consist of protein structure information (experimental or predicted) and chemical structure of drugs from public databases. Two analysis workflows can be applied: standard analysis consisting of docking followed by molecular dynamics (MD) simulations and DL-based analysis. Finally, the output data of both approaches generally consist of a ranking of drugs based on their (predicted) docking scores. The drugs can be further evaluated by whether or not they are in clinical trials.
The first step for structure-based screening is the selection of the drug library and the target protein. For COVID-19, the intuitive candidate for targeting virus proteins were antivirals. Thus, many studies limited their search to these. The number of screened drugs ranged from 3 (ref. ) to 123 antiviral drugs. Broader studies, such as that by Chen et al., combined compounds from the KEGG (Kyoto Encyclopedia of Genes and Genomes) and DrugBank databases to screen 7,173 drugs.
The other crucial step is the selection of the target protein and its corresponding three-dimensional structure (experimental or predicted). Wu et al. performed screening on 19 encoded proteins of the virus. By comparison, most other studies focused on the 3CLpro, envelope (E), spike, RNA polymerase and methyltransferase proteins.
Virtual screening of the drug libraries utilized established software, such as Autodock and Glide. Candidate drugs were selected using respective scoring methods, followed by validations with molecular dynamics simulations,.
Most drugs were predicted for 3CLpro (Supplementary Table ), which was also the focus of most studies (17 studies), followed by RdRp and PLPro. For 3CLpro, the predictions ranged from 2 (ref. ) to 27 (ref. ) drugs per study. The 5 most frequently predicted drugs were ritonavir (8 studies), lopinavir (6 studies), nelfinavir, remdesivir and saquinavir (5 studies each). However, 99 of the candidate drugs were only predicted in 1 study, showing a high variability in the resulting candidate sets. Interestingly, the studies that screened full databases also predicted antiviral drugs as top scorers (Supplementary Table ). Of the 23 studies, 10 have not yet been peer-reviewed, which we discuss in the section on ‘A unified drug repurposing strategy’.
DL models can predict binding affinities or docking scores and have shown advantages over conventional docking protocols. While standard docking protocols are limited to millions, DL approaches can analyze billions of chemical compounds. This allows them to be applied to whole databases, which increases the diversity of the tested compounds and the likelihood of finding unconventional compounds. Furthermore, they are capable of processing more (physico-)chemical features and can find features related to a non-favorable docking. However, most of these methods require datasets for training, which often come from real docking simulations; thus, the performance of many DL-based approaches still rely on the accuracy of the docking software used for training.
Ton et al. developed DeepDocking, which utilizes quantitative structure–activity relationship models trained to predict docking scores of compounds targeting the SARS-CoV-2 3CLpro protein. It requires fewer docking pipelines, since it performs docking only on subsets of compounds and can produce a reduced list of compounds, which is also enriched in potential top hits.
Nguyen et al. developed the method MathDL, which utilizes low-dimensional mathematical representations of the drug–target protein complex structures, which are then fed to DL algorithms to predict binding energies of drug–protein complexes. For SARS-CoV-2, the authors used experimental binding affinity data from SARS-CoV ligand–3CLpro complexes from PDBbind and SARS-CoV protease inhibitors as training data to predict binding energies on DrugBank compounds for SARS-CoV-2 3CLpro (ref. ) and does not depend on docking software.
Beck et al. developed a DL-based drug-target interaction prediction model, named Molecule Transformer-Drug Target Interaction. It utilizes simplified molecular-input line-entry system (SMILES) representations for drugs and protein sequences as input for training and predicts affinities. For SARS-CoV-2, the model was trained on commercially available antiviral drugs and viral target proteins. Antiviral drugs already used against SARS-CoV-2 were found among the candidate drugs identified.
Host-targeting approaches involve identifying potential drugs that interfere with host mechanisms that contribute to viral pathogenesis, which also makes them less prone to drug resistance,. In addition, SARS-CoV-2 infections can trigger a hyper-reactive immune response characterized by the excessive release of pro-inflammatory cytokines and chemokines. Thus, drugs that modulate the host immune response can benefit critically ill patients with COVID-19 by targeting specific dysregulated pathways,,.
Signature-based approaches primarily utilize transcriptome datasets from samples infected with SARS-CoV-2 or closely related human coronaviruses to identify candidate drugs through connectivity mapping (Fig. ), a well-established approach that relies on finding drug-induced expression signatures exhibiting reverse profiles to a disease signature,. Several studies adopted this as a primary method for identifying new therapeutics for COVID-19. Loganathan et al. performed differential expression analysis of virus-infected cells and extracted consistently dysregulated genes in infected conditions. They were used to query the Connectivity Map database for drug perturbation profiles exhibiting anti-correlated expression signatures. A modified approach was implemented by Jia et al., wherein expression data from infected and healthy individuals were used as input to a pathway-guided drug repurposing framework. They identified disease co-expression clusters and performed enrichment analyses prior to reverse signature matching.
Signature-based methods involve finding drug-induced expression profiles that exhibit reverse patterns to the coronavirus disease signature. Network-based approaches typically assemble heterogeneous networks from diverse data types, including gene–disease associations or drug–target associations. Algorithms such as network proximity, random walk/diffusion-based methods, or network enrichment are then employed. Some studies combined them with machine-learning-based methods, particularly autoencoders and graph convolutional networks. The outputs can be ranked lists of host targets or drug candidates.
The general network-based approach applied in drug repurposing studies on COVID-19 integrates multiple data sources, including virus–host interactions, PPIs, co-expression networks, functional associations or drug–target interactions (Fig. ). Network-based algorithms or topology measures are applied to the assembled networks to identify relevant host protein targets or regions of the host interactome that can be targeted.
Multiple studies implement random-walk-based algorithms as the primary method to identify new putative drug targets. Law et al. implemented several algorithms on a virus–host interactome to identify additional SARS-CoV-2 interactors. The coronavirus spike protein primarily has been established to mediate viral entry into host cells. Similarly, but focusing on a specific context, Messina et al. explored the pathogenic mechanisms triggered by the spike protein using data from three closely related coronaviruses. They implemented a random walk algorithm on assembled molecular networks using the spike protein as seed to identify relevant targets for COVID-19. In addition, CoVex implemented TrustRank, a variant of the PageRank algorithm, to propagate scores from user-defined seeds to the other host proteins and rank host drug targets.
Network proximity relies on the principle that a drug can be effective if it targets proteins within the neighborhood of disease-associated proteins in the interactome. Zhou et al. utilized this concept to compute the network proximity measure between drug targets and coronavirus-associated proteins in the human interactome. They also used the ‘complementary exposure’ pattern, which is based on the shortest distance between targets of two drugs predicted by network proximity, to identify potential drug combinations to treat COVID-19 patients.
Several studies combined multiple network-based strategies to predict drug candidates. Gysi et al. characterized and extracted a COVID-19 disease module using experimentally determined SARS-CoV-2 interactors. They performed network-based analyses accounting for tissue specificity and potential disease comorbidities. They employed a multi-modal approach to the virus–host interactome integrating network proximity, diffusion state distance and graph convolutional networks (GCNs) to identify drugs that can perturb the activity of host proteins associated with the COVID-19 disease module. The final drug list was obtained by rank aggregation from the different pipelines.
CoVex is a web platform for exploring SARS-CoV and SARS-CoV-2 virus–host–drug interactomes. Users can predict drug targets and drug candidates using several graph analysis methods that allow custom seed proteins as input. For instance, KeyPathwayMiner is a network enrichment tool that identifies condition-specific subnetworks by extracting a maximally connected subnetwork from the host interactome starting from the seeds. CoVex also implements a weighted multi-Steiner tree method that aggregates several non-unique approximations of Steiner trees, which are subnetworks of minimum cost connecting the set of seeds, into a single subnetwork.
Other studies additionally utilize machine learning to predict drug candidates against SARS-CoV-2. Belyaeva et al. implemented a hybrid approach between signature matching and network-based methods. Using autoencoders, they learned feature embeddings for drugs using drug-induced expression profiles to identify drugs exhibiting reverse profiles to the SARS-CoV-2 infection signature. Steiner tree and causal network discovery algorithms were then used to extract the mechanisms mediated by both SARS-CoV-2 and aging. Ge et al. constructed a virus-related knowledge graph and employed a GCN algorithm. The list of drug candidates was further filtered for existing evidence of antiviral activities through text mining. Similarly, Zeng et al. assembled a large-scale knowledge graph derived from PubMed articles. A GCN model was then applied to learn low-dimensional embeddings of the nodes and edges.
In the following, we examine the quality and potential of the reviewed data resources and computational methods in order to improve the response in future pandemics.
The availability of molecular datasets is a precondition to develop drug repurposing methods quickly. Besides that, network-based resources were a large driver in drug repurposing. However, a large portion of the publications are based on only a few primary resources, which always induces the risk of bias or measurement errors. In addition, the only type of molecular interaction network used was PPI. Still, high confidence PPIs are needed since, for instance, none of the approaches included structure data. In the future, other network types, such as gene regulatory networks, should be considered. Other data resources, such as off-label data for drugs, should also be integrated in drug repurposing studies.
Finally, existing drug and trial resources were widely used for developing the drug repurposing pipelines. However, we observed no standardization in trial resources, making it hard to analyze trials for certain drugs due to different names, different spellings, or typing errors. Standardization is usually implemented for drug resources (for example, DrugBank), but some drugs undergoing trials could not be found in the databases. Keeping the resources up to date and interconnected should be a focus and will enhance accessibility.
Assessing the quality of predictions is challenging, since many studies are not peer-reviewed, do not perform experimental evaluation, or rely on clinical trial databases. We examined the quality of predictions by determining the overlap between the final candidate drug lists from the individual studies and the drugs undergoing clinical trials from ClinicalTrials.gov (https://clinicaltrials.gov/) and Biorender (https://biorender.com/covid-vaccine-tracker) databases. In addition, we provide supplementary in vitro screening data, such as IC50 values for viral targets and inhibition indices from cell culture studies for SARS-CoV-2 (Supplementary Data ). Our effort to compile these data shows that a substantial number of predictions have not been experimentally tested.
We identified 53 drugs predicted with docking simulations that are undergoing current trials (Supplementary Table ). Wu et al. identified most of the drugs (36 drugs); however, these drugs were predicted for multiple viral proteins (for example, chlorhexidine for 11 and methotrexate for 6 different viral proteins). This indicates that their approach did not yield specific and feasible candidates. After excluding this study, the remaining drugs were only predicted for one specific protein each, except for chloroquine (3CLpro and PLpro) and remdesivir (3CLpro and RdRp). The top five drugs in clinical trials, which were predicted by docking simulations using the 3CLpro main protease, were predicted by 14.3% (darunavir), 19.0% (remdesivir), and 23.8% (lopinavir, nelfinavir, ritonavir) of the total number of included docking studies (Supplementary Table ), showing that for each drug, the majority of studies were not able to predict them. Similar drugs were identified by the DL approach of Beck et al., who identified ritonavir, lopinavir and remdesivir, which are being tested in multiple clinical trials. However, these antiviral drugs have not yet shown well-defined results in patients. For ritonavir/lopinavir, only four trials are completed,,, and preliminary results suggest no difference in the outcome after treatment,,. Further investigation is required. For remdesivir, some trials have been completed and the preliminary results in patients,, and human cell lines showed that it could be effective in treating SARS-CoV-2 infection.
Antiviral drugs are always the top hits among a large selection of drugs from databases, indicating high accuracy of the methods. These drugs are good candidates for experimental screening or clinical trials, independently of how reliable the computational predictions are. More interesting candidates are the additional drugs identified by these approaches; however, little experimental validation is available for these drugs and the majority of them do not enter clinical trials. A similar situation is observed in the emerging field of DL approaches, where most studies focused on demonstrating the accuracy of their predictions and developing benchmarking datasets,. DL and docking simulation-based approaches are promising tools to identify repurposable drugs given their capacity to deliver results in a short time. While a standard workflow is already established for docking simulations, DL-based approaches might robustly deliver testable candidate drugs. However, docking studies in particular were rarely peer reviewed, found very different candidate sets and partially used different scores for evaluation and ranking. This makes it necessary to validate these results by systematic comparisons of experiments.
Host-targeting approaches typically involve integration and analysis of multiple omics types and employ data-driven network-based methods; thus, a major limitation is the lack of gold-standard datasets and the scarcity of data from the MERS-CoV (Middle East respiratory syndrome coronavirus) and SARS-CoV outbreaks. Prior to the availability of sufficient SARS-CoV-2-specific data, earlier studies utilized preliminary data or augmented the analyses using data from closely related viruses. While the quality of the predictions is highly data-dependent, continued generation of SARS-CoV-2-specific omics data and pending results on clinical studies are expected to improve the predictions. Clinical expert knowledge remains crucial for filtering the drug predictions based on criteria such as toxicity and pharmacological properties. However, the efficacy of these candidate drugs in trial remains to be established and firm conclusions cannot be made because of the limited data availability.
The degree of overlap with drugs in clinical trials was generally low (Supplementary Tables and ), but more than half of the drugs (26 out of 41) predicted by an ensemble method primarily based on knowledge graphs are also undergoing clinical trials. While it should be noted that the drugs registered for clinical trials were also used as their validation set at the time of writing, more of their predicted drugs were registered for clinical trials later on. We also noted several drugs that were predicted by both signature-based and network-based approaches and thus warranted further examination (Supplementary Table ). Ribavirin was predicted by four out of six studies,,,, thereby providing a mechanistic basis for its predicted efficacy. Methotrexate, which is indicated for rheumatoid arthritis, was also predicted by three studies,,.
It is worth noting that several predicted compounds are currently used to treat critically ill COVID-19 patients. An example is dexamethasone (predicted by one signature-based and two network-based studies,), which was supported by the RECOVERY trial. Hydrocortisone (predicted by three studies,,) has also demonstrated efficacy for critically ill patients. Dexamethasone and hydrocortisone are corticosteroids that act by modulating an overactive immune response, which is typically observed in severely ill COVID-19 patients.
Notably, drugs reaching advanced phases in clinical trials were not selected based on in silico predictions, but were repurposed based on clinical experience with the previous SARS or MERS outbreaks and selected based on known effects in alleviating disease symptoms. Furthermore, the predictions were not followed-up by experimental validation in the majority of the studies reviewed. This translational gap between computational efforts for drug repurposing and clinical application is a major and widely recognized bottleneck in drug repurposing and medicine in general. Results from systematic validation efforts will also be important for identifying the algorithms and datasets that are specifically suitable for drug repurposing in the COVID-19 context. Given the urgency of identifying effective therapies in a pandemic, close collaboration between clinicians, experimental biologists and computational biologists is expected to address this gap.
Although overlaps between computationally predicted drug repurposing and clinical trials exist, there are no indications that clinical trials were conducted based on computational predictions, despite their promising potential. For future pandemics, computational tools should be able to deliver promising sets of candidates, which could then be validated in trials or screenings. Therefore, a unified strategy is necessary. In the following, we identify important issues and discuss potential solutions to make computational drug repurposing more effective.
Newly developed methods often rely on the same data types (Fig. ). The fast generation of different kinds of data in future disease outbreaks is a key initial step. Notable examples are the interaction data from Gordon et al. and the publication of the 3CLpro structure, which were both used by many subsequent studies. However, experimental replication of datasets obtained from different laboratories and the integration of different data types are crucial to increase robustness and require improvement.
a, Availability of standardized data. b, Accessible workflows for computational predictions. c, Combination of predictions from different methods. d, Feedback from clinical experts of drug candidate sets and screening parameters. e
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-22 00:56
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社