INDRA applications and models for COVID-19


INDRA integrates multiple text-mining systems and pathway databases to automatically extract mechanistic knowledge from the biomedical literature and through a process of knowledge assembly, build executable models and causal networks. Based on profiling and perturbational data, these models can be contextualized to be cell-type specific and used to explain experimental observations or to make predictions.

In the context of the ongoing COVID-19 pandemic, the INDRA team at the Laboratory of Systems Pharmacology, Harvard Medical School is working on understanding the mechanisms by which SARS-CoV-2 infects cells and the subsequent host response process, with the goal of finding new therapeutics using INDRA.

Results

A self-updating model of COVID-19 literature

EMMAA (Ecosystem of Machine-maintained Models with Automated Analysis) makes available a set of computational models that are kept up-to-date using automated machine reading, knowledge-assembly, and model generation, integrating new discoveries immediately as they become available.

The EMMAA COVID-19 model integrates all literature made available under the COVID-19 Open Research Dataset Challenge (CORD-19) and combines it with newly appearing papers from PubMed (about 300 every day) as well as bioRxiv and medRxiv preprints. It also integrates content from CTD, DrugBank, VirHostNet, and many other pathway databases.

The model is also used to construct casual, mechanistic explanations to around 2,800 drug-virus effects:

The EMMAA COVID-19 model is also on Twitter (@covid19_emmaa) where it provides updates on the findings that it learns from the literature and also new experimental observations (such as drug effects on viruses, as described above) that it can explain based on these new pieces of knowledge.

INDRA aligned with the COVID-19 Disease Map

The COVID-19 Disease Map brings together top pathway curators and modelers from around the world to create a set of models to elucidate the molecular mechanisms behind COVID-19.

We used INDRA statements assembled from all available biomedical literature and a multitude of pathway databases to find evidence for all interactions in the COVID-19 Disease Map, and to suggest other mechanisms that haven’t yet been included. The results are available here.

We also implemented a feature - based on the above alignment - to find small molecule inhibitors for a given pathway in the COVID-19 Disease Map. The results for the Interferon Type I pathway are available here.

We also used our Gilda system to find appropriate grounding (database identifiers) to ungrounded entities used in the Disease Map. The results of this are available here.

Reports on drugs affecting targets relevant for COVID-19

We used INDRA to assemble all known small molecules that can inhibit a set of protein targets that are of particular interest in treating COVID-19. These reports are organized as browseable web pages that allow drilling down into specific literature evidence, linking to supporting publications, and curating any incorrect relationships. The target-specific reports are available here: ACE2 TMPRSS2 CTSB CTSL FURIN.

We also compiled similar reports on the downstream effects of some specific drugs of interest to our collaborators. These can be found here: amodiaquine hydroxychloroquine

While we added some customizations to these reports, similar results can be obtained by querying the INDRA DB directly.

CORD-19 documents prioritized for pathway curators

To support the COVID-19 Disease Map curator community, we generated a ranking of articles in the CORD-19 corpus by the amount of molecular mechanistic information they were likely to contain. For each article, the dataset lists 1) the total number of mechanistic events extracted by all NLP systems supported by INDRA, 2) the number of unique events extracted from the document, and 3) the number of unique events where subject and object were both molecular entities (i.e., protein or chemical). Because the CORD-19 corpus contains many documents that are not directly relevant to coronavirus biology, we also generated rankings for the subset of documents tagged with the MESH term for “coronavirus” in PubMed (MESH ID D017934). The datasets are available at the links below: All CORD-19 articles Coronavirus articles only


Semantic search over INDRA COVID-19 results

Another interface for browsing INDRA COVID-19 literature assembly results is available via semviz.org on this page (login: semvizuser/semviz), an approach to semantic browsing of biomedical relations developed at Brandeis University. A tutorial video of using this interface with INDRA results to construct hypotheses about COVID-19 is available here.

Integrations and collaborations

CoronaWhy

CoronaWhy is a globally distributed, volunteer-powered research organisation, assisting the medical community’s ability to answer key questions related to COVID-19.

INDRA is a key part of the CoronaWhy software infrastructure as an entrypoint to access multiple text-mining systems and pathway databases and assembling causal models from these sources.

COVIDminer

INDRA coupled to Reach serves as the back-end for the COVIDminer application developed by Rupert Overall. COVIDminer allows searching for entities of interest for COVID-19 and visualizing the set of interactions in their neighborhood as a graph. By clicking on graph nodes or edges, users can learn more about each entity as well as the supporting publication and the specific sentence serving as evidence for relations.

General technologies for COVID-19

We have developed several applications that are generally applicable to biomedical research and can therefore also be used to study COVID-19.

Funding

This work is funded under the DARPA Communicating with Computers (W911NF-15-1-0544), DARPA Automating Scientific Knowledge Extraction (HR00111990009) and DARPA Automated Scientific Discovery Framework (W911NF-18-1-0124) programs.