INDRA integrates multiple text-mining systems and pathway databases to automatically extract mechanistic knowledge from the biomedical literature and through a process of knowledge assembly, build executable models and causal networks. Based on profiling and perturbational data, these models can be contextualized to be cell-type specific and used to explain experimental observations or to make predictions.
In the context of the ongoing COVID-19 pandemic, the
INDRA team at the
Laboratory of Systems Pharmacology, Harvard Medical School
is working on understanding the mechanisms by which SARS-CoV-2 infects
cells and the subsequent host response process, with the goal
of finding new therapeutics using INDRA.
EMMAA (Ecosystem of Machine-maintained Models with Automated Analysis) makes available a set of computational models that are kept up-to-date using automated machine reading, knowledge-assembly, and model generation, integrating new discoveries immediately as they become available.
EMMAA COVID-19 model
integrates all literature made available under the
COVID-19 Open Research Dataset Challenge (CORD-19) and combines it with newly appearing
papers from PubMed (about 300 every day) as well as bioRxiv and
medRxiv preprints. It also integrates content from CTD, DrugBank, VirHostNet,
and many other pathway databases.
The model is also used to construct casual, mechanistic explanations to around 2,800 drug-virus effects:
MITRE COVID-19 Therapeutic Information Browseravailable
The EMMAA COVID-19 model is also on Twitter (
@covid19_emmaa) where it provides updates on the findings that it learns from the literature and also
new experimental observations (such as drug effects on viruses, as described
above) that it can explain based on these new pieces of knowledge.
COVID-19 Disease Map
brings together top pathway curators and modelers from around the world
to create a set of models to elucidate the molecular mechanisms behind
We used INDRA statements assembled from all available biomedical
literature and a multitude of pathway databases to find evidence
for all interactions in the COVID-19 Disease Map, and to suggest other
mechanisms that haven’t yet been included. The results are available
We used INDRA to assemble all known small molecules that can inhibit a set of
protein targets that are of particular interest in treating COVID-19.
These reports are organized as browseable web pages that allow drilling down
into specific literature evidence, linking to supporting publications, and
curating any incorrect relationships. The target-specific reports are available
While we added some customizations to these reports, similar results can
be obtained by querying the
INDRA DB directly.
To support the COVID-19 Disease Map curator community, we generated a ranking
of articles in the CORD-19 corpus by the amount of molecular mechanistic
information they were likely to contain. For each article, the dataset lists 1)
the total number of mechanistic events extracted by all NLP systems supported by
INDRA, 2) the number of unique events extracted from the document, and
3) the number of unique events where subject and object were both molecular
entities (i.e., protein or chemical). Because the CORD-19 corpus contains
many documents that are not directly relevant to coronavirus biology, we
also generated rankings for the subset of documents tagged with the MESH
term for “coronavirus” in PubMed (MESH ID D017934). The datasets are available
at the links below:
All CORD-19 articles
Coronavirus articles only
Another interface for browsing INDRA COVID-19 literature assembly results
is available via
this page (login: semvizuser/semviz), an approach to semantic
browsing of biomedical relations developed at
A tutorial video of using this interface with INDRA results to construct
hypotheses about COVID-19 is available
CoronaWhy is a globally distributed, volunteer-powered research organisation, assisting the medical community’s ability to answer key questions related to COVID-19.
INDRA is a key part of the
CoronaWhy software infrastructure
as an entrypoint to access multiple text-mining systems and pathway databases
and assembling causal models from these sources.
INDRA coupled to Reach serves as the back-end for the
COVIDminer application developed by
Rupert Overall. COVIDminer allows searching for
entities of interest for COVID-19 and visualizing the set of interactions in
their neighborhood as a graph. By clicking on graph nodes or edges, users can
learn more about each entity as well as the supporting publication and the
specific sentence serving as evidence for relations.
We have developed several applications that are generally applicable to biomedical research and can therefore also be used to study COVID-19.
INDRA: INDRA can be used as a
Python packageor a
web serviceto collect relevant information from the literature and pathway databases and build custom COVID-19 models.
INDRA database: The INDRA database website provides a search interface to find INDRA Statements assembled from the biomedical literature, browse their supporting evidence, and curate any errors. An example search relevant to COVID-19 is Object: TMPRSS2 to find entities that regulate the TMPRSS2 protease, which is crucial for SARS-CoV-2 entry into human cells.
INDRA network search: The INDRA network search allows finding causal paths, shared regulators, and common targets between two entities. An example search relevant to COVID-19 is Subject: ACE2, Object: MTOR (see
Dialogue.bio: The dialogue.bio website allows launching dedicated human-machine dialogue sessions where you can upload your data (e.g., DE gene lists or gene expression profiles), discuss relevant mechanisms, and build model hypotheses using simple English dialogue. For instance, you could try the following series of questions: “what is ACE2?”, “what does it regulate?”, “which of those are transcription factors?”.
here. It is currently deployed in multiple workspaces and has answered hundreds of questions from COVID-19 researchers since the pandemic began. Please
contact usif you would like to install CLARE in your Slack workspace.
This work is funded under the DARPA Communicating with Computers (W911NF-15-1-0544), DARPA Automating Scientific Knowledge Extraction (HR00111990009) and DARPA Automated Scientific Discovery Framework (W911NF-18-1-0124) programs.