INDRA integrates multiple text-mining systems and pathway databases to automatically extract mechanistic knowledge from the biomedical literature and through a process of knowledge assembly, build executable models and causal networks. Based on profiling and perturbational data, these models can be contextualized to be cell-type specific and used to explain experimental observations or to make predictions.
In the context of the ongoing COVID-19 pandemic, the
INDRA team
at the
Laboratory of Systems Pharmacology, Harvard Medical School
is working on understanding the mechanisms by which SARS-CoV-2 infects
cells and the subsequent host response process, with the goal
of finding new therapeutics using INDRA.
EMMAA (Ecosystem of Machine-maintained Models with Automated Analysis) makes available a set of computational models that are kept up-to-date using automated machine reading, knowledge-assembly, and model generation, integrating new discoveries immediately as they become available.
The EMMAA COVID-19 model
integrates all literature made available under the
COVID-19 Open Research Dataset Challenge (CORD-19)
and combines it with newly appearing
papers from PubMed (about 300 every day) as well as bioRxiv and
medRxiv preprints. It also integrates content from CTD, DrugBank, VirHostNet,
and many other pathway databases.
here
.here
.The model is also used to construct casual, mechanistic explanations to around 2,800 drug-virus effects:
MITRE COVID-19 Therapeutic Information Browser
available here
.here
.
The EMMAA COVID-19 model is also on Twitter (@covid19_emmaa
) where it provides updates on the findings that it learns from the literature and also
new experimental observations (such as drug effects on viruses, as described
above) that it can explain based on these new pieces of knowledge.
The COVID-19 Disease Map
brings together top pathway curators and modelers from around the world
to create a set of models to elucidate the molecular mechanisms behind
COVID-19.
We used INDRA statements assembled from all available biomedical
literature and a multitude of pathway databases to find evidence
for all interactions in the COVID-19 Disease Map, and to suggest other
mechanisms that haven’t yet been included. The results are available
here
.
We also implemented a feature - based on the above alignment - to find
small molecule inhibitors for a given pathway in the COVID-19 Disease Map.
The results for the Interferon Type I pathway are available here
.
We also used our Gilda
system to find
appropriate grounding (database
identifiers) to ungrounded entities used in the Disease Map. The results of
this are available
here
.
We used INDRA to assemble all known small molecules that can inhibit a set of
protein targets that are of particular interest in treating COVID-19.
These reports are organized as browseable web pages that allow drilling down
into specific literature evidence, linking to supporting publications, and
curating any incorrect relationships. The target-specific reports are available
here:
ACE2
TMPRSS2
CTSB
CTSL
FURIN
.
We also compiled similar reports on the downstream effects of some specific
drugs of interest to our collaborators. These can be found here:
amodiaquine
hydroxychloroquine
While we added some customizations to these reports, similar results can
be obtained by querying the INDRA DB
directly.
To support the COVID-19 Disease Map curator community, we generated a ranking
of articles in the CORD-19 corpus by the amount of molecular mechanistic
information they were likely to contain. For each article, the dataset lists 1)
the total number of mechanistic events extracted by all NLP systems supported by
INDRA, 2) the number of unique events extracted from the document, and
3) the number of unique events where subject and object were both molecular
entities (i.e., protein or chemical). Because the CORD-19 corpus contains
many documents that are not directly relevant to coronavirus biology, we
also generated rankings for the subset of documents tagged with the MESH
term for “coronavirus” in PubMed (MESH ID D017934). The datasets are available
at the links below:
All CORD-19 articles
Coronavirus articles only
Another interface for browsing INDRA COVID-19 literature assembly results
is available via semviz.org
on this page
(login: semvizuser/semviz), an approach to semantic
browsing of biomedical relations developed at Brandeis University
.
A tutorial video of using this interface with INDRA results to construct
hypotheses about COVID-19 is available here
.
CoronaWhy is a globally distributed, volunteer-powered research organisation, assisting the medical community’s ability to answer key questions related to COVID-19.
INDRA is a key part of the CoronaWhy software infrastructure
as an entrypoint to access multiple text-mining systems and pathway databases
and assembling causal models from these sources.
INDRA coupled to Reach serves as the back-end for the
COVIDminer
application developed by
Rupert Overall
. COVIDminer allows searching for
entities of interest for COVID-19 and visualizing the set of interactions in
their neighborhood as a graph. By clicking on graph nodes or edges, users can
learn more about each entity as well as the supporting publication and the
specific sentence serving as evidence for relations.
We have developed several applications that are generally applicable to biomedical research and can therefore also be used to study COVID-19.
INDRA
: INDRA can be used as a Python package
or a web service
to collect relevant
information from the literature and pathway databases and build custom
COVID-19 models.INDRA database
: The INDRA database website provides
a search interface to find INDRA Statements assembled from the biomedical
literature, browse their supporting evidence, and curate any errors. An
example search relevant to COVID-19 is Object: TMPRSS2 to find entities that
regulate the TMPRSS2 protease, which is crucial for SARS-CoV-2 entry into
human cells.INDRA network search
: The INDRA network search
allows finding causal paths, shared regulators, and common targets between
two entities. An example search relevant to COVID-19 is Subject: ACE2,
Object: MTOR (see here
).Dialogue.bio
: The dialogue.bio website allows
launching dedicated human-machine dialogue sessions where you can upload your
data (e.g., DE gene lists or gene expression profiles), discuss relevant
mechanisms, and build model hypotheses using simple English dialogue.
For instance, you could try the following series of questions:
“what is ACE2?”, “what does it regulate?”,
“which of those are transcription factors?”.here
. It is currently deployed in multiple workspaces
and has answered hundreds of questions from COVID-19 researchers since
the pandemic began. Please contact us
if you would like to install CLARE in your Slack workspace.This work is funded under the DARPA Communicating with Computers (W911NF-15-1-0544), DARPA Automating Scientific Knowledge Extraction (HR00111990009) and DARPA Automated Scientific Discovery Framework (W911NF-18-1-0124) programs.