COVID-19 and Linguamatics

In the rapidly evolving fight against COVID-19, IQVIA is committed to deploying our resources and capabilities to help everyone in healthcare do what needs to be done, and to keep things moving forward. Pharmaceutical and healthcare organizations, governments, and the broader scientific communities around the world are working to assess the impact of the virus, and how this can be tackled.

As part of this effort, it’s critical to have access to the best evidence from a broad range of data, including scientific literature, clinical trials and other textual sources. For intelligence from unstructured text, Linguamatics can help. Our Natural Language Processing (NLP) technology enables fast, systematic, and comprehensive insight generation from unstructured text. These sources can include scientific literature, clinical trial records, preprints, internal sources, social media, and news. Capturing key information from these many sources and synthesizing into one place – an Evidence Hub – gives users a deeper understanding of everything that’s going on. This approach can speed answers to key questions to confront the COVID-19 pandemic, such as:

  • Short-term: What are the best potential drugs for repurposing efforts? How can we find new candidates? How do I prioritize new candidates? How do I find recent trials, the most up-to-date research, and the key opinion leaders and researchers in this field?
  • Medium-term: What’s known about SARS-CoV-2 to enable vaccine development? Who in the population is most at-risk for severe disease? What are the key co-morbidities and are there therapeutics that can prevent COVID-19 impacting these populations?
  • Long-term: How will my patient population respond to my drugs, after COVID-19 infection? What additional care pathways are needed?

Linguamatics experience and domain expertise allows us to quickly develop resources to track and collect the latest content on COVID-19. We are providing access to a number of resources that we hope will assist our clients and the broader healthcare community to track and access the most current scientific and clinical evidence from key unstructured text. These include a publicly available COVID-19 Global Dashboard, as well as new Linguamatics OnDemand Cloud COVID-19 resources.

More information on the dashboard and resources are provided below. In addition, we are happy to help you transform your internal text sources into evidence to combat COVID-19, and integrate these with our efforts. If you want to find out more or have any questions about how NLP can help in the fight against COVID-19, please contact us.

COVID-19 Global Dashboard

Using Linguamatics NLP to extract COVID-19 relevant abstracts and trial locations from MEDLINE® and ClinicalTrials.gov, we have created a simple dashboard at https://covid19.linguamatics.com. This global view tracks existing efforts with blue circles representing the location of the first author’s affiliation from MEDLINE® abstracts, and orange circles representing all of the study locations:

In some cases, the location is approximate and, where the location could not be automatically calculated, the document is missing from the map. The data is refreshed on a weekly basis to keep you up-to-date.

New Linguamatics OnDemand cloud COVID-19 Resources

Linguamatics is providing new COVID-19-related sources, updated terminologies and focused queries to all Linguamatics OnDemand Cloud users.

As well as the MEDLINE® and ClinicalTrials.gov indexes (mentioned above), other relevant sources like FDA Drug Labels and NIH Grants are available via the Linguamatics Content Store. The sources will provide you with focused COVID-19 data as well as broader context to dig into drug candidates and mechanism of action information. In addition, we are updating our terminologies to include more specific coronavirus and COVID-19 (i.e. species and disease) terms as well as new queries that make the best use of these new terminologies and data sources.

We are also providing access to two new data sources:

CORD-19 Dataset

The COVID-19 Open Research Dataset (CORD-19) is a collection of scholarly articles about COVID-19 and the coronavirus family of viruses compiled by the Allen Institute for AI. It contains peer-reviewed articles and preprints from PubMed Central, bioRxiv, medRxiv, and a corpus of research articles maintained by the World Health Organization (WHO) with links to other resources like Microsoft Academic Graph, PubMed, and Semantic Scholar.

Elsevier Coronavirus Dataset

Elsevier have kindly provided all of their content related to coronavirus and COVID-19 to the wider scientific community. Linguamatics has processed these documents to create a new index of these documents using our ScienceDirect indexing settings. More information about the dataset, including how long it is available for, can be found in the Elsevier Coronavirus Center.


As long as the sources are publicly available, Linguamatics will process them and keep them updated on a regular basis. If you want to find out more or have any questions about how NLP can help in the fight against COVID-19, please contact us.