Skip to main content

Text analytics for Rare Disease drug discovery & development

I attended the Findacure “Drug Repurposing for Rare Diseases” event last week; a small symposium with an interesting mix of attendees – academics, pharma, patient groups, vendors.  The main focus was networking, inspired by a series of short talks (see Findacure blog for more information).

  • 6,000 to 8,000 identified rare diseases (prevalence less than 5 in 10,000)
  • Only approximately 200 have licenced treatments – large unmet need
  • 1 in 17 people (6-8% of population) will develop a rare disease
  • 30-40 million people in US, 30-40 million in Europe
  • 75% of all rare diseases affect children

With the changing landscape from “blockbuster” to more personalised “nichebuster” therapeutics, and the incentives provided by regulatory bodies (such as FDA’s Orphan Drug Designation), rare diseases are an increasing focus of many of Linguamatics’ pharma and biotech customers.

So, I hear you ask – how does text analytics fit into rare diseases drug discovery?  It’s simple: Information associated with rare diseases is essential at many stages of drug discovery and development.  And, this essential information is often buried in unstructured text - in different data sources, with differing formats, vocabs, etc.

Over the past couple of years we have been involved in customer projects extracting key facts around rare diseases from sources such as Orphanet, MEDLINE, OMIM, Citeline’s PharmaProjects and others.  In particular, expanding the rare disease vocabulary from Orphanet, and making this accessible for text mining searches, has increased recall considerably.

Questions researchers have used I2E to address include:

  • What is the relationship between diseases X and Y?
  • How can I identify shared pathways, genes and targets
  • Can we triage/prioritise a list of targets further?
  • Can we search for a particular gene and uncover all diseases and potential indications

Using the semantic and linguistic search functionalities provided by I2E enabled these researchers to develop a greater understanding of the disease-related concepts in orphan diseases, including intelligence on the impact of disease on patient population, data around the biology/genetics of mutations, clinical presentation, and treatment options and outcomes.


Case study: Researchers at Agios Pharmaceuticals used I2E to develop a virtual portfolio and systematically map the space around a specific class of rare diseases, Inborn Errors of Metabolism (IEM) in order to link diseases to targets. Using I2E they extracted key facts from MEDLINE, OMIM and other sources. Using these data they were able to extract 469 genes linked to 647 diseases, and identify the key monogenic disorders, for further clinical validation and prioritisation. 

Systematic map of associations between genes and inborn metabolic disorders

Ready to get started?

Request a Demo

Questions? Ask our experts