Pharmacogenomics across the drug discovery pipeline – are we nearly there?

August 20 2014

Since the human genome was published in 2001, we have been talking about the potential application of this knowledge to personalized medicine, and in the last couple of years, we seem at last to be approaching this goal.

A better understanding of the molecular basis of diseases is key to development of personalized medicine across pharmaceutical R&D, as was discussed last year by Janet Woodcock, Director of the FDA’s Center for Drug Evaluation and Research (CDER).

FDA CDER has been urging adoption of pharmacogenomics strategies and pursuit of targeted therapies for a variety of reasons. These include the potential for decreasing the variability of response, improving safety, and increasing the size of treatment effect, by stratifying patient populations.

Pharmacogenomics is the study of the role an individual’s genome plays in drug response, which can vary from  adverse drug reactions to lack of therapeutic efficacy. With the recent explosion in sequence data from next generation sequencing (NGS) technologies, one of the bottlenecks in application of genomic variation data to understanding disease is access to annotation.

From NGS workflows, scientists can quickly identify long lists of candidate genes that differ between two conditions (case-control, or family hierarchies, for example). Gene annotations are essential to interpret these gene lists and to discover fundamental properties like gene function and disease relevance.

Key sources for these annotations include the ever-growing biomedical literature either in structured databases (such as COSMIC, GAD, DGA) but much valuable information is in textual sources such as PubMed Central, MEDLINE, and OMIM.

Extracting actionable insight rapidly and accurately from text documents is greatly helped by advanced text analytics – and users of our I2E text analytics solution have been asking for access to OMIM, which will soon be available in our OnDemand portfolio.

In particular, there are two common use cases that I2E users want to address with enhanced text analytics over the OMIM data:

One use case comes from the clinical side of drug discovery-development; the clinicians provide information on a particular case or phenotype, and I2E is used to extract from OMIM the potential genes that might be relevant to sequence from the clinical samples to see if there is involvement in the disease pathway.

The other use case comes from early on in the drug discovery-development pipeline, at the initial stages of a project, where for a new disease area I2E is used to pull out a set of potential targets from OMIM. Obviously, before any lab work starts, more in-depth research is needed, but this provides an excellent seed for entry into a new therapeutic area.

Utilizing I2E to access OMIM brings benefits such as:

  • High quality results from this manually curated, fact-dense data source, compared to, for example, querying original articles in peer-reviewed literature
  • The use of our domain-specific ontologies (e.g. for diseases, genes, mutations and other gene variants) enables high recall compared to searching via the OMIM interface (for example, using ontologies to search for “liver cancer”, and being able to also find records with annotations for “liver neoplasm”, “hepatic cancer”, “cancer of the liver”, etc)
  • Clustering of various synonyms and expressions from the use of Preferred Terms (PT) such as Gene Symbols
  • The ability to build in-depth queries, such as extraction of gene-gene interactions, and to hit a wide variety of concepts and synonyms, for example many different ways in which gene/protein mutations may be named (see figure legend)


The image shows a network of Disease – Gene – Mutation relationships from I2E results in Cytoscape. I2E was used to extract gene (green squares) and mutation (circles) information for stroke (central red triangle), showing some overlap of gene interactions with a related disease, cerebral infarction. Utilising the Linguamatics Mutation resource enables easy extraction of precise information patterns (e.g. “an ACTA2 mutation”; “proline for serine at codon 116″; “4895A/G”; “a 4-bp deletion”; “Q193Sfs*12″; “a 377A-T transversion”), which would be hugely time-consuming to do by manual curation.

Combining OMIM access with extraction of genotype-phenotype relationships from MEDLINE, PubMed Central and will give I2E users an excellent resource for NGS annotation, target discovery, and clinical genomics, in order to better target the molecular basis of disease.

If you are interested in accessing OMIM using the power of I2E (or any other of the current content options on I2E OnDemand i.e. MEDLINE,, FDA Drug Labels, NIH Grants, PubMed Central Open Subset, and Patents), please get in touch and we can provide more information and keep you updated on our progress.