The 2014 Ebola outbreak is officially the deadliest in history. Governments and organizations are searching for ways to halt the spread – both responding with humanitarian help, and looking for treatments to prevent or cure the viral infection. 

Ebola virus disease (or Ebola haemorrhagic fever) is caused by the Ebola filovirus 

A couple of weeks ago we received a tweet from Chris Southan, who has been looking at crowdsourcing anti-Ebola medicinal chemistry. He asked us to mine Ebola C07D patents (i.e. those for heterocyclic small molecules, the standard chemistry for most drugs) using our text analytics tool I2E, and provide him with the resulting chemical structures.

We wanted to help. What anti-Ebola research has been patented, that might provide value to the scientific community? Searching patents for chemistry using an automated approach is notoriously tricky; patent documents are long, and often purposefully obfuscated with chemicals frequently being obscured by the complex language used to described them or corrupted by OCR errors and destroyed by the overall poor formatting of the patents.


Since the human genome was published in 2001, we have been talking about the potential application of this knowledge to personalized medicine, and in the last couple of years, we seem at last to be approaching this goal.

A better understanding of the molecular basis of diseases is key to development of personalized medicine across pharmaceutical R&D, as was discussed last year by Janet Woodcock, Director of the FDA’s Center for Drug Evaluation and Research (CDER).

FDA CDER has been urging adoption of pharmacogenomics strategies and pursuit of targeted therapies for a variety of reasons. These include the potential for decreasing the variability of response, improving safety, and increasing the size of treatment effect, by stratifying patient populations.

Pharmacogenomics is the study of the role an individual’s genome plays in drug response, which can vary from  adverse drug reactions to lack of therapeutic efficacy. With the recent explosion in sequence data from next generation sequencing (NGS) technologies, one of the bottlenecks in application of genomic variation data to understanding disease is access to annotation.

From NGS workflows, scientists can quickly identify long lists of candidate genes that differ between two conditions (case-control, or family hierarchies, for example). Gene annotations are essential to interpret these gene lists and to discover fundamental properties like gene function and disease relevance.


In the current competitive marketplace for healthcare, pharmaceutical and medical technology companies must be able to demonstrate clinical and economic evidence of benefit to providers, healthcare decision-makers and payers.

Now more than ever, pricing pressure and regulatory restrictions are generating increased demand for this kind of outcomes evidence.

Health Economics and Outcomes Research (HEOR) aims to assess the direct and indirect health care costs associated with a disease or a therapeutic area, and associated interventions in real-world clinical practice.

These costs include:

  • Direct economic loss
  • Economic loss through hospitalization
  • Indirect costs from loss of wider societal productivity

The availability of increasing amount of data on patients, prescriptions, markets, and scientific literature combined with the wider use of comparative effectiveness make traditional keyword based search techniques ineffectual. I2E can provide the starting point for efficiently performing evidence based systematic reviews over very large sets of scientific literature, enabling researchers to answer questions such as:

• What is the economic burden of disease within the healthcare system? Across states, and globally?

• Does XYZ new intervention merit funding? What are the economic implications of its use?

• How do the incremental costs compare with the anticipated benefits for specific patient groups?

• How does treatment XYZ affect quality of life? Activities of daily living? Health status indicators? Patient satisfaction?