Posts from July 2015

Giving a presentation on NLP text mining a couple of weeks ago*, I was asked whether our text analytics solution can help one of the extra Vs of big data – Veracity. This is a much-discussed topic at the moment, and after Volume Velocity and Variety, seems to be the most important of the additional Vs (see Seth Grimes blog for a good discussion on some more “wanna-Vs”).

Veracity, when it comes to data and decision making, can mean many things:

  • Does my conclusion make sense?
  • Is this particular data point accurate?
  • Do I trust this publication?
  • Is this assertion evidenced reliably?

 - but the bottom line is, if I am making an important business decision, how can I be sure it’s made using the best possible data?

This is obviously a tricky question and has been thrown into public view over recent years with studies trying to replicate critical experimental data and finding reproducibility frighteningly low (e.g. PLoS . So, how can a text analytics tools shed any light in such a minefield?

Scientists in the United States spend $28 billion each year on basic biomedical research that cannot be repeated successfully. That is the conclusion of a study published on 9 June 2015 in PLoS Biology that attempts to quantify the causes, and costs, of irreproducibility.


On July 16, delegates across the life sciences, biotech, healthcare and other knowledge-driven industries gathered in Princeton for Linguamatics’ one-day seminar: “From bench to bedside, unlocking key insights in your data”.  

We heard from Regeneron Pharmaceuticals, Johnson & Johnson, Copyright Clearance Center (CCC) and Linguamatics on how NLP technology is moving into new application areas to improve patient outcomes and unlock key insights across the drug discovery, development and delivery continuum. Delegates were very engaged and many stayed long after the talks had finished, to continue the day’s discussions.  

Jim Dixon, Senior Application Specialist, gave us an introduction to I2E NLP text mining and the new features in the latest I2E release and industry’s first federated text mining platform. Whatever the content, I2E can mine and extract with precision and at scale. You can use Linguamatics I2E to provide valuable intelligence from text, getting you to the answers faster so you can make smarter and better informed decisions.

Dr. Peng Zhang’s presentation showed us a real-life use case of I2E’s potential at Regeneron. Eliminating or modifying a single gene in the mouse genome can provide insight into the role that gene plays in normal physiology and disease pathogenesis, but keeping up-to-date with novel information is time-consuming. Dr. Zhang uses I2E to systematically mine the scientific literature for any reported gene knockout in mice, and associated autoimmune phenotype.