Earlier this year, Linguamatics announced our new Connected Data Technology for federated search, and in our newest version, I2E 4.4, we build on this to take another step along the path of better data interoperability. I2E 4.4 introduces a more powerful way to customize your text analytics results using enhanced linkouts in the HTML output, enabling you, for example, to connect your text-mined data to structured content.

Linkouts enable you to link out to, or pull in, additional information relating to the preferred terms (PTs) or concept identifiers (NodeIDs) in your query results. They can be hyperlinks, images or customized output. For example, you can configure linkouts to see information from an external website by clicking on the concept in the text-mined query results. Alternatively, it is possible to enable the interface to display an image in the query results, such as a chemical structure, instead of the preferred term.

This new functionality means you can use linkouts to enhance query results, by enabling you to access additional related information to provide more context or metadata for your search. So, for example, a search for chemicals from ChEBI could link directly from the preferred term in your results to the webpage for that concept on the EBI web site (e.g. Cyclosporine), whilst a gene name in the same result links to EntrezGene (e.g. ICAM1).


At the October Text Mining Summit, we had speakers from pharma, biotech and academia presenting on an amazing range of different applications of text analytics to provide value within the drug discovery-development pipeline. Over a day and a half we heard from a dozen external speakers from healthcare and pharma, all sharing their enthusiasm for the value that text analytics can bring to the drug discovery, development and delivery environments.

Work presented by UNCC researchers using I2E to understand potential health effects of plant phytochemical: Network map of text-mined associations linking Plant to phytochemical; Phytochemical to human genes; Human genes to biological pathways; Pathways linked to human health phenotypes.

The life science applications ranged from safety, target discovery and alerting, genotype-phenotype annotations, clinical trial analytics, phytochemicals as potential nutraceuticals, and patent landscaping for antibody-drug conjugates.

Back by popular demand, Wendy Cornell (ex-Merck) presented on gaining value from internal preclinical safety reports using I2E, which we’ve discussed in blog posts here before.


There seems to be a certain buzz around rare and orphan diseases. Following the Findacure meeting I attended last month, there are two recent events I’d like to mention.

Firstly, I attended the first Cambridge Rare Disease Network summit, held in Cambridge UK, with a fantastic line-up of speakers from a range of professions to discuss current and new initiatives in rare disease. The debates ranged from the use of next generation sequencing for diagnostics, to crowd-sourcing both for science and funding, to drug repurposing, to the views of payers and the issues around pricing.

For me it was also a reminder, particularly from some of the parent speakers, of the impact that rare disease has on individuals and families. All too often we are so busy with the day-to-day of research and business that it's easy to lose sight of the ideal end-goal - treatments for all adults, all children, affected by these disparate and often devastating diseases.

Secondly, this month the FDA released new draft guidance “to navigate the difficult road to approval of drugs for rare diseases”.


I attended the Findacure “Drug Repurposing for Rare Diseases” event last week; a small symposium with an interesting mix of attendees – academics, pharma, patient groups, vendors.  The main focus was networking, inspired by a series of short talks (see Findacure blog for more information).

  • 6,000 to 8,000 identified rare diseases (prevalence less than 5 in 10,000)
  • Only approximately 200 have licenced treatments – large unmet need
  • 1 in 17 people (6-8% of population) will develop a rare disease
  • 30-40 million people in US, 30-40 million in Europe
  • 75% of all rare diseases affect children

With the changing landscape from “blockbuster” to more personalised “nichebuster” therapeutics, and the incentives provided by regulatory bodies (such as FDA’s Orphan Drug Designation), rare diseases are an increasing focus of many of Linguamatics’ pharma and biotech customers.

So, I hear you ask – how does text analytics fit into rare diseases drug discovery?  It’s simple: Information associated with rare diseases is essential at many stages of drug discovery and development.  And, this essential information is often buried in unstructured text - in different data sources, with differing formats, vocabs, etc.


Giving a presentation on NLP text mining a couple of weeks ago*, I was asked whether our text analytics solution can help one of the extra Vs of big data – Veracity. This is a much-discussed topic at the moment, and after Volume Velocity and Variety, seems to be the most important of the additional Vs (see Seth Grimes blog for a good discussion on some more “wanna-Vs”).

Veracity, when it comes to data and decision making, can mean many things:

  • Does my conclusion make sense?
  • Is this particular data point accurate?
  • Do I trust this publication?
  • Is this assertion evidenced reliably?

 - but the bottom line is, if I am making an important business decision, how can I be sure it’s made using the best possible data?

This is obviously a tricky question and has been thrown into public view over recent years with studies trying to replicate critical experimental data and finding reproducibility frighteningly low (e.g. PLoS . So, how can a text analytics tools shed any light in such a minefield?

Scientists in the United States spend $28 billion each year on basic biomedical research that cannot be repeated successfully. That is the conclusion of a study published on 9 June 2015 in PLoS Biology that attempts to quantify the causes, and costs, of irreproducibility.